Identification of SARS-CoV-2 main protease inhibitors from FDA-approved drugs by artificial intelligence-supported activity prediction system

Abstract Although a certain level of efficacy and safety of several vaccine products against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) have been established, unmet medical needs for orally active small molecule therapeutic drugs are still very high. As a key drug target molecule, SARS-CoV-2 main protease (Mpro) is focused and large number of in-silico screenings, a part of which were supported by artificial intelligence (AI), have been conducted to identify Mpro inhibitors both through drug repurposing and drug discovery approaches. In the many drug-repurposing studies, docking simulation-based technologies have been mainly employed and contributed to the identification of several Mpro binders. On the other hand, because AI-guided INTerprotein’s Engine for New Drug Design (AI-guided INTENDD), an AI-supported activity prediction system for small molecules, enables to propose the potential binders by proprietary AI scores but not docking scores, it was expected to identify novel potential Mpro binders from FDA-approved drugs. As a result, we selected 20 potential Mpro binders using AI-guided INTENDD, of which 13 drugs showed Mpro-binding signal by surface plasmon resonance (SPR) method. Six (6) compounds among the 13 positive drugs were identified for the first time by the present study. Furthermore, it was verified that vorapaxar bound to Mpro with a Kd value of 27 µM by SPR method and inhibited virus replication in SARS-CoV-2 infected cells with an EC50 value of 11 µM. Communicated by Ramaswamy H. Sarma


Introduction
Several kinds of vaccine products seem to be contributing to a certain level of containment of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) . However, therapeutic treatment with orally available small molecule drugs has not been still established (Dowara et al., 2021), and large number of research and development are being conducted to identify novel small molecule drugs that clearly inhibit SARS-CoV-2 replication by many pharmaceutical companies and academic research organizations (Salasc et al., 2021).
The genome of SARS-CoV-2, in common with the most of coronavirus genomes, encodes two large polyproteins, PPA1 and PPA1AB, which are cleaved by two proteases, main protease (M pro , 3-chymotrypsin-like protease or 3CL pro ) and papain-like protease (PL pro ) . M pro is crucial for virus replication and important for the propagation of the virus. Furthermore, there is little mutation that greatly affect the enzymatic activity (Gimeno et al., 2020). Therefore, M pro is considered as one of the key drug targets for SARS-CoV-2 (Jin et al., 2020). In fact, many kinds of M pro inhibitors were identified both through drug repurposing and drug discovery approaches (Paul et al., 2021), and a part of the inhibitors are being evaluated in clinic (NS Healthcare Staff Writer, 2021).
A lot of drug-repurposing studies have been conducted for identification of M pro -inhibiting approved drugs. Principal approaches of these studies are molecular dynamics (MD)and/or docking simulation-based and have produced many crucial results (Dotolo et al., 2021). On the other hand, the approach of our artificial intelligence (AI)-supported drug discovery system, namely 'Artificial Intelligence-guided INTerprotein's Engine for New Drug Design (AI-guided INTENDD V R )', is distinguishable from current main virtual and/ or AI-introduced screening systems and does not depend on MD and docking simulation in the final process for selection of hit candidates (Komatsu et al., 2019a(Komatsu et al., , 2019bSato et al., 2021). Therefore, we hypothesized that approved drugs without preceding reports could be identified by our approach. As a result, although a part of AI-guided INTENDD-proposed drugs have been previously reported as potential M pro binders elsewhere, several drugs were newly identified as the M pro binder candidates. Furthermore, we conducted wet assessments as well and confirmed that 13 drugs bound to M pro indeed and 1 drug showed antivirus activity in SARS-CoV-2-infected cell-based assay.

Materials and methods
Selection of SARS-CoV-2 m pro binder candidates SARS-CoV-2 M pro binder candidates were selected with AIguided INTENDD (Komatsu et al., 2019a(Komatsu et al., , 2019bSato et al., 2021). In brief, we prepared unique description of feature values by pre-processing the coordinate information corresponding to protein-ligand binding mechanisms (atomic species, 3D-atomic positions, interactions, functional groups) using Protein Data Bank (PDB) data. The feature values and ligand activity information derived from PDBbind data were subjected to a deep convolutional neural network-based machine learning system, which was originally developed by Interprotein Corporation and named as AI-guided INTENDD. If we input structure data of target proteins and compounds without activity information to this system, it enables to provide predicted binding poses of the compounds and proprietary AI scores that indicate probabilities of the activity prediction. AI-guided INTENDD gives predicted activity classes of 0 to 7 in accordance with rank order of predicted potency. When we calculate AI score, we prepare an evaluation function that consists of prediction probability of the most potent class (x) and evaluation score (y), where the typical evaluation function is described as a following general formula: y ¼ 1.0000 þ (a À 1.0000)/(1 þ exp((x À b/c)). Finally, AI score is obtained by multiplying evaluation score (y) by weight (a range of 0.3 to 1.0) based on the predicted most potent class.
In the present study, 6Y2G  of PDB ID and FDA-approved compounds registered in Namiki Repurposing Library 2020 (Namiki Shoji, Tokyo, Japan) were used as target protein (M pro ) structure and compound source, respectively ( Figure 1). FDA-approved drugs of the number of 2700þ are registered in DrugBank (https://go.drugbank.com). Of these drugs, however, the number of molecules whose coordinate information we could use was limited to 1944. Furthermore, because the number included the compounds with unreliable structures such as disconnected or incorrect bond that might be formed in the process of conversion from 2D-to 3D-structures by Open Babel (O'Boyle et al., 2011), we had to remove such compound, resulting in narrowing-down of target drugs to 1741. These compounds were subjected to docking simulation. Docking simulation was conducted with a widely used standard open-source program with a minor modification. In docking simulation, protonation of amino acid residues and ligands were regulated in accordance with pKa under a condition of pH7.4, indicating that a certain residues of target protein are protonated around the neutral range. In addition, we utilized coordinates of the ligands with protonated polar hydrogens but any amino acids of the target protein were not protonated. Test compounds with the 3D coordinates that were obtained from PubChem (PubChem, 2021) or generated with Open Babel were docked with the area of Figure 2A of main protease (PDB: 6Y2G), resulting in the generation of 34,766 binding poses. Number of poses per compound was 20 or lower. AI-guided INTENDD selected 1605 poses (423 compounds) as the compounds with potential affinities for M pro of K d values of less than 0.1 mM. Finally, AIguided INTENDD proposed 20 compounds that fulfill the condition of AI score (> 0.7), molecular weight (350 À 800) and docking score ( À6.5 kcal/mol). We set 0.7 as a cut off value of AI score of the compounds with acceptable potencies from the overall range of AI score of 0.16 to 1.0 because 'AI score > 0.7' means that the compounds are theoretically expected to show pharmacological activity at concentrations of less than 100 nM in the output of AI-guided INTENDD. The proposed 20 compounds were purchased from Namiki Shoji (Tokyo, Japan).

Measurement of binding affinity of compounds to M pro
The binding affinity of the proposed compounds to M pro was measured by a standard surface plasmon resonance (SPR) method with Biacore T-200 (Cytiva, MA). Briefly, M pro (ACRO Biosystems, DE) was fixed on flow cell 2 of sensor chips (CM5, Cytiva, MA) with a resonance unit (RU) range of 7000 À 14000 by an amine coupling method. Human IgG (Sigma-Aldrich, MA) was fixed on reference flow cell 1 of the sensor tips as a negative control protein with the similar RU range to M pro . The RU of flow cell 1 was subtracted from that of flow cell 2 and the resulting value was defined as specific binding level of each compound. For some compounds, K d values were calculated by a software attached to Biacore T-200.
Virus SARS-CoV-2 viral strain, JPN/TY/WK/521 was provided by the National Institute of Infectious Diseases, Tokyo, Japan. Viruses were propagated in the monolayers of VeroE6/ TMPRSS2 cells in DMEM supplemented with 2% FBS and 1 mg/mL G418 at a multiplicity of infection of 0.01.

Virus titration
Infectivity was titrated by focus-forming assay as described previously (Yasugi et al., 2013) but with slight modifications. Viruses were serial 10-fold diluted by DMEM supplemented with 2% FBS and infected to the confluent VeroE6 cells for 8 hours.
The cells were fixed with 4% paraformaldehyde for 15 minutes and washed with phosphate-buffered saline (PBS) three times. The cells were permeabilized by 0.1% Triton V R X-100 for 15 minutes and washed with PBS three times. The cells were incubated with the rabbit anti-nucleocapsid monoclonal antibody (Thermo fisher scientific) at a dilution of 1:1000 for 1 hour at 37 C. After washing the cells with PBS three times, the cells were then incubated with Alexa488-conjugated goat anti-rabbit IgG (Thermo fisher scientific) at a dilution of 1:500 for 45 minutes at 37 C. After washing the cells with PBS, the antigen positive cells were counted by the fluorescent microscope.

Cell viability test
After the monolayer of VeroE6 cells were incubated in DMEM supplemented with 2% FBS and 0 to 50 mM GC376, vorapaxar,  or DMSO for 72 hours, cell viabilities were measured by cell counting kit-8 (CCK-8) assay (Dojindo, Kumamoto, Japan) according to the manufacturer's instructions.

Virus inhibition test
The monolayer of VeroE6 cells in DMEM supplemented with 2%FBS were mock treated or treated with 0.2 to 50 mM GC376, vorapaxar, or DMSO 1 hour prior to infection with 150 focus-forming units (FFU) of viruses at 37 C for 1 hour. After infection, the mixtures were replaced to the fresh DMEM with 2% FBS and 0 to 50 mM of GC376, vorapaxar, or DMSO. The cells were then cultured at 37 C for 8 hours.
After incubation, the cells were immunofluorescent stained as described in 'Virus titration'. The percentage of inhibition was estimated as the viral infectivity under compoundtreated conditions compared with that without compounds.

Statistics
Cell viability test and virus inhibition test were performed three times independently, and the mean and standard deviation (SD) were calculated by Microsoft Excel (version 2109 build 16.0.14430.20154). In virus inhibition test, half maximal (50%) effective concentration (EC 50 ) was determined by a RU values of 10 or more in SPR assay were indicated as 'þ' and less than 10, 'À'. 'Yes' of the column of 'Precedent report' means that effect on SARS-CoV-2 M pro had been examined in silico and/or in a wet assay precedential to the present study elsewhere. 'Yes' of the column of 'Clinical study' means that clinical study in COVID-19 patients is registered in 'ClinicalTrials.gov'. ND, not determined; NT, not tested. linear regression method with two measurement points that give less than and more than 50% inhibition. Selectivity index (SI) was calculated by dividing CC 50 (50% cytotoxic concentration) by EC 50 .

Selected SARS-CoV-2 M pro binder candidates
Firstly, we identified 'target pocket' as a small molecule binding site on the surface of M pro using a crystal structure, 6Y2G  from PDB ( Figure 2). This area is surrounded by Catalytic Dyad (His41 & Cys145), Met49, Asn142 and Gln189, and co-crystal structure in complex with a-ketoamide 13b has been reported as well. After coordinate information of this area was input to AI-guided INTENDD system, structure information of the FDA-approved drugs was subjected to the system. As a result, AI-guided INTENDD proposed 20 approved compounds, chemical structures of which are indicated in Figure S1. Original indications of these compounds were not restricted to antivirus but showed a broad variety ( Table 1).

Binding of compounds to M pro
Affinity of the proposed compounds to M pro was measured by SPR. In the first screening, the compounds were applied to Biacore T-200 at concentrations of 12,5, 25 and 50 except enasidenib (25, 50 and 100 mM). RU values of 10 or higher at the maximal concentration were defined as positive binding signal and indicated as 'þ' in the column of '1 st screening' of Table 1. RU values of less than 10 were indicated as 'À'and taken as negative binding signal. Sensor gram of each compound is shown in Figure S1. In total, SPR analysis revealed that 13/20 (65%) drugs showed positive M pro -binding signal in the first screening. The positive 13 compounds were subjected to the quantification of the binding affinity. Although 8 drugs of the 13 compounds did not reach obvious saturation due to insufficient solubility, K d values of remaining 5 compounds could be determined ( Figure 3; Table 1). Among these 5 compounds, vorapaxar and dasabuvir showed relatively high affinity to M pro , and those K d values were 27 and 3.1 mM, respectively.

Effect of compounds on SARS-CoV-2 replication and cell viability
Based on the results of the binding affinity to M pro , we focused at first on vorapaxar and dasabuvir, but finally decided to perform a cell-based assessment only for vorapaxar because dasabuvir showed cytotoxicity at concentrations of 3 mM and higher in the preliminary experiment (data not shown). In the present study, GC376 (Fu et al., 2020) was used as a positive control and showed a clear inhibitory effect on virus replication at slightly higher concentrations as compared with the previous report (Fu et al., 2020; Figure 4). Vorapaxar inhibited SARS-CoV-2 replication in a concentration-dependent manner as well although its efficacy and potency seemed to be lower than GC376. The EC 50 values of GC376 and vorapaxar were 3.5 and 11 mM, respectively. CC 50 values of the 2 compounds were >50 mM, therefore, these SI values were estimated to be >14 and >4.4, respectively.

Discussion
Regarding drug development for COVID-19, a certain level of efficacy and safety of vaccine products have been clarified but therapeutic treatment by orally available small molecules remains to be established. As the target proteins of SARS-CoV-2 for small molecule drugs, RNA-dependent RNA polymerase (RdRp) and M pro are mainly focused. In fact, Merck has sought FDA emergency use authorization for a RdRp inhibitor, molnupiravir (Towey, 2021) and Pfizer (Pfizer Inc, 2021a) has started global Phase 2/3 EPIC-PEP (Evaluation of Protease Inhibition for COVID-19 in Post-Exposure Prophylaxis) study of a M pro inhibitor, PF-07321332 as a novel oral antiviral candidate against COVID-19. As of November 2021, Merck and Ridgeback Biotherapeutics announced that the United Kingdom Medicines and Healthcare products Regulatory Agency (MHRA) has granted authorization in the United Kingdom (U.K.) for molnupiravir (MK-4482, EIDD-2801), the first oral antiviral medicine authorized for the treatment of mild-to-moderate COVID-19 in adults with a positive SARS-CoV-2 diagnostic test and who have at least one risk factor for developing severe illness (Merck & Co., Inc., 2021). Pfizer Inc. announced that it sought emergency use authorization (EUA) of its investigational oral antiviral candidate, PAXLOVID TM (PF-07321332; ritonavir), for the treatment of mild to moderate COVID-19 in patients at increased risk of hospitalizations or death (Pfizer Inc, 2021b). Preceding the research and development of these new chemical entities, many drug-repurposing studies had been started especially for M pro . As approaches for the drug repurposing, virtual screening methods including AI-supported systems were utilized in many studies. The methodology of those approaches was mainly docking simulation-based ranking and drugs with low values of binding free energies had tendency to be selected, a part of which showed actual binding activity for M pro , indicating that the docking simulation studies for identification of M pro binders have demonstrated remarkable contributions (Dotolo et al., 2021). On the other hand, the feature values for machine learning in AIguided INTENDD do not contain binding free energy-related factors but information of 3-dimensional coordinates of atoms that contribute to crucial interactions between proteins and ligands. Therefore, we made a hypothesis that AIguided INTENDD would propose the drugs with relatively low docking scores as potential M pro binders. In fact, the binding free energies of drugs listed in Table 1 seemed to show relatively high values. Although it might be difficult to exactly compare the difference between our approach and current other main technologies, at least, there seem to be some differences with AtomNet, a machine learning (deep convolutional neural network)-based activity prediction Figure 5. Comparison of proposed binding pose of vorapaxar (A) and X-ray co-crystallography of a-ketoamide 13b (B). S1 -S4 indicate the canonical binging pockets presented by Zhang et al. (2020). system, developed by Atomwise (Schroedl, 2019;Wallach et al., 2015). For example, in AtomNet, the coordinate of each atom is located in the voxel grid of 0.5 À 1.0 angstrom on a side and the position of the voxel grid seemed to be used for the machine learning. On the other hand, in AIguided INTENDD, the voxel grid is not used but the unique feature values derived from coordinates with the original digit number registered in PDB are subjected to the machine learning. In addition, we have not sufficiently investigated the effect of protonation, but we also recognize the significance of the protonation states in the activity prediction. Therefore, we have a plan to conduct the examination for the effect of protonation to further improve the prediction provability of our AI system.
In the present study, AI-guided INTENDD proposed 20 FDA-approved drugs as M pro binder candidates. Of these drugs, to the best of our knowledge, 8 drugs, aprepitant , vorapaxar (Gul et al., 2021), fluvastatin (Reiner et al., 2020, palbociclib , imatinib (Molavi et al., 2021), enasidenib (Molavi et al., 2021), doxazosin (Milligan et al., 2021) and dasabuvir (Ajeet et al., 2020) had been suggested to be potential inhibitors for M pro by virtual screening (Table 1). However, the results of wet assessment for these compounds have not seemed to be reported so far. Although Gul et al. (2021) selected vorapaxar as a drug of the top 15 potential binders to M pro , it was not included in interacting residue-based top 5 drugs and molecular mechanics generalized born surface area (MM/ GBSA) binding free energy (BFE)-based top 7 compounds.
Among the 20 drugs proposed by AI-guided INTENDD, 13 (65%) drugs showed M pro -binding signal in the SPR analysis. Although the directly comparable studies are limited, for instance, Mody et al. (2021) reported that they selected 47 potential M pro binders from FDA-approved drugs in silico, of which 6 drugs (13%) inhibited M pro enzymatic activity. These findings are suggesting that AI-guided INTENDD demonstrated high performance for the prediction of binding affinity to the target protein. Six (6) drugs of the 13 positive compounds were M pro binders that newly identified by AIguided INTENDD. Because vorapaxar and dasabuvir indicated relatively high affinity to M pro , we planned to assess antivirus effects of these compounds in SARS-CoV-2-infected cells. However, dasabuvir showed relatively strong cytotoxicity in the preliminary experiment, so the cell-based assay was conducted only for vorapaxar. As a result, vorapaxar inhibited the replication of SARS CoV-2 with an EC 50 value of 11 mM although its potency was lower than that of GC376, a positive control. Although we recognized that EC 50 value of GC376 was 0.70 mM in a previous report (Fu et al., 2020), the test concentrations were shifted toward a higher range in the present study based on a preliminary experiment, resulting in an EC 50 value of 3.5 mM. Both GC376 and vorapaxar showed partial cytotoxicity at a concentration of 100 mM while no observational change at 50 mM. The potential binding ability of vorapaxar to M pro had been predicted by Gul et al. (2021) but the result of wet assessment with vorapaxar has not been reported, meaning that its inhibitory effect on SARS-CoV-2 replication was verified for the first time by the present study. Because maximum plasma concentration (C max ) of vorapaxar when administered in accordance with the approved dosage and administration is around 65 ng/mL (131.96 nM) (Chen et al., 2014) (Table S1), unfortunately, it is speculated that vorapaxar would not exert a beneficial effect in COVID-19 patients at the original dosage.
Based on these findings, we considered it is probable that vorapaxar binds to the target pocket ( Figure 2) and inhibits M pro -mediated virus replication. In addition, the binding pose of vorapaxar proposed by AI-guided INTENDD was compared with X-ray co-crystallography of a-ketoamide 13b ( Figure 5). Its binding pose seemed to be much different from that of a-ketoamide 13b. Among the canonical binging pockets S1 -S4 presented by Zhang et al. (2020), at least S1 does not seem to be utilized for the binding of vorapaxar. Therefore, vorapaxar derivatives that functionally fulfill the space of S1 pocket would be expected to show higher affinity to M pro . The proposed binding pose (3D) and supposed interaction with M pro (2D) of all the AIguided INTENDD-selected 20 drugs were indicated in Figure S3. There seem to be many compounds that do not effectively utilize S1 pocket including dasabuvir with a relatively high affinity to M pro (Table 1). This finding might also support the hypothesis that the use of S1 pocket would contribute to identification of more potent M pro binders.

Conclusion
We selected 20 SARS-CoV-2 M pro binder candidates from FDA-approved drugs using AI-guided INTENDD, of which 12 compounds (60%) have been proposed as potential M pro binders for the first time to the best of our knowledge. SPR analysis revealed that 13 compounds (65%) of the 20 selected drugs showed positive M pro -binding signals, among which 6 compounds (46%) were first identified as M pro binders in the present study. Vorapaxar, one of the 13 M pro binders, exhibited a K d value of 27 mM to M pro in a SPR assay and inhibited virus replication with an EC 50 value of 11 mM in SARS-CoV-2-infected cells. AI-guided INTENDD-proposed binding pose of vorapaxar was largely different from the binding pose of a-ketoamide 13b in the X-ray co-crystallography. These findings suggest that AI-guided INTENDD is a practical system for identification of potential binders to drug target proteins and vorapaxar might become a good lead compound to identify a novel M pro inhibitor.