A systematic drug repurposing approach to identify promising inhibitors from FDA-approved drugs against Nsp4 protein of SARS-CoV-2

Abstract COVID-19 is caused by SARS-CoV-2 and responsible for the ongoing global pandemic in the world. After more than a year, we are still in lurch to combat and control the situation. Therefore, new therapeutic options to control the ongoing COVID-19 are urgently in need. In our study, we found that nonstructural protein 4 (Nsp4) of SARS-CoV-2 could be a potential target for drug repurposing. Due to availability of only the crystal structure of C-terminal domain of Nsp4 (Ct-Nsp4) and its crucial participation in viral RNA synthesis, we have chosen Ct-Nsp4 as a target for screening the 1600 FDA-approved drugs using molecular docking. Top 102 drugs were found to have the binding energy equal or less than –7.0 kcal/mol. Eribulin and Suvorexant were identified as the two most promising drug molecules based on the docking score. The dynamics of Ct-Nsp4-drug binding was monitored using 100 ns molecular dynamics simulations. From binding free energy calculation over the simulation, both the drugs were found to have considerable binding energy. The present study has identified Eribulin and Suvorexant as promising drug candidates. This finding will be helpful to accelerate the drug discovery process against COVID-19 disease. Communicated by Ramaswamy H. Sarma


Introduction
Coronavirus disease 2019 (COVID-19) is a severe respiratory illness caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which is a member of the genus Betacoronavirus and species SARS-related coronavirus (Gorbalenya et al., 2020). According to the World Health Organization report on 18 May 2021, the global cumulative cases of deaths and confirmed infections are more than 3 million and 162 million (WHO. Coronavirus Disease, 2019). Symptoms of COVID-19 include dry cough, fatigue, myalgia, fever, dyspnea, etc. However, the disease progression leads to severe illness and reported death is 2% in confirmed cases due to major alveolar damage and developing respiratory failure (Xu et al., 2020). Currently, there are several potential vaccine candidates with adequate immunogenicity and acceptable safety profiles available in the market for the treatment of COVID-19, such as ChAdOx1 nCoV-19 vaccine (AZD1222), a simian adenovirus nonreplicating vector ChAdOx1 comprising the full-length structural spike protein of SARS-CoV-2 developed by University of Oxford and AstraZeneca (Rawat et al., 2021); the mRNA-1273 vaccine, a modified messenger RNA (mRNA)-based vaccine developed by the Moderna and NIAID (Jackson et al., 2020) and the lipid-nanoparticle-formulated, nucleoside-modified mRNAbased vaccine candidate BNT162b1 developed by BioNTech, Fosun Pharma, and Pfizer (Mulligan et al., 2020). In spite of having promising vaccine candidates, effective drugs to control this pandemic are yet to be found. The U.S. Food and Drug Administration has approved Remdesivir treatment for COVID-19 in adults and pediatric patients (FDA, 2021a) and in combination with Baricitinib for emergency use (FDA, 2021b). It also authorized monoclonal antibody such as REGEN-COV (Casirivimab and Imdevimab) and Bamlanivimab in combination with Etesevimab (FDA, 2021b). In addition, many therapies are being tested in clinical trials to evaluate whether they are safe and effective in combating COVID-19. The Remdesivir showed some positive results in the control of SARS-CoV-2 infection in vitro (Wang et al., 2020a). Recent data showed that Remdesivir was able to shorten the recovery time in adults with COVID-19 , though in severe COVID-19 cases, Remdesivir did not show statistically significant clinical benefits (Wang et al., 2020b). So according to the current treatment scenario worldwide, drug candidates repurposing from FDA-approved drugs list would be an effective alternative method to quickly resolve the current pandemic with more medicinal cures.
The genome of SARS-CoV-2 (26-32 kb) is organized into segments of structural and nonstructural proteins (Nsp) (Ye et al., 2020). The structural component of virus consists of four proteins: Spike (S), Membrane (M), Envelope (E), and Nucleocapsid (N) protein, respectively. The nonstructural segment encodes for the 16 nonstructural proteins (Nsp1-16) (Ye et al., 2020). These proteins are essential for viral replication as well as infection, whereas functions of some are yet to be identified. Nsp4 protein is a subunit of replicase polyprotein, having membrane-spanning domains (Manolaridis et al., 2009). The large membrane-spanning part of the Nsp4 subunit in coronaviruses is hydrophobic, and a conserved Cterminal domain of about 100 amino acids is predominantly an alpha helix and exposed in cytoplasm (Manolaridis et al., 2009). Nsp4 together with Nsp3 might play critical functions during the process of viral RNA synthesis and double-membrane vesicle (DMV) formation in Middle East respiratory syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS) (Oudshoorn et al., 2017). The crystal structure of Ct-Nsp4 from Feline coronavirus (FCoV) predicted to undergo dimerization and functions as an anchor for the assembly of the viral Replication Transcription Complex (RTC) and also reported to play an important role in specific virus-host protein-protein interactions (Manolaridis et al., 2009). Strong immunolabeling of the Nsp4 C-terminal domain was also observed on the DMVs from SARS-CoV-infected cells (Van Hemert et al., 2008). The high conservation of this fragment among different groups of coronaviruses (Xu et al., 2009), and its involvement in assembly of the DMVs, suggests that this cytoplasmic domain of Nsp4 may play an important role in coronavirus replication complex assembly.
Therefore, we hypothesized that Ct-Nsp4 of SARS-CoV-2 can be a target for drug-designing studies, and its inhibition can potentially attenuate the virus. We have generated a three-dimensional (3D) model of the Ct-Nsp4 of SARS-CoV-2 and used this model to execute virtual screening to identify potential inhibitor compounds. Our in silico study is directed within the database of FDA-approved drugs (ZINC database) and found Eribulin and Suvorexant being the most suitable drugs to retain high potential to act as Ct-Nsp4 inhibitors.

Protein structure prediction and validation of drug target
The sequence of nonstructural protein 4 (Nsp4) of SARS-CoV-2 (YP_009724389.1) was retrieved from NCBI (Sayers et al., 2019). Monomeric crystal structure of C-terminal (Ct) domain of Nsp4 of SARS-CoV-2 was generated using Phyre2 web portal (Kelley et al., 2015). To validate our proposed structure of Ct domain of Nsp4, we used PROCHECK, an online suite of programs (Laskowski et al., 1993). The structure file of Ct domain of Nsp4 in pdb format was uploaded to the PROCHECK program. The outputs comprise of number of plots in PostScript format, and a comprehensive residue-byresidue listing and model validation was done by analyzing the Ramachandran plot. The reliability of the refined model was further assessed through ProSA web (Protein Structure Analysis), an established tool for validation and refinement of experimental protein structures (Wiederstein & Sippl, 2007). ProSA calculates an overall quality score for a specific input structure. After uploading the structure file of Ct domain of Nsp4, the Z-score and plot of the residues energy as an output was found. ProQ, which is a predictor based on neural network with a number of structural features, help in the prediction of the quality of a protein model as measured either by LGscore or MaxSub. Correct models should have LGscore > 1.5 and MaxSub > 0.1, whereas incorrect models should have LGscore < 1.5 and MaxSub < 0.1 (Wallner & Elofsson, 2003). Further, the 3D conformation of the refined model was verified by ERRAT server, a program for verifying protein structures determined by crystallography studies. Error values are plotted as a function of the position of a sliding nine-residue window. The error function is based on the statistics of nonbonded atom-atom interactions in the reported structure to a database of reliable high-resolution structures (Colovos & Yeates, 1993). The generally accepted range is >50% for a high-quality model (Messaoudi et al., 2013).

Identification of potential ligand-binding sites
The potential ligand-binding sites of C-terminal (Ct) domain of Nsp4 protein of SARS-CoV-2 was determined by server MetaPocket 2.0, which is a consensus method for the prediction of binding sites from eight different methods such as LIGSITEcs, PASS, Q-SiteFinder, SURFNET, Fpocket, GHECOM, ConCavity, and POCASA (Zhang et al., 2011).

Refinement of the Ct-Nsp4 protein structure for docking
Before docking, the structure of the Ct-Nsp4 was energy minimized to remove the steric clashes and unusual overlap between the side chains. The protein is modeled using CHARMM36 all atom force field (Best et al., 2012), and a 5000 steps of energy minimization following Steepest Descent (2500 steps) and Conjugate Gradient (2500 steps) algorithms were performed in CHARMM. The energy minimization steps were chosen to be sufficiently large so that the energy values after minimization get converged. The structure obtained after energy minimization is considered for virtual screening and docking in the following section.

Virtual screening and analysis of ligand interaction profiles
Virtual screening method is used as an efficient framework for filtering compounds for drug discovery. It was performed using PyRx 0.8 (Virtual Screening Tools) (Kulkarni et al., 2020). The 3D coordinates of the 1600 FDA-approved drug compounds were retrieved from ZINC database (https://zinc.docking.org) in SDF format in a single file and imported in Open Babel of PyRx, and further energy minimization of all the drugs was performed using mmff94 force field and 200 total numbers of minimization steps were performed utilizing conjugate gradient optimization algorithm in PyRx followed by conversion of them into AutoDock PDBQT format. Further, these drugs were subjected to docking against prepared target protein by AutoDock Vina (Trott & Olson, 2009)

Molecular dynamics simulation of the proteinligand complex
The complexes of Ct-Nsp4 with the ligands having the most favorable docking scores are subjected to all-atom molecular dynamics simulation. The N-terminal and C-terminal end of the protein are protected using Acetyl and Amide group, respectively. The Nsp4-ligand complex is solvated in a cubic box of TIP3P water (Mark & Nilsson, 2001). The dimension of the box is set in such a way that there is a minimum distance of 1 nm between any atom of the complex and the edge of the box. The Nsp4 and the small molecule ligands are parameterized using CHARMM36 (Best et al., 2012) and CGENFF (Vanommeslaeghe et al., 2010) force field, respectively. The CGENFF force field is used frequently to model the protein-ligand complexes and has been shown good promises. With the recent modifications of the parameter set, it can model large variety of small molecules (Lin & MacKerell, 2019). Each of the complexes solvated in the water box is energy minimized for 1000 step followed by 1 ns equilibration in NVT ensemble. A 5-ns equilibration at the isothermal-isobaric (NPT) ensemble where the temperature was kept constant at 310 K by applying a V-rescale thermostat (Bussi et al., 2007), and the pressure was maintained at 1 atm using a Parrinello-Rahman barostat (Parrinello & Rahman, 1981) with a pressure relaxation time of 2 ps. After that, these are subjected to 100 ns production dynamics. Coordinates are saved at each 20 ps of the production dynamics. Simulations are performed using GROMACS 2018.8 with GPU acceleration (Van Der Spoel et al., 2005). A spherical cutoff distance of 1 nm is chosen for estimating the electrostatic and van der Waals interactions. Long-range electrostatic interactions are calculated using the particle mesh Ewald (PME) method (Darden et al., 1993). Periodic boundary conditions have been used in all three directions to remove edge effects. SHAKE restraining algorithm was used (Andersen, 1983), algorithm which allowed 2 fs time step for production dynamics. Different structural properties are calculated using GROMACS. Graphical representations of the structures are prepared using VMD (Humphrey et al., 1996). The binding free energy between Ct-Nsp4 and the small molecule ligand is calculated as sum of enthalpy and entropy. Different components of enthalpy were computed using the Generalized Born model utilizing Molecular Volume (GBMV) (Lee et al., 2003), method implemented in CHARMM. GBMV calculates the solvation free energy as a sum of polar and nonpolar parts. Entropy was obtained as the quasi-harmonic entropy calculated from covariance over the trajectories. Average value and standard error of each component were calculated over the last 50 ns (5000 conformations) of 100 ns trajectories. The details of the method have been described elsewhere (Priya et al., 2015;Wymore et al., 2004). The calculation of binding free energy using "Generalized Born"-based methods has shown promising results in several studies while compared with experimental binding affinities (Chen et al., 2011;Greenidge et al., 2013;Maity et al., 2021). Different components of the binding energy are listed in Table 2.

Conservation of the Ct-Nsp4 among the HCoV
The multiple sequence alignment of the Ct-Nsp4 from seven known HCoVs to date (Ye et al., 2020) showed conserved region having 24 out of 100 residues being identical and having conserved aromatic amino acids and also 48 residues are conserved according to their physiochemical properties ( Figure 1). In Ct-Nsp4 of SARS-CoV-2, the Tyr487 (Y), Pro489 (P), and Pro490 (P) form an YxPP motif which is conserved among all seven HCoVs (Figure 1). This motif is a putative protein-protein interactions site and can be recognized by other proteins having very small protein interaction modules called Class I WW domains (Di Leva et al., 2006). PROSITE (http://www.expasy.org/prosite/) analysis of all Nsp proteins of SARS-CoV-2 did not identify the presence of any possible Class I WW domains, suggesting that the YxPP motif of Nsp4 can interact with host protein but not any other Nsps and might play a role in specific virus-host protein-protein interaction which is in agreement with the previous study (Manolaridis et al., 2009). Also, localization of the Proline-Proline motif may help in protection of the extended C-terminus from proteolytic cleavage by host enzymes (Vanhoof et al., 1995). Above results suggested that Ct-Nsp4 of SARS-CoV-2 may play a similar and conserved mode of operation that is replication complex assembly as well as specific virus-host protein-protein interactions.

Structure prediction and validation of C-terminal domain of Nsp4 protein
Due to the availability of crystal structure of only the C-terminal domain of Nsp4 (Manolaridis et al., 2009), the 3D model of the Ct-Nsp4 domain of SARS-CoV-2 (Figure 2(a)) was generated by Phyre2 web portal (Kelley et al., 2015). A total of 90 residues from the C-terminal end ( ̴ 100 residues) have been modeled with 100.0% confidence by the topranked template having highest raw alignment score, that is chain D of the C-terminal domain of FCoV (PDB ID-3GZF, with 41% identity and score of 321.39) found to be predominantly an alpha-helical domain. The generated 3D model was validated by Ramachandran plot by PROCHECK (Laskowski et al., 1993), and it was observed that 86.9%, 9.5%, 1.2%, and 2.4% residues were located in most favorable, additionally allowed, generously allowed, and disallowed regions, respectively (Figure 2(b)), which implies the accuracy of model prediction.
The compatible Z-score value, which was found to be -4.76 (Figure 2(c)), was obtained from ProSA-web evaluation (Wiederstein & Sippl, 2007) and divulged that the 3D model is well fit within the range of its native conformation of crystal structures. Further, the overall energy of Ct-Nsp4 was found to be negative that corresponds to correct and reliable parts of the Ct domain (Figure 2(d)). The Levitt-Gerstein (LG) score and the predicted MaxSub of Ct domain found to be 2.607 and 0.436 by ProQ tool [2003], whereas the LG score of > 2.5 implies the predicted structure is of acceptable quality. By using the ERRAT plot (Colovos & Yeates, 1993), same suppositions were achieved, where the overall quality factor is 64.286% (Figure 2(e)).

Potent inhibitors identified by in silico screening and their interactions with ligand-binding site of Ct-Nsp4
Further, the ligand-binding site of Ct-Nsp4 was determined by MetaPocket 2.0 (Zhang et al., 2011) (Figure 3(a) and (b)). The ligand-binding site was found to be in integration with the dimerization domain and protein-protein interaction (YxPP motif) domain (Figure 3(a)), suggesting the trustworthiness of the predicted ligand-binding site as target for drug inhibition studies. The Ramachandran plot of the refined and energy minimized structure of Ct-Nsp4 showed no single residue angle errors (Supplementary Figure S1) and further considered for virtual screening. Then, virtual screening of FDA-approved drugs from ZINC database (https://zinc.docking.org) against Ct-Nsp4 as target was done using AutoDock Vina (Trott & Olson, 2009) in PyRx 0.8 (Kulkarni et al., 2020). The top 102 drug-hits according to their binding energies ranging from -8.1 to -7 (kcal/mol) were tabulated in Table S1 in supplementary material. The top three categories belong to anticancer, antihypertensive, and anti-HIV drugs at occupancy of about 29.4%, 10.8%, and 8.8%, respectively (Figure 3(c)). From this list of 102 drugs, based on the lowest binding energy, top two drugs, Eribulin and Suvorexant, were further analyzed by PLIP (Kulkarni et al., 2020) and determined their interacting residues with the target (Figure 4). Both the drugs were found to have binding site overlapping with dimerization domain (Figure 4). Eribulin having lowest binding energy (-8.1 kcal/mol) and interacting with TYR 448, PHE 454, ALA 474, VAL 485, LEU 486, GLN 491 of Ct-Nsp4 with two hydrogen bonds and four hydrophobic interaction (Figure 4(a), Table 1). Suvorexant with binding energy of -7.9 kcal/mol was found have interaction with ALA 474, VAL 485, GLN 491, LEU 486, GLN 488 residues having four hydrogen bonds and four hydrophobic interactions (Figure 4(b), Table 1).

Dynamics of the Nsp4-drug complex
To check the stability of the two drugs identified by virtual screening inside Ct-nsp4 binding pocket, 100 ns all-atom    Figure 5(a)). The plot of radius of gyration also confirms overall preservation of the globular shape of the complexes as well as the apo protein ( Figure 5(b)). A comparison of residue-wise fluctuation from the RMSF plot shows more or less similar fluctuations of amino acids in apo Ct-Nsp4 and the two complexes ( Figure 5(c)). Throughout the 100 ns simulation, the drugs are found to be well anchored to the binding pocket. Eribulin, which is relatively larger in size, is mostly stabilized by hydrophobic interaction, whereas for Suvorexant, both hydrophobic and polar interactions are responsible for drug binding (Figure 5(d)). Quantitative estimation of the binding free energy between Ct-Nsp4 and drug molecules shows (Table 2) to have reasonable binding energy (-10.7 kcal/mol for Eribulin and -13.42 kcal/mol for Suvorexant). Detailed analyses of the different components of binding energy reveal that there are contributions of both hydrophobic and polar interactions in the Ct-Nsp4 and small molecule binding. The hydrophobic contribution is relatively higher for Eribulin (-81.92 kcal/mol) compared to Suvorexant (-67.55 kcal/mol) ( Table 2). The overall polar contribution toward binding is determined by the sum of the electrostatic interaction and polar solvation energy. The positive value of DE elec þ GB (Polar) indicates a decrease in overall polar interaction because of the binding process. However, the value is lower for Suvorexant (23.14 kcal/mol) than Eribulin (39.63 kcal/mol) which indicates favorable polar interaction between Ct-Nsp4 and Suvorexant (  Figure S2).

Discussion
COVID-19 caused by SARS-CoV-2 has raised global public health concerns. Person-to-person transmission and exposure to the infected environments lead the number of death toll rising rapidly. While all the efforts should be directed toward the prevention and containment of the pandemic, it is obvious that contingency measures with experimental therapeutics are urgently needed. In our present study, we employed the strategy of drug repurposing or drug reprofiling that promises to identify the potent inhibitors from FDAapproved drugs against this deadly virus in a time-critical fashion. We have chosen Ct-Nsp4 of SARS-CoV-2, important for viral replication complex assembly as well as specific virus-host protein-protein interactions as a target for screening FDA-approved drugs using molecular docking and molecular dynamics simulation studies. The multiple sequence alignment of the Ct-Nsp4 from seven known HCoVs showed high level of conservation (Figure 1) of this fragment that may play a similar and conserved mode of operation in viral replication. 3D model of the Ct-Nsp4 domain of SARS-CoV-2 found to be predominantly an alpha-helical domain which is in agreement with previous study (Manolaridis et al., 2009) and structural validation implies high accuracy and reliability in model prediction. In silico screening demonstrated a vast  category of drugs to target Ct-Nsp4 domain of SARS-CoV-2, where the majority belongs to anticancer, antihypertensive, and anti-HIV drugs. The top two drug molecules based on lowest binding energies were observed to form hydrogen bonds and hydrophobic interactions with the ligand-binding site residues of Ct-Nsp4 domain of SARS-CoV-2 (Table 1). 100 ns all-atom molecular dynamics simulation of the top two drug complexes showed deviations (RMSD) from initial structure within the considerable range of 0.2 nm, and the drugs are found to be well anchored to the binding pocket throughout the simulation period. GBMV binding free energy analysis showed reasonable binding free energies between Ct-Nsp4 and the drug molecules, where binding of Suvorexant is highly favored by hydrophobic interactions and both favorable polar and hydrophobic interactions were observed for Eribulin (Supplementary Figure S2, Table 2). A comparison of the binding free energy values suggests that Suvorexant has slightly better binding affinity than Eribulin. However, that needs further experimental studies to be confirmed. Eribulin mesylate is a microtubule inhibitor used for the treatment of metastatic breast cancer (Jain & Vahdat, 2011).
In in silico study, it was found to have binding affinity to main protease of SARS-CoV-2 and has the ability to interact with its key catalytic residues (Kalhotra et al., 2021). Molecular docking and supervised machine learning algorithms also found Eribulin as a potential candidate against the nucleocapsid protein of SARS-CoV-2 (Kadioglu et al., 2021). On other hand, Suvorexant functions as an orexin receptor antagonist, which is used to treat insomnia (Winrow et al., 2011). It was found to have great potential for binding at the receptor binding motif (RBM) of the spike glycoprotein (Ram ırez-Salinas et al., 2020) as well as to the main protease of SARS-CoV-2 in in silico study (Odhar et al., 2020). We also found these two drugs as a potential inhibitor of Nsp4 protein of SARS-CoV-2; therefore, using them against COVID-19 can be a multi-target approach which needs further study.

Conclusion
There is an immediate need for anti-COVID-19 drugs to address the global pandemic emergency. Several drugs are currently employed with firsthand clinical knowledge along with some contradictions though. Our study showed that promising drug candidates such as Eribulin and Suvorexant could bind to the conserved Ct-Nsp4 protein of SARS-CoV-2 and might inhibit its essential functions which hinder viral replication complex assembly as well as specific virus-host protein-protein interactions. Therefore, these FDA-approved drugs can be potent candidates to inhibit the spread of COVID-19. Despite having promising results from this study, validations such as in vitro and in vivo experiment should be explored further to accelerate the drug discovery and find out the best strategy to combat the present pandemic due to COVID-19.  The values are in kcal/mol.