Computational studies on the design of NCI natural products as inhibitors to SARS-CoV-2 main protease

Abstract The pandemic coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in more than 5 million deaths globally. Currently there are no effective drugs available to treat COVID-19. The viral protease replication can be blocked by the inhibition of main protease that is encoded in polyprotein 1a and is therefore a potential protein target for drug discovery. We have carried out virtual screening of NCI natural compounds followed by molecular docking in order to identify hit molecules as probable SARS-CoV-2 main protease inhibitors. The molecular dynamics (MD) simulations of apo form in complex with N3, α-ketoamide and NCI natural products was used to validate the screened compounds. The MD simulations trajectories were analyzed using normal mode analysis and principal component analysis revealing dynamical nature of the protein. These findings aid in understanding the binding of natural products and molecular mechanisms of SARS-CoV-2 main protease inhibition. Communicated by Ramaswamy H. Sarma


Introduction
The coronavirus disease  is an acute respiratory disease first reported in December 2019 in Wuhan, China that is caused by the infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Team, 2020). This disease was declared as a pandemic by the World Health Organization (WHO) on March 11, 2020 (https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-atthe-media-briefing-on-covid-19-11-march-2020). As of 6th February 2022, SARS-CoV-2 has cumulatively infected more than 394 million people and more than 5.7 million deaths worldwide. Coronavirus (CoV) is a positive sense single-strand RNA virus belonging to nidovirale, family coronaviridae, and subfamily orthocoronavirinae (Satija et al. 2007). The SARS-CoV-2 genome comprises nearly 30 K nucleotides translated into two overlapping polypeptides, polyprotein 1a (pp1a) and polyprotein 1ab (pp1ab). The pp1a harbors two proteases; papain-like protease and 3 C-like protease also called main protease that cleave the polypeptides into individual functional proteins. The viral protease replication can be blocked by the inhibition of these proteases. The specific recognition sequence motif for cleavage (#) by the main protease is "Leu-Gln#(Ser/ Ala/Gly)". Also, in earlier CoV infections the main protease has been proposed as a drug target as it cleaves the SARS-CoV polyprotein at eleven conserved sites (Ziebuhr et al., 2000) and is therefore responsible for the viral replication. Following the outbreak of COVID-19 pandemic several drug candidates from the repository of existing drugs have been tested for activity against SARS-CoV-2 (J acome et al., 2020;Touret et al., 2020). The Food and Drug Administration (FDA) has created a special emergency program, the Coronavirus Treatment Accelerated Program (CTAP) that has so far reviewed 470þ trials, eleven COVID-19 treatments authorized for emergency use and one FDA approved treatment from the 660þ drug development programs (https://www.fda.gov/drugs/coronavirus-covid-19-drugs/ coronavirus-treatment-acceleration-program-ctap). Several existing anti-viral drugs that target the viral replicating mechanism, remdesivir, hydroxychloroquine, chloroquine, lopinavir, darunavir, baloxavir, imatinib, and favipiravir were under investigation for the treatment of COVID-19 (Javorac et al., 2020). Computational studies on SARS-CoV-2 main protease inhibition by screening of plant-based natural compounds (Majumder & Mandal, 2020) was studied using molecular docking, molecular dynamics and Gibbs free energy calculations. The in-silico screening of phytochemical compounds of Tinospora cordifolia identified tinosponone as a potential inhibitor of SARS-CoV-2 main protease (Krupanidhi et al., 2021). The screening of existing drugs as potential main protease inhibitors from computational and experimental methods identified dipyridamole as a probable inhibitor , anti-viral compounds against SARS-CoV-2 main protease using computational studies are reported (Peele et al., 2020). The molecule N3 shows good inhibitory activity and covalent interactions with the main protease as an irreversible inhibitor and can specifically inhibit main protease from multiple coronaviruses, including SARS-CoV and MERS-CoV (Ren et al., 2013;Wang et al., 2016;Xue et al., 2008;Yang et al., 2005). N3 shows IC 50 value of 16.77 lM inhibitory activity on the SARS-CoV-2 main protease (Banerjee et al., 2020). The a-ketoamide inhibitor 13 b inhibits SARS-CoV-2 main protease with an EC 50 value of 4 to 5 lM in human Calu-3 cells infected with SARS-CoV-2 .
The crystal structure of SARS-CoV-2 main protease (PDB id: 6LU7) binding with the peptide inhibitor N3 (Jin et al., 2020), a-ketoamide inhibitor (6Y2G) , GC376, a broad-spectrum dipeptidyl inhibitor at the active site of main protease (7CB7)  were reported. So far studies on dengue viruses reported the NCI diversity database compounds to exhibit good inhibitory activity (Abduraman et al., 2018). Encouraged by these results, we screened the molecules from natural products sets of NCI database by carrying out docking-based virtual screening in the active site of main protease. Molecular docking and molecular dynamics simulations were carried out to study the binding interactions of the screened molecules followed by their binding free energy calculations to evaluate their binding affinity when complexed with SARS-CoV-2 main protease.

Protein preparation
The crystal structures of SARS-CoV-2 main protease complexed with inhibitor N3 (PDB id: 6LU7) (Jin et al., 2020) and a-ketoamide inhibitor (PDB id:6Y2G)  deposited in protein data bank (PDB) were used for this study. The heteroatoms and water molecules were deleted from the protein crystal structures, hydrogen atoms were added in order to prepare the protein for screening the molecules and the three-dimensional structure coordinates were saved in pdbqt format for virtual screening using PyRx server (Dallakyan & Olson, 2015).

Ligand preparation
The coordinates of N3 and a-ketoamide inhibitors were extracted from the crystal structures of SARS-CoV-2 main protease. Library of compounds from the National Cancer Institute (NCI) natural compounds set database (II, III, IV and V) (https:// wiki.nci.nih.gov/display/ncidtpdata/compound+sets) comprising 1046 molecules were downloaded in .sdf format. Hydrogen atoms were added at pH 7.0 and the coordinates of the compounds were saved in .pdbqt format.

Virtual screening and molecular docking
Docking-based virtual screening using PyRx software was performed as the initial step to identify potential main protease inhibitors. The 1046 natural compounds were screened by docking into the active site at the a-ketoamide inhibitor binding location in the PDB id: 6Y2G. The compounds were ranked according to their binding mode and scoring analysis. The best molecules obtained from virtual screening were docked using AutoDock Vina (Morris et al., 2009;Trott & Olson, 2010) that employs protein-ligand flexible docking using the Broyden-Fletcher-Goldfarb-Shanno method. The protein structure with all the compounds was loaded and ten conformations were generated for each ligand molecule by AutoDock Vina, the grid box was centered at 30.71, 50.48, 4.10 Å in x, y, z coordinates, respectively, with a grid spacing; 0.492 Å, box size of 25 Â 25 Â 25 points and exhaustiveness was set to 10. Initially, the molecules were loaded; torsions were set and saved in .pdbqt format. The screened-in molecules were docked within a 5 Å cavity defined around the a-ketoamide binding pocket in the SARS-CoV-2 main protease. The best conformer selected based on binding affinity and the number of hydrogen bonding interactions between the docked pose of natural product and protein were manually visualized. The virtual screening and molecular docking methods were validated by re-docking the crystal ligands N3 and a-ketoamide inhibitors into the receptor active site.

Validation of molecular docking
The top-ranked molecules from AutoDock Vina were further proceeded for another round of docking studies using CDOCKER (Gagnon et al., 2016) available in Discovery Studio 3.5 (DS 3.5). A sphere of 5 Å radius was generated around a-ketoamide inhibitor to define the active site of protein.
Ten docking poses were generated for each molecule in the protein active site. The binding conformations of the molecules in SARS-CoV-2 main protease were analyzed using "scoring ligand poses" implemented in receptor-ligand interactions protocol in DS 3.5. The scoring functions PLP1, PLP2 and PMF (Gehlhaar et al., 1995;Muegge, 2006;Muegge et al., 1999;Parrill & Reddy, 1999) were used to assess the docking poses. The selection of docking pose was based on top scores and inter-molecular interactions with SARS-CoV-2 main protease. The best hit-molecules chosen from both AutoDock Vina and CDOCKER docking methods were subjected to study their drug-like properties. The best docking pose of the hit-molecules obtained from CDOCKER methodology were proceeded for MD simulations.

Lipinski and ADME properties
The drug-like properties of the best docked compounds were studied by analyzing the pharmacokinetics profile using the SwissADME server (http://www.swissadme.ch/index.php). This is a software tool to calculate molecular properties such as absorption, distribution, metabolism and excretion (ADME), and physicochemical properties such as solubility, lipophilicity and pharmacokinetics. The Lipinski's rule of five (Lipinski, 2004;Lipinski et al., 1997Lipinski et al., , 2012 are an essential criterion to ensure a drug-like profile for orally administered drugs. The hit-molecules that qualify the ADME properties were studied by MD simulations studies in complex with SARS-CoV-2 main protease.

Molecular dynamics simulations
Molecular dynamics simulations of the apo and SARS-CoV-2 main protease in complex with hit molecules was carried out using GROMACS-5.1.4 for 150 ns. These studies reveal the stability of protein-ligand complexes during MD simulations. The Amber99sb force field (Hornak et al., 2006) was applied to the protein, force fields were assigned to the small molecules using ACPYPE script (Da Silva & Vranken, 2012) with AM1-BCC charges in Antechamber (Wang et al., 2006). The molecular systems were immersed in a cubic box, SPC waters were added to the system, Na 1 and Clions were added to neutralize (Berendsen et al., 1981) the systems and periodic boundary conditions were applied. Energy minimization was carried out with a tolerance of 1000 kJ/mol/nm. The systems were heated until 300 K for 100 ps; in the subsequent step, the system was equilibrated at 1 atm and 300 K for 1000 ps until it reaches proper density. The temperature was maintained using a V-rescale thermostat (Bussi et al., 2007) and Parrinello-Rahman method was used to control the pressure (Parrinello & Rahman, 1981). The long-range electrostatics were handled using the Particle Mesh Ewald (PME) method (Darden et al., 1993;Essmann et al., 1995). The equilibration of molecular systems was performed under NVT and NPT ensembles for 1000 ps. The Lennard-Jones (LJ) interactions and the real-space electrostatic interactions were truncated at 9 Å. Hydrogen bonds were constrained using the LINCS algorithm (Hess et al., 1997). The coordinates from production MD trajectories were generated and saved for every 2 ps. The final models in all the systems were obtained by averaging the snapshots from the trajectories generated by MD simulations after the structure stabilization was achieved. The root mean square deviation (RMSD) of the Ca atoms concerning their starting structures was calculated using gmx rms, and the root mean square fluctuations (RMSF) were calculated using gmx rmsf commands in GROMACS. The xmgrace software was used to plot the data, UCSF Chimera (Pettersen et al., 2004) was used for structure superposition and PyMOL was used for cartoon image generation. For the sake of comparison; apo, N3 and a-ketoamide bound SARS-CoV-2 main protease were also studied by MD simulations.

Binding free energies of protein-ligand complexes
The protein-ligand binding affinities describe the extent of inter-molecular recognition. The ligand binding free energies based on molecular mechanics-Poisson-Boltzmann surface area (MM-PBSA) were calculated (Kumari et al., 2014) using g_mmpbsa tools. Where, Equation (1) was used for calculating the binding free energy of protein-ligand complexes, E represents the energy minimized state of protein-ligand used in the calculation of Solv and SA that represent the solvation energy and surface area associated energy, respectively.
The ligand-binding free energies (DG LIE ) (Alml€ of et al., 2004;Brandsdal et al., 2003) were calculated using the commands gmx energy and gmx lie for the SARS-CoV-2 main protease -hit molecule complexes from the output trajectories of MD simulations. This was computed as the mean of vdW and coulomb (cou) interaction energy differences of the inhibitor with its neighboring atoms upon its incorporation. The individual ligand in the solvent is the unbound state denoted as subscript u and the inhibitor in the binding mode with SARS-CoV-2 main protease is the bound state denoted as subscript b.
The above equation was used for calculating linear interaction energy. The coefficient c is a constant that is associated with the alteration of the hydrophobic nature of the binding cleft, while the coefficients a and b correspond to parameters for nonpolar and polar interactions, respectively. The values taken for a, b and c were 0.181, 0.5 and 0, respectively.

Normal mode analysis and mechanical stiffness
Normal Mode Analysis (NMA) can provide a quick and systematic investigation of protein dynamics. We have developed elastic network model-based NMA using dihedral angels as independent variables for all molecular systems using the software suite of programs in Prodynamics (Atilgan et al., 2001;Uyar et al., 2011). Mechanical stiffness plots of all molecular systems in response to all possible pulling directions were constructed by using Anisotropic Network Model (ANM) using the software suite of programs in Prodynamics (Eyal et al., 2015).

Principal component analysis
Principal component analysis (PCA) was performed to study the overall motion of SARS-CoV-2 main protease in all the simulated systems using MODE-TASK (Ross et al., 2018). A 3 N Â 3 N covariance matrix was created using Cartesian coordinates, followed by the construction of eigenvectors by diagonalization of the covariance matrix. The PCA was calculated from 0 to 150 ns MD simulations trajectories.

Results and discussion
The crystal structure of SARS-CoV-2 main protease comprises three domains; domain I (1-99 amino acid residues), domain II (100-182 residues) and domain III (199-307 residues). The domains I and -II, each comprise six-stranded b-barrel with the substrate binding site at the intersection of the two domains. The binding site is made up of subsites S1, S2, S3, S4, and S1' as shown in supplementary material Figure S1 represented based on the location of the substrate (Jin et al., 2020). The domains I and II connected with hinge region residues (182)(183)(184)(185)(186)(187)(188)(189)(190)(191)(192)(193)(194)(195)(196)(197)(198), contribute to the S3 and S4 subsites formation. According to the crystal structure of the a-ketoamide bound protein, the lactam ring at P1 position of the inhibitor is in the S1 subsite formed by the side-chains of Phe140, Asn142, Glu166, His163 and His172. The lactam nitrogen at P1 position makes hydrogen bonding interactions with mainchain carbonyl oxygen of Phe140. The cyclopropyl methyl group at P2 position is inserted into the S2 subsite formed by the side-chains of His41, Met49, Tyr54, Met165 and Asp187. Carbonyl oxygen of inhibitor close to the lactam ring forms hydrogen bond with His41 side-chain. The amide nitrogen located between the lactam and cyclopropyl methyl group makes hydrogen bonding interactions with main-chain carbonyl oxygen of His164. The carbonyl oxygen on pyridone makes hydrogen bonding interactions with main-chain NH of Glu166. The OH functional group on imine carbon in the inhibitor makes hydrogen bonds with main-chain NH of Ser144 and Cys145 located in the S1' subsite. The N3 inhibitor formed a covalent bond with Cys145 and hydrogen bonding interactions with Phe140, Gly143, His164, Glu166 (S1 subsite), Gln189, Thr190 (S4 subsite) (Jin et al., 2020). Both crystal structures are highly superimposable with RMSD of 0.69 Å as shown in the supplementary material Figure S2.

Virtual screening and molecular docking
The N3 and a-ketoamide inhibitors binding site is considered as the active site of SARS-CoV-2 main protease. PyRx server based virtual screening of natural products (1046 molecules) into the active site of protein successfully screened 736 potential hit molecules. These screened-in molecules were docked into the SARS-CoV-2 main protease active site using AutoDock Vina. The N3 and a-ketoamide inhibitors docked into the protein active site with binding affinity -7.8 kcal/mol and -9.6 kcal/mol, respectively, and formed hydrogen bonding interactions similar to the crystal structure. Thirty natural product molecules were retrieved with AutoDock Vina score lower than -7.0 kcal/mol that also contribute hydrogen bonding interactions similar to the reference molecules. These 30 molecules were studied for another round of docking by CDOCKER using receptor ligand interaction protocols available in DS 3.5. The docking protocols validated by redocking the reference molecules N3 and a-ketoamide at the active site of SARS-CoV-2 main protease is shown in supplementary material Figure S3. Eight best compounds (Table  1) were selected based on inter-molecular hydrogen bonding interactions with SARS-CoV-2 main protease active site and the highest docking scores from AutoDock Vina and CDOCKER.

Drug-likeness properties
The eight molecules selected from both docking methods were assessed for Lipinski's rule of five and ADME properties. The results shown in Table 2 reveal that the selected molecules were within the acceptable range of synthetic accessibility (score less than 5.93), topological polar surface area (TPSA) was between 20 and 140 Å, lipophilicity; expressed as cLogP was less than 4.4, and water solubility expressed as Log S shows that most molecules are soluble or moderately soluble in water. The skin permeation possibility expressed as Log Kp was also reasonable, indicating the possibility of skin permeation. The ADME properties also lie within the range of acceptable values. Based on these results, all the eight hit molecules were selected for in silico validation using MD simulations.

Molecular dynamics simulations
Classical MD simulations of all the selected molecular systems; apo SARS-CoV-2 main protease, complexes with inhibitors N3, a-ketoamide, and the screened-in molecules was performed using GROMACS 5.1.4 for 150 ns. Out of eight screened-in hit molecules, four molecules (NSC36398, NSC281245, NSC44175, and NSC11926) showed stability at the active site of SARS-CoV-2 main protease as shown in Figure 1. The covalent bond between the N3 inhibitor and active site residue Cys145 was not observed during the MD simulations because the Amber99SB force filed cannot account for covalent bond formation. The inter-molecular hydrogen bonding interactions during the MD simulations is shown in supplementary material Figures S4A and S4B.
Studies on natural compounds-like bioactive molecules are reported as inhibitors of SARS-CoV-2 drug targets. For example, assafoetidnol A, conferol, farnesiferol B, sesamin, sesaminol, sesamolin show potential activity in targeting main protease, spike protein, and human ACE-2 receptors (Natesh et al., 2021). Plant based natural compounds such as apigenin, coriandrin, curcumin, glabridin and oleanolic acid (Sampangi-Ramaiah et al., 2020;Verma et al., 2020) have been reported as main protease inhibitors, and some of the spice molecules (piperine, capsaicin, gingerol and terpinen-4ol) (Rout et al., 2020) have been shown to bind spike protein and main protease. All these studies described that natural compounds show good inhibitory activity on SARS-CoV-2 proteins. The inhibition of these target proteins may lead to either attenuation of viral replication or reduce the infectivity of this virus. In this manuscript we report the screened molecules from natural compounds NCI database that show good binding affinity and non-bonding interactions with SARS-CoV-2 main protease. The results demonstrated that protein attains stability when it binds with screened-in hit molecules and maintain the hydrogen bonding interactions with important amino acids compared with reference molecules (N3 and a-ketoamide). The amino acid Cys145 shows covalent interactions with reference molecules, which also maintains distance with screened-in hit molecules throughout the MD simulations. Cys145 interaction is most significant in inhibition of the drug inside the active site of SARS-CoV-2 main protease. The RMSD plots in Figure 2(A) revealed that the structures attained stability within the first 10 ns of MD simulations. The main protease when complexed with NSC36398 showed greater RMSD ($0.3 nm) among all the systems studied. The N3, a-ketoamide, NSC44175, NSC281245 bound main protease displayed lower RMSD ($0.22 nm) indicting greater structural stability of these four complexes. The screened-in hit molecules NSC11926, NSC281245 and NSC44175 exhibit lower RMSD (lower than 0.1 nm) whereas the reference molecules N3 and a-ketoamide showed relatively higher RMSD as shown in Figure 2(B). The RMSF plots analyzed the residual fluctuations of protein during MD simulations. Higher fluctuations are observed in the regions; Asp153-Val157 and Asn221-Thr225 that are away from the active site and dimer interface of the SARS-CoV-2 main protease. The region, Leu50-Arg60 that contributes to the S2 subsite undergoes fluctuations up to 0.25 nm, the Table 1. Docking scores of N3, a-ketoamide and screened molecules along with the interacting residues in the SARS-CoV-2 main protease binding site. region Glu270-Gly283 located at the inter-subunit interface shows fluctuations between 0.2 to 0.3 nm in all the molecular systems studied. The residues located in the region Cys117-Pro122 show greater fluctuations in N3 and a-ketoamide complexed proteins, Ser139-Cys145 (S1' subsite) region also shows higher fluctuations in the N3 binding protein. These regions of fluctuations in the apo and complexed SARS-CoV-2 main protease is shown in the RMSF plots, Figure 2(C).
The radius of gyration provides information about the compactness of protein throughout the MD simulations. It was observed that the apo and NSC281245 complexed SARS-CoV-2 main protease have a relatively higher Rg among all the systems studied as shown in supplementary material Figure S4. The stability of reference and screened molecules in the protein active site was analysed by comparing the initial and average structures, different types of non-bonding interactions were measured within 5 Å around the ligand. The inter-molecular interactions in the structures between the protein -reference and screened-in molecules before and after MD simulations showed that the N3-inhibitor maintains interactions with Gly143, Ser144, Cys145, Glu166, Glu189, Gln192; and a-ketoamide also has interactions with Leu141, Ser144, Glu166, His164 and Gln189 throughout MD simulations. The protein complexed with NSC36398 made hydrogen bonding interactions with Leu141, Ser144, His163, Glu166, Arg188, and Gln189;NSC281245 with Ser46, Ser144, Cys145, and Glu166;NSC44175 with Gly143, Cys145, and Glu166 and NSC281245 with Ser144, Cys145, and Glu166. These hydrogen-bonding interactions shown in supplementary material Figure S5(A,B) are retained during the MD simulations.

Normal mode and mechanical stiffness analysis
The NMA is a fast and simple method to calculate protein flexibility (Alexandrov et al., 2005) involving atomic fluctuations. It reveals the structural variations and mobility in protein which are a collection of micro-ensemble states fluctuating about thermodynamically stable states. The RMSF plots revealed certain flexible regions in SARS-CoV-2 main protease during the MD simulations. To further confirm this observation, we performed NMA for all molecular systems. Ten normal modes were obtained for each system from MD trajectories, the first mode was selected and the structural variations were compared with apo structure of main protease. The regions that displayed higher RMSF (Leu50-Arg60, Asn221-Thr225, Glu240-Asp245 and Glu270-Gly283), also display normal modes with higher mobility in the presence of screened molecules as shown in supplementary material Figure S6. This study explains the conformational changes in apo and ligand bound complexes of main protease and indicate relatively higher flexibility in domain-III. The mechanical stiffness plots are useful to identify the anisotropic response of the protein structure to external perturbations, and the determination of weak and strong pairs of interactions that depend on the direction of the external force (Eyal & Bahar, 2008). Lower mechanical stiffness is indicative of the weak regions and higher  mechanical stiffness is indicative of strong regions. In all molecular systems, structural deformations were noted when compared with apo protein. From these plots, it is observed that the regions (Leu50-Arg60, Asn221-Thr225, Glu240-Asp245 and Glu270-Gly283) exhibit lower effective stiffness in all molecular systems. In the mean plots of mechanical stiffness, the effective spring constant value for fluctuating regions of residues was greater than 8 k (a.u) and was larger than 12 k (a.u) for the stable regions in proteins. These values indicated that the elastic nature of regions Leu50-Arg60, Asn221-Thr225, Glu240-Asp245 and Glu270-Gly283 is higher in the all-molecular systems throughout MD simulations as shown in supplementary material Figure S7(A-G). From the results of mechanical stiffness and NMA we propose that the regions of residues Leu50-Arg60, Asn221-Thr225, Glu240-Asp245 and Glu270-Gly283 in the protein exhibit mechanically weak behavior. These large deviations of conformational changes indicated the elastic nature of protein in all systems studied.

Principal component analysis
PCA deciphers the conformational changes in a protein as a function of time from the MD simulations trajectories. The PCA scatter plots of all molecular systems studied is shown in Figure 3(A-G). The conformational changes of the SARS-CoV-2 main protease in apo form, N3, a-ketoamide and natural products bound molecular systems were monitored. The Ca-atoms distribution is greater in NSC36398 bound molecular system which indicates that greater conformational changes of protein are observed. This demonstrated that the conformational distributions of main protease bound with NSC36398 was remarkably different from other molecular systems. The frequencies of PCA scatter plots were quantified and the highest-frequency is observed in NSC36398 bound main protease. These results indicated that SARS-CoV-2 main protease bound with NSC36398 displayed higher protein conformational changes compared to other molecular systems.

Binding free energy and residue contribution analysis
The binding free energies of the reference and screened natural products calculated using MM-PBSA (supplementary material Table 1 and Figure S8) and LIE methods are shown in Table 3. The contributions from van der Waals (vdW), electrostatic and polar solvation energies for MM-PBSA binding free energies show compatibility with each other and reference molecules already reported. The binding free energies for N3 and a-ketoamide inhibitors in complex with SARS-CoV-2 main protease were -150.06 kJ/mol and -90.11 kJ/mol, respectively. The binding energies observed from AutoDock Vina and the MM-PBSA scores observed in this work are in correspondence with previous reports (Keretsu et al., 2020). The binding free energies for the natural products selected were NSC281245 (-133.79 kJ/mol), NSC11926 (-93.22 kJ/mol), NSC44175 (-81.97 kJ/mol) and NSC36398 (-70.75 kJ/mol). In all the complexes, the binding site residues Leu27, His41, Gly143, Ser144, Cys145, His164, Met165 and Glu166 contribute to the highest binding energies in all the complexes studied. Pro168 contributes to binding energy in both the reference molecules as shown in supplementary material Figure S9 and Table 2. The LIE values show that the binding energies of reference and screened-in hit molecules with SARS-CoV-2 main protease N3 shows (-140.69 kJ/mol), a-ketoamide (-143.78 kJ/mol), NSC281245 (-117.63 kJ/mol), NSC36398 (-83.71 kJ/mol), NSC11926 (-78.65 kJ/mol), and NSC44175 (-73.82 kJ/mol) and these data are provided in Table 3.

Conclusions
Computer Aided Drug Design (CADD) methodologies can be used effectively to speed-up the process of developing therapeutic agents for the treatment of COVID-19 disease. In this manuscript, we used docking based virtual screening of NCI diversity set of natural compounds to identify the potential leads for SARS-CoV-2 main protease. Molecular docking and molecular dynamics simulations were carried out to study the binding interactions between protein and ligand molecules. All natural compounds display drug-likeness properties. Binding free energy calculations were performed to identify the potential lead molecules for SARS-CoV-2 main protease. These studies identified four compound that showed good binding affinity and stability in the protein active site throughout 150 ns MD simulations. The amino acid residues Cys145, Met165 and Glu166 have high contribution to the binding free energies of all the molecules studied. In all molecular systems studied, certain regions in SARS-CoV-2 main protease domain III showed greater flexibility and NSC36398 bound protein displayed higher protein conformational changes revealing the molecular mechanisms of protein-NCI natural products interactions. All natural compounds studied also displayed drug-likeness properties indicating their suitability as probable inhibitors for SARS-CoV-2 main protease.