Traditional herbal compounds as candidates to inhibit the SARS-CoV-2 main protease: an in silico study

Abstract COVID-19, a disease caused by the SARS-CoV-2 virus, is responsible for a pandemic since March 2020 and it has no cure. Therefore, herein, different theoretical methods were used to obtain potential candidates from herbal compounds to inhibit the SARS-CoV-2 main protease (Mpro). Initially, the 16 best-scored compounds were selected from a library containing 4066 ligands using virtual screening by molecular docking. Among them, six molecules (physalin B 5,6-epoxide (PHY), methyl amentoflavone (MAM), withaphysalin C (WPC), daphnoline or trilobamine (TRI), cepharanoline (CEP) and tetrandrine (TET)) were selected based on Lipinski’s rule and ADMET analysis as criteria. These compounds complexed with the Mpro were submitted to triplicate 100 ns molecular dynamics simulations. RMSD, RMSF, and radius of gyration results show that the overall protein structure is preserved along the simulation time. The average ΔGbinding values, calculated by the MM/PBSA method, were −41.7, −55.8, −45.2, −38.7, −49.3, and −57.9 kcal/mol for the PHY-Mpro, MAM-Mpro, WPC-Mpro, CEP-Mpro, TRI-Mpro, and TET-Mpro complexes, respectively. Pairwise decomposition analyses revealed that the binding pocket is formed by His41-Val42, Met165-Glu166-Leu167, Asp187, and Gln189. The PLS regression model generated by QSPR analysis indicated that non-polar and polar groups with the presence of hydrogen bond acceptors play an important role in the herbal compounds-Mpro interactions. Overall, we found six potential candidates to inhibit the SARS-CoV-2 Mpro and highlighted key residues from the binding pocket that can be used for future drug design. Communicated by Ramaswamy H. Sarma


Introduction
Since March 2020, the Word coexists with the COVID-19 (coronavirus disease 2019) pandemic as declared by the World Health Organization (WHO). The initial cases were observed in Wuhan (Hubei, China) in 2019, when it was reported an outbreak of many pneumonia cases . This disease is caused by the SARS-CoV-2 (severe acute respiratory syndrome 2) virus (Guarner, 2020) and it has rapidly spread worldwide. Initially, the use of masks, social isolation and social distancing were the main actions adopted to control the COVID-19. In extreme situations, more restrictive measures such as lockdown were applied in many countries around the world.
In parallel, hundreds of vaccine candidates have been developed to combat the virus by antibody generation in humans. However, it was only in December 2020 that the first COVID-19 vaccine was developed by Pfizer and BioNTech and approved by a regulatory agency (Medicines and Healthcare Products Regulatory Agency) in the United Kingdom (UK). To date, there are eight vaccines used with emergency authorizations (Sadarangani et al., 2021): Janssen, Pfizer/BioNTech, Astrazeneca-SK Bio, Serum Institute of India, Astra Zeneca EU, Moderna, Sinopharm, and Sinovac-CoronaVac. However, most of these vaccines require two doses to achieve optimal efficacy, which has turned the vaccination process slow. In addition, new variants with multiple mutations in the SARS-CoV-2 have emerged and studies showing the efficacy of these vaccines against these variants are still in the initial phase. Therefore, it is of great importance and urgency to develop drugs for treating COVID-19 by inhibiting some of its protein/enzyme. SARS-CoV-2 belongs to the SARS-CoV family, as reported by Xu et al. (Xu et al., 2020), and therefore it is composed by glycosylated spike protein (S-protein), membrane protein, envelope protein and nucleocapsid protein (Chen et al., 2006). The S-protein is responsible for the first contamination stage, binding to the human cellular receptor angiotensinconverting enzyme 2 (ACE2) (Ou et al., 2020;Shang et al., 2020). Therefore, the antibodies developed by the immune system after some vaccine application recognize and bind to the S-protein RBD (receptor-binding domain) region avoiding the formation of S-protein-ACE2 complex and posterior contamination. Although the S-protein is an interesting target, it has been observed the formation of 41 new SARS-CoV-2 Sproteins due to the mutation of 17 amino acids that are essential for the S-protein-ACE2 interaction . Among them, the Delta variant spreads faster and causes more infections than native virus. Therefore, it is expected that one specific drug can not inhibit all variants and S-protein mutations making the drug development a difficult task.
On the other hand, another interesting protein is the M pro (main protease), also called 3 C-like protease (3CL pro ). This enzyme plays an essential role in the process of viral maturation by processing the polyproteins that are translated in the viral RNA (Hilgenfeld, 2014, Zhang et al., 2020. Therefore, the M pro inhibition avoids viral replication. In addition, there is not any human protease homolog to SARS-CoV-2 M pro , indicating that an inhibitor must be non-toxic.
Contrary to the S-protein, to date, mutations in the M pro enzyme have not been observed. Regarding these statements, the M pro appears to be an ideal drug target against SARS-CoV-2. Therefore, the M pro is widely used as a target in the search for drugs . However, an efficient compound to inhibit this enzyme has not been found.
An interesting class of compounds used in drug design is based on natural products, i.e. traditional herbals. This class of compounds has the advantage of being safe (low toxicity) and bioavailable, which minimizes possible side effects. In addition, traditional and herbal medicines have been used for medical benefits throughout the centuries, and the isolation of their active principle can provide a lead compound. For instance, Jan et al. (Jan et al., 2021) have shown that extracts of some traditional Chinese herbal medicines present in vitro anti-SARS-CoV-2 activity when using the M pro as target. Moreover, in a recent review, the authors reported the importance of Chinese herbal medicine against SARS-CoV-2 infection (Wang & Yang, 2021). In another study, the Scutellaria baicalensis plant extract presented in vitro M pro activity of IC 50 equal to 8.52 lg/mL and its main active component (baicalein) has IC 50 of 0.39 lM (Liu et al., 2021). The Ginkgo biloba leaves extract present IC 50 of 6.68 lg/mL and their twenty isolated phytochemicals have IC50 < 10 lM (Xiong et al., 2021). Although these in vitro and in vivo studies are essential and effective to obtain a drug against a disease, they are time consuming and demand high cost. Therefore, in silico studies based on theoretical methods (virtual screening, docking molecular, QSAR, molecular dynamics simulation, and others) can help to optimize experimental research with a relatively lower cost (time and financial). Therefore, many in silico studies have been used to identify potential anti-SARS-CoV-2 M pro lead molecule from Moroccan (Aanouz et al., 2021), Chinese (UI Qamar et al., 2020), South African (Dwarka et al., 2020), Indian (Thakkar et al., 2021), Asian  medicinal plants. It is interesting to point out that natural products were also used as candidates to inhibit other SARS-CoV-2 proteins/enzymes from in silico studies. For instance, different plant bioactives were tested to inhibit the nonstructural protein 1 , 15 , and 16 , RNAdependent RNA polymerase , and Spike receptor binding domain .
Likewise, herein an in silico study using the SWEETLEAD library (Novick et al., 2013), that is composed of approved drugs and traditional medicinal herbs, will be used. The library has the advantage of being elaborated using the SMILES format, which permits to generate all isomers and to convert them for another format (i.e. pdb). Recently, we used this library to search for potential candidates to inhibit the SARS-CoV-2 spike protein (Oliveira et al., 2021). Also, Smith and Smith (2020) used it for molecular docking calculation in the SARS-CoV-2 spike protein and spike protein-human ACE2 interface.
Therefore, molecular docking, molecular dynamics simulations and MM/PBSA methods were used herein to find potential compounds candidates from herbal compounds to inhibit the SARS-CoV-2 M pro . Also, QSPR (quantitative analysis of structure-property relationships) was performed in order to: (i) verify whether this class of descriptors can be significantly correlated with the energy of ligand-receptor interaction; (ii) define the variations in the values of these descriptors that can favorably influence these interactions and, finally, (iii) give some clue about the structural modifications in the compounds that would increase the biological activity.

Molecular docking
The crystallographic structure of the SARS-CoV-2 main protease (M pro ) was retrieved from the Protein Data Bank with PDB code: 6W63 (Mesecarr, 2020), in which it is complexed with the X77 ligand (N-(4-tert-butylphenyl The protein M pro was prepared by removing the X77 ligand and crystallographic water molecules. Posteriorly, the AutoDockTools (ADT) (Morris et al., 2009) was used to generate the file in PDBQT format for docking calculations. The traditional herbals were obtained from SWEETLEAD library (Novick et al., 2013), which is composed by of herbals and approved drugs. Recently, this library was used by our group to identify potential candidates to inhibit the spike protein from the SARS-CoV-2 (Oliveira et al., 2021). Herein, 4,066 compounds from traditional herbals were taken into account and some of their isomers previously converted into PDBQT format by the Smith and Smith (2020). Also, the non-covalent inhibitor X77 was employed to validate the docking procedure. The virtual screening was performed using the AutoDock Vina program (Trott & Olson, 2010) for docking the compounds into the active site of the M pro , considering a grid size of 26 Â 26 Â 26 Å centered at 16.209 Â 24.741 Â34.118 Å in the active site of the M pro . In docking calculations, the ligands were defined as flexible structures, while the protein was treated as a rigid structure. A total of 20 binding models were generated for each ligand. The results and figures were analyzed and prepared by the Discovery Studio (DS) Visualizer (Discovery Studio Visualizer Software, 2020) and software PyMol (DeLano, 2009).

Molecular dynamics (MD) simulation
MD simulations were carried out for the six best-docked traditional herbals with lowest docking energy, which obey Lipinski's rule. The ATP version 3.0 (http://compbio.chemistry. uq.edu.au/atb/) (Stroet et al., 2018) was used to generate the structural and Lennard-Jones parameters for all ligands in the GROMOS force field formalism. All ligands were optimized using the DFT//B3LYP/6-31G Ã method as implemented in the ORCA package (Neese, 2010). The RESP (restrained electrostatic potential) charges were obtained from the wavefunction generated by these calculations using the MULTIWFN program (Lu & Chen, 2012). The structural and energetic parameters for the M pro were described by the GROMOS54a7 force field (Schmid et al., 2011). For water solvent, the single point charge (SPC) model (Berendsen et al., 1981) was considered. Each complex was merged into a cubic box with edge of 9.0 nm containing 22,537 water molecules. Four Na þ ions were added to keep the system electrostatically neutral. Periodic boundary conditions were considered with a cut-off of 1.0 nm for non-bonded interactions. The smooth Particle-Mesh Ewald (PME) method (Berendsen, 2007;Deserno & Holm, 1998) with real space interactions truncated at a cut-off radius of 1.0 nm was used to describe the long-range electrostatic forces. The NpT ensemble was considered with a temperature of 310 K and pressure of 1.0 bar. The temperature was controlled by stochastic velocity rescaling method (Bussi et al., 2007) with a time constant of 1 ps. For the pressure, the Berendsen barostat (Berendsen et al., 1984) was used with a time constant of 5 ps. In the first step, the energy of the systems was minimized until it reaches a convergence of 200 kJ.mol À1 .nm À1 using the steepest descent method (Morse & Feshbach, 1953) to prevent unfavorable contacts between the atoms. In a second step, 500 ps of MD simulation was used to equilibrate the solvent, while the M pro was kept partially rigid by imposing a positional restraint potential with force constant of 1000 kJ.mol À1 .nm À2 . In the last step, triplicate 100 ns MD simulations were carried out for each system. An integration step of 2 fs was considered by keeping the covalent bonds to hydrogen atoms constrained by the P-LINCS method (Hess, 2008). The leap-frog algorithm (Berendsen, 2007) was used to integrate the motion equations. All MD simulations were performed in the GROMACS package version 2020.3 (Berendsen et al., 1995;Hess et al., 2008).

MM/PBSA
The molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA) method was used to calculate the binding free energy (DG bind ) of the herbal-M pro complexes obtained along triplicate 100 ns MD simulation. For each system, 500 configurations were selected at intervals of 200 ps along the trajectory. In this approach, the DG bind is calculated with the following equation (Kumari et al., 2014), where G complex is the Gibbs free-energy of the herbal-M pro complex, while G Mpro and G herbal are the total G of M pro and the herbal in the presence of the solvent, respectively. The individual G for complex, M pro , and the herbal was calculated with equation: where hE MM i is the average of the potential energy obtained from the MD simulation in vacuum. The hG solv. i is the Gibbs free energy of solvation which is calculated using the equation, where G polar is calculated by solving the Poisson-Boltzmann equation (Baker et al., 2001;Honig & Nicholls, 1995;Srinivasan et al., 1998), while the G nonpolar term is obtained from the solvent accessible surface area (SASA). Herein, both terms were calculated by the APBS program (Baker et al., 2001). The entropy contributions (hSi) were not calculated here due to the high computational cost. In addition, similar entropy contributions are expected for the herbal-M pro complexes upon herbal binding to the same protein (M pro ). The g_mmpbsa tool (Kumari et al., 2014) implemented in the GROMACS package was used to obtain the DG bind and its components.

Compounds selection
Among the four thousand binding energy (Eb) values previously calculated for docking of ligands with M pro, which varied from À1.2 to À10.2 kcal/mol, only 165 compounds were chosen in such a way that the full range of binding energy values was uniformly covered. (Table S1, Supplementary material).

Partial least square regression
The physicochemical descriptors for the 165 compounds (Table S1) were organized in a matrix X, while the binding energy values (E b in kcal/mol) were arranged in a column vector y. After autoscaling (mean centering and scaling to unity variance) the data, the vector y was correlated with the descriptors from X matrix using the partial least square regression (PLS), through the QSAR modeling software (Martins & Ferreira, 2013). Descriptors presenting Pearson correlation with E b lower than 0.3 were eliminated (Ferreira, 2015) and then, the OPS algorithm (Te ofilo et Figure S1). The outliers 177, 401, 3033 and 3433 presented studentized residuals beyond the critical value of À2.0, while other seven compounds presented studentized residuals above the critical value of 2.0 (Ferreira, 2002). In all, eleven compounds were removed from the model. After removing outliers, the data set was randomly split into two subsets: training set containing 105 compounds and the test set with the 49 remaining compounds. The final regression model was assessed by analyzing the coefficient of multiple determination (R 2 ), standard error of calibration (SEC), crossvalidated correlation coefficient (Q 2 ) and standard error of cross validation (SEV) ( Table S2). The model was validated by external validation and by tests such as leave-N-out crossvalidation (LNO) and y-randomization, through the QSAR modeling software (Martins & Ferreira, 2013). Leave-N-out (LNO) cross-validation was used to test the robustness of a model. In this test, the training set is divided into consecutive blocks of N samples. Each block is excluded once and a new model is built without it. Then, the values of the dependent variable are predicted for the removed block and used to calculate Q 2 LNO . LNO was repeated 100 times and the average of Q 2 LNO with its standard deviation were calculated. The test was performed for N ¼ 2 to 10. The critical N is the maximum value of N for which Q 2 LNO is still stable and high. For a good model, the average of Q 2 LNO should stay close to Q 2 LNO , with small variations at all values for N. By our experience, two standard deviations should not be greater than 0.1 (Q 2 LOO ± 0.05)) for N ¼ 2, 3, etc.
The presence of chance correlation between the dependent variable and descriptors which are statistically well correlated to y, although in reality not related to the problem under investigation, was investigated by y-randomization (Kiraj & Ferreira, 2009). For this test, the matrix X is left untouched, and only the vector y is randomized, y rand . The models obtained under such conditions should be of poor quality and without real meaning. It is expected that the and Q 2 yrand ) should be significantly lower than those obtained for non-scrambled data (R 2 and Q 2 LOO ). In this work, 100 randomization runs were carried out. To judge whether the real model is characterized by chance correlation, the absolute value of the Pearson correlation coefficient, R(y, y rand ), between the original vector y and randomized vector y rand were calculated and two y randomization plots, R(y, y rand ) vs Q 2 and R(y, y rand ) vs R 2 , were drawn for all randomized and real models. Two linear equation lines of R(y, y rand ) vs Q 2 and R(y, y rand ) vs R 2 were obtained. It has been recommended that for a model free of chance correlation, the intercepts are a Q < 0.05 and a R < 0.3.
The external validation of the model was performed using the test set to assess the prediction ability of the model, through the standard error of prediction (SEP) and coefficient of multiple determination of prediction R 2 pred (Table S2).

Molecular docking
Docking calculations were carried out in the virtual screening process of 4,066 traditional herbals to identify potential candidates to inhibit the SARS-CoV-2 main protease. In molecular docking, the ability of the software and algorithm to reconstruct the ligand-receptor complex is a very important procedure, known as self-docking, applied in order to validate the docking method used. Therefore, the ligand X77 was removed and placed far from the M pro , and the docking calculation was carried out using the AutoDock Vina. The binding energy obtained was À8.4 kcal/mol. Figure 1 shows the M pro complexed with the ligand X77 obtained by self-docking procedure and its crystallographic structure.
In Figure 1, we can see clearly that the conformations of X77 docked with the protein M pro and in the crystallized complex are very similar. Moreover, the calculated RMSD by superposition of both structures is 1.0 Å and this small value shows that there is not significant structural difference between the self-docking and X77 in the crystallographic structure of the complex. Therefore, these results show that the AutoDock Vina is efficient and sufficient for the present study. Table 1 presents the 16 best-scored ligands, with binding energy values lower than À9.7 kcal/mol, and some properties obtained from the SwissADME (http://www.swissadme.ch) and DataWarrior (Sander et al., 2015).
In Table 1, it is interesting to observe that all compounds have lower binding energy than the docked X77 (-8.4 kcal/ mol) implying that the ligands tested in the present work are more efficient to inhibit the M pro than X77. Based on Lipinski's rule (MW < 500 Da, ClogP < 5, H-bond donor < 5, and H-bond acceptor < 10), six traditional herbals were selected as good candidates to inhibit the M pro : physalin B 5,6-epoxide (C 28 H 30 O 10 , PHY), methyl amentoflavone (C 31 H 20 O 10 , MAM), withaphysalin C (C 28 H 36 O 7 , WPC), daphnoline or trilobamine (C 35 H 36 N 2 O 6 , TRI), cepharanoline (C 36 H 36 N 2 O 6 , CEP) and tetrandrine (C 38 H 42 N 2 O 6 , TET). These compounds, except withaphysalin C, violate only one Lipinski's rule (molar weight). Moreover, they are not mutagenic and tumorigenic. For that reasons, more attention was paid to these compounds than to the others in this study. As it is known, the COVID-19 is characterized by the high intensity of the inflammatory process in the organism. Therefore, the first top-scored physalin B 5,6-epoxide appears to be a good candidate to be used in COVID-19 because physalins have anti-inflammatory activity (Pinto et al., 2010;Soares et al., 2003   amentoflavone favorably binds to the SARS-CoV-2 M pro . Likewise, the docking calculations performed in this study yielded binding energies of À9.6 kcal/mol for amentoflavone (not presented in Table 1) and of À10.2 kcal/mol for methyl amentoflavone (Table 1). Consequently, this derivative can be more efficient to M pro inhibition than amentoflavone, and these results indicate that this class of molecule is a good candidate for future in vitro studies and drug design against SARS-CoV-2 M pro . For instance, this molecule presents an IC 50 ¼ 8.3 lM against SARS-CoV as it was shown by fluorescence resonance energy transfer method (Ryu et al., 2010). For SARS-CoV-2, the IC 50 value is 8.65 ± 3.72 lM according to Xiong et al. (Xiong et al., 2021). In a recent study (Oliveira et al., 2021), it was shown through docking calculations that the withaphysalin C binds favorably to the SARS-CoV-2 S-protein with a binding energy of À8.2 kcal/mol. Hence, the results show that the withaphysalin C bind favorably in both S-protein and M pro , but with more intensity in the M pro (-10.1 kcal/mol). Trilobamine or daphnoline, cepharanoline, and tetrandrine belong to the bisbenzylisoquinoline (BBIQ) alkaloid class, presenting different biological activities (Weber & Opatz, 2019). In detail, trilobamine presents activity against trypanosomatid parasites (Fournet et al., 1998(Fournet et al., , 2000, cepharanoline has antiplasmodial activity (Chea et al., 2010) and tetrandrine was used as anti-inflammatory (Ferrante et al., 1990). In order to search for an approved drug similar to these three compounds, the SwissSimilarity tool was used (Zoete et al., 2016). According to this tool, the tubocurarine (C 37 H 42 N 2 O 6 ) has a score of $0.99 with the present BBIQ alkaloids. The tubocurarine is a FDA approved drug (1945) used as a muscle relaxant and, historically, it is known for its use as an arrow poison by indigenous tribes. As the tubocurarine molecule is not present in the library used in this study, docking calculations were performed and the binding energy obtained was À8.8 kcal/mol. Compared to the binding energy values presented in Table 1, the BBIQ compounds used in the present work have higher affinity for the M pro than the tubocurarine. For other compounds (physalin B 5,6epoxide, methyl amentoflavone, and withaphysalin C), it was not found any approved drug with score higher than 0.6 using the SwissSimilarity tool. As tubocurarine has the nitrogen atoms protonated, docking calculations were carried out for the trilobamine, cepharanoline, and tetrandrine in their protonated form. The binding energies and best-docked structures obtained for these compounds were similar to those for neutral forms. In the charged form, the binding energies were À9.8, À9.8 and À9.7 kcal/mol for trilobamine, cepharanoline, and tetrandrine, respectively. For best visualization of the interactions between the docked herbals and M pro , Figure 2 displays the 2 D maps of their interactions.
Although the DS Visualizer does not show any specific residues interacting with the withaphysalin C, it is surrounded by the following amino acids at a 3.0 Å cutoff: Thr25, His41, Cys44, Met49, His164, Glu166, Pro168, Arg188, and Gln189. Overall, in Figure 2, it is observed that the bestscored compounds interact with the catalytic residues His41 and/or Cys145. Therefore, these compounds can inhibit the M pro by blocking the substrate access to its active site. Moreover, it is possible to see the formation of hydrogen bond between all ligands and residues in the M pro active site region. Molecular docking is a powerful and useful technique to bind a ligand to a specific region of a protein. However, it does not take into consideration the protein flexibility, solvent, and temperature effects. Therefore, in the next section, we describe the results obtained from MD simulations for the six top scored herbals complexed with the M pro .

MD simulations
To overcome some limitations of the docking calculations, 50 ns of MD simulations were performed for six herbal-M pro complexes previously obtained in the molecular docking. Another advantage of MD technique is that the stability of the complex can be monitored along the simulation time, which is essential since the ligand should be bound to the protein for a long time in the inhibition process. Therefore, we carried out triplicate 100 ns MD simulations to equilibrate each system in aqueous media, and temperature and pressure effects also were taken into account. Figure 3 shows the active site region of the complexes obtained at 0 and 100 ns of MD simulation.
Overall, along the triplicate MD simulations, all ligands were kept in a similar position in the M pro pocket. Specifically, as can be seen in Figure 3, the herbal compounds are into the active site and nearby the His41 and Cys145 residues at the end (100 ns) of the MD simulation. Also, it is possible to see that the M pro has similar conformation at 0 and 100 ns of MD trajectory for all complexes and for protein in apo form. Only for the PHY-M pro complex, the Cys41 appears to be slightly away from the His41 and PHY compound. In MD simulation, we considered the TUB-M pro complex because TUB is an approved drug with higher similarity to the present BBIQ compounds (CEP, TRI, and TET).
To better quantify the structural change along the MD simulation, the root mean square deviation (RMSD), the root mean square fluctuation (RMSF), and radius of gyration (Rg) for the present complexes are presented in Figure 4.
In the Supplementary material, all properties presented in Figure 4 are plotted for the triplicate MD simulations ( Figures S2-S5), for each system. For PHY-M pro , we considered a duplicate MD simulation because the PHY was dislocated from the M pro active site in one trajectory, but it kept interacting with the protein. These small values show that the protein conformations sampled in MD are similar to its initial structure. Therefore, the presence of the herbal compounds do not significantly affect the M pro structure along the simulation time, since the average RMSD values are similar to the M pro in the apo form (0.33 nm). The higher RMSD values were observed for the MAM-M pro (0.40 nm) implying that the MAM molecule slightly destabilize the M pro conformation. In the RMSF results presented in Figure 4C, the three initial residues from C-and N-terminal (both formed by loops) were not shown due to their high flexibility. Small RMSF values ($0.15 nm) were observed for the His41 and the Cys145 catalytic residues, indicating low fluctuation in the MD simulation. Overall, RMSF values higher than 0.3 nm are attributed to the regions formed by flexible loops. Regarding the M pro structure compactness, we also calculated the radius of gyration ( Figure 4B). The average radius  Except for the TET-M pro complex, the higher average SASA for other complexes indicate that herbal compounds induce the protein expansion in agreement with their higher radius of gyration. In general, the RMSD, RMSF, Rg, and SASA analyses show that the present compounds do not significantly alter the overall structure and the active site of the M pro along the 100 ns of the MD simulation.

MM/PBSA
The MD simulations show that the 6 compounds remain within the M pro active site region along the 100 ns of the trajectory. Thus, to quantify the interaction between these ligands and M pro , the binding free energy and its components were calculated with the MM/PBSA method and the results are presented in Table 2.
In Table 2, negative values of the binding free energy (DG binding ) indicate favorable interaction between the ligand and the M pro . However, the lowest energy value obtained for the TUB-M pro complex can be attributed to some artifact generated during the MM/PBSA calculation, mainly in the description of electrostatic contributions and polar solvation energy. It should be noted that the TUB compound has a þ 2 charge. On the other hand, it is known that the MM/PBSA method tends to overestimate the value of the binding free energy. The same pattern was observed in MM/PBSA calculations considering the CEP, TRI, and TET with charge of þ2. The calculated binding energies were À234.6 ± 0.9, À227.7 ± 0.7, and À231.0 ± 0.7 kcal/mol for the CEP-M pro , TRI-M pro and TET-M pro , respectively (data not presented in Table   Figure 3. Snapshots obtained at 0 and 100 ns of the MD simulation for all complexes. The herbal compounds are in sticks. The blue and magenta ball representations are the C-alpha of His41 and Cys145, respectively. The snapshots were elaborated from the complex with lowest binding free energy as predicted in the next section.  2). In these simulations, we considered MD trajectories of 50 ns. These lower values must be analysed and interpreted carefully. Recently, Patil et al. (2021) used the MM/PBSA approach to calculate the binding free energy of the amentoflavone complexed with the SARS-CoV-2 M pro (PDB code: 6WNP) and they found a value of À32.1 ± 4.7 kcal/mol. Therefore, the lower docking energy (-10.2 kcal/mol) and the lower binding free energy (-55.8 ± 0.3 kcal/mol) obtained herein indicate that the methyl amentoflavone could be more efficient to inhibit the M pro than amentoflavone. Except for WPC, the E van der Waals term is the major contributor to the binding energy in the complexes. Therefore, in the drug design from the compounds present in Table 2, the van der Waals interaction must be taken into account. Although we do not performed MD simulation for the crystallographic ligand (X77) complexed with M pro ,Weng (Weng et al., 2021) obtained a binding free energy of À34.4 kcal/ mol from the MM/GBSA method (this approach is similar to the MM/PBSA). Thus, the present herbal molecules are more efficient to inhibit the M pro than the X77. Overall, all of the selected compounds are promising candidates to inhibit the SARS-CoV-2 M pro , mainly the methyl amentoflavone and tetrandrine.
In addition, pairwise decomposition analysis ( Figure 5) was performed to quantify the contribution of each residue to the total DG binding obtained from MM/PBSA calculations.
Overall, pairwise decomposition analysis ( Figure 5) revealed that three regions with key residues are responsible for the stabilization (lower DG binding ) of the present compounds. The first region is formed by residues His41 (catalytic residue) and Val42; the second by Met165-Glu166-Leu167; and the thirty by Asp187 and Gln189. For instance, the residue Glu166 contributes with more than À3.5 kcal/mol in thePHY-M pro , MAM-M pro , and WPC-M pro complexes; and residues Asp187 and Gln189 make a significant contribution of at least À1.5 kcal/mol to the binding free energy in the MAM-M pro and TUB-M pro complexes. These interactions play an important role for the stabilization of these compounds into the M pro active site, leading to the lower binding free energies that were obtained for these complexes (Table 2). For other complexes, Figure 5 also shows favorable interactions between these residues and ligands; however, with lower magnitude. Conversely, the residues Arg40 and Lys61 present unfavorable interactions with all compounds, as predicted by positive binding free energies. In the Supplementary material (Figures S6), the 2 D interaction diagrams obtained from the snapshot at 100 ns of MD trajectory reveal that the van der Waals and hydrogen bonds are responsible for the main interactions between the present herbal compounds and M pro in agreement with data presented in the Table 2. Although the catalytic residue Cys145 has a small contribution to the binding free energy, it favorably interacts with the PHY via van der Waals force and forms Pi-Alkyl contact with other ligands (Figures S6, A). Likewise, the pairwise decomposition analysis presented in Figure 5 shows that the MM/PBSA method overestimated the interactions between the tubocurarine and M pro residues. The fact that the three different classes of molecules (physalin, amentoflavone, and BBIQ) bind to the same region (His41-Val42, Met165-Glu166-Leu167, Asp187, and Gln189) of the M pro imply that the pocket formed by these residues can be used as target for future drug design for SARS-CoV-2 M pro .

QSPR analysis -partial least square regression
A pharmacologically active drug, besides being capable of interacting with the receptor, needs to be assessed regarding its toxicity and the degree of absorption, which depends on the solubility and permeability. It is known that some physicochemical descriptors strongly influence the pharmacokinetic parameters, such as absorption, distribution, metabolism, excretion, and toxicity (ADMET) (Duchowicz et al., 2007). Molecular weight (MW), partition coefficient (clogP), water solubility (clogS), number of H-bond acceptors (H-acceptors), number of H-bond donors (H-donors), total surface area (surf area), druglikeness (Drugl) and the efficiency metrics such as ligand efficiency (LE), lipophilic ligand efficiency (LLE) and lipophilicity corrected LE (LELP) are descriptors commonly used to describe the ADMET properties. In the present study, these descriptors were correlated to the binding energy (E b ) obtained from docking of the ligands into the receptor, through partial least square regression (PLS). The best regression model for the training set was built with 3 factors describing 70.42% of the total variance, and the selected descriptors, MW, clogS, H-accept, surf area, druglikeness, LE and LLE. The regression model (Figure 6) presented the following statistics, R 2 ¼ 0.94; SEC ¼ 0.59; Q 2 ¼ 0.92; SEV ¼ 0.65, which satisfies the minimal requirements for QSAR studies (R 2 > 0.6 and Q 2 LOO > 0.5) (Kiraj & Ferreira, 2009;OECD, 2007).
The validation tests performed in the training set are shown in Figure 7. The y-randomization graphs obtained with 100 randomizations (Figure 7A-C) show that the model is free of chance correlation. Q 2 and R 2 for randomized y ( Figure 7A) were below 0.10 and 0.2, respectively, confirming that the randomized models were of poor quality, as expected. In addition, the intercepts for R 2 vs. R(y rand , y) and Q 2 vs. R(y rand ,y) y-randomization plots in Figures 7B and 7C, must be lower than 0.3 and 0.05, respectively (Eriksson et al., 2003). The values found for this model were À0.003 for R 2 vs. R(y rand , y) and À0.15 for Q 2 vs. R(y rand ,y).
LNO cross-validation ( Figure 7D) was repeated a hundred times leaving one to ten compounds out from the training set once at a time. The average Q 2 LNO values are close to the value of Q 2 LOO, and the standard deviation for each N values is small, indicating that the QSPR model is robust.
For external validation ( Figure 6) the following results were obtained: R 2 pred ¼ 0.88 and standard error of prediction, SEP ¼ 0.61. The mean percentage error for the test set was 9.23. The residuals and calculated percentage error for the training and test sets are displayed in the Table S3. The values indicated that the PLS model can be used for an approximate prediction of new compounds.
From the autoscaled regression coefficients signals, in equation 4, the binding energy is negatively correlated to MW, H-Acceptors, surface area and druglikeness, such that E b decreases (becomes more negative) when these descriptors increase. This means that the interaction energy is favored when the values of molecular mass, number of hydrogen acceptors, surface area and druglikeness increase, as expected. Observing the ligand interaction diagram for the six compounds selected as good candidates to inhibit the M pro ( Figure  2), one can see the presence of hydrogen bonds, where the ligands participate as H-bond acceptors and donors. On the other hand, E b increases (become less negative) with the increasing of cLogS, LE and LLE descriptors. Water solubility influences the absorption of a compound, so that, although solubility is a necessary feature of drugs, excessively polar compounds could be difficult to pass through the biological membranes (Duchowicz et al., 2007). From the PLS model, E b values increase with cLogS, indicating that highly water-soluble compounds are unfavorable to interactions with the receptor and that the presence of polar groups must be balanced with non-polar regions. This result suggests that the interactions with the receptor must occur through both polar and non-polar groups, besides corroborates the importance of the balance between water solubility and its ability to diffuse biological barriers.
The signals of the coefficients for the correlation between E b and the descriptors were also investigated, in order to evaluate the consistency of the model (Kiraj & Ferreira, 2010). From the coincidence of the signals for the regression coefficients in the model (eq. 3) and the signals of correlation coefficients between each respective descriptor and E b (-0.89 MW; þ0.78 cLogS À0.71 H-Acc; À0.90 surf; À0.45 druglikeness; þ0.74 LE; þ0.43 LLE), this model proved to be self-consistent.

Conclusions
SARS-CoV-2 is a new virus discovered at the end of December 2019, which is responsible by COVID-19 disease. Currently, there is no approved drug for this disease. In this way, molecular docking, molecular dynamics simulations, MM/PBSA, and QSPR methods were used herein to search for potential herbal compounds which are good candidates Figure 7. Plots of y-randomization test for 100 repetitions, where R(y rand ,y) means R (Eb randomized , E b ). (A) Q 2 vs. R 2 , where Q 2 is the coefficient of determination for LOO cross-validation. (B) R 2 vs. R (y rand ,y); (C) Q 2 vs. R (y rand ,y); (D) LNO cross-validation (N ¼ 1 to 10) for 100 repetitions.
to inhibit the SARS-CoV M pro . In our virtual screening using molecular docking, a set of 4066 compounds was used. In this calculation, 16 compounds were selected based on their lower docking energy values, which were lower than À9.7 kcal/mol. These compounds were filtered to six (physalin B 5,6-epoxide (PHY), methyl amentoflavone (MAM), withaphysalin C (WPC), daphnoline or trilobamine (TRI), cepharanoline (CEP), and tetrandrine (TET)) using the Lipinski's rule and ADMET analysis. Triplicate 100 ns MD simulations were performed for these selected compounds complexed with the M pro and for the protein in the apo form in aqueous solution. The structural analysis based on the RMSD, RMSF and Rg show that the binding of these compounds into M pro active site do not alter the overall structure of the protein along the triplicate 100 ns of the MD trajectories. The average binding free energies, calculated by the MM/PBSA method, indicated favorable interactions between the compounds and the M pro with the following values: À41.7, À55.8, À45.2, À38.7, À49.3, and À57.9 kcal/mol for the PHY-M pro , MAM-M pro , WPC-M pro , CEP-M pro , TRI-M pro , and TET-M pro complexes, respectively. In our study, the tubocurarine also was considered and the binding free energy was À227.9 kcal/mol. Likewise, similar values were obtained for the CEP, TRI, and TET with charge of þ2. Pairwise decomposition analysis and 2 D interaction diagram revealed that the residues His41-Val42, Met165-Glu166-Leu167, Asp187, and Gln189 are the ones that most contribute to the binding free energy, implying that the pocket formed by these residues can be used as a target for future drug design for SARS-CoV-2 M pro . The QSPR analysis, performed through the relationships between the ligand-receptor binding energies and the descriptors correlated to the pharmacokinetics of the drugs, proved useful. The PLS regression model was successfully validated and indicated that the interactions with the receptor can be favored by bulk compounds containing both, non-polar and polar groups with the presence of hydrogen bond acceptors. These finds are in accordance with the docking studies, besides corroborate the importance of the equilibrium between the aqueous solubility of the drug and its ability to diffuse through the biological membranes. Finally, we highlight six potential candidates to be used against COVID-19 by inhibiting the SARS-CoV-2 M pro .