Accelerating the discovery of the beyond rule of five compounds that have high affinities toward SARS-CoV-2 spike RBD

Abstract The battle against SARS-CoV-2 coronavirus is the focal point for the global pandemic that has affected millions of lives worldwide. The need for effective and selective therapeutics for the treatment of the disease caused by SARS-CoV-2 is critical. Herein, we performed a hierarchical computational approach incorporating molecular docking studies, molecular dynamics simulations, absolute binding energy calculations, and steered molecular dynamics simulations for the discovery of potential compounds with high affinity towards SARS-CoV-2 spike RBD. By leveraging ZINC15 database, a total of 1282 in-clinical and FDA approved drugs were filtered out from nearly 0.5 million protomers of relatively large compounds (MW > 500, and LogP ≤ 5). Our results depict plausible mechanistic aspects related to the blockage of SARS-CoV-2 spike RBD by the top hits discovered. We found that the most promising candidates, namely, ZINC95628821, ZINC95617623, ZINC3979524, and ZINC261494658, strongly bind to the spike RBD and interfere with the human ACE2 receptor. These findings accelerate the rational design of selective inhibitors targeting the spike RBD protein of SARS-CoV-2. Communicated by Ramaswamy H. Sarma


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that emerged in late 2019 has caused serious illness and death all over the world. COVID-19, a disease caused by SARS-CoV-2, led the world health organization (WHO) to announce a global pandemic on March 11, 2020. To date, about 380 million infections and more than five million deaths have been reported by the WHO, and the numbers continue to rise (WHO, 2021). Among several vaccine candidates, three vaccines made by Pfizer-BioNTech, Moderna, and Johnson & Johnson companies were approved by the U.S. food and drug administration (FDA) for an emergency use against veklury (remdesivir), an antiviral drug, has also been approved by the U.S. FDA for the use as a treatment for COVID-19 in adults and pediatric patients (12 years of age and older) requiring hospitalization. Although studies reported that remdesivir failed to show clinical benefits for moderately severe COVID-19 patients Young et al., 2021), it is the only drug that has been approved so far for COVID-19 by the U.S. FDA. Therefore, accelerating the discovery of a safe and an effective COVID-19 medication is a must to control the pandemic.
The coronavirus genome encodes distinct structural and nonstructural proteins. The structural proteins (i.e. the membrane, the envelop, the spike, and the nucleocapsid) are responsible for key functions such as host infection , membrane fusion , self-assembly , release of virus-like particles (Siu et al., 2008), and other functions (Mittal et al., 2020). The nonstructural proteins facilitate viral replication-transcription processes (Haque et al., 2020). The extensive amount of ongoing research is mainly focused on both nonstructural and structural proteins of SARS-CoV-2 as drug targets in order to develop effective therapeutics for COVID-19 (Abu-Saleh et al., 2020;Awoonor-Williams & Abu-Saleh, 2021;Izda et al., 2021;Li et al., 2020;Liu et al., 2021;Mulholland & Amaro, 2020;Zhang et al., 2020). Among the structural proteins, the spike protein is the key machinery that empowers virus entry into the host cell (Wrapp et al., 2020). Structural characterization of the spike protein would give atomic-level information to guide structure-based drug design. Recently, crystallographic and cryo-EM methods have been utilized to determine various structures and conformational states of the SARS-CoV-2 spike proteins Wrapp et al., 2020). Amaro and coworkers have carried out massive molecular dynamics (MD) simulations to promote and extend the available structural data (Casalino et al., 2020). Moreover, Choi et al performed MD simulations of the spike protein in a viral bilayer (Choi et al., 2021). These computational results added a detailed insight on the structural and dynamics of the full-length glycosylated SARS-CoV-2 spike protein at an atomic level (Casalino et al., 2020).
Detailed mechanisms of the cell entry of SARS-CoV-2 have been recently investigated Tang et al., 2021;. The spike protein has two subunits S1 and S2, which are critical for receptor recognition and membrane fusion, respectively. The S1 contains the receptor-binding domain (RBD) of the viral spike that specifically binds to the human angiotensin-converting enzyme 2 (hACE2). The RBD can be in either an up conformation, which is the receptor-accessible state (i.e. enables binding to the host receptor), or a down conformation, which is the receptor-inaccessible state Wrapp et al., 2020). Gur et al investigated conformational dynamics and the transition pathway between down to up states of the spike protein by using all-atom MD simulations (Gur et al., 2020). For membrane fusion, host cell proteases cleave SARS-CoV-2 spikes at the S1/S2 boundary followed by structural changes of the S2 that promote a host cell entry (Borkotoky et al., 2021;Letko et al., 2020;. Interestingly, experimental results have revealed that SARS-CoV-2 RBD has stronger hACE2 binding affinity than that possessed by the previous SARS-CoV RBD. However, the entire SARS-CoV-2 spike has hACE2 binding affinity similar to or lower than SARS-CoV-2 spike (Ou, 2020;Walls et al., 2020). Moreover, a computational study of the RBD-accessible and -inaccessible conformations and their binding strength to the hACE2 is in agreement with the experimental findings . Therefore, the RBD is considered as a key target for designing and developing compounds that suppress the virus entry into the human cell.
As far as we are aware, no in silico screening has been conducted in relatively large molecules in the 'beyond rule of 5' chemical space that have high RBD binding affinity. Herein, we performed a comprehensive computational protocol, which incorporates molecular docking, MD simulations, absolute binding energy calculations, and steered MD simulations for the discovery of relatively large compounds that would bind to SARS-CoV-2 spike RBD very tightly, therefore, blocking viral attachment to the host cell.

Results and discussion
We performed an integrated computational strategy including molecular docking, MD simulations, absolute binding energy calculations, and SMD simulations to accelerate the identification of promising drug candidates for COVID-19. 1283 compounds (in-clinical trials and FDA approved drugs) were retrieved from the ZINC15 database (Sterling & Irwin, 2015) and advanced to flexible molecular docking using both AutoDock Vina (Trott & Olson, 2010) and Kdeep (Jim enez et al., 2018) scoring functions. Consensus docking scores were chosen as an initial measure of the RBD-ligand binding affinities to rank the poses of the screened compounds. Docking results of the top 15 compounds are listed in Table 1. The two-dimensional structures and the threedimensional docked conformations of these compounds are also provided in the SI (Figures S1 and S2, respectively). Docking results of all screened compounds are provided in the SI.
The best 15 RBD-ligand complexes from molecular docking were used as starting structures for the conventional MD simulations. Ensemble MD simulations consolidate the effect of solvation, flexibility of both ligand and protein target, and better estimate of protein-ligand nonbonded interactions. The structure stability of the RBD-ligand model systems were evaluated by the RMSD with respect to the equilibrated structures (see Figure S3 in the SI). The time-evolution of the RMSD show that most ligands with invalid binding modes reveal significant signs of binding instability in the RBD binding pocket. Snapshots of the last frame of the stable simulated RBD-ligand model systems are depicted in Figure 1.
The use of absolute binding energy calculations is crucial to enhance the hit identification. Consequently, rigorous absolute binding energy calculations were only conducted on the stable RBD-ligand systems resulting form the MD simulations. In absolute binding energy calculations, the resulting contributions from the geometric restraints (see Methods' section) are computed sequentially using the extended adaptive biasing force (eABF) method (Fu et al., 2016) with the unbiased corrected z-averaged restraint (CZAR) estimator (Lesage et al., 2017)   binding energy (DG bind) of ligands to the SARS-CoV-2 spike RBD protein are provided in Table 2.
Unambiguously, the four ligands, namely, ZINC95628821, ZINC95617623, ZINC3979524, and ZINC261494658 bind spontaneously to the SARS-CoV-2 spike RBD with DG bind of À23, À19, À12, and À8, respectively. In contrast, the two ligands, namely, ZINC150339052 and ZINC3978005 have non-spontaneous binding behavior toward the RBD with G bind ¼ 3 and 4 kJ mol À1 , respectively. These values represent the noncovalent interaction energies between the ligands and the spike RBD and give a good estimate of their binding affinities.
To validate the potency of the RBD-ligand binding affinities, we need to investigate if the top hits are able to interfere with the RBD-hACE2 binding event. For this purpose, SMD simulations were employed for distinguishing active from decoy ligands. Both strong and weak ligands listed in Table 2 were considered for SMD simulations as a validation strategy. In SMD simulations, the center of mass of the RBD was kept fixed (i.e. PHE400 of the RBD is an anchoring residue), whereas the center of mass of the hACE2 was steered (i.e. TYR515 of the hACE2 is a pulling residue). Figure 2 shows results of 20 ns SMD simulations for the RBD-hACE2 in the presence and absence of ligands.
The initial and the equilibrated conformations of RBD-ligand-hACE2 model systems are also provided in the SI, Figure S5. The time-evolution force profile of the unbinding event of the RBD-ACE2 model system has a maximum rupture force of 704 pN, in a good agreement with the computed maximum rupture force of 751 pN reported recently (Nguyen et al., 2020). Taka et al also reported the unbinding process of SARS-CoV-2 spike RBD from the hACE2 by carrying out SMD simulations (Taka et al., 2021). It is well established that the rupture force is directly proportional with the receptor-ligand binding affinity (Vuong et al., 2015). The moving average (black line) of the time-evolution force profiles in   Figure 2b show that ligands, namely, ZINC95628821, ZINC261494658, and ZINC95617623 are strong binders to the RBD protein. This is anticipated due to the fact that these ligands have relatively better binding affinity to the RBD than other ligands, namely,  ZINC150339052, and ZINC3978005. On the contrary, during the SMD simulations, ZINC3979524 and ZINC3978005 left the RBD and attached to the hACE2 receptor, revealing that these ligands have more favorable interactions to the hACE2 than the RBD. SMD simulations also revealed that ZINC150339052 has non promising binding affinity toward neither the RBD nor to the hACE2 proteins (see Figure 2b).
Detailed analyses of the ligand interaction diagram of the equilibrated RBD-ligand-hACE2 complexes are shown in Figure 3.
Ligands that exhibit robust RBD binding stability during the SMD simulations, namely, ZINC95628821, ZINC261494658, and ZINC95617623, indeed form more noncovalent interactions (i.e. hydrogen bonding, pi-cation, pi-pi stacking) with key amino acid residues of the RBD and enjoy more exposure to the RBD binding pose than other ligands. Notably, these robust binders have direct polar contact with ARG403 of the RBD. Moreover, these ligands also interfere with key residues of the hACE2 protein (i.e. ASP38, TYR41, and LYS353). Interestingly, ARG403 of the RBD and residues ASP38, TYR41, and LYS353 of the hACE2 were reported as true hot spots that contribute to the stability of the RBD-hACE2 interface (Laurini et al., 2020). Many studies reported that hydrophobic interactions also play a critical role in anchoring the RBD to the hACE2 receptor (Taka et al., 2021;. All ligands that possess spontaneous standard binding energies have at least four direct contacts with the RBD hydrophobic residues (see Figure 3). The polar and non-polar interactions of the RBD-ligand complexes contribute to their total standard binding energy. Based on rigorous binding energy calculations and SMD simulations, we suggest that the top drug candidates are ZINC95628821 followed by ZINC95617623, ZINC3979524, and ZINC261494658. These top hits certify critical interactions associated with the RBD-ligand-hACE2 binding, which in turn disrupt the RBD-hACE2 binding event.

Conclusions
In this work, we performed hierarchical virtual screening that comprise of molecular docking, MD simulations, non-covalent absolute binding energy calculations, and SMD simulations for the discovery of SARS-CoV-2 spike RBD-based inhibitors. Vina (Trott & Olson, 2010) and Kdeep (Jim enez et al., 2018) scoring functions were leveraged for structure-based screening and ranking of in-clinical trials and FDA approved drugs. Absolute binding energy calculations in combinations with the SMD simulations provided more effective ranking procedures of the top hits. Based on integrated in silico approach, our findings suggest that ZINC95628821, ZINC95617623, ZINC3979524, and ZINC261494658 have potential antiviral effects by blocking the RBD-hACE2 recognition. Clinical trials have shown that ZINC3979524 is a novel inhibitor of a respiratory syncytial virus with the EC50 of 1.4 nM, and thus, this compound along with the other drug candidates suggested herein need to be considered for further in vitro and in vivo investigation. This study provides a rational drug design of RBD-based inhibitors for accelerating the research and development of effective COVID-19 medications.

Model system preparation
The initial structure of SARS-CoV-2 RBD bound to the hACE2 was retrieved from the RCSB protein data bank [PDB entry 6LZG] . The protonation states of the titratable residues were determined using the Hþþ Web server under the physiological pH (Anandakrishnan et al., 2012). Moreover, salinity, internal and external dielectric constants were set to be 0.15, 10.0, and 80.0 respectively.

Molecular docking
3-D chemical structures (MW > 500, and LogP 5) were retrieved from ZINC15 database (Sterling & Irwin, 2015), resulting in about 0.5 million protomers. Among these protomers, only in-clinical trials and FDA approved drugs (a total of 1283 protomers) were selected for docking. OpenBabel software (O'Boyle, 2011) was used to minimize selected compounds. These compounds were docked into a preprocessed SARS-CoV-2 RBD protein pocket by utilizing the AutoDock Vina (Trott & Olson, 2010). For improving the reliability of docking, these compounds were also docked using the BindScope web application (Skalic et al., 2019). The xyz grid cell origin of À37.0 Å, 30.0 Å, and 3.5 Å with dimensions of 26.0, 45.5, and 22.0 in the x, y, and z directions, respectively, was built for docking calculations. The grid cell was constructed at key residues located within receptor binding motif of the RBD that are directly involved in the interaction with the hACE2. The spike RBD-ligand binding affinities were ranked based on a consensus docking results from two scoring functions Vina (Trott & Olson, 2010) and 3D-convolutional neural networks method (Kdeep) (Jim enez et al., 2018). Consensus docking scores were chosen as the measure of protein-ligand binding affinities to rank the poses of the studied compounds. For the next virtual screening, we selected the top 13 compounds that have a consensus docking score of À38 kJ mol À1 (À9 kcal mol (WHO, 2021)). Moreover, we selected one compound from each scoring functions Vina (Trott & Olson, 2010) and Kdeep algorithm (Jim enez et al., 2018) that has the highest protein-ligand binding affinity. All molecular docking results are provided in the supporting information (SI).

MD simulations
All-atom MD simulations were done using the NAMD 2.13 package (Phillips et al., 2005). We performed MD simulations on the model systems with top docking results (total of 15 RBD-ligand complexes). Model systems of the RBD-ligand complexes were prepared in an explicit solvent using the TIP3P water model (Jorgensen et al., 1983) in a simulation cell with dimensions of 86 Â 71 Â 96 Å 3 . All crystallographically resolved water molecules were removed from model systems. The parameters for the ligands and the spike RBD protein model structures were all set using the CHARMM general force field (CGenFF) (Yu et al., 2012), and the CHARMM36 force field (Best et al., 2012), respectively. Model systems were neutralized using chloride ions. The MD protocols involve minimization, annealing, equilibration, and production. The protein backbone atoms were restrained in the minimization and annealing simulations. The Ca atoms of the protein backbone were restrained in the 1 ns equilibration simulations. For the 100 ns MD production simulations, no atoms were restrained. The isothermal-isobaric (NPT) ensemble and a 2 fs time-step was chosen for all MD simulations. The pressure was set at 1 atm using the Nos e-Hoover Langevin piston barostat (Nos e, 1984;Nos e & Klein, 1983) with the Langevin piston decay of 0.2 ps and a period of ps. The temperature was set at 298.15 K using the Langevin thermostat (Grest & Kremer, 1986) and a damping frequency of 1 ps À1 . A cutoff distance of 10.0 Å was applied for Lennard-Jones interactions with a pair list distance of 12 Å, interactions were smoothly truncated at 8.0 Å. The particlemesh Ewald (PME) method was utilized to treat the longrange electrostatic interactions (Essmann et al., 1995), where a grid spacing of 1.0 Å was used for all simulation cells. Covalent bonds involving hydrogen atoms were constrained using the SHAKE algorithm (Ryckaert et al., 1977). See the SI for more information about the MD simulations protocol.

Absolute binding energy calculations
One of the main goals in computer-aided drug discovery is the accurate and reliable prediction of the absolute binding energy of receptor-ligand systems. Here, we calculated the absolute binding energies of the stable RBD-ligand complexes that resulted from the MD simulations. Starting from the equilibrated RBD-ligand structures, we conducted the absolute binding energy calculations using the potential of mean force (PMF) approach as described by Chipot and coworkers (Fu et al., 2017(Fu et al., , 2018. It has been reported that the PMF approach accurately predicts protein-ligand binding affinities (Fu et al., 2017;Gumbart et al., 2013). A set of geometrical restraints on collective variables (Fiorin et al., 2013) are exploited to accurately determine the translational, rotational, and conformational entropies that accompany the binding process and, thus, computing protein-ligand absolute binding energies. These restraints include the root mean-square deviation (RMSD) of the ligand in the bound (site) and unbound (bulk) states compared to an equilibrated reference structure of the RBD-ligand complex. Additional restraints include three Euler (H, U, W) and two spherical (h, u) angles that describe the relative orientation and position of the ligand with respect to the RBD protein, respectively, see Figure S4 in the SI. The equilibrium binding constant (K eq ) can be determined as, where the DG site c and DG bulk c denote the Gibbs energy changes of the conformational ligand RMSD in the site and bulk states, respectively, the DG site o and DG site a represent the Gibbs energy changes associated with the orientational and the positional of the ligand in the binding site, respectively, and the DG bulk o represents the Gibbs energy change associated with the ligand orientation in the bulk. The last term in Equation (1) represents the separation of the ligand from the binding site into the bulk. It should be mentioned that the DG bulk o is calculated analytically as reported by Chipot and coworkers (Fu et al., 2017). More details on theoretical background of the binding energy calculations using the PMF route can be obtained elsewhere (Fu et al., 2017). Eventually, the standard binding energy is where the k B and T are the Boltzmann constant and simulation temperature, respectively, and C ¼ 1/1661 Å 3 is the standard one molar concentration. For each model system, the RBD-ligand separation, RMSD (site), RMSD (bulk), and angular course variables were run for 50 ns, 50 ns, 100 ns, and 50 ns, respectively. By leveraging GPU-accelerated NAMD engine (Phillips et al., 2020), the simulations were performed in triplicate, yielding an aggregate total of 750 ns simulation time for each model system. Absolute binding energies reported in this study are the averages of the triplicate runs. See the SI for more details about the absolute binding energy protocol.

Steered molecular dynamics (SMD) simulations
To probe the selectivity and the potency of the top hits binding to the spike RBD protein, SMD simulations of the RBD-ligand-hACE2 model systems were conducted. The RBD-ligand-hACE2 model systems were solvated using a TIP3P water model with a 12 Å buffer around the system and then extended 30.0 Å in the SMD pulling axis, resulting in a simulation cell with dimensions of 121 Â 121 Â 167 Å 3 . The SMD simulations were employed by harmonically restraining the position of the spike RBD protein center-ofmass, and moving the second restraint center-of-mass of the hACE2 protein, with a constant pulling speed of 1.5 Å ns À1 in the z-axis direction. More details about the SMD protocol are provided in the SI.