Novel drug design framework as a response to neglected and emerging diseases

Abstract A new de novo drug design procedure, applicable in the circumstances of scarcity of experimental data, is proposed. The proposed method integrates de novo and fragment-based drug design approaches. This protocol is uncomplicated, uses openly available resources and can be used to design new molecules that would inhibit the newly discovered or under researched targets. The new potential inhibitors are modelled based on energy minima of small molecules in the binding site. The designed structures are modified following an in-house developed, unambiguous algorithm. These modifications improve predicted binding affinities and pharmacokinetic properties of the designed compounds. The suitability of the proposed drug design approach is confirmed by molecular dynamics simulations, and analysis of both binding modes and binding energies of drug candidates (calculated using MM/PBSA method). Lastly, alternative retrosynthetic approaches to the drug candidates are proposed. Communicated by Ramaswamy H. Sarma


Introduction
Multiple drug targets are limitedly researched, or are not research fast enough (Schnorrenberg, 2018). Some of the neglected tropical diseases might serve as an example because they afflict a large part of the humanity, but are not sufficiently investigated for various reasons (such as limited finances, limited geographical accessibility or ongoing wars in the diseaseafflicted regions) (Hotez & Pecoul, 2010). Conversely, new target proteins are also discovered every year. For example, it has been pointed out that only about 3% of all human proteins have been subjected to drug design. This leaves a lot of space for the new research (Overington et al., 2006). Lastly, new diseases emerge in a fast rate.
The 'traditional' drug design method is based on a selecting of the most potent molecule from a set the existing molecules after their screening against a target (Prieto-Mart ınez et al., Shurtleff et al., 2012). This approach is time and cost consuming and/or limited by the size of the available databases of the existing molecules. The de novo approaches (i.e. the ligand-based approach and the structure-based de novo approaches) facilitate finding drug candidates without explicit examination of all possibilities (Lounnas et al., 2013). They can largely alleviate the costs of experimental research and increase the speed of the design procedures. In the structurebased de novo procedure, the binding site is first mapped, and later, the best-fitting drug candidate is proposed. The drug candidate is usually a completely new structure, previously unknown to the science (Schneider & Clark, 2019). The benefits of this feature become evident on consideration of the number of druglike molecules, estimated in the range of 10 12 À10 63 (Gorse, 2006). Through the recent years, the de novo approach was increasingly gaining in recognition because of efficient resolution of the problem of chemically impossible solutions and alleviation of the threat of combinatorial explosion (Fischer et al., 2019).
The current leading de novo methods relied heavily on the existing experimental data. The majority of existing procedures included the steps of growing, recombination or linking of the fragments of existing inhibitors of a target (Kutchukian & Shakhnovich, 2010;Schneider & Clark, 2019). Therefore, in addition to the crystal structure of the target protein, the crystal structures of the protein-ligand complexes were usually needed to complete a design procedure. We perceived that this creates a somewhat paradoxical situation, i.e. the already discovered inhibitors of a certain receptor were needed to discover inhibitors of this very receptor. Thus, the de novo methods seemed to be impossible to employ in case of neglected, emerging or newly discovered drug targets. Several solutions of this issue were proposed. For example, the lattice-based de novo design strategy does not rely on large quantities of experimental data (Schneider & Fechner, 2005). However, this approach is currently limited to the design of aliphatic structures. It was suggested that significant advancements would be necessary to improve its usability (Bywater et al., 2004). Alternatively, several of the more recent pieces of software introduced the option of random generation of seed for de novo linking (Yuan et al., 2020). However, it was suggested that an overly reliance on stochastic search might limit the design efficiency. Consequently, the introduction of knowledge-based elements to the procedure is desirable (Hartenfeller & Schneider, 2010;Schneider & Fechner, 2005).
Inspired by the CONCERTS software developed in the early ages of the de novo design (Pearlman & Murcko, 1996), we envisioned that some elements of fragment-based design could be introduced to the procedure. The starting seed for the design could be generated using computationally determined energy minima of chosen ligands bound to the receptor. This would enable the generation of seed structures in a very efficient way. The seed structures would be linked by the de novo software into bigger entities. The employment of the evolutionary de novo algorithm would also result in elimination of less prospective entities on early stages of the procedure (Devi et al., 2015;Douguet et al., 2000). Lastly, the de novo software would optimize and output the best solutions. The only experimental data necessary to conduct this procedure would be the crystal structure file of the receptor. Consequently, such procedure would be a novel resourcea drug design method applicable in the circumstances of scarcity of experimental data. Such protocol could be used to design the possible drug candidates for the 2019 coronavirus disease that has emerged only recently. More specifically, it could be used to design the inhibitors of the main protease (M pro ) of severe acute respiratory syndrome corona virus 2. This viral protein that has been suggested as possibly the most attractive drug target in SARS-CoV-2 treatment (Ullrich & Nitsche, 2020). However, it was discovered approximately two years ago, and therefore, is relatively new to science (Liu et al., 2020). Consequently, it would be difficult to design an inhibitor of this protein utilizing data-reliant drug design approaches.
Since number of applicable protocols is limited, a new insight into the design methods is needed, that would respond to the current circumstances resulting from SARS-CoV-2 pandemic.
Moreover, we felt that it would be necessary to verify the synthetic feasibility of the proposed drug candidates using an openly available tool, e.g. Spaya online retrosynthetic server (Spaya, 2020). In this manner, our new protocol consisting of generation, verification, modification and retrosynthetic authentication of the de novo solutions could become a tool for creating libraries of potential drug candidates, consisting of completely novel, previously unknown chemical entities. Furthermore, such protocol would be applicable for the design of inhibitors of newly discovered proteins, or drugs targeting the tropical or emerging diseases.

Results and discussion
2.1. The protocol for in silico assembly and evaluation of the potential inhibitors The proposed protocol featured a structure-based in silico assembly of a potential inhibitor from molecular fragments localized in the binding site ( Figure 1). The protocol consisted of six main stages.
2.1.1. Stage 1. Preparation of seed fragments for the assembly procedure First, energy minima of molecules from a fragment library were identified. The fragments used belong to three groups: very small fragments (e.g. water, ammonia), rigid linkers (e.g. benzene, thiophene) and other fragments of diverse activity (e.g. sulphonamide, uracil). For criteria of the fragment choice, as well as list of all the fragments used, see the Supporting Information. A major improvement to the existing approaches (Kutchukian & Shakhnovich, 2010;Schneider & Clark, 2019) was generation of the seed by computational identification of binding poses of small molecules. Our procedure is applicable in case of neglected, emerging or newly discovered drug targets since an analysis of experimentally measured protein-ligand crystal structures was not employed for the seed fragments preparation.

Stage 2. De novo assembly of the candidate molecules
This stage was realized in 15 cycles. In the first cycle, the seed fragments were randomly linked, grown or mutated, yielding a pool of solutions. In each next cycle, these actions were repeated with the best solutions from the previous cycle. Consequently, the process resembled the process of biological evolution (Yuan et al., 2020). The process was repeated until it resulted in approximately 190,000 assembled molecules with predicted IC 50 lower than 0.1 lM.

Stage 3. Preliminary selection of the candidate molecules
The topological polar surface area (TPSA, strongly correlate with drug bioavailability (Ertl, 2007;Martin, 2005)) was used to predict pharmacokinetic properties of the candidate molecules. Next, the predicted binding affinity and predicted pharmacokinetic properties of the candidate molecules were evaluated.

Stage 4. In silico structural modification of the candidate molecules
The modifications aimed at increasing binding affinity of the candidate molecules and improving their predicted pharmacokinetic properties. In oppose to the existing approaches (Friedrich et al., 2016;Rodrigues et al., 2013), the de novo solutions were modified algorithmically, rather than in a caseby-case manner. Owing to implementation of our in-house algorithm the process was time efficient. Moreover, its quality was consistently high and the results were replicable. Hence, the algorithmic structure of this procedure could allow its future automation. This process is described in detail in the Discussion.

Stage 5. Molecular dynamics simulations of potential drug candidates
The simulations aimed at investigation of binding poses and binding energies of potential drug candidates with account to the solvent and dynamic effects. The energy of the protein-ligand complexes was calculated using MM-PBSA method (Homeyer & Gohlke, 2012). Each of the protein-ligand candidate complexes was simulated in three independent, 100 nanosecond molecular dynamics (MD) simulations. The three molecules yielding the most stable complexes were chosen as potential drug candidates. The drug candidate structures and simulation results are described in detail in the Discussion.
2.1.6. Stage 6. In silico retrosynthetic analysis Spaya (2020) online openly available retrosynthetic server was employed (Sun & Sahinidis, 2022). The results revealed synthetic feasibility our de novo developed potential drug candidates. This analysis is described in detail in the Discussion.

The algorithm of in silico structural modification of the candidate molecules
It was reported that in most of the de novo drug design procedures, de novo search was usually a component of a larger workflow. Specifically, the de novo solutions were often modified to increase their binding affinity, alleviate the problem of chemically impossible motifs, or increase synthetic feasibility of proposed compounds (Schneider & Clark, 2019). It was observed that those modifications were usually introduced in a case-by-case manner, without a clear and easily replicable algorithm behind them (Friedrich et al., 2016; Rodrigues et al., Figure 1. The proposed protocol for the de novo drug design shown on the example of the best potential drug candidate. 2013). It was suggested that, while not necessarily detrimental on small scale, this would become a significant difficulty while scaling up the procedure (Kutchukian & Shakhnovich, 2010). Taking into account the envisaged number of potential drug candidates, our project could be considered as conducted on relatively small scale. However, a development and examination of an algorithm applicable on a large scale was one of our research goals. For the complete algorithm, see the Supporting Information. Briefly, our devised algorithm prioritized compliance with the criteria of chemical correctness and the criterion of bioavailability (TPSA < 140 Å 2 ). When the drug candidates complied with these two criteria, they were modified to increase their binding affinity. The modifications were effectuated mainly by heteroatom exchange and removal. The direction of the exchanges was guided by the algorithm, as well as by the cluster analysis conducted beforehand. To avoid a combinatorial explosion, the modifications were introduced consequently, building a nondivergent stream of modifications, rather than a decision tree. Ultimately, the structures resulting from every modification step were scored and the influence of every modification was reviewed. Based on this review, an attempt was made to propose an even better modification, e.g. by omitting detrimental steps. It was observed that the influence of subsequent steps could be investigated separately, because both descriptors considered in the modification process (TPSA and score) were almost additive (Wang et al., 1998). This measure greatly improved time-efficiency of the process.
The modification process was exemplified in Figure 2. Structure 1 was chemically correct and had a high binding affinity, but its TPSA was higher than 140 Å 2 . We introduced the modifications in two different manners: by route (1) or by route (2). First, at the beginning of route (1), an attempt was made to reduce the TPSA by substituting the -NH 3 þ group with an -OH group (modification (1a)). This modification resulted in a limited decrease of the TPSA value (by 7.41 Å 2 ) and a significant decrease of binding affinity (by 0.88 score). Therefore, the subsequent replacement of two triazole nitrogen atoms by carbon atoms was introduced (modification (1b)). Modification (1b) resulted in a significant decrease of TPSA (by 25.78 Å 2 ) and an increase of the score (by 0.24). Consequently, it was envisioned that it would be possible to introduce the modifications in a different manner (route (2)): by implementing the replacement of the triazole nitrogen atoms by the carbon atoms, while omitting the detrimental replacement of the -NH 3 þ group by an -OH group (modification (1a)). The TPSA of such modified structure 2a was calculated to be equal to 139.01 Å 2 , which was slightly below the limit. Its binding affinity was equal to 7.33 score, better than the parent structure 1.
Overall, 25 molecules of varying chemical structures were subjected to modifications using the above-presented algorithm. Among them, one solution both had acceptable TPSA value and was chemically correct. Its structural modification improved the score by 0.43 and increased the TPSA by 20.23 Å 2 . However, 24 structures from the starting pool did not comply either with the criterion of TPSA or the criterion of chemical correctness before the modification process. The modification of these 24 structures in accordance with our algorithm resulted in successful correction of 23 of them (96%). Furthermore, in 13 of those 24 cases (54%) we fixed the occurring problems with the TPSA value or chemical correctness and improved the score of the designed molecules by a mean value of 0.33 score. The correction process did not give the expected results in one of the 25 starting structures. Consequently, the proposed algorithm was useful in modification of drug candidates resulting from structurebased design. Moreover, the algorithmic structure of this procedure could allow its future automation.

MD simulations and drug candidate structures
In order to maintain high efficiency of the design, the de novo software used in our study relied on a number of approximations (Yuan et al., 2020): e.g. the binding site was treated as rigid, the dynamic effects were omitted and the solvent was modelled implicitly. Consequently, MD simulations of the protein-ligand candidate complexes were conducted to evaluate the drug candidates and investigate their binding poses with account for the solvent effects and dynamic effects.
Ten best scoring ligand structures from the previous steps were subjected to the MD simulations. Each of the protein-ligand complexes was simulated in three independent, 100 nanosecond runs. The energy of the protein-ligand complexes was calculated using MM-PBSA method (Homeyer & Gohlke, 2012). The structures of the simulated ligands, as well as their MM-PBSA computed binding energies are shown on Figure 3. The calculated energy of the complexes differed from À172.9 kJ/mol to À73.2 kJ/mol. The energies of the three most stable complexes were very similar (-172.9, À172.5 and À169.8 kJ/mol for molecules 3, 4 and 5, respectively). They were lower than the energy of the complex with the native ligand (-151.4 kJ/mol). IC 50 of the drug candidates were predicted using the scoring function of the de novo design software because in the MM-PBSA method the binding energy tends to be underestimated (Misini Ignjatovi c et al., 2016). The predictions were 38.9 nM, 29.5 nM and 38.0 nM for compounds 3, 4 and 5, respectively. A brief visual inspection of our drug candidates revealed three structural similarities with the previously reported SARS-CoV-2 M pro inhibitors, i.e. (a) the presence of a four-branched skeleton enabling binding to four subcavities of the binding site, (b) analogous distribution of the hydrogen bond donors and acceptors within the molecule skeleton and (c) a small  10 10 4 8 9 5 5 7 6 9 1 3 7 8 9 8 9 1 2 9 7 9 10 6 7 11 9 9 a The variant consisting of less synthetic steps.
branch capable of penetrating the narrow subcavity within the binding site (Mengist et al., 2021). These similarities confirmed the suitability of our drug design protocol. Moreover, skeletons of the proposed drug candidates consisted mostly of carbon atoms, and the enzyme-linking functions were located on their outer shell.
The RMSD plots derived from the simulations revealed that each examined protein-ligand complex underwent conformational changes in relation to its original conformation. However, none of the complexes dissociated. Each of them eventually stabilized. This observation confirmed the correctness of our protocol, as well as suitability of the designed molecules. Each of the ligands retained certain conformational freedom in the binding site. The observed conformational freedom could be due to the fact that the binding cavity of the SARS-CoV-2 M pro is not completely buried within the protein. The de novo proposed binding poses and the binding poses derived from the simulations were in good agreement (by a visual inspection). On the example of molecule 7 (Figure 4), while the orientation of specific moieties in the binding site differed between the poses, the orientation of the whole ligand was conserved. This result also suggested that each branch of the ligand was selectively recognized by one of four subcavities of the binding site.
Structural analysis of these complexes revealed that the three best drug candidates were all significantly buried within the cavity. Similarly to the previously reported M pro inhibitors (Singh et al., 2020), all of our drug candidates formed strong interactions with Leu27, His41, Met49, Gly143 and Cys145 residues. This observation suggested the correctness of our protocol and suitability of the designed molecules. Additionally, all of the inspected drug candidates formed strong interactions with 165Met, 187Asp and 189Gln. This was due to the presence of a ligand fragment buried deeply within the protein.

Synthetic perspectives
The in silico retrosynthesis attracted a lot of attention in the recent years (Sun & Sahinidis, 2022). We used the openly available Spaya (2020) online retrosynthetic server to evaluate the synthetic feasibility of our drug candidates. The analysis of compounds 2a and 3-11 was carried out in two variants (Table 1, where variant (1) was defined as that consisting of less synthetic steps). For retrosynthesis of all drug candidates, see the Supporting Information. On example of compound 7, variant 2 ( Figure 5). Variant 2 included more retrosynthetic steps than alternative variant 1. However, it was chosen as the exemplary analysis, because it clearly displayed certain characteristic features of all analyzed compounds. First, similarly as in the other considered procedures, the assembly of the target structure (compound 9) started from a heteroaromatic core (compound 12). In subsequent steps, the core 12 was enriched with successive carbon branches bearing heteroatom functions crucial for ligand-enzyme interactions (steps APD-FGA 1 , APD-FGA 2 , APD-FGA 3 ) or precursors of those functions (steps FGI 1 , APD-FGA 4 , APD-FGA 5 , APD-FGA 6 ). The retrosynthetic sequence contained also the steps of protection of susceptible moieties (FGP) or removal of moieties after branch introduction (FGA). Introduction of certain branches resulted in generation of stereocenters (steps APD-FGA 2 or APD-FGA 4 ). The conducted analysis supported our hypothesis that the de novo proposed potential drug candidates were synthetically feasible. Moreover, it confirmed that the retrosynthetic analysis of druglike molecules could be successfully conducted using openly available tools.

Cross-validation of the drug design protocol
The general character of our protocol was checked by generating a set of inhibitors of other molecular target, H 1 histamine receptor (3RZE PDB structure (Shimamura et al., 2011)). The minimal PKD value was increased to 8.50 and the molecular weight threshold of output molecules was changed to [250,400]. The growing sequence has been changed to '90009311215333442'. Approximately 25,000 structures were designed and analyzed by LigBuilder Cluster module. The structures were grouped into clusters according to the positions of their carbon atoms (456 clusters). Fifty clusters including 75% of all ligands were inspected. The clusters including compounds structurally similar to the existing inhibitors of H 1 histamine receptor were identified. The numbers of compounds present in these clusters were summated. At least 25% of all designed molecules bore significant similarity to the FDA-approved antihistamine agents from alkylamine or ethanolamine classes (Simons, 2004). Structural comparison of the representative antihistamine designed by our procedure with the FDA-approved antihistamine is shown on Figure 6.

Seed generation
7BQY PDB structure (Liu et al., 2020) was chosen as the receptor file. The solvent and the ligand have been removed, and the binding cavity has been identified using LigBuilder v.3.0 Cavity module (Yuan et al., 2020). A total number of 61 fragment molecules were chosen from the LigBuilder fragment library according to the criteria described in the Discussion. The fragments were converted from the .mol2 format to the .pdbqt format using Raccoon v.1.2 plugin (Forli et al., 2016). The docking was done using AutoDock Vina v.1. 1.2 (Trott & Olson, 2010). The grid box was set to fully encompass the binding site (24 Â 20 Â 24 Å, center: x ¼ 11.6, y ¼ 1.7, z ¼ 22.1). The following settings were employed: exhaustiveness, 64; maximal number of modes, 20; and the maximal energy range, 1 kcal/mol. The docking settings were checked by redocking the native ligand present in the PDB structure. The obtained pose was in acceptable agreement with the native pose. The output files from docking the seed were later converted to the .mol2 format using PyMol (The PyMOL Molecular Graphics System). If the output contained two closely identical binding poses (i.e. there was less than 1 Å of distance between the corresponding atoms), the pose of lower energy was being chosen for further analysis. Missing hydrogen atoms were added using Avogadro toolbox (Hanwell et al., 2012). The files were then passed through the LigBuilder seed extractor module on its default settings.

De novo design of the inhibitors
LigBuilder v.3.0 was used in the de novo design. The program worked in the 'linking' design mode with the set of fragments prepared as above. Unless stated otherwise, the program settings were kept default. The minimal PKD value (predicted negative logarithm of protein-ligand binding constant) was increased to 7.00 and the molecular weight threshold of output molecules was changed to [250,500]. One of the program's own libraries, 'rotate_drug_40', was used for growing. Several additional structural limitations were implemented: thiol, thioether, ester and a,b-unsaturated carbonyl moieties were added to the forbidden fragment library. Halogen atoms have also been disallowed. Accordingly, the growing sequence has been changed to '2092921121533340'. The software was run in the 'Continue' mode. The calculation was performed on a Intel(R) Pentium(R) CPU B960 @ 2.20 GHz and ran for 250 CPU hours. Following the calculation, LigBuilder was run subsequently in the 'Process', 'Filter' and 'Score' modes. The procedure returned a set of approximately 200,000 structures of unverified chemical correctness, but with specified positions in the binding site. Therefore, the output could be considered a set of data, rather than a set of chemical entities. Consequently, it seemed more appropriate to call the resulting structures 'solutions', rather than 'compounds'.

Postprocessing of the de novo designed solutions
OpenBabel v.3.0.0 (O'Boyle et al., 2011) was used to calculate InChI identifiers of the solutions. If two solutions had identical identifiers, the one with lower score was rejected. Consequently, conformers were considered the same molecule and optical isomers were considered different ones. The molecules with lowest scores were also rejected, until a group of 5000 chemically different solutions with the highest scores was obtained. The resulting group of best solutions was docked to the receptor, analogously as before. The exhaustiveness of the docking was lowered to 8 (the default value). The results were analyzed by comparing the binding pose proposed by LigBuilder with the binding poses predicted by AutoDock Vina. If the binding pose proposed by LigBuilder did not correspond to any of the poses proposed by AutoDock Vina, the result was excluded from further analysis. The comparison of the two poses proceeded as follows. (1) The list of coordinates of heteroatoms and aromatic carbon atoms from the LigBuilder pose was created.
(2) An analogical list was created for the pose proposed by AutoDock Vina. (3) Every atom from the Autodock Vina list was examined. If there was no atom of its type closer than 2 Å from it in the LigBuilder list, the poses were considered different. The functionality of this procedure was verified empirically.
Openbabel was used to generate canonical SMILES identifiers of the solutions. Molecular weight and heavy atom number were derived from LigBuilder output files. XlogP and numbers of rotors, rings, acceptors and donors were calculated using XlogP ver 3.2.2 (Cheng et al., 2007). Furthermore, TPSA and logP were calculated using Cxcalc (ChemAxon's Calculator).

Cluster analysis
The narrowed group of the solutions was analyzed by LigBuilder Cluster module on default settings and grouped into clusters according to their chemical similarity. The clusters were ordered by the score of the best scoring solution in the given cluster. Twenty-five best clusters were chosen for further analysis. Subsequently, the solutions were analyzed with regard to the moieties appearing in numerous molecules and positioned in the cavity in a similar manner. The influence of such 'persisting motifs' on TPSA and score of the molecules was researched. An approximate influence of a selected motif on the score was calculated as a mean of the scores of the fragments isolated from four chosen molecules. Interchangeable persisting motifs were grouped in the families of 'close analogs'.

Structural modifications of the best solutions
Modifications were introduced into the best structures, following a precise algorithm. The detailed procedure is given in the Supporting Information. Its major steps were as follows. (1) Taking the interactions between a solution and the binding pocket into consideration, the solution with the highest LigBuilder score was chosen from each cluster. (2) Modifications were introduced. Chemical correctness and valid protonation states were prioritized. Further, the TPSA below 140 Å 2 was ensured. The modifications were introduced consequently, in a stream. Consequently, building a tree of all possible combinations was avoided. (3) The modifications were evaluated. The binding poses were cross-validated by AutoDock Vina. Molecular descriptors were calculated. (4) The impact of each modification from the stream was examined. An attempt was made to propose an even better modification by excluding the modifications that were detrimental to the score.

Selection of final candidates
Modifications, scores and molecular descriptors of the de novo proposed solutions were compiled into one sheet. The solutions scored between 7.13 and 8.12 (i.e., presumably in the range of one order of magnitude of IC 50 from the best (Wang et al., 1998)) and with TPSA > 140 Å 2 were analyzed further. The selection was narrowed further by choosing only the best scoring solution from each cluster. Ten 'solutions' chosen at this stage were the ultimate drug candidates. As they were chemically correct structures and their synthetic feasibility was to be considered, it seemed more appropriate to call them 'compounds' from this point onward.

MD and free energy calculations
MD simulations of the ligand-protein complexes were performed in order to evaluate their structure and stability. The energy of each of the systems was minimized prior to the simulation. GROMACS package (Bekker et al., 1993), version 2021.4 was used for the analysis. GROMOS 54a7 force field (Schmid et al., 2011) was used to describe the system. ATB server (Malde et al., 2011) (version 3.0) was used to prametrize of the ligands using their all-atom representations. The protein-ligand complexes were simulated in a triclinic box. The distance between the complex and the edge of the box was set to be at least 1.0 nm during the simulation. The complex was solvated with SPC water molecules (Berendsen et al., 1981) and neutralized with sodium cations.
Each of the systems was subsequently equilibrated under NVT conditions for 100 ps and simulated under NPT conditions for 100 ns. During the NVT equilibration, the complex was restrained with a force constant of 1000 kJ/mol in every direction. No restraints were applied during the production run. Periodic boundary conditions were used in all three dimensions. The particle mesh Ewald (PME) method (Darden et al., 1993) was applied to calculate long-range electrostatic interactions, and 0.9 nm cutoff for nonbonded interactions was applied. Single-range cutoff was used for nonbonded interactions in order to avoid an influence of the force field parametrization (Hess et al., 2019). Lincs algorithm (Hess et al., 1997) was used to constrain all bonds involving hydrogen. The equations of motion were integrated using leapfrog algorithm with a time step of 2 fs. The temperature was maintained at 298 K using V-rescale method (Bussi et al., 2007) with a coupling time constant of 0.1 ps. Protein and ligand were coupled separately from ions and water. The pressure was maintained at 1 bar using Berendsen barostat (Berendsen et al., 1984) with 2 ps time constant, compressibility of 4.5Á10 À5 bar À1 and isotropic coupling.
GROMACS analysis tools were used to analyze root-meansquare deviation (RMSD) of the ligands and of the protein backbone during the simulation. The free binding energy of the complexes was calculated by MM-PBSA method (Homeyer & Gohlke, 2012) (g_mmpbsa tool (Kumari et al., 2014), version 5.1.2). The solvent dielectric constant was set to 80, the solute dielectric constant was set to 8 and SASA model with the solvent probe radius of 1.4 Å was used to calculate the free binding energies. Per-residue free energy decomposition was also performed to identify the contribution of individual residues to the binding free energy of the protein-ligand complexes.
The protein-ligand complex of the native ligand present in the 7BQY PDB structure was also examined. The complex was parametrized with the ligand bound noncovalently to the protein. It was simulated and its binding energy was calculated analogously to the other ligands.

Retrosynthetic analysis
Spaya online retrosynthetic server (Spaya, 2020), v. Beta 1.0 was used to search for possible retrosynthetic routes leading to the selected compounds. Two alternative routes were found for each drug candidate. This part of the work proceeded according to the following procedure. (1) A search using Spaya on default settings (maximum of 15 steps to reach a commercially available product) was performed. (2) The pathways with the smallest number of steps were inspected. (3) If every path contained a deeply doubtful disconnection, the software was requested to design other possible paths, starting from this disconnection. (4) The procedure was repeated until a favorable path was reached.

Conclusion
In this study, we proposed the new protocol for de novo drug design using very little experimental data. This protocol is a significant advancement, as the de novo linking protocols used until now required the data on previously designed inhibitors of their targets. In particular, we consider that our protocol could prove very useful when the experimental data concerning the target is scarce. We anticipate its usefulness in the design of drug candidates targeting neglected tropical diseases or emerging diseases. When SARS-CoV-2 M Pro (an example of a newly discovered protein) was subjected to our protocol, the designed new inhibitors were predicted to display nanomolar potency.
The proposed procedure does not only utilize little experimental data, but also introduces innovative approaches to the key components of de novo drug design: to the generation and modification of the de novo drug candidates. First, the seed for the de novo procedure was generated computationally from binding poses of small molecules in their energy minima in the binding site. The seed structures were then linked into bigger, druglike entities with the use of de novo measures. Our in-house algorithm for structural modifications of the designed molecules is highly intuitive, efficient and unambiguous. It yielded molecules with high predicted binding affinities and good estimated bioavailability. The suitability of our drug candidates was confirmed by the MD simulations and comparative analyses of the designed compounds with known inhibitors of the target protein. Lastly, the synthetic feasibility of the designed potential drug candidates was verified with the use of an openly available tool, Spaya (2020) online retrosynthetic server.
The software used in our drug design protocol was widely accessible and user-friendly, but it was used in new and inventive ways. We hope that our protocol might be of interest to a wide range of researchers who conduct practical investigations, but are not primarily engaged on the field of computational chemistry. We find that their experimental investigations can be enriched by our simple, intuitive and openly available protocol.