Ligand and structure-based discovery of phosphorus-containing compounds as potential metalloproteinase inhibitors

ABSTRACT In this study, a methodology is proposed, combining ligand- and structure-based virtual screening tools, for the identification of phosphorus-containing compounds as inhibitors of zinc metalloproteases. First, we use Dragon molecular descriptors to develop a Linear Discriminant Analysis classification model, which is widely validated according to the OECD principles. This model is simple, robust, stable and has good discriminating power. Furthermore, it has a defined applicability domain and it is used for virtual screening of the DrugBank database. Second, docking experiments are carried out on the identified compounds that showed good binding energies to the enzyme thermolysin. Considering the potential toxicity of phosphorus-containing compounds, their toxicological profile is evaluated according to Protox II. Of the five molecules evaluated, two show carcinogenic and mutagenic potential at small LD50, not recommended as drugs, while three of them are classified as non-toxic, and could constitute a starting point for the development of new vasoactive metalloprotease inhibitor drugs. According to molecular dynamics simulation, two of them show stable interactions with the active site maintaining coordination with the metal. A high agreement is evident between QSAR, docking and molecular dynamics results, demonstrating the potentialities of the combination of these tools.


Introduction
Metalloproteinases are currently attractive pharmacological targets for the search for new therapeutic alternatives, in the control of non-communicable chronic diseases that require prolonged treatment.These enzymes play a key role in the proteolytic degradation or activation of proteins that modulate important physiological processes [1][2][3].Taking these aspects into account, enzyme activity can be regulated through the use of selective inhibitors depending on the expected effect.
Zinc-dependent metallopeptidase inhibitors of Angiotensin-converting enzyme (ACE, EC:3.4.15.1) have been used in the treatment of cardiovascular diseases for more than 50 years [4].The success of these drugs is undeniable; however, they show a series of adverse reactions that limit the tolerance of treatment for many patients.In addition, synergistic combinations of drugs have been explored, as well as the use of inhibitors with activity on several metalloproteases (such as dual ACE-Neprilysin inhibitors).Both enzymes are involved in a wide variety of different biochemical reactions that lead to the regulation of blood pressure (BP).For its part, ACE induces the conversion of angiotensin I into the powerful vasoconstrictor angiotensin II, promoting an increase in BP, and Neprilysin (NEP, EC:3.4.24.11) is responsible for the metabolism of natriuretic peptides, which have the function of reducing water, sodium and fat from adipose tissue into the circulatory system [3], thereby reducing BP.Thus, dual ACE-NEP inhibitors prevent the formation of angiotensin II and increase the half-life of natriuretic peptides, thus promoting a reduction in blood pressure through two different mechanisms, but they have also shown limitations associated with undesirable effects, which means that the search for new candidates continues [5].
Recently, studies are directed at combining NEP inhibitors with angiotensin-II receptor antagonists (sacubitril/valsartan).In this case, ACE can exert its action on bradykinin, and the risk of angioedema is reduced.However, these enzymes share a great structural similarity with each other in the catalytic site and with the bacterial enzyme Thermolysin (TLN, EC: 3.4.24.27) [6].In addition to having a zinc ion in the active site, they contain a homologous helical structure, as well as two generally conserved structural sequences HExxH and ExxxD that are part of the catalytic domain, and are involved in the interaction and coordination of the ligand [7][8][9][10].For this reason, the TLN enzyme can be used as a pharmacological model for the inhibition of Zn metalloproteases as therapeutic alternatives.
Several functional groups have been reported that could interact with the active site of these enzymes [11], forming a pentacoordinated complex with two histidines, a glutamate and a water molecule.Among them, the phosphinate anionic group stands out; there are analogues of the natural inhibitor phosphoramidon, in which the electric charge provides a Coulombic attraction to the zinc cation, blocking the substrate [12,13].
Currently, there are many phosphorus-containing drugs, and they have been widely used in modern medicine because of the therapeutic properties they possess [14].However, there are reports of organophosphate compounds with high toxicity on human health.Although it is true that phosphorus-containing compounds have great pharmaceutical potential, their toxicity must be taken into account before selecting them as drug candidates [15,16].
However, the cost of the process of identification and development of a new drug, useful in therapeutics, has increased significantly in the last years.According to a study conducted at the Tufts Center for the Study of Drug Development, out-ofpocket cost per approved drug is $1861 million and capitalized cost is $2870 million [17].The pharmaceutical industry reports that only five of every 5000-10000 initially tested chemical compounds ultimately receive approval from the FDA, and it is estimated that 91% of drugs that reach the clinical trial status do not reach the market [18].With the purpose of reducing those problems, new technologies have emerged to replace the ´classic´ strategy, which has been gradually replaced by molecular target-based drug discovery [19,20].An important role has been played by computer-aided drug design techniques based on ligand and structure drug discovery approaches [21][22][23][24][25]. Therefore, the main aim of the present study was to identify promising phosphorus-containing compounds as metalloprotease inhibitors, with a low probability of toxic effects on the organism, and its interactions with the active site.

Ligand-based study
A mathematical model, based on a linear multivariate analysis technique, was proposed for the study of TLN inhibition and the selection of candidates for phosphorus-containing drugs, based on the principle that the biological activity of a compound depends on its molecular structure [26].

Dataset
The dataset of TLN inhibitory compounds was compiled from reports in the international literature and previously published [27].A database of enzyme inhibition coefficient (K i ) ranging from 6.8 × 10 −8 to 430 mM was collected.Because of the wide range of inhibition, we worked with pK i (-log K i ) as the experimental response or dependent variable, and the corresponding molecular descriptors for each compound, as independent variables.Compounds reporting IC 50 were not considered, and those reported as non-inhibitors were maintained in the inactive class (see Supplementary Material, Table S1).

Molecular descriptors
All the families of molecular descriptors implemented in the DRAGON software [28], which encode chemical information of different nature, that is, from constitutional or 0D descriptors to three-dimensional or 3D descriptors, were calculated.We started from a three-dimensional conformation of the molecules, because of the stereospecificity of the studied enzyme, previously optimized with the AM1 semi-empirical method [29] that is implemented in Mopac [30].It was decided to use only a conformation generated by the method, without verifying frequencies, since the nature of the stationary point in the interaction of the inhibitor with the enzyme is not known.

Training and prediction series
The process of obtaining the categorical variable, and the selection of the training (TS) and prediction (PS) series for the classification study was described in a previous work [27], and is given as Supplementary Material.Discretization of the response variable was performed using a piecewise regression method.The selection of both series was performed using a k-means cluster analysis according to Figure 1, to guarantee structural representativeness in both series.

Obtaining and validation of the classification model
In order to obtain the QSAR (Quantitative Structure-Activity Relationship) classification models, the Linear Discriminant Analysis (LDA) was selected.For this, we used the Statistica 7.0 software [31] with exponential and sequential exhaustive search techniques (best subset and forward stepwise, respectively) for variables selection.The developed QSAR-LDA model was used to discriminate between two classes: TLN inhibitors (1) and non-inhibitors (−1), searching for the linear combination of the independent variables that best allowed differentiating (discriminating) the groups, according to equation 1: where a is a constant and b 1 -b m are the coefficients of the function [31].
In order to assess the quality and robustness of the discriminant models obtained, several statistical parameters were evaluated, such as Wilks' λ, the F value and the square of the Mahalanobis distance (D 2 ).Another factor, which was taken into consideration to evaluate the ability of the classification functions obtained, was the percentages of good classification (Accuracy, Q) for all the models in the TS.In the same way, the statistical parameters recommended in the medical-statistical literature were calculated.In this sense, the quality of the models obtained was also expressed through the Matthews correlation coefficient (C), sensitivity (Sens), specificity (Spec) and the false positive rate (FPR) according to equations 2-6 [32]: where TP and TN are the true positives and negatives, respectively, and FP and FN are the false positives and negatives, correspondingly.
In order to test the robustness of the models and demonstrate their predictive power, internal and external validation exercises were carried out.In the case of LDA, for internal validation, a cross-validation strategy was carried out, leaving 5, 10, 15, 20, 25 and 30% of the compounds outside (Leave-Many-Out), generating new models with which both the new TS used, and the group not included on each occasion were predicted.In addition, the y-scrambling test was carried out, also as an internal validation method, where the values of the assignment of the active and inactive series were exchanged at 5, 10, 20, 30 and 40% of the data and, subsequently, the new models for the TS were built.
As a second validation exercise, the PS generated by means of cluster analysis (Figure 1) was submitted to the evaluation of the classification functions, so that the percentages of good classification were calculated for each case, as well as the recommended statistical parameters (C, Sens, Spec and RFP) [32].
In order to define the applicability domain (AD), the William graph was used, considering that the reliable predictions of the model have leverage values lower than the critical leverage with ± 3 standard deviations, and that the compounds that are outside these intervals can be considered outliers [33].

Virtual screening
For the identification of compounds with potential activity as TLN inhibitors, the DrugBank database (www.drugbank.ca),which is available at the University of Alberta (Canada), was evaluated.It contains FDA-approved molecules, bioactive drugs as well as experimental drugs.The selection of the most promising candidates was restricted to the following selection criteria: 1) it is predicted as active by the model, 2) it is within the chemical search space, 3) it meets all the considerations of Lipinski's Rule of Five [34] for oral absorption and 4) it contains some phosphorus atom in its structure.

Structure-based studies
The docking methodology was used to determine the binding mode of the compounds selected according to the QSAR-LDA model as the best candidates.A 100 ns Molecular Dynamics (MD) simulation was subsequently carried out for the non-toxic compounds, to verify the stability of the ligands in the active site of the enzyme.

Ligands preparation
Taking into account the importance of the three-dimensional stability of the compounds for the interaction between the enzyme and a ligand, the 3D structures of each of the selected ligands were prepared and optimized, using the LigPrep tool, implemented by Schrödinger, 2022-2 [35,36].Subsequently, the Schrödinger protein preparation wizard was used to prepare the reference crystallographic complexes for the studied enzyme system.In this study, a Thermolysin-Phosphoramidate crystallographic complex was used (pdb_Code: 6YMS, Resolution: 1.32 Å).During the process, the water molecules and traces of solvent from crystallization were removed from each reference complex.In addition, to obtain a more refined model from the structural point of view, the bond orders, the missing hydrogens and the different protonation states of the residues were assigned at pH = 7.0 ± 2.0.

Grid construction
In this step, the Gridgeneration tool was used to construct a box of dimensions 10.0 × 10.0 × 10.0 Å 3 , located at the centre of mass of the co-crystallized ligand, with the objective of delimiting the conformational search space of the ligands docked within the active site of TLN.In order to specify the position and extent of the region for which the grids of each receiver are calculated, the 'Centre and Size' options were used.The grids (purple and green colours) shown in Figure 2 were used to place the ligands to be docked in a suitable position.

Selection of the docking method
Ligand-receptor docking experiments for compounds, identified by the QSAR model as potential inhibitors of TLN in the virtual screening, were performed using the Glide software of Schrödinger's molecular modelling toolkit [37].The standard (SP) and extraprecision (XP) Glide docking modes were used keeping the default parameters defined in the tool.Glide SP was used to evaluate the ability of the method to obtain conformations that match the known pharmacophore of TLN inhibitory ligands, and the more precise Glide XP was used to refine and find the final mating conformations.After finding several conformations for each compound, only those showing the best scoring energies, and those essential chemical interactions described for the crystallized TLN inhibitors and published in PDB were considered.The most obvious essential chemical interaction is that where oxygen present in pharmacophoric groups (negatively charged) such as phosphates, sulphates, and carboxylates coordinate to the Zn 2+ ion present in the active site of this enzyme.Therefore, based on the reasoning that similar ligands should have similar chemical interactions when functional groups are conserved, the best docking solution for each compound was the conformation that had the best scoring energy and met this essential chemical interaction.

Evaluation of the safety profile of the identified molecules
For the evaluation of the safety toxicological profile of the molecules previously proposed as drug candidates, the ProTox-II tool [38] was used.This provides in silico information on several toxicity endpoints, such as acute toxicity, hepatotoxicity, cytotoxicity, carcinogenicity, mutagenicity, immunotoxicity, adverse outcome pathways (Tox21) and toxicity targets using molecular similarity, pharmacophores, fragment propensities and machinelearning models.
As selection criteria we used compounds classified as non-toxic (Class 6), according to the globally harmonized system of classification of labelling of chemicals (GHS) with an LD 50 > 5000 mg/kg.The established toxicity classes can be consulted in Table S1 of the Supplementary Material.

Molecular dynamics simulations
The Desmond package (Desmond, Schrödinger, LLC, New York, NY, 2021-4) was utilized for conducting MD simulations of protein-ligand complexes (TLN/DB2824, TLN/DB2854, and TLN/DB3018), which demonstrated superior performance in both XP GScore and ADMET profile evaluations.Atomic parameters were derived using the OPLS3e force field [39].The complexes underwent minimization employing the Steepest Descent algorithm, succeeded by the Broyden-Fletcher-Goldfarb-Shanno (LBFGS) method across three phases.In the initial phase, the heavy atoms of water were constrained with a constant force of 1000 kcal/mol Å^2 over 5000 steps (1000 SD, 4000 LBFGS), adhering to a convergence criterion of 50 kcal/mol Å^2.In the subsequent phase, main chains of the protein were constrained using a force constant of 10 kcal/mol Å^2, with a convergence criterion of 10 kcal/mol Å^2 across 2000 steps (1000 SD, 1000 LBFGS).Finally, the systems were unrestrictedly minimized for 1000 steps (750 SD, 250 LBFGS), applying a convergence criterion of 1 kcal/mol Å^2.
Equilibration was methodically conducted in several steps.Initially, 250 ps of Brownian Dynamics with the Berendsen thermostat were executed.Following this, NVT ensemble simulations were progressively heated from 10 K to 300 K over 3000 ps, imposing constraints on the solute's heavy atoms at a constant of 50 kcal/mol.Subsequently, additional 10 ns of equilibration in the NPT ensemble was performed, utilizing the Langevin thermostat/barostat combination without restrictions on the heavy atoms in the system.
Later, the systems underwent 120 ns of production under the NPT ensemble, initiating with a randomly selected seed.Initial temperature and pressure settings were established at 300 K and 1.01325 bar, respectively, using the Nose-Hoover thermostat [40,41] and the Martyna-Tobias-Klein (MTK) barostat [42].Calculation of short and long-range Coulombic interaction forces considered a cut-off radius of 9 Å in an unrestricted system.Integration was conducted at intervals of 1.2 fs, recording every 100 ps.In conclusion, drawing inspiration from the recently published analysis protocol recently published [43,44]; the Simulation Interactions Diagram Panel from Schrödinger LLC, New York, USA, 2021-4, was employed for analysing trajectory frames accumulated throughout the MD simulations.

Development of the LDA model
The LDA constitutes an important tool in computational chemistry and its use has been extensively reported for the prediction of chemical and pharmacologic properties [45][46][47].The best model is shown in Equation 7together with its statistical parameters: where λ is Wilks' statistic, D 2 is the square Mahalanobis distance, F is the Fisher ratio and p-value is its significance level.
The LDA model (Eq.7) presents a value of Q = 92.308%for the TS with eight predictive variables.The high C value of 0.846, showed by the model, explains the strong linear relationship between the classifications and the selected descriptors [48].The model shows an adequate performance evaluating the obtained values of Q, Sen, Spec and FPR.
The obtained model contains information from several families of molecular descriptors.For example, the descriptors nSH (number of thiols), and nROH (number of hydroxyl groups) bring information about the functional groups of the molecules that are important for the inhibitory activity, since they are groups able to coordinate with the zinc ion of the active site of these enzymes.Besides, the C-003 (atom-centred fragment CHR3) is a descriptor related to the presence of the CHR3 fragment in the molecules, which together with the fingerprints: F04[C-C] (frequency of C-C at topological distance 04), B04[C-P] (presence/absence of C-P at topological distance 04) and B01[N-P] (presence/ absence of N-P at topological distance 01) could explain the binding of the molecule to the active site, as well as to the pockets of the enzyme.Finally, descriptors G1s (1 st component symmetry directional WHIM index/weighted by atomic electrotopological states) and R3m+ (R maximal autocorrelation of lag3/weighted by atomic masses) provide three-dimensional information about the inhibitors, which is feasible since the enzymes are stereospecific.

Validation
Our model was widely validated according to the OECD principles that guarantee their soundness for future predictions [49][50][51].In order to verify the robustness and stability, internal and external validation exercises were developed.First, an LGO strategy was carried out, and the accuracies for the new training and prediction sets were computed.It permitted us to assess, by comparison, the quality of the model (see Figure 3(a)).Afterwards, the y-scrambling test was performed, and the results can be seen in Figure 3(b) indicating that, while the random group size is increased, the accuracy of the model gradually decayed.
As can be observed in Figure 3(a), the model presents a high stability to perturbations within the database, which expresses its robustness.Notice that all the Q values for the training series are greater than 90%, and the external series performance in the LMO is also close to these values.Moreover, in Figure 3(b), we can see that the performances of the corresponding scrambled models drastically decrease, affecting the correlation between molecular descriptors and TLN inhibition.This indicates that the fit goodness of the model is not because structural redundancy or casual correlation in the TS.Although the model is robust, it does not guarantee that predictions are trustworthy if the experimental space where these predictions are reliable is not determined.
However, the statistics for external PS is the most important measure for accepting or rejecting the discriminant model.The model achieved an accuracy value of 87.500% (C = 0.750), a value of 86.957% of both sensitivity and specificity, and a false-positive rate of 12.000% for the prediction set.
The definition of the AD model is a critical point in chemometric and QSAR studies.In this work, to evaluate the AD, a study was developed to access the chemical scope of our model, by achieving its leverage values (h) and standardized residuals (Std.Res. or σ). Figure 4 shows the Williams plot of the model for both TS and PS.From this graph, the AD is defined inside a square area within ± 3 σ and at the left of the critical leverage (h*) of 0.1888.
As can be seen in Figure 4, most of the compounds are inside the AD area.Three TS and three PS compounds show leverage greater than h*.In addition, one TS and three PS compounds show with deviations > ±3 σ.Of them, only two PS compounds are completely outside the AD and can be considered outliers since their predictions are not reliable.For this reason, they were removed from the database; the others are not considered as outliers but, instead, as influential compounds.Consequently, the model predictions can be used with high confidence.
Once rejected these statistical outliers, a better performance and discriminant power is achieved (see Table 1).The most desirable, in this work, is to obtain a low FPR value in order to decrease the probability of evaluating inactive compounds, although we take a little risk of rejecting some active compounds.Considering the proven quality of this model, we used this for the virtual screening (VS) of the DrugBank database.

Virtual screening
The main objective of this section is to identify a series of DrugBank database compounds as promising TLN inhibitors, using the QSAR-LDA model.Figure 5 illustrates the structures of several compounds that have high drug-like probabilities, and their posterior probabilities (P) are shown in Table 2. Here, we report five compounds as potential TLN inhibitors, and they were evaluated with the docking methodology.

Docking
According to the proposed methodology to validate the docking method, the ligand was extracted from the active site of the crystallized enzyme and recoupled to it.Figure 6 shows the superposition of the conformations obtained in the docking vs. the phosphoramidate ligand crystal.The experimental conformation was represented in green (crystal conformation published in the PDB) and in yellow the conformation obtained by the docking method.As can be seen, the method can reproduce, with great precision, the ligand poses in the enzyme crystal.A very small root mean square deviation (RMSD = 0.03) was obtained, so it can be used to study possible interactions of selected inhibitors at the active site of the TLN.

Analysis of candidate interactions with the catalytic site of TLN
The compounds identified as promising metalloprotease inhibitors using the QSAR model have structural characteristics similar to the reported inhibitors, as well as functional  groups that could form a coordination complex with the zinc of the active site.However, the interaction not only depends on the ligand, but also involves the chemical environment of the catalytic site and the pocket containing it.Therefore, it is very important to determine the interactions that the ligand could establish at this site to orient the molecule towards zinc and stabilize the complex.In this sense, the five ligands obtained from DrugBank showed an adequate conformation in the active site.Table 3 shows the   ligand-protein binding affinity values (XP GScore) obtained by Glide's extra-precision (XP) docking method, in which only active compounds will have available conformations and whose fundamental purpose is to eliminate false positives.Furthermore, the ligand efficiency values are shown, which can be defined as the ratio between the Gibbs free energy (ΔG) and the number of non-hydrogen atoms within the compound.In order to prevent active compounds from docking if they are not compatible with the particular conformation of the receptor being used, they were docked to multiple conformations of the receptor, and the results of the individual docking runs were combined.Figure 7 shows the main interactions of the ligands in the active site where coordination with zinc is evident in all cases.
In the DB2085 compound, in addition to showing coordination with the metal, a salt bridge is observed between O 2-and Zn 2+ , which strengthens the enzyme-inhibitor interaction.Interactions can also be seen by hydrogen bonds with polar residues such as His231 and Asn112, as well as the negatively charged residue Asp150, and a non-covalent interaction of π -π stacking type between the π-bonds of the aromatic rings of the molecule with His146.
Coordination with the metal, and a salt bridge between the inhibitor and the zinc of the enzyme are also shown in the DB2792 compound.Interactions by hydrogen bonds are evident with His231, and with hydrophobic residues such as Ala113 and Trp115 through the -OH groups.
An extensive coordination area and a salt bridge between S 2-and Zn 2+ are observed in the DB2824 compound.Acceptor hydrogen bonds are established with the polar residues His231 and Asn165, and the hydrophobic residue Trp115.In addition, donor hydrogen bonds are formed between the -SH group, and residues Asn112 and Ala113, the -OH group and Tyr157, and the -NH group and Asp150.
In the DB2854 compound, coordination with the metal and two salt bridges between S 2-and Zn 2+ , and another between the positively charged Arg203 residue of the enzyme and the negatively charged -S group are shown.Additionally, an acceptor hydrogen bond can be seen with His231; two donor hydrogen bonds between -SH/Ala113 and -OH/ Glu143, and the non-covalent interaction of π -π stacking type between the π-bonds of the aromatic rings of the molecule with Phe114.The DB3018 compound shows coordination with the metal, and two salt bridges with the Zn 2+ and the His231 residue of the enzyme, respectively.Hydrogen bonds can also be seen with Asn227 and Trp115.
As can be seen, the interaction with the most frequent catalytic domain residues is between a proton acceptor group of the inhibitors and His231.According to the mechanism of action proposed for the enzyme [52], this residue plays a very important role in catalysis, which suggests that these compounds are indeed good candidates as metalloprotease inhibitors.

Safety toxicological profile of the drug candidate
An initial assessment of the toxic properties of the identified molecules is very important in the field of drug discovery, since the possible side effects of the proposed products must be taken into account.In this sense, if the risk is greater than the benefit, the use in medical practice is not recommended.In this work, the preliminary toxicological evaluation of the previously identified phosphorus-containing compounds was obtained, using the ProTox-II webserver.Table 4 shows the results of the prediction of the oral toxicity of the compounds, and Table 5 shows the reports of organ toxicity and endpoints.Other predicted toxicity targets can be consulted in Tables S2-S6 and the radar chart in Figures S1-S5 of the Supplementary Material.
According to the results obtained, the DB2085 compound belongs to class-2 toxicity, so it can be considered fatal if swallowed.This compound shows a very small predicted LD 50 of 13 mg/kg, which is why its use as an oral drug would be dangerous.Furthermore, it is predicted to have carcinogenic and mutagenic potential.The DB2792 compound is predicted to be in class 4 (harmful if swallowed) and, although doses used in pharmaceutical products are usually less than 1400 mg/kg, it shows a high mutagenic potential.Taking these aspects into account, they are not recommended as drug candidates.

Analysis of molecular dynamics simulations
Molecular dynamics is well-established as a vital tool, for generating trajectories that provide extensive structural insights into the stability and significance of molecular interactions, during the formation of protein-ligand complexes.In our research, we employed this method to validate docking poses, and to investigate the dynamics and stability of the interactions within the TLN/DB2824, TLN/DB2854, and TLN/DB3018 complexes.A crucial criterion for the validity of our complexes was the maintenance of a key inhibitory interaction mechanism (between the O and S atoms of the ligands within the Zn 2+ coordination sphere, 2.0 to 2.5 Å) for over 30% of the simulation duration.Notably, among the three complexes analysed via MD simulations, valid trajectories were obtained for only TLN/DB2854 and TLN/DB3018.Regrettably, for TLN/DB2824, no reliable trajectories were acquired as, early in the simulation, the ligand was observed dissociating from the binding site, resulting in the loss of the described interaction.

RMSD and RMSF analysis
The temporal evolutionary analysis over 120 ns for TLN/DB2854 and TLN/DB3018, utilizing standard metrics like RMSD and RMSF, is crucial for understanding the behaviour of ligands in biological environments and their potential therapeutic implications.Figure 8(a,b) illustrate the temporal RMSD values for the Cα atoms of TLN/DB2854 and TLN/DB3018, referenced against each initial structure of the complex (Frame zero).Both complexes showed stability patterns within the functional thresholds for biomolecular systems (<2.5 Å).Specifically, the TLN/DB2854 complex exhibited an average protein RMSD of 0.7 Å and an average ligand RMSD of 5.5 Å, indicating a stable protein and a dynamically active ligand (Figure 8(a)).Conversely, the TLN/DB3018 complex showed an average RMSD of 1.0 Å for the protein and 4.5 Å for the ligand, suggesting enhanced structural flexibility of the protein and slightly decreased ligand mobility (Figure 8(b)).
Figure 8 shows the complexes through molecular dynamics simulations.Figure 8(a,b) illustrate the evolution of RMSD for the protein (blue) and the ligand (red) throughout the 120 ns simulation, indicating structural stability.Figure 8(c,d) detail the Root Mean Square Fluctuation (RMSF) of the ligands, highlighting their flexibility during the simulation.The inserted structures represent the conformation of the ligand.The RMSF of the protein residues are shown in Figure 8(e,f), represented with light blue and red areas indicating αhelices and β-sheets, respectively.Peaks in the RMSF identify residues with increased mobility within the protein, potentially impacting the affinity and specificity of ligandprotein binding.

Further analysis of the root mean square fluctuation of the ligand
The analysis of the RMSF of the ligand (broken down by atoms) offered a detailed insight into the atomic mobility within the structure of the ligands, when bound to the active site of TLN.For the TLN/DB2854 complex (Figure 8(c)), certain atoms, particularly those labelled as 13-16 and 18, exhibited significant mobility with RMSF values exceeding 4 Å.This pattern of atomic fluctuation suggests that there are specific structural regions of the ligand that are crucial for its ability to accommodate and adapt to the binding site topology of TLN.An example of such a region is the single-bond connection between C5 -N11, which facilitates rotation of the molecule around it.In contrast, the atomic fluctuation of the ligand in the TLN/DB3018 complex (Figure 8(d)) is generally lower, except for notable peaks at atoms labelled as 1, 2, 10, and 11.Thus, a lower overall atomic mobility in this complex suggests a more rigid interaction and, possibly, a higher ligand-protein binding affinity.
Analysing the 'Fit Ligand on Protein' line, which represents the fluctuations of the ligand in relation to the protein, a notable correlation is highlighted in both complexes.In TLN/DB2854 (Figure 8(d)), the alignment of RMSF peaks of the ligand with the 'Fit Ligand on Protein' line indicates that the fluctuations of the ligand are highly influenced by its interaction with the protein.This may reflect a dynamic interaction where conformation of the ligand adjusts in response to protein movements.Conversely, for TLN/DB3018 (Figure 8(d)), although there is a general correlation, the ligand fluctuation peaks do not align as closely with the 'Fit Ligandon Protein' line, suggesting a more independent ligand-protein interaction from the protein conformational fluctuations.
For both the TLN/DB2854 (Figure 8(e)) and TLN/DB3018 (Figure 8(f)) complexes, an analysis of protein residue flexibility was performed.In both cases, a notably highflexibility peak was observed at residue 200, with an RMSF > 2.8 Å, indicating significant mobility in this specific region of the protein Figure 8(e,f).This peak marks a potential region of important functional adaptability crucial for biological activity.
Moreover, contrasting the RMSF profiles of both complexes revealed that TLN/DB3018 possesses higher point flexibility compared to TLN/DB2854.This could suggest that the protein in the TLN/DB3018 complex is capable of undergoing larger conformational changes, which might be critical for molecular recognition mechanisms and ligand binding.

Protein-ligand interaction analysis
From the detailed analysis of molecular dynamics simulations for the TLN/DB2854 Figure 9(a,c) and TLN/DB3018 (Figure 9(b,d)) complexes, interaction patterns between the ligands and the active site residues of TLN were identified.These patterns are key to understanding differences in binding affinity and the stability of the formed complexes.
Figure 9(a,b) represent the 2D diagrams of protein-inhibitor contacts, indicating the residues that interacted for more than 30% of the simulation time.Furthemore, Figure 9  (c,d) show the stacked bar charts representing normalized protein-inhibitor contacts throughout the trajectory.
Values above 1.0 are possible as a single protein residue can make multiple contacts.All contacts, including hydrogen bonds (backbone, side chain), metal coordination (ionic), hydrophobic (π-π stacking, cation-π, others), and water bridges, are identified in the legend with symbols and colours.
In the TLN/DB2854 complex (Figure 9(a)), significant and consistent interactions were observed with key residues within the cavity, notably the coordination of residues His142, Glu143, His146, and Glu166 with the Zn 2+ ion throughout 100% of the simulation time.Furthermore, Zn 2+ also remained stably coordinated to the -O atom for 100% of the simulated trajectory.Additionally, residues His231 (41%) and Arg203 (65%) maintained the ligand DB2854 anchored to the coordination sphere through H-bonds and electrostatic interactions with the -O and -S atoms, respectively.This indicates that these interactions have a key role in stabilizing the conformation of the ligand near the active site, thus favouring the pharmacological inhibition mechanism.Figure 9(c) shows other residues that contributed to the complex stabilization through interactions like Hydrogen Bonds, Water Bridges, and Hydrophobic contacts (π-π Stacking, other: interactions involving a hydrophobic amino acid and an aromatic or aliphatic group on the ligand).
Conversely, the analysis of the TLN/DB3018 complex (Figure 9(b)) revealed a similar interaction profile with strong coordination between Zn 2+ , and residues His142, Glu143, His146, and Glu166 throughout the entire simulation time (100%).However, a hydrophobic interaction with the side chain of Tyr157, although less frequent (32% of the time), might be relevant for the orientation and stability of the ligand within the binding site.Additionally, Asn227 participated in watermediated hydrogen-bond interactions 37% of the simulation time, implying a subtle but significant contribution to the solvation dynamics and, potentially, the stability of ligand-protein interactions Figure 9(b,d).
Comparing the molecular interaction analyses for TLN/DB2854 and TLN/DB3018, the similarity in interaction with the Zn 2+ ion stands out, indicative of a conserved anchoring between both complexes.However, the fundamental difference lies in the pattern of secondary interactions: while TLN/DB2854 seems to favour a more polar and electrostatic environment around the ligand, TLN/DB3018 suggests the contribution of hydrophobic interactions as a potentially stabilizing and differentiating factor.
These detailed interaction patterns not only shed light on the inhibitory binding mechanism of these compounds, but also provide a useful tool for the rational design of new inhibitors.As previously mentioned, in TLN/DB2854, the synergy between metallic and electrostatic interactions could be exploited to develop inhibitors with enhanced binding affinity.Conversely, in TLN/DB3018, the presence of hydrophobic interactions suggests synthesizing compounds with functional groups that favour this type of interaction, potentially increasing the potency, affinity, and specificity of the ligand.

Conclusion
There are two major approaches in Computer-Aided Drug Design studies: the first one, Ligand-Based Drug Design (QSAR studies) and the second, Structure-Based Drug Design (docking studies and molecular dynamics simulation).They have proved to be powerful tools when used individually, but the combination of both approaches could provide a wider picture of the problem.Herein we present a combination of QSAR, docking and molecular dynamic simulation methodologies to identify compounds with potential metalloprotease inhibitory activity.The QSAR model allows us to identify five successful phosphorus-containing compounds.These showed high binding energies in the active site of thermolysin, evidencing the coordination with the metal according to the docking study.However, only three can be considered non-toxic, one of them, the DB2824 compound that leaves the active site of TLN has the worst binding energy and a lower probability of belonging to the active class; while the other two compounds (compounds DB2854 and DB3018) maintained stable interactions in the catalytic site of the enzyme and metal coordination, according to molecular dynamics simulation.The combination of the three developed methodologies have shown good agreement in the results, demonstrating the potentialities of the combination of these tools.

Figure 1 .
Figure 1.Training and prediction set selection using cluster analysis.

Figure 2 .
Figure 2. Receptor grids used to dock the ligands into the TLN active site.

Figure 3 .
Figure 3. Results of the internal validation exercises performed a) Leave-Group-Out (LGO) b) y-scrambling.

Figure 4 .
Figure 4. Model applicability domain according to Williams graph for training and prediction sets.

Figure 5 .
Figure 5.Chemical structure of the compounds identified in the virtual screening.

Figure 6 .
Figure 6.Superposition of the conformations of the crystallized ligand.

Figure 7 .
Figure 7. Diagram of protein-ligand interactions for each complex obtained by docking.

Figure 8 .
Figure 8. Comparative analysis of molecular dynamics of TLN/DB2854 (left block) and TLN/DB3018 (right block) (see text for significance a to f).

Figure 9 .
Figure 9. Summary of molecular interactions for TLN/DB2854 and TLN/DB3018 complexes (see text for significance a to d).

Table 1 .
ADL model performance parameters.

Table 2 .
Results of virtual screening.
ΔP = (Probability of belonging to the active class) -(Probability of belonging to the inactive class).h:leverage.LAI: Lipinski Alert Index descriptor.

Table 5 .
Organ toxicity and toxicological endpoints predicted.Refers to the predicted active class, (−) refers to the predicted inactive class.