Combined machine learning and pharmacophore based virtual screening approaches to screen for antibiofilm inhibitors targeting LasR of Pseudomonas aeruginosa

Abstract Pseudomonas aeruginosa, a virulent pathogen affects patients with cystic fibrosis and nosocomial infections. Quorum sensing (QS) mechanism plays a crucial role in causing these ailments by mediating biofilm formation and expressing virulent genes. A novel approach to circumvent this bacterial infection is by hindering its QS network. Targeting LasR of las system serves beneficial as it holds the top position in QS system cascade. Here, we have integrated machine learning, pharmacophore based virtual screening, molecular docking and simulation studies to look for new leads as inhibitors for LasR. Support vector machine (SVM) learning algorithm was used to generate QSAR models from 66 antagonist dataset. The top three models resulted in correlation coefficient (R2) values of 0.67, 0.86, and 0.91, respectively. The correlation coefficient (R2 test) values on external test set were found to be 0.62, 0.57, and 0.55, respectively. A four-point pharmacophore model was developed. The pharmacophore hypothesis AAAD_1 was used to screen for potential leads against MolPort database in ZincPharmer. The leads which showed predicted pIC50 value of >8.00 by SVM models were subjected to docking analysis that reranked the compounds based on docking scores. Four top leads namely ZINC3851967 N-[3,5-bis(trifluoromethyl)phenyl]-5-tert-butyl-6-chloropyrazine-2-carboxamide, ZINC4024175 4-Amino-1-[(2R,3S,4S,5S)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-2-oxopyrimidine-5-carbonitrile, ZINC2125703 N-[(5-Methoxy-4,7-dimethyl-2-oxo-2H-chromen-3-yl)acetyl]-beta-alanine, and ZINC3851966 N-[3,5-Bis(trifluoromethyl)phenyl]5-tert-butylpyrazine-2-carboxamide were selected. These compounds were checked for its stability by performing a molecular dynamics simulation for a period of 100 ns. The ADME properties of the leads were also determined. Hence, the compounds identified in this study can be used as possible leads for developing a novel inhibitor for LasR. Communicated by Ramaswamy H. Sarma


Introduction
Globally, with an increase in the development of multidrug resistant (MDR) pathogenic bacteria, around 700,000 people die every year due to its infections. It is assumed that it would raise to 10 million deaths per year by 2050 (Gupta & Singhal, 2018;Spaulding et al., 2018). The misuse and overuse of conventional antibiotics mainly target fundamental mechanism of bacteria and leads to creation of various multi-drug resistant variants (Durão et al., 2018). A review report commissioned by UK government in 2014 reported death rate due to antimicrobial resistance (AMR) will lead cancer death rate by the year 2050 (Ahmed et al., 2019). In particular, infections caused by bacteria remains as the major cause for morbidity worldwide that results from various acute and chronic infections. The untreatable nature of bacterial infections is at rise due to an enormous growth of antibiotic-resistant bacterial strains (Khatoon et al., 2018).
According to National Institute of Health (NIH), bacteria which reside within biofilms contribute to about 80% of all microbial infections in humans (Jamal et al., 2018). Biofilms are microbial consortia attached to an inert or living surface that entrap themselves in a self-secreted matrix referred to as extracellular polysaccharide matrix (Banerjee et al., 2020). Biofilms function as reservoirs for pathogenic organisms and a source of outbreak for infectious diseases such as cystic fibrosis, bacterial endocarditis, osteomyelitis, otolaryngologic infections and non-healing wounds (Krithiga & Jayachitra, 2018). Due to their inherent tolerance to antibiotics, microbial biofilms contribute to various acute as well as chronic infections in immunocompromised patients (Rajkumari et al., 2017).
Pseudomonas aeruginosa, a notorious human pathogen is a virulent Gram-negative bacterium responsible for causing fatal infections in immunocompromised individuals, and nosocomial infections (Thi et al., 2020). The adaptability and intrinsic resistant nature of P. aeruginosa to conventional antibiotics has led to an increase in the rate of mortality caused by its infections (Pang et al., 2019). According to the reports by World Health Organization (WHO, 2017), P. aeruginosa was listed as the most life-threatening and priority bacteria for which research and development of new antibiotics is necessary.
The pathogenesis of P. aeruginosa mainly owes to the biofilms whose formation is initiated by an interconnected signalling network known as quorum sensing (QS). QS is a cell-to-cell communication mechanism that regulates collective behaviour in response to environmental stresses or differences in cell density (Mukherjee & Bassler, 2019). QS relies on secreted signalling molecules referred to as autoinducers whose concentration and specificity are recognized by certain transcriptional regulators (Thi et al., 2020).
Though all three QS systems are unique, they are linked in a hierarchical pattern, with las system regulating the other systems and its corresponding receptor protein LasR is regarded as a principal protein intricated in controlling various virulence associated phenotypes (Asfahl & Schuster, 2017;Vetrivel et al., 2021). Researchers have also evidenced that diverse array of physiological activities are controlled by QS circuit and in particular targeting the las system, might be a useful approach to prevent P. aeruginosa pathogenesis (Sarkar et al., 2018).
In the current era, chemoinformatics approaches have been evolved from forward pharmacology with the emergence in computational techniques. Computer Aided Drug Discovery (CADD) is gaining popularity in the area of drug design by identifying compounds as hit-to-lead process. Computational methods like quantitative structure-activity relationship (QSAR), pharmacological analysis, and structure related techniques like dynamic modelling and molecular docking has revolutionized the drug design platform (Ahamed et al., 2019).
QSAR is a ligand-based computational technique, employed to build mathematical models that can predict structural requirements for unknown biological activity and optimize new lead compounds (Qin et al., 2019). Several QSAR works have been preferred to conventional methods over the past 15 years to develop potent quorum sensing inhibitors (QSIs) against LasR protein to reduce the time, effort, and cost (Luise & Robaa, 2018). Though many antagonists targeting LasR have been known so far, X-ray crystallographic studies and molecular docking studies have shown that structurally unrelated inhibitors and inducers binds to the same active LasR protein site thereby imitating the action of inhibitor or inducer (Choi et al., 2017;McCready et al., 2019;Nain et al., 2020;Sadiq et al., 2020;Vetrivel et al., 2021).
Machine learning process, one of the subsets of artificial intelligence has been used in drug discovery processes, since it facilitates data driven decision making that speed up the process and minimize failure rates (Vamathevan et al., 2019). In this note, substructure mining, graph-based methods, molecular fingerprints, and molecular descriptors are some of the important machine learning methods that obtain chemical information from small molecule structures and analyse it computationally (Jabeen & Ranganathan, 2019).
Among the various supervised machine learning algorithms employed for recognition of essential molecular descriptors in lead compounds, support vector machine (SVM) is one of the most powerful technique widely used (Vamathevan et al., 2019). SVM was initially reported by Vapnik (1998) to handle nonlinear problems of classification and regression. It utilizes a kernel function to map the input features for constructing a hyperplane separation using high dimensional feature space (Qin et al., 2019).
In addition, techniques like pharmacophore modelling, virtual screening, molecular docking and simulation methods can analyse drugs and similar active biological molecules. Structure and ligand based pharmacophore model identify active molecules by using known pharmacophoric features which can further be assessed for its interaction with the target by in silico docking and simulation process. Pharmacological properties that include Absorption, Distribution, Metabolism, and Excretion (ADME) of a compound can also be predicted by CADD process (Opo et al., 2021). To our knowledge this is the first kind of report in which QSAR model generated by machine learning algorithm is combined with pharmacophore model based virtual screening to identify possible antagonists for LasR protein.
In the present study, SVM machine learning algorithm was employed to develop QSAR models based on the molecular descriptors of LasR antagonists; the performance of the generated QSAR models were evaluated in terms of correlation factor (R 2 ) and root mean square error (RMSE); a common pharmacophoric features were generated and pharmacophore hypothesis was used to screen the potential lead compounds against LasR from MolPort database in ZincPharmer; Finally, the SVM models were utilized to predict the activity of the top screened hits; Lead compounds with predicted activity greater than 8.0 were further validated for its interaction with LasR by molecular docking and simulation studies; The pharmacokinetic profile of the top lead compounds were also evaluated.

Dataset
A total of 66 dataset that are already reported as potential antagonists of LasR is used in the present study. Of which 36 LasR antagonists were obatined from ChEMBL database (https://www.ebi.ac.uk/chembl/) and 30 from literatures (Borlee et al., 2010;Bottomley et al., 2007;Smith et al., 2003;Choi et al., 2017;Geske et al., 2008;Ni et al., 2009;O'Brien et al., 2015;Zou & Nair, 2009) and their corresponding IC50 values were also retrieved. Since the data points of the IC50 values may not be equally distributed, we converted all the IC50 values into pIC50 (-log IC50). The pIC50 values of the inhibitors ranged from 4.0 lM to 9.0 lM. The SMILES of the compounds chosen for the study are given in Table S1.

Pre-processing of data and classification of compounds
Initially, the dataset of compounds were converted to SMILE or 3D SDF format for molecular descriptors calculation. Then the dataset was partitioned into training and test sets at a ratio of 1:3. The training set consisting of 49 compounds was used to build the predictive QSAR model, and the test set consisting of 17 compounds was used to assess the predictive ability of the developed model.

Molecular descriptors calculation
To develop a correlation between the structure and its activity or function, molecular descriptors are essential (Kumari et al., 2017). Molecular descriptors were calculated using ChemDes (http://www.scbdd.com/chemdes/), a web-based molecular descriptors computing platform (Dong et al., 2015).

Feature selection
Feature selection was done to get rid of highly correlated descriptors, multicollinearity and to eliminate useless descriptors resulting in a set of highly significant descriptors for QSAR model development (Chauhan et al., 2014). The total number of descriptors computed was decreased by excluding descriptors with at least one missing value, descriptors with constant or almost constant values, descriptors showing a standard deviation of less than 0.0001 and descriptors whose Pearson correlation coefficient with activity was less than 0.1 (Hdoufane et al., 2019). The most relevant descriptors for model development were selected using CfsSubset variable selection and BestFirst evaluator module implemented in Weka.

QSAR model
For the present study, we applied support vector machine (SVM) based machine learning algorithm for QSAR model development using the Weka software (https://sourceforge. net/projects/weka/). Weka is an online machine learning tool consisting of various machine learning algorithms for data mining which includes features like data pre-processing, attribute selection, regression, classification, association rules, clustering, and visualization (Wang et al., 2012).

Support vector machine
Weka software has a module for SVM that solves support vector regression problems using a SMO-type of the algorithm (Fan et al., 2005). In this study, we utilized SMOreg (weka.classifiers.functions.SMOreg) function for generating regression model based on the dataset. This module of Weka use support vector machine with arbitrary kernel functions for regression analysis.

Optimization of kernel functions and parameters
Three most commonly employed kernel functions are linear kernel, Gaussian (RBF) kernel and polynomial kernel functions (Niu et al., 2012). The SVM model performance is dependent on combination of various parameters namely capacity C parameter, the kernel function K (x,y) along with their corresponding parameters and epsilon (e)-insensitive loss function. (Hdoufane et al., 2019). The values for C, c, and e parameters were optimized through a grid search process utilizing Radial Basis Function (RBF) kernel function. In the grid search, a sequence of C values in the range from 5 to 20 with incremental step of 5, c values ranging from 0.05 to 0.5 with incremental step of 0.05 and e from 0.001 to 0.009 with an increase in step of 0.001 were exploited. Selection of these parameters is the most pivotal step in SVM calculations to achieve a good model performance.

Model performance
The quality and performance of the predictive model was validated by internal and external methods. Internal validation was done by employing a 10-fold cross validation approach to determine the fitness of the model. External validation was done on the test set (Kumari et al., 2018). The overall robustness and reliability of the developed SVM model was evaluated in terms of correlation coefficient (R 2 ), root mean square error (RMSE) and mean absolute error (MAE) parameters (Chauhan et al., 2014).

Generation of pharmacophore hypothesis
The pharmacophore model was developed using Phase module of Schr€ odinger suite 2019-1. The pharmacophore model was developed using certain essential features to generate sites for all the compounds used in the study (Misra et al., 2017). Phase provides six in-built pharmacophoric features, i.e. hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobic group (H), negative ionizable (N), positive ionizable (P), and aromatic ring (R) to define the features of ligands. Pharmacophore model was generated using systematic variation of number of sites and features that are common to most active compounds. The quality of alignment was evaluated based on the survival score and the default values have been used for hypothesis generation (Dixon et al., 2006).

Virtual screening
The best pharmacophore hypothesis was used to screen a library of compounds that match its pharmacophoric features. Virtual screening was carried out using ZincPharmer (http://zincpharmer.csb.pitt.edu/) which makes use of best pharmacophore hypothesis as query to screen the MolPort database. For screening purpose, we used constraints that included a molecular weight range of 180-500 Dalton, 10 rotatable bond cut-off and maximum of 0.7 Root Mean Square Deviation (RMSD) to get the top hits from MolPort database (Rajeswari et al., 2014). The lead compounds that best matched with the pharmacophoric features were then predicted using the developed SVM regression model to identify compounds that can act as possible LasR inhibitors.

Molecular docking
The top lead compounds with a predicted pIC50 value of greater than 8.0 were subjected to Glide v9.2 docking program implemented in Maestro v12.9 of Schr€ odinger suite 2021-3. For this purpose, the 3D crystal structure of LasR protein with PDB ID: 3IX4 was obtained from protein data bank (PDB) (https://www.rcsb.org/) (Zou & Nair, 2009). Initially the 3D structures of all ligands were prepared using LigPrep v5.9 module of Schr€ odinger software where all the ligands were minimized using the OPLS-2005 force field. The protein structure was prepared using Protein Preparation Wizard module of Schr€ odinger suite 2021-3 which involved the addition of hydrogens, assignment of hydrogen bonds, bond orders, minimization, optimization of the proteins, and deletion of waters. The Receptor Grid Generation Panel of the Glide v9.2 application was employed to generate a grid box with a dimension of 27 Â 27 Â 27 Å at the centroid of the co-crystallized ligand in such a way it allows docking of ligands into the active site. Finally, the ligands were docked in Glide's Extra Precision (XP) docking mode.

In silico drug likeliness and ADME prediction
The top hit compounds identified as potential leads were further assessed for its drug-like behavior and pharmacokinetic profile using online SwissADME (http://www.swissadme. ch/) server to evaluate ADME properties such as Gastro intestinal (GI) absorption, solubility profile and bioavailability of the compounds (Opo et al., 2021).

Molecular dynamics simulation
The molecular dynamics of the top leads in the LasR active site were determined using the OPLS-AA force field in an explicit solvent with the single point charge (SPC) model intricated in the Desmond package of Schr€ odinger suite 2020-4. The SPC water molecules were added, and the orthorhombic water box were then allowed to ensure that the entire surface of the complex is covered by the solvent model. The system was neutralized by adding counterions which balance the net charges of the system. After the generation of the solvent environment, the complex system was minimized and equilibrated using the default relaxation routine implemented in Desmond. The relaxed model system was subjected to 300 K and 1.01325 bar pressure, for a total simulation time of 100 ns at NPT ensemble of MD simulation with a recording interval of 100 ps for total energy (kcal/ mol). The real time behavior of the complex structure and structural modifications during the course of 100 ns simulation were assessed by means of root mean square deviation (RMSD), root mean square fluctuation (RMSF), and various other parameters. The overall workflow is given in Figure 1.

Molecular descriptors computed by ChemDes
Various PaDEL type of descriptors namely 1D, 2D and 3D descriptors were computed by ChemDes. Each descriptor distinguishes the molecules and applies a mathematical formalism to render possible correlation between the calculated descriptors and their biological activity (Hdoufane et al., 2019). The total number of descriptors generated was 1875. The number of descriptors was reduced to 565 when they were pre-processed to remove irrelevant features. The summary of the different PaDEL 1D, 2D and 3D descriptors calculated is shown in Table 1.

Feature selection
In order to develop an efficient and robust QSAR model for predicting LasR inhibition activity, a set of descriptors that can represent total set of descriptors should be identified. The model performance is highly dependent on what type of descriptors are used to predict its activity. The most relevant attributes among the 565 PaDEL descriptors were preselected by employing the CfsSubset variable selection and BestFirst attribute evaluator of Weka. Six descriptors, namely ATSC6e, MATS6v, RotBtFrac, VE3_D, RPCS and LOBMAX have been selected as the most significant attributes that described the inhibitory activity of LasR. These descriptors were taken as the input candidates for constructing the support vector regression (SVR) model. The chemical sense and form of the descriptors selected from the whole set are listed in Table 2. ATSC6e and MATS6v are two parameters which denote a heterogenous group of molecular descriptors. These are autocorrelation descriptors that determine the spatial distribution of a generic molecular property on the molecular structure space and measure the firmness between atoms at a predefined distance. RotBtFrac is one of the constitutional descriptor that represents a molecular structure by taking behind only the chemical composition and does not provide any information related to its topology and geometry. VE3_D is a two-dimensional topological descriptor which encodes topological representation of molecular structures by means of molecular graphs, matrix-based representations and generalized topological indices. RPCS is a charged partial surface area (CPSA) type of descriptor used in structure-physical relationship studies to represent molecular features in charge of polar intermolecular interactions. Finally, LOBMAX belongs to geometrical class of descriptors which considers the threedimensional structure of molecule and provide higher information than two-dimensional descriptors. This type of descriptor provides information on the distance relationship between all atom pairs in 3D space (Mauri et al., 2016).

Optimization of kernel functions and parameters
To search for a better SVM regression model, essential parameters namely kernel function, kernel coefficient and penalty parameter were determined. R 2 and RMSE were evaluated to select the best model through 10-fold cross validation method using various kernel functions and support vector regression parameters. The choice of kernel function is most important to deal with regression problems of nonlinear data effectively and efficiently. Since the data points distribution are unknown and the methods for choosing kernel functions are still uncertain, selection of kernel function is largely based on the method of evaluation namely cross validation (Niu et al., 2012). As a result, in terms of kernel function, radial basis function (RBF) kernel function was chosen as the optimum kernel function for building the SVR model as it performed best compared to other kernel functions.
In case of parameter values, the optimum parameters that resulted from the best 10-fold cross validated grid search models were used for modeling. The optimized parameters and statistical measures of the top three SVM regression models are shown in Table 3. Here, the C parameter controls the trade-off between the complexity in maximisation of margin and minimisation of training error. The c parameter regulates the amplitude of RBF and generalization potential of the SVM. Finally, the epsilon (e) is an insensitive parameter that protects the training set from reaching the boundary

Performance of predictive models
The optimized values of C, c, and e shown in Table 3 were then considered to construct predictive models. The established models were evaluated for their predictive performance by 10-fold cross validation method and also used to predict a test dataset (external validation). The overall performance of the models were assessed based on their statistical measures obtained by popular sequential minimization optimization algorithm shown in Table 3. The parameter, coefficient of correlation (R 2 ) statistically determines how well the regression line fits the real data points. From Table  3, the R 2 values obtained infer that the regression line fits the data and are in positive correlation relationship. On the other hand, root mean square error (RMSE) is a statistical measure which evaluates the fitting ability of the model and MAE estimates the error between predicted and actual efficacy of compounds (Chauhan et al., 2014). In such a case, the generated models have shown a quality values of RMSE and MAE in case of both sets of training and test (Table 3). The experimental and predicted pIC50 values of the SVM models obtained for training and test datasets are given in Table 4. The scatter plots of experimental versus predicted pIC50 values of the SVM models are presented in Figure 2a, b, and c, respectively. Thus, the generated models could be considered good and reliable as they exhibited a better range of predicted activities for both training and test datasets (Table  4) used for validation. From these observations, it can be reported that SVM algorithm is suitable for both internal and external prediction. In addition, Figure 2, subsequently support the finding that developed SVM models can inevitably predict the biological activity of new LasR inhibitors.

Generation of pharmacophore hypothesis
Structurally diverse 66 set of compounds were partitioned into active (pIC50 ! 6.0), and inactive (pIC50 5.9) to identify the common pharmacophore which resulted in 25 active compounds and 40 inactive compounds. Common pharmacophores were determined from a group of variants in Phase. The variants were defined by selecting the number of maximum sites to include in the hypothesis as 5 and minimum number to 4. With the provided criteria, out of 25 active compounds, 15 compounds were perfectly matched, and a range of pharmacophore hypothesis were generated. Four featured pharmacophore hypotheses were obtained and further applied to scoring function analysis.
The scoring procedure ranked different hypotheses based on the superimposition of the volume overlap, site points, vector alignment, selectivity, number of ligands matched, relative conformational energy and activity. This type of scoring algorithm helps in making a better choice for choosing which hypothesis is more suited for further investigation (Dixon et al., 2006). A total of 3 hypotheses namely AAAD_1, AAAD_2, and AAAD_3 survived the scoring process. The various scores of the generated pharmacophore hypotheses are provided in Table 5. The pharmacophore hypothesis, AAAD_1 which exhibited highest values of survival score (4.437), and site score (0.938) was selected for further investigation. The enrichment viewer parameters of the pharmacophore hypotheses also suggested the selective character of the pharmacophore hypothesis AAAD_1 (Table 6). The common pharmacophore feature and the receiver operating characteristic (ROC) plot of AAAD_1 are presented in Figure 3a and b, respectively.
From Figure 3a, the selected pharmacophore hypothesis AAAD_1 contained features namely three hydrogen bond acceptors (spheres with arrows A1, A2, and A3 indicated in red) and one hydrogen bond donor (sphere with arrow D1 indicated in blue). The alignment generated by the best pharmacophore hypothesis, AAAD_1, with the most active and inactive compounds mapped to the pharmacophore model are depicted in Figure 4.

Virtual screening
The pharmacophore hypothesis AAAD_1 was considered to search the MolPort database in ZincPharmer to obtain hits that best matched the pharmacophoric features of the pharmacophore model. A total of 1,82,892 compounds were fetched as hits from ZincPharmer. To keep up the degree of consistency with the pharmacophore model, the hit compounds were ranked based on the RMSD values. To reduce the number of hits, we eventually used 0.5 A˚of the maximum RMSD value to prune the hit list which resulted in 1666 compounds.

Predicting pIC50 values using SVM models
Here, the predicted pIC50 values of the retrieved hit compounds were identified by screening against the developed   Table S2. Finally, top leads with predicted pIC50 value greater than 8.0 were subjected to molecular docking analysis to confirm its interaction with LasR.
Analysis of the docked complex suggested that nonbonding interactions namely hydrogen bonds, hydrophobic contacts, p-p interactions, polar, charged positive and negative interactions contributed to the binding of lead compounds into the active site. The protein-ligand interaction profile (Figure 6a) revealed that ZINC3851967 formed hydrogen bond with amino acid residue Trp60 alone and p-p contact was found with amino acid Tyr56. Hydrophobic interactions were observed with amino acid residues including Tyr47, Ala50, Ile52, Leu40, Leu39, Leu36, Tyr64, Leu110, Ala105, Trp88, Tyr93, Phe101, Val76, Cys79, Leu125, and Ala127. Also, residues namely Thr75, Ser129, Thr115 were engaged in polar interactions and Asp73 formed charged negative interaction with the protein. ZINC4024175 (Figure 6b) formed hydrophobic contacts through residues Ile52, Tyr64, Leu36, Tyr56, Leu110, Ala105, Ile92, Tyr93, Trp88, Val76, and Ala127. It exhibited hydrogen bond interactions with amino acid residues Trp60, Thr75, Ser129 and p-p interaction with Phe101. The polar and charged negative interactions of the lead with LasR were observed with residues Thr 115 and Asp73, respectively.
Analysis of ligand binding pocket of native ligand (OdDHL) suggested the involvement of hydrogen bond interactions with residues including Tyr56, Trp60, Asp 73, and Ser129 (Figure 7). In particular, the native ligand forms  hydrogen bond interactions through its polar head group amino acids namely Tyr56, Trp60, Asp73, Ser 129 in the ligand binding pocket. On the other side of the active site, the nonpolar tail group are involved in hydrophobic interactions involving residues Leu36, Leu40, Ile52, Val76, and Leu75. These interactions shield the ligand binding pocket of LasR and sequester the residues near the ligand binding pocket into a favorable environment and activates LasR (Zou & Nair, 2009). The analysis of interaction of cocrystallized ligand molecule (activator TP-1) with LasR (PDB ID: 3IX4), revealed that TP-1 possessed stable interaction with residues, viz., Tyr56, Trp60, Arg61, Asp73, Thr75 and Ser129. In connection to our research findings, Sadiq et al. (2020) identified six hit molecules from DrugBank database against LasR of Pseudomonas aeruginosa and reported that the screened inhibitors exhibited hydrogen bond contacts without sharing all the four active residues namely Tyr56, Trp60, Asp73, Ser129 and p-p stacking was majorly attributed with residues Tyr64, Trp88, and Phe101. A study by Kalia et al. (2017) reported seven potential inhibitors against LasR from preliminary database in PyRX software and found that none of the compounds were engaged in hydrogen bond interaction with all the four active site residues. In addition, it is already well known that binding of native ligand to active site of LasR promotes dimerization which eventually influence the transcription of genes regulated by LasR. Hence, the possible mechanism of QS inhibition by these molecules would be to prevent dimerization by competing with the native ligand (Kalia et al., 2017).
Another study by Nain et al. (2020) also witnessed seven potential compounds against LasR through high throughput virtual screening (HTVS) approach and suggested that the compounds did not establish nonbonding interaction patterns similar to that of the native ligand. Particularly, majority of the compounds shared amino acid residues Tyr56, Asp73, Ser129 only or with either single or no or more than four hydrogen bonds. A study by Sharma et al. (2019) has suggested that hydrocinnamic acid from Enterobacter xianfangensis are effective against LasR of P. aeruginosa and it showed hydrogen bond contacts involving residues Tyr56, Asp73, Ser129 and p-p interactions was observed with Phe101. Moreover, a study by Bottomley et al. (2007) has also stated that a good agonist of LasR should be able to form hydrogen bond interaction with residues involving Tyr56, Trp60, and Ser129 and could fill the entire ligand binding pocket of LasR. Lack in the ability to correctly pack into the entire ligand binding pocket of the protein core involving the four major amino acid residues namely Tyr56, Trp60, Asp73, and Ser129 would explain the quorum sensing inhibitor (QSI) activity of the lead compounds.

In silico ADME and drug likeliness prediction
Poor pharmacokinetic profile is often considered as major drawback for the lead molecules during clinical trials. Thus, it would be beneficial if these issues are addressed at early stages. Application of in silico methodology for prediction of possible pharmacokinetic parameters and drug likeliness profile of hit compounds will be more useful for identification of best lead molecules (Pal et al., 2019). Keeping this in view, the four compounds selected as top leads after various screening namely ZINC3851967, ZINC4024175, ZINC2125703, and ZINC3851966 were subjected to ADME and drug likeliness prediction.
Here, the ADME properties of the lead compounds were analyzed using SwissADME tool to check their pharmacokinetic parameters namely water-solubility, lipophilicity, medicinal chemistry, and drug-likeness of the compounds. The ADME properties of the four lead compounds are listed in Table 8. The efficacy and safety of a drug largely depends on the bioavailability of a drug (Opo et al., 2021). In our study, we could find all the four top lead compounds were found to exhibit a good bioavailability score of 0.55-0.56. All the selected lead compounds passed the pharmacokinetic requirements successfully and were found to be within the permissible range (Table 8). Additionally, drug like features which include molecular weight, H-bond acceptors, H-bond donors, and logP based on Lipinski's rule of five were confirmed by drug likeness property analysis (Jana et al., 2018). None of the leads violated Lipinski's rule of five which further confirms the drug-like behavior of the lead compounds.
Overall as per the results of SwissADME analysis, we could suggest that these selected compounds may be considered as leads for developing a LasR inhibitor as they obey all the rules of pharamacokinetics and drug likeliness.

Molecular dynamics simulation
To study the structural stability of docked complexes, molecular dynamics simulation was performed for 100 ns to explore the binding mode of the top lead compounds. Molecular dynamics simulations was done using Desmond module included in the Schr€ odinger software. The stability of the complexes LasR-ZINC3851967, LasR-ZINC4024175, LasR-ZINC2125703, and LasR-ZINC3851966 were assessed based on root mean square deviation (RMSD), root mean square fluctuation (RMSF), intramolecular hydrogen bonds, and radius of gyration (Rg). Additionally, to understand the conformational changes in the protein over a period of 100 ns, the simulated complex of LasR in its apo state and in the presence of lead compounds at specific trajectory frames namely frame 1, 200, 400, 600, 800, and 1000 were analysed and are shown in Figure 8 (a-d).

Root mean square deviation (RMSD) analysis
To understand the conformational changes of the protein in presence of ligand and in its apo state, RMSD was  determined. Initially, the apo form of LasR protein was simulated for 100 ns to assess the stable and equilibrated state of protein (Figure 9). The stability of the protein is evaluated by RMSD of the backbone atoms related to the initial docking structure. The RMSD of the apo form of protein was stabilized around 9.0 Å which suggested that the protein has undergone a large conformational changes to maintain a thermal average structure towards the end of the simulation.
The RMSD of LasR-ZINC3851967 complex was initially found to be fluctuating upto 60 ns. Thereafter the RMSD value of the complex reached a maximum to 9.0 Å and was maintained around this value upto the end of simulation which proved the stability of ligand in its binding pocket. In this connection, it could be inferred that the ligand has not diffused away from its initial binding site till the end of simulation. Though the simulation has converged, the RMSD of protein with respect to its backbone atoms was however stabilized around a fixed value of 12 Å after 50 ns. These changes indicated that the protein has undergone a large conformational change in its structure during the simulation process (Figure 10a). In case of LasR-ZINC4024175, the RMSD of holoprotein was constant around 3.0 Å untill a simulation period of 60 ns but after 60 ns, there was a limited fluctuation in the RMSD value. However, it became constant around 7.5 Å after 90 ns which suggested that the ligand has not diffused away from the binding pocket of LasR protein. In case of protein RMSD, the protein has underwent a rigorous change after 60 ns to keep the system equilibrated and to maintain the structural conformation throughout the simulation process (Figure 10b).
On the other hand, the RMSD of LasR-ZINC2125703 complex was observed around 7.5 Å upto 75 ns followed by there was a small leap in the RMSD at a time interval of 75-85ns. But eventually it was again maintained around 7.5 Å till the stimulated period of 100 ns. From this observation, it was evident that the protein-ligand complex was perfectly aligned on the protein backbone of the reference and it was likely that the ligand has not diffused away from its binding pocket (Figure 10c). Finally, the RMSD of LasR-ZINC3851966 complex was around 6.0 Å upto 75 ns. Afterwards, there was a slight leap in the RMSD value which in due course became stable around 7.5 Å till the end of the simulation. The protein became increasingly rigid on binding of ligand and hence the complex was stable upto 100 ns. Though certain deviations were observed, however as the ligand RMSD value was not significantly larger than the protein RMSD it could be concluded that the lead compound did not diffuse away from the binding site throughout the simulation period of 100 ns (Figure 10d).

Root mean square fluctuation (RMSF) analysis
The average fluctuation of the docked protein residues was evaluated by the RMSF of all residues during the simulation process. Fluctuation in residues demonstrates the binding poses of the leads and its interactions as they are dependent on residual fluctuation RMSF values (Nazar et al., 2020). Hence, to characterize the conformational and local changes in the amino acid residues of LasR on binding of lead compounds, root mean square fluctuation (RMSF) of protein was monitored. Initially, the local changes along the protein chain in its apo form was determined to identify the area of protein that fluctuate more during the simulation (Figure 11). The highest RMSF value was around 9.0 Å for the apoprotein during 100 ns simulation.
In case of binding of leads to the protein, there was limited fluctuations in the amino acid residues of the protein as witnessed from the interaction diagram throughout the entire simulation process (Figure 12). These fluctuations were observed as a result of stable binding of the ligands to protein. Typically, the areas of the protein that fluctuate the most during the simulation process are depicted as sharp peaks in the plots (Figure 12 (a-d)). In general, the N-and Cterminal tails of a protein fluctuated more than any other area of the protein. Here the ratio of fluctuation at residual level are depicted by the RMSF plots. The RMSF result of ZINC2125703 (Figure 12c) reflected the highest loop fluctuations followed by ZINC3851967 (Figure 12a) compared to other lead compounds throughout the simulation process. Thus, this comparative analysis revealed the significance of ZINC4024175 and ZINC3851966 over the other two leads as they fluctuated little and are much better during the entire simulation period. In particular, the highest average RMSF value of 16.0 Å was also exhibited by ZINC4024175 and ZINC3851966 which indicated the greater flexibility of the complexes during simulation.
Additionally, the RMSF profiles revealed that the interaction of lead compounds with specific amino acid residues of LasR protein lead to primary changes in the crystal structure of the protein. The crystal structure of protein comprises of 34.94% of helices and 15.56% of beta strands ( Figure S1). In this context, these secondary structure elements namely alpha helices and beta strands are usually more rigid compared to unstructured part of the protein, and hence fluctuate less than the regions of loop. On the other side, ligand RMSF evaluated the identity of fluctuations of atom positions in the ligand on binding of protein. The fluctuations in the internal atoms of the lead compounds are evident from the ligand RMSF plots ( Figure S2 (a-d)).
Ligand RMSF plots provide information on how different fragments of leads interact with the protein and their entropic role during the event of binding. In general, greater fluctuations are observed in the leads upon exposure of atoms to the solvent molecules and lesser or no fluctuations are seen when there is high force of attraction at the proximity of the binding site of protein (Manoharan et al., 2021). From our results, it was observed that lead compounds namely ZINC3851967 ( Figure S2a) and ZINC2125703 ( Figure S2c) were initially around an RMSF value of 2.5 Å which gradually increased to 4.0 Å during the course of time due to fluctuations and eventually decreased to a value around 2.0 Å at the end of simulation. The other two leads ZINC4024175 ( Figure S2b) and ZINC3851966 ( Figure S2d) were maintained around an RMSF value of 3.0 Å with respect to protein throughout the simulation process.
The lead compound ZINC2125703 did not show any significant hydrogen bond or hydrophobic interactions with   LasR. Amino acid residues Gln45 and Tyr47 showed an initial hydrogen bond contact only over less than 30% of the simulation whereas hydrophobic contacts was observed with Leu36, Leu40, Tyr47, Ala50, Ile52, Trp60, Arg61, Tyr64, Val76, Cys79, Leu125, and Ala127 but those were also not significant ones (Figure 13c). In case of ZINC3851966, the lead made hydrogen bond with residues Tyr56, Trp60, and Thr115, of which only Trp60 was engaged in the interaction around 90% of the simulation time. Hydrophobic interactions were made with amino acids namely Leu36, Tyr47, Tyr56, Trp60, Tyr64, Pro74, Trp88, Tyr93, Phe191, Phe102, Ala105, Leu110, and Ala127 during the period of simulation (Figure 13d). In addition, amino acid residues with multiple contacts were also observed. It was represented as a timeline plot that revealed the overall specific interactions the protein made with the lead compounds during the course of simulation and also exploited about which amino acid residue interact with the leads in each frame of trajectory ( Figure S3 (a-d)).

Ligand torsion profile
Here, the torsional rotatable bonds are depicted by a dial and a bar plot. These plots give an insight into the conformational changes of rotatable bonds in the lead compounds throughout the period of simulation. Information about these torsional rotations are necessary to understand the conformational strain all the leads withstand to maintain protein-bound conformation. The ligand torsion profiles obtained for the four lead compounds during the period of 100 ns simulation are presented in Figure 14. Our results revealed that there are six rotatable bonds in ZINC3851967 ( Figure 14a) and ZINC4024175 (Figure 14b). In case of ZINC2125703 ( Figure 14c) and ZINC3851966 (Figure 14d),  there were seven and six rotatable bonds respectively. Furthermore, in case of all the four leads the rotatable bonds were found between À180 and 180 . The dial plot described the torsional conformation seen during the simulation process. It illustrated that initially the simulation was at the center of the dial plot and as the time evolves it was plotted radially outwards. The bar plot showed the probability density of each torsional rotation by summation of potential of the related torsions.

Ligand properties
The precise stability of the lead compounds in the binding pocket of LasR during the simulation process was explored by examining various properties of the leads including RMSD, radius of gyration (Rg), number of intramolecular hydrogen bonds, molecular surface area (MSA), solvent accessible surface area (SASA), and polar surface area (PSA). Rg determines the extendedness of the lead compounds which is an indicator system that demonstrates the overall shape by variation in expansion or compactness of the system. MSA is considered equivalent to van der Waals surface area with 1.4 Å probe radius calculation. SASA defines the surface area of the lead compounds accessed by water molecules. PSA illustrated the SASA in the leads that is contributed by nitrogen and oxygen atoms.
Our findings demonstrated that in case of ZINC3851967, the Rg value was stable around 4.90 Å which defines the tightly packed nature of the lead into the protein. No intramolecular hydrogen bonds were detected and values of SASA fluctuated upto 60 ns and eventually became stable around 50 Å 2 untill the end of simulation. The polar and molecular surface area were maintained at a constant value of 60 Å 2 and 344 Å 2 , respectively ( Figure S4a). The other lead compound ZINC4024175 possessed intramolecular hydrogen bonds and exhibited an Rg value of 3.30 Å throughout the simulation. There was an initial fluctuation in SASA alone up to 15 ns but eventually it became consistent around the value of 30 Å 2 . All the other properties including PSA (296 Å 2 ) and MSA (231 Å 2 ) were stabilized around a fixed value till the end of simulation ( Figure S4b).
On the other hand, ZINC2125703 also showed intramolecular hydrogen bonds. In terms of other parameters, an Rg within the range of 4.0 À 4.4 Å was exhibited which evidently proves the tight packing of the lead and SASA alone portrayed a small leap for an interval of 10 ns and eventually became stabilized around a fixed value of 140 Å 2 ( Figure  S4c). Finally, ZINC3851966 due to the rigid packing into the binding pocket of LasR showed an Rg value of 4.80 Å. Fluctuations on properties namely SASA, MSA, and PSA were observed for a small interval of time and later on remained stable with an value of 50 Å 2 , 332 Å 2 , and 64 Å 2 respectively upto the end of simulation process and without intramolecular hydrogen bonds detected ( Figure S4d).
Among these, Rg and SASA are significant parameters which explains the conformational changes of the lead compounds within the binding pocket of the protein. Hence we have analysed the measures of Rg and SASA of protein in its unbound and bound states. The Rg determines the protein's compactness at various conditions during the simulation analysis. In general, higher values of Rg illustrates less tight packing of atoms, whereas tight packing of atoms is depicted by lower values of Rg. From our results (Figure 15), it was observed that the overall protein structure dimensions on binding of leads were identical to that of the protein in its apo form, but one of the lead ZINC4024175 alone, showed an increased compactness after 70 ns with an highest value of 38.94 Å and eventually reached 38.4 Å towards the end of simulation. On the other side, ZINC3851966 was found to be more comparable to apoprotein. The other two leads ZINC3851967 and ZINC2125703 exhibited lower values of Rg which suggested the tight packing of the atoms of these leads into the binding pocket of LasR. These findings suggested that the docked complexes had underwent only small deviations from their native conformations to maintain its stability.
The parameter SASA was examined to understand the conformational changes and to explore the information on exposure of protein to solvent molecules. From our observation (Figure 16), the average SASA values for apo-LasR, LasR-ZINC3851967, LasR-ZINC4024175, LasR-ZINC2125703, and LasR-ZINC3851966 were found to be 62254.77, 63399.61, 63947.03, 63695.87, and 64832.97 Å 2 , respectively. Here, it can be seen that ZINC4024175 and ZINC3851966 showed the highest value for SASA compared to the other two leads namely ZINC3851967 and ZINC2125703. This result inferred that the leads ZINC3851967 and ZINC2125703 possessed more complex stability with stable lower peaks till the end of the simulation process.

Conclusion
P. aeruginosa, a Gram-negative bacterium are becoming ineffective to conventional antibiotic therapy because of their extensive colonizing ability and multiple antibiotic resistance. QS is an important mechanism which regulates the pathogenesis of P.aeruginosa. Hence attenuation of virulence by inhibiting QS network could serve as a challenging approach to prevent its infections. As QS system in P. aeruginosa is interconnected in a hierarchical pattern, inhibiting las system can destroy the whole QS network and its functions. In this study, we have used chemoinformatics approach to discover new lead compounds against LasR protein of P. aeruginosa. SVM based predictive QSAR regression models were developed by prioritizing molecular descriptors utilizing 66 reported LasR antagonists by Weka software. The generated models met the validation criteria for their use in predicting the LasR inhibitory activity of new lead compounds. A ligand-based pharmacophore model was developed which comprises of three hydrogen bond acceptors and one hydrogen bond donor. The best hypothesis AAAD_1 was taken as 3D query to search for LasR inhibitors from MolPort database in ZincPharmer. The compounds obtained as hits from the database were evaluated against the SVM models. Molecular docking study was then carried out to rerank the lead compounds and ADME properties were determined. Finally, the structural stability of the identified lead compounds with LasR was confirmed by molecular dynamics simulation analysis. At the end, four lead compounds namely ZINC3851967, ZINC4024175, ZINC2125703, and ZINC3851966 were identified as drug-like molecules that can potentially inhibit LasR of P. aeruginosa. In conclusion, these integrated approaches of machine learning, pharmacophore based virtual screening, molecular docking and molecular simulation studies may provide an insight and will aid in drug designing and discovery for treating P. aeruginosa biofilm infections in the future. Here, the black line represents the apoprotein whereas red, blue, green, and violet lines represent the lead compounds ZINC3851967, ZINC4024175, ZINC2125703, and ZINC3851966, respectively. Figure 16. Conformational changes in the total solvent accessible surface areas (SASA) of the lead compounds compared to apo form of LasR where the black line indicates the apoprotein and the red, blue, green, and violet lines indicates ZINC3851967, ZINC4024175, ZINC2125703, and ZINC3851966, respectively.