Machine learning and traditional QSAR modeling methods: a case study of known PXR activators

Abstract Pregnane X receptor (PXR), extensively expressed in human tissues related to digestion and metabolism, is responsible for recognizing and detoxifying diverse xenobiotics encountered by humans. To comprehend the promiscuous nature of PXR and its ability to bind a variety of ligands, computational approaches, viz., quantitative structure–activity relationship (QSAR) models, aid in the rapid dereplication of potential toxicological agents and mitigate the number of animals used to establish a meaningful regulatory decision. Recent advancements in machine learning techniques accommodating larger datasets are expected to aid in developing effective predictive models for complex mixtures (viz., dietary supplements) before undertaking in-depth experiments. Five hundred structurally diverse PXR ligands were used to develop traditional two-dimensional (2D) QSAR, machine-learning-based 2D-QSAR, field-based three-dimensional (3D) QSAR, and machine-learning-based 3D-QSAR models to establish the utility of predictive machine learning methods. Additionally, the applicability domain of the agonists was established to ensure the generation of robust QSAR models. A prediction set of dietary PXR agonists was used to externally-validate generated QSAR models. QSAR data analysis revealed that machine-learning 3D-QSAR techniques were more accurate in predicting the activity of external terpenes with an external validation squared correlation coefficient (R2) of 0.70 versus an R2 of 0.52 in machine-learning 2D-QSAR. Additionally, a visual summary of the binding pocket of PXR was assembled from the field 3D-QSAR models. By developing multiple QSAR models in this study, a robust groundwork for assessing PXR agonism from various chemical backbones has been established in anticipation of the identification of potential causative agents in complex mixtures. Communicated by Ramaswamy H. Sarma


Introduction
Humans encounter multiple foreign compounds, or xenobiotics, daily in the modern world.To detoxify and eliminate these compounds from the human body, xenobiotics must first be identified, and the appropriate metabolizing response must be initiated.Pregnane X receptor (PXR) is a nuclear receptor considered a master xenobiotic receptor that coordinately regulates the expression of genes encoding drug-metabolizing enzymes and drug transporters, which are responsible for the detoxification and excretion of xenobiotics (Oladimeji & Chen, 2018).Activation of PXR leads to upregulation of the PXR target genes, induction of various cytochrome P450 isozymes, and multidrug resistance proteins, including most chemoresistance-related enzymes, which may contribute to detoxification and the acquired resistance to chemotherapy (Qiao et al., 2013).Indeed, as xenobiotics occur in various chemical scaffolds, PXR responds to distinct subsets of xenobiotics, resulting in directed promiscuity versus random promiscuity (Watkins et al., 2002).Therefore, developing computational approaches to explore and establish effective predictive models to predict PXR's activation by xenobiotics before conducting the time-consuming cell-and animal-based experiments would add tremendous scientific value.The continuing growth of cheminformatics, high-throughput data in the molecular sciences, and the trend of minimizing animal use in biomedical research suggest the importance of the quantitative structure-activity relationship (QSAR) approach in expediting the discovery process, reducing the number of candidates to be tested experimentally and to rationalize their choice (Cherkasov et al., 2014).As PXR is a prominent transcription factor capable of being activated by a diverse range of chemical structures, its role in human carcinogenesis and toxicology is widely touted.However, previous computational studies involving PXR have encountered shortcomings due to a limited structural diversity in QSAR training and test sets, together with the promiscuity of PXR's ability to bind a variety of substrates.
Few attempts have been made to date to construct ligand-based models using a subset of compounds that encompass the broad structural diversity of those known to modulate PXR.The traditional structure-based screening method of docking is confounded by PXR's large binding pocket volume (1150 Å 3 ), which allows the binding of small molecules in multiple orientations (Watkins et al., 2001).Indeed, a study involving docking a small set of compounds to multiple crystal structures of human PXR yielded a range of docking scores (Kortagere et al., 2010), highlighting some limitations associated with conventional methods and their partial utility in predicting the ligand-protein interactions, particularly receptors with large ligand-binding domains.
Additionally, producing a consensus model for the promiscuity of PXR is not easily obtainable and often requires large data sets; therefore, laborious data curation efforts need to be employed.Kortagere et al. (2010) attempted to generate predictive PXR models on a small subset of steroidal agonists and an 11-compound set from the ToxCast database.Yet, results showed that a simplistic approach yielded poor predictability across diverse molecular scaffolds (Kortagere et al., 2010).With a large data set to describe PXR's promiscuity, QSAR can be utilized as an alternative ligand-based screening method to assess the array of structures from the series of molecules in the data set as they correlate to the associated PXR activity.
Previous publications have related PXR agonism to ligands through classification-based methods, simplifying the experimental data into a binary output of either positive or negative agonism (Dybdahl et al., 2012;Rosenberg et al., 2017).However, PXR agonists occur in various scaffolds with molecular weights ranging from 172.2 to 853.9 Da and have been reported to display a wide range of PXR inductive potentials.Since these activities (EC 50 data) are experimentally derived, regression-based QSAR models are anticipated to predict continuous output variables with greater certainty rather than simplifying the assessment.Additionally, machine learning techniques, which are more efficient and can easily be scaled to large datasets, are also expected to improve the activity predictions, as these techniques involve pattern recognition algorithms to discern mathematical relationships between empirical observations of small molecules and extrapolate them to predict chemical, biological and physical properties of test compounds (Lo et al., 2018).In traditional 2D-QSAR modeling, only parameters such as geometry, topology, polar surface area and molecular fingerprints are considered of assessed compounds; whereas, 3D-QSAR deepens the assessment of compounds by including steric properties (Peter et al., 2019).By applying machine learning models to QSAR modeling, increasingly larger data sets can be processed compared to traditional QSAR modeling, leading to higher efficiency in learning from massive datasets for modern drug discovery.Machine learning techniques also grant increased data interpretation of descriptor importance and attempt to mitigate the overfitting prevalent in QSAR modeling (Lo et al., 2018).This study presents comprehensive regression-based QSAR approaches, focusing on predicting a substantial collection of 500 PXR agonists for ligand binding activity towards human PXR using traditional and machine learning methods applied to 2D-and 3D-QSAR modeling.This study was further extended by validating the constructed QSAR models and investigating the applicability domain.Also, an Activity Atlas that visualized modeled interactions of PXR agonists, such as electrostatics, hydrophobics and shape within the ligand-receptor binding site, was compiled from the field points of agonists in the field-based 3D-QSAR model (Activity Atlas TM ; in Forge, 10.6, CressetV R ; Cheeseright et al. 2006).

2D-QSAR (AutoQSAR)
AutoQSAR is a fully automated QSAR model generator available in the Schr€ odinger Maestro software suite, capable of descriptor generation, feature selection and creation of multiple QSAR models from several algorithms, including kernelbased partial least squares, naive Bayes, and ensemble-based recursive partitioning with different training/test set splits (An et al., 2013;Duan et al., 2010;Schr€ odinger Release 2021-4, 2021a).Using a diverse set of 500 PXR agonists, a total of ten 2D-QSAR models were generated using Schr€ odinger AutoQSAR.The best model from Table 1 was selected based on correlation coefficient (R 2 ) and square coefficient of crossvalidation (Q 2 ) values.The leave-one-out method of cross validation was implemented in the AutoQSAR model construction.All QSAR models in this experiment were assembled from the same set of 500 agonists at a ratio of 3 agonists to the training sets and 1 agonist to the test sets.For the input of 2D structures in this experiment, the ten returned QSAR models incorporated descriptors by a correlation coefficient of 0.80 and three fingerprint types, as indicated in the Model Code of Table 1 and detailed in Table S1, Supplementary Information, for instance, radial fingerprints in Kpls_radial_43.
Data analysis revealed the Kpls regression method, which uses both descriptors and fingerprints, had generated a high-scoring model.All models built on the 500 PXR agonists returned R 2 values equal to or greater than 0.6 and Q 2 values greater than 0.5, which meet the thresholds recommended for robust QSAR models (Golbraikh & Tropsha, 2000).By ranking models based on their R 2 values, 'Kpls_radial_43' was selected as the best 2D-QSAR model.Kpls_radial_43 maintains additional model statistics that indicate it is an acceptable representation, such as the lowest standard deviation and root-mean-square deviation (RMSE) scores amongst all ten AutoQSAR models.The scatter plot of the prioritized model, Kpls_radial_43, is shown in Figure 1.
To assess the adequate number of Canvas-generated descriptors necessary to describe the variance of the 500 PXR agonists involved in the study, a principal component analysis (PCA) was computed in Orange data mining software.As found by Yoo and Shahlaei (Yoo & Shahlaei, 2018), PCA can be easily applied to the pool of calculated structural descriptors for complex datasets in QSAR studies and will yield extracted information that can be used for further appropriate analysis.From Figure 2, it was observed that more than 75% of the variance within the 500 PXR agonist dataset could be explained by 16 Canvas-generated descriptors.
It was also noted that increasing beyond 16 components approaches an asymptote in the PCA, indicating that 16 descriptors is a sufficient amount for a successive step in dimensionality reduction, i.e. t-distributed stochastic neighbor embedding (t-SNE).
The t-SNE algorithm was selected to visualize the diverse dataset of PXR agonists due to its known ability as a powerful tool to visualize complex high-dimensional data sets in diverse experimental settings (Abdelmoula et al., 2016;Amir et al., 2013;Bushati et al., 2011;Mahfouz et al., 2015;Taskesen & Reinders, 2016).t-SNE can generate a visualization of molecular and biological similarity by mapping the chemical space of dataset compounds through recognizing the molecular similarity in a broad set of diverse molecules (Janssen et al., 2019).Again, utilizing the Orange data mining suite, a representation of the chemical space of the 500 PXR agonists was generated via the t-SNE algorithm and is shown in Figure 3.
Distinct clusters of agonists were apparent in the t-SNE map, indicating that the descriptors calculated by Canvas appropriately describe the compounds within the dataset.Moreover, the compactness of the entirety of the t-SNE map illustrates the retained global association between all 500 agonists following computation of the various Canvas descriptors.

2D-QSAR (DeepChem/AutoQSAR)
In the next step, an augmented AutoQSAR approach involving DeepChem machine learning that is implemented in Schr€ odinger Maestro was explored to evaluate the utility of machine learning on PXR agonism assessment versus QSAR models generated through traditional AutoQSAR.The Deep learning method (DeepChem Project; DeepAutoQSAR) is recommended for efficient training of predictive models on very large datasets in drug discovery, quantum chemistry, materials science and biology by discovering patterns in the input data via deep neural networks (Golbraikh & Tropsha, 2000).Since the set of PXR agonists used for QSAR model generation contains less than 5000 molecules, it was expected that DeepChem/AutoQSAR (DeepAutoQSAR) would yield similar results to traditional AutoQSAR.When comparing the best model created by AutoQSAR to the DeepChem random   forest (RF) regression model from Table 2, it can be seen that the neural networks implemented in DeepChem did not produce similar or superior statistics to traditional AutoQSAR for the training set R 2 .
Furthermore, the DeepChem-augmented AutoQSAR model did not predict a significantly different R 2 value for the external prediction set of natural products (Chang, 2009;Seow & Lau, 2017;Vyhl� ıdalov� a et al., 2020;Table S3, Supplementary Information), which is not a part of the 500 training and test set agonists, when compared to traditional AutoQSAR (R 2 ¼ 0.50).This indicated that the neural networks in the RF regression model assessed the PXR agonism activity of the natural product external validation set to the same degree as the traditional AutoQSAR method, as shown in Table 2 and depicted in Figure 4. Overall, the    RMSE was also similar between traditional AutoQSAR and DeepAutoQSAR.

Field-based 3D-QSAR
To gain a deeper insight into the relationship between PXR agonist structure and activity, the protein crystal structure (PDB ID: 2O9I) (Xue et al., 2007) of PXR cocrystallized with agonist T0901317, was implemented in field-based 3D-QSAR construction.This would allow molecular alignment of the electrostatics and hydrophobics of the 500 agonists to be conformationally constrained to the experimentally determined pose of the co-crystallized ligand (T0901317) within the ligand-binding site of PXR.The model with eight components was considered the most representative of PXR agonism due to possessing the highest square coefficient of cross-validation (Q 2 ) for the agonist predictability.This model also exhibited a training set R 2 that can be considered significant, as it is above the threshold of 0.67 (Golbraikh & Tropsha, 2000), which indicated acceptable predictivity.
To fully assess the validity of the eight-component fieldbased 3D-QSAR model, a natural product-derived set of monomethylated indoles and terpenes was used as an external test set in a similar approach to all other QSAR models.Because the alignment of bioactive molecular structures is crucial in 3D-QSAR, it was expected that the diverse terpene structures, such as carnosic acid, carnosol and ursolic acid derived from Rosmarinus officinalis, schisandrins A and B, and schisandrol B derived from Schisandra chinensis, and E-and Z-guggulsterone derived from Commiphora mukul, would be difficult to assess accurately.Indeed, the monomethylated indoles, sourced from human intestinal microbiota or from dietary intake, were likewise considered difficult structures to predict accurately, owing to their relatively small size compared to the larger PXR agonists contained within this study.Nevertheless, the field-based 3D-QSAR model yielded an R 2 value of 0.68 after the prediction of the external terpene set.This result is an improvement when compared to the R 2 value of the DeepChem/AutoQSAR prediction, and the resulting graph is included in Figure 5.

Machine learning regression 3D-QSAR
As mentioned earlier, Cresset V R Forge V10.6 (Cresset V R Inc., UK) includes four regression algorithms within the QSAR model building software.Support Vector Machine (SVM), k-Nearest  Neighbor (kNN), RF, and Relevance Vector Machine (RVM) algorithms can be utilized simultaneously by Cresset V R Forge to generate a regression model containing the best statistics, such as R 2 and Q 2 , which is retained as most robust.Of the four available regression algorithms in Cresset V R Forge, Support Vector Machine (SVM) was prioritized for this study, since this method has been successfully used to solve classification and correlation problems in bioinformatics (Xiang-Min et al., 2007).According to Yao et al., SVM also possesses comparative advantages to traditional regression algorithms and neural network methods, such as those used in DeepAutoQSAR (Yao et al., 2004).These advantages include a global optimum, an acceptable generalization ability, simple implementation, few free parameters and dimensional independence (Ciura et al., 2020;Lashin et al., 2020;Suzuki & Katouda, 2020).
Generation of an SVM model of PXR agonists produced the best-fitting training set R 2 amongst all machine learning regression 3D-QSAR models with a value of 0.96, as seen in Table 4.Moreover, the SVM model retained the highest test set correlation (R 2 ¼ 0.67) and showed the lowest error in training set prediction.
Furthermore, the SVM regression 3D-QSAR model was checked for overfitting by principal components analysis (PCA) of the same descriptors used for building the SVM model, since PCA can extract signals from the noise to a moderate degree and is not sensitive to the type of noise (Razifar et al., 2009).As shown in Figure 6, most compounds used in the construction of the SVM model show correlation of similarity due to their location within the highlighted regions, as depicted in orange.The selected components in Figure 6 are most attributed to the variance of the agonists within the model.Additionally, Figure 7 graphically depicts statistical details associated with building the SVM model in graphical representation.
The external validation set consisting of naturally-occurring terpenes (Chang, 2009;Seow & Lau, 2017;Vyhl� ıdalov� a et al., 2020) was again used to assess the validity and robustness of the SVM regression 3D-QSAR model.As 2D-QSAR machine learning techniques produced comparable predictive R 2 results to traditional 2D-QSAR results (R 2 ¼ 0.52 and 0.50, respectively), the machine learning regression-based 3D-QSAR model was expected to yield similar results to the external validation R 2 value the field-based 3D-QSAR model.As illustrated in Figure 7c and elaborated in Table 5, the calculated R 2 of the prediction set for the SVM model (R 2 ¼ 0.70) is comparable to the prediction set R 2 for the field-based 3D-QSAR model (R 2 ¼ 0.68).
Nevertheless, the SVM regression 3D-QSAR model was more accurate in the categories of training set R 2 and test set R 2 than any other QSAR model, indicating superior robustness of the model.Additionally, the error associated with the SVM 3D-QSAR model (RMSE ¼ 0.15) was significantly lower versus the 2D machine learning DeepAutoQSAR model (RMSE ¼ 0.51).A full comparison of the statistics from the best-constructed QSAR models employing the four major techniques discussed in this study is arranged in Table 6.

Activity Atlas
A visual representation of the field points associated with the 500 agonists for PXR in the field-based 3D-QSAR model was produced through the Activity Atlas TM (Activity Atlas TM ; in Forge, 10.6, Cresset V R ; Cheeseright et al. 2006) feature in Cresset V R Forge V10.6 (Cresset V R Inc., UK).The Activity Atlas TM allows the user to assess individual molecules for key features in the chemical scaffold that correspond to an overall summary of the alignment of the training and test set compounds.Figures 8 and 9 show the summary of the Activity Atlas generated for PXR, as well as the individual fields associated with the most and least active agonists.In shape summary (Figures 8a and 9a), the presence of the chemical moiety within the green areas indicated favorable or desired substituents.In contrast, the presence of the chemical moiety within the magenta portions indicated a negative interaction.The atlas of hydrophobic regions followed a similar trend, with green regions indicating spatial areas desiring more hydrophobic portions of molecules versus the magenta regions, which preferred hydrophilic substituents (Figures 8b  and 9b).Electrostatics (Figures 8c and 9c) were summarized in different colors, with favorable positive charge density regions indicated by red and regions that are favorable for negative charge density indicated by blue colors.Since descriptors for individual agonists were generated from fields surrounding the molecules, a summary of the electrostatics and sterics defining each agonist was generated (Figures 8d

Table 5. Statistical comparison of two methods of 3D-QSAR modeling of PXR agonism using Cresset
V R Forge V10.6.and 9d), with field point colors emulating the same colors as the activity cliff summaries.The degree to which compounds shift alignment within the active site of PXR was also explored as electrostatic and steric variance (Figures 8e and  9e), shown in gray and green, respectively.As expected, the most active agonist of PXR (MN425) better aligns with the areas of favorable shape, hydrophobics, and electrostatics compared to the least active (MN190) PXR agonist.

Applicability domain
To perceive the certainty surrounding the predictions of the models generated in this study, the applicability domain of the agonists in the training, test and external prediction sets was investigated.As it applies to QSAR modeling, the applicability domain determines the similarity of a particular query molecule to the compounds used to construct the QSAR  models (Weaver & Gleeson, 2008).Structurally similar compounds should be evaluated against the training set molecules so that the final QSAR model performs well due to the principles of similarity, as the inclusion of highly different compounds to the training set compounds would yield poor predictions (Roy et al., 2015).The Model Disturbance Index (MDI) versus Prediction Error (PE) method (Yan et al., 2014) and the standardization approach (Roy et al., 2015) were utilized to measure the similarity of PXR agonists.
As discussed by Yan et al. (2014), the applicability domain assesses the MDI of query compounds in terms of similarity to the training set compounds and errors in the prediction of the activity of the external or test compounds simultaneously.In this case, a linear relationship cannot be established between the MDI and PE; therefore, the prediction of individual compounds is assigned into four quadrants, 110 true positives (A in Figure 10), 2 true negatives (B in Figure 10), 5 false positives (C in Figure 10), and 8 false negatives (D in Figure 10).True positive predictions are compounds within the applicability domain, whereas true negatives are outside the applicability domain.False-positive and falsenegative compounds were considered foreseeable due to the breadth of compounds included in this study.The external set of query terpene compounds, although not shown, were calculated to be within the applicability domain of the QSAR approaches.
Further exploration of the PXR applicability domain was carried out through the standardization approach to label discrete compounds from the test and prediction sets as outliers of the applicability domain.According to Roy et al. (2015), the applicability domain zone should represent a region where most of the training set compounds belong based on a mean of ±3 standard deviations of descriptor values.The Applicability Domain using the standardization approach identified only compound MN557 as an outlier of the QSAR training set and did not identify any compounds within the terpene prediction set as outside the applicability domain.
All agonists selected for this study were considered active to varying degrees.Each agonist was classified by increasing order of activity as developed by Schuster and Langer (2005), with 89 agonists of (þ) or low activity: EC 50 >10,000 nM or pEC 50 <5; 270 agonists of (þþ) or moderate activity: EC 50 >1000 À 10000 nM or pEC 50 5-6; and 141 agonists of (þþþ) or high activity: EC 50 <1000 nM or pEC 50 >6.

Molecular preparation and descriptors
Following compilation of 500 PXR agonists from the literature, inorganic and organometallic compounds and duplicate structures were carefully removed as part of data curation.The set of PXR agonists was prepared and 3D energy-minimized using the LigPrep (Schr€ odinger Release 2021-4, 2021b) module implemented in the Schr€ odinger Maestro software (Schr€ odinger Release 2021-4, 2021a).Optimized states of the agonists were created at physiological pH of 7.4, and curation of tautomers was subsequently performed.The bias of the QSAR models to either stereoisomer of a tertiary amine, which becomes protonated under these conditions, was eliminated by removing the extraneous proton and reoptimization of an unprotonated amine.Schr€ odinger's AutoQSAR module was used to populate a multitude of 2D molecular, topological, and feature count descriptors of the 500 agonists for QSAR construction through four methods, linear, radial (circular), dendritic, and molprint2D, in addition to the attached experimental pEC 50 data (Duan et al., 2010;Sastry et al., 2010; Schr€ odinger Release 2021-4, 2021a).AutoQSAR automatically detected and removed descriptors considered to be duplicated.The protein crystal structure (PDB ID: 2O9I) was retrieved from the Protein Data Bank (Xue et al., 2007).The protein receptor was prepared and minimized using the Schr€ odinger Protein Preparation Wizard (Madhavi Sastry et al., 2013;Schr€ odinger Release 2021-4, 2021c).Additionally, selecting the bound 2O9I agonist, N-(2,2,2-trifluoroethyl)-n-f4-[2,2,2-trifluoro-1hydroxy-1-(trifluoromethyl)ethyl]phenylgbenzenesulfonamide, also known as T0901317, directed Cresset V R Forge V10.6 to align the prepared agonists on this bioactive reference confirmation as a reference.2O9I was selected for 3D-QSAR model generation, as the co-crystallized ligand T0901317 has been referenced throughout PXR literature and was among the set of 500 agonists with one of the highest pEC 50 values.

2D dimensionality reduction
The data mining software, Orange 3.32.0,was utilized throughout the dimensionality reduction experimental steps (Demsar et al., 2013).Orange is a hierarchically organized toolbox of data mining components with a broad range of features suitable for curation of datasets (Demsar et al., 2013).Orange is considered an appropriate choice due to its maturity, widespread active use, straightforward visual interactivity and extensive supporting documentation.Additionally, Orange is considered comparable to the accessible tools for predictive data analysis contained within scikitlearn (Pedregosa et al., 2011).
To perform a PCA plot, the descriptors calculated for all 500 PXR agonists by Schr€ odinger Canvas were imported to a widget in Orange.Upon connecting the data input from the descriptor widget to a PCA widget, computation of the PCA was begun with normalized variables.
The same Canvas descriptor widget was connected to an unsupervised t-SNE widget to plot the agonists in 2D by their probability distribution.To obtain a balanced attention between local and global aspects of the PXR agonists, the perplexity was lowered to 25.The global structure was not preserved, so as to avoid interfering with the perplexity setting at 25. Two exaggerations were allowed to maintain an intermediate compactness of the resulting clusters.As noted earlier, 16 components were the maximum number of components deemed necessary to adequately describe the PXR agonists within the 2D AutoQSAR experiment; therefore, only 16 PCA components were incorporated in the t-SNE analysis.Finally, the data was also normalized for t-SNE.

2D model selection
AutoQSAR, through Schrӧdinger Canvas, utilizes Kpls, Naïve Bayes, and ensemble-based recursive partitioning for QSAR model generation; however, only models constructed in the Kpls algorithm were returned in this 2D-QSAR model report.An et al. annotate the algorithm implemented in their work with AutoQSAR as Kpls even though the direct formalism, direct kernel-based partial least squares (DKPLS), is implied (An et al., 2013).DKPLS approximates a low-rank of the kernel matrix K and in order to compute a final regression function similar to eigenvector methods, which suits regression and tends to yield more robust models (An et al., 2013).DKPLS has been shown to be superior to classic algorithms, such as traditional partial least squares (PLS), by outperforming PLS in predictive datasets (Wang et al., 2007).
AutoQSAR settings (Duan et al., 2010; Schr€ odinger Release 2021-4, 2021a) ensured a 3:1 ratio of the training set to test set compounds for model construction, as used by Racz et al. and Mandal and Roy to produce satisfactory QSAR models (R� acz et al., 2021;Roy & Mandal, 2009).The integrated algorithms generate ten numeric models to describe agonist activity based on multiple linear regression (MLR), partial least-squares regression (PLS), kernel-based partial least-squares regression (KPLS) and principal components regression (PCR) when a continuous set of values, in this case, pEC 50 , is applied.In this QSAR experiment, only models built upon the Kpls algorithm were returned, despite the ability of AutoQSAR to generate models on multiple algorithms, and their corresponding squared correlation coefficient (R 2 ), square coefficient of cross-validation (Q 2 ), and root mean square error (RMSE) are shown in Table 1.
Following traditional AutoQSAR 2D-QSAR model generation, Schr€ odinger's DeepAutoQSAR further analyzed a similar ratio of PXR agonists in training and test sets through the creation of a convolutional neural network.To create the convolution, DeepChem treats small molecules as a graph for machine learning, where nodes are atoms with attached chemical features such as atom type, valence, charge, etc., and edges are bonds and multiply a filter matrix across all atoms in the input data (Wang et al., 2007).For modeling of PXR agonism, DeepAutoQSAR may utilize dense neural network (DNN), Random Forest (RF), Torch Graph Convolution and eXtreme Gradient Boosting (XGBoost) as machine learning models, yet the resulting returned best model constituted only the RF regression algorithm.According to information provided to DeepAutoQSAR users by Schr€ odinger, DeepAutoQSAR has been shown to perform similarly to traditional AutoQSAR on smaller training sets, that is, those training sets with fewer than 5000 molecules (DeepAutoQSAR).The training time allowed for DeepChem/AutoQSAR was 8 h.

3D field model
The eXtended Electron Distribution (XED) force field implemented in Cresset V R Forge V10.6 (Activity Atlas; in Forge, 10.6, Cresset V R ; Cheeseright et al., 2006) was used to model charge delocalized from atomic centers to accurately describe the binding site electrostatics (XedeX, 10.6, Cresset V R , Litlington, Cambridgeshire, UK) when applied to all small molecules involved in the QSAR study.Conformation hunting for alignment of the 500 LigPrep-prepared agonists was subsequently carried out using a custom calculation method allowing a maximum number of 300 conformations to be generated for each molecule.The number of high-T dynamics runs for flexible rings was set to 20, and the gradient cutoff for conformer minimization remained at 0.100 kcal/mol/A.Duplicate conformers were filtered at an RMS of 0.50 A, and the energy window value was reset to 5.00 kcal/mol.The input amide geometry was utilized for the acyclic secondary amide handling value, and Coulombic and attractive van der Waals forces were turned off.Boats and twist-boats, however, were not removed.Manual analysis of all automatically generated alignments resulted in the most robust final 3D-QSAR models.
The calculation method utilized to align the training set to the reference compound was also customized under the Normal (element þ hybridization) rules.All achiral compounds were inverted in conformation, and the hardness of the protein excluded volume was stated as soft.Next, a weighted average was used as a scoring method for multiple references.The fraction of score from shape similarity and reference weight into db field points was set to 0.50.Field similarity weighting was standardized to 1.00 for all parameters.Lastly, the scoring of alignments was calculated with the Tanimoto metric.
For the field-QSAR models, a normal calculation method was utilized.Herein, a maximum number of 20 components, or latent variables, were kept for the models with a sample point minimum distance of 1.0 Å and 50 Y scrambles.Both electrostatic and volumetric fields were used to construct the models, yet molecules were not to be weighted by similarity.The field QSAR models were cross-validated using the leaveone-out method with 20% of the training set randomly selected as validation data over 1000 repeats.

3D machine learning model
An automatic calculation method generated 3D-QSAR models through RF, SVM, Relevance Vector Machine (RVM) and k-Nearest Neighbor (kNN) but retained and displayed only the model deemed to be most robust by R 2 and Q 2 .SVM calculations were made through electrostatic and volumetric fields with a sample point distance of 1.0 Å. Global parameters were optimized to a maximum of 50 optimizer iterations, a limit of 3600 s for global optimization time, and a Gamma range between 1.0 � 10 À 5 and 1.0 � 10 À 1 .Furthermore, the C range was allowed to be between 1.0 � 10 À 1 and 1.0 � 10 3 , and the Epsilon range was determined between 1.0 � 10 À 4 and 1.0 � 10 0 .
From above, the 3D RVM model was built with a minimum sample point distance of 1.0 Å using electrostatic and volumetric fields, and 5 folds were used per k-fold cross-validation.In this method, global parameters were optimized to a maximum of 50 optimizer iterations, a limit of 3600 s for global optimization time, and a Gamma range between 1.0 � 10 À 5 and 1.0 � 10 À 1 .
As with the RVM and SVM calculation, for the RF calculation, electrostatic and volumetric fields were used with a sample point minimum distance of 1.0 Å. 1000 trees were to be built with a feature subsampling fraction of 0.33.A minimum of 5 samples were to be allocated to each leaf.
To construct the kNN model, a maximum number of 20 neighbors (k) were allowed with similarity type based on a field.Pairwise alignments were not optimized, and the fraction of similarity from shape was centered at 0.50.The weighting scheme was chosen to be automatic.

Applicability domain generation
The MDI versus PE method (Yan et al., 2014), calculated by the Java-based Applicability Domain-Model Disturbance Index (AD-MDI) (http://nanobridges.eu/software/), and the standardization approach, calculated by the Java-based Applicability Domain (using standardization approach; Roy et al., 2015), was utilized to measure the similarity of the PXR agonists.The metrics generated from both methods were employed to reveal the applicability domain of compounds within this study.
Through the AD-MDI program, external and test set compounds were assessed by MDI in terms of similarity to the training set compounds, whereby those compounds that are more similar to the training set compounds yielded lower MDI disturbance values.Simultaneously, errors in predicting the activity of the external or test compounds were correlated alongside the MDI to generate a linear relationship under ideal conditions (Yan et al., 2014).Most often, and in this case, a linear relationship could not be established between the MDI and PE; therefore, the prediction of individual compounds was assigned into four quadrants, true positive, true negative, false positive, and false negative.True positive predictions are compounds within the applicability domain, whereas true negative predictions are outside the applicability domain.False-positive and false-negative compounds were considered foreseeable due to the amount and diversity of the compounds included in this study.
Under the standardization approach, carried out through the Applicability Domain (using standardization approach) Java-based software (http://nanobridges.eu/software/), the standard deviations for all individual compounds with respect to a particular descriptor are calculated to generate multiple S i values that represent the similarity or dissimilarity of the entire set.Since some compounds may be considered similar to the training set when viewing a particular subset of S i s for descriptors and may be considered dissimilar to the training set when viewing another particular subset of S i s for descriptors, a corresponding representative standard deviation, S new , is generated to summarize all of the descriptors.Because of this, compounds are considered outside the applicability domain if they fall outside of more than 3 standard deviations of S new .

Model validation
For 2D-QSAR, a null hypothesis was used to internally validate of constructed models.As implemented in Schr€ odinger AutoQSAR, the null hypothesis assumes activities of the PXR agonists to be directly correlated to only the molecular weight of each corresponding agonist (Dixon et al., 2016).AutoQSAR models considered to be robust and validated through the molecular weight null hypothesis should yield poor R 2 values, and those models with seemingly acceptable R 2 values were disregarded.
Similarly, Cresset V R Forge V10.6 field-based 3D-QSAR models were internally validated by Y scrambling, which involves building new models with randomly reordered responses, the experimental pEC 50 , using the same descriptors of the selected model (Gramatica, 2020).A field-based 3D-QSAR model was considered valid if the Q 2 values for the scrambled models were significantly lower than those of the original 3D-QSAR model and if the RMSE values for the scrambled models were significantly increased compared to the original 3D-QSAR model.
To externally validate the predictability of the field and SVM QSAR models, a set of 17 natural products comprised of terpenes and monomethylated indoles known to experimentally activate PXR (Table S3, Supplementary Information) were used for Cresset V R Forge V10.6 to predict their pEC 50 values based on either QSAR model (Chang, 2009;Seow & Lau, 2017;Vyhl� ıdalov� a et al., 2020).The external set chosen for this experiment was considered adequate, as none of the compounds were determined to be outside of the applicability domain.Again, each agonist was classified by increasing order of activity, with 1 terpene and 7 monomethylated indoles of (þ) or low activity: EC 50 >10000 nM or pEC 50 <5; 7 terpenes of (þþ) or moderate activity: EC 50 >1000 À 10000 nM or pEC 50 5-6; and 2 terpenes of (þþþ) or high activity: EC 50 <1000 nM or pEC 50 >6 (Schuster & Langer, 2005).Furthermore, terpene agonists were chosen for external validation due to structural similarity to pregnane and associated steroid molecules that activate PXR.In the same way, monomethylated indoles were added to the external validation set due to the presence of indole moieties in the QSAR training and test sets of PXR agonists.Subsequently, these compounds were 3D-optimized at pH 7.4 ± 0.0 with desalting and tautomerization using LigPrep (Schr€ odinger Release 2021-4, 2021b).The prepared terpenes were then imported into the Cresset V R Forge V10.6 project using the Fit Molecules to an Activity Model function.Likewise, as some of the training and test set compounds were reevaluated for better conformer alignment towards the reference compounds, some of the prediction set conformers (3-methylindole, carnosol, and ursolic acid) were also reassessed for better alignment of predicted conformers.

Activity Atlas visualization
The Activity Atlas TM module of Cresset V R Forge V10.6 was utilized to create a visual representation of the PXR four-component-based field 3D-QSAR model.Since Cresset's XED force field generates shape, hydrophobic, positive electrostatic and negative electrostatic fields, distinct field points of the agonists were overlaid and compared.Using the existing PXR agonist field points defined in the field-based 3D-QSAR model, the Activity Atlas TM summarized the alignments of the large data set of prepared agonists (Activity Atlas; in Forge, 10.6, Cresset V R ; Cheeseright et al., 2006).This visual representation was generated using the Normal calculation method with predetermined parameters, such as a weighted sum algorithm, 1.0 Å grid spacing and automatic calculation of the disparity range.

Conclusion
This study attempted to better explain PXR agonism's promiscuity by developing various QSAR models using a large and diverse molecular dataset of 500 agonists.Verification of usefulness towards the study of the entirety of the agonists through the applicability domain ensured the generation of robust QSAR models.2D and 3D evaluation of the chemical scaffolds of the PXR agonists has allowed for deeper insight into traditional QSAR techniques.In the 2D-QSAR analysis of PXR agonists, both traditional Schrӧdinger AutoQSAR and machine learning DeepAutoQSAR methods performed similarly in predicting the correlation between PXR agonist structures and their respective experimental pEC 50 s.However, Cresset V R Forge (V10.6)3D-QSAR models were shown to better predict the correlation between the PXR agonist structures and their respective experimental activities when machine learning algorithms were implemented.The results suggest that, based on the software utilized in this study, 3D-QSAR modeling produces significantly better predictive and reproducible QSAR models, especially when machine learning techniques are considered in terms of internal validation correlation coefficients.
Furthermore, visual analysis of the alignment of agonists within the PXR ligand-binding site was provided through a detailed Activity Atlas based on a robust field-based 3D-QSAR model that verified the presence of particular functional groups/moieties within the agonists contributing to tighter binding and, therefore, greater agonism of PXR.The most robust QSAR models from this work could improve the detection of potentially undiscovered xenobiotic agonists of PXR.Additionally, the results of the implementation of machine learning offer more significant evidence for the use of similar methods in future work.

Figure 1 .
Figure 1.Predicted activity versus observed (experimental) activity in pEC 50 for 500 PXR agonists as generated by the top-scored AutoQSAR model (Kpls_radial_43).

a
Indicates the use of radial fingerprints.b Indicates the use of Canvas-calculated descriptors (auto-generated by the Canvas suite within Schrӧdinger Maestro).c Indicates the use of MolPrint2D fingerprints.

Figure 2 .
Figure 2. Proportion of variance versus number of principal components analyzed from the 2D Canvas-generated descriptors.
-0.5098 0.5200 a R 2 values for external prediction set.
Field-based 3D-QSAR model construction yielded 21 models, sequentially incorporating one additional component through the partial least squares (PLS) algorithm.Each model's statistics are shown in Table 3.Additionally, Y-scrambling, incorporated into the construction of each field-based model in Cresset V R Forge V10.6 (Cresset V R Inc., UK), ensures that the final generated models genuinely describe PXR agonism by scrambling random agonist activities within each model.If the scrambled models return lowered Q 2 and increased root mean squared error (RMSE) values compared to the generated models, then the generated field-based 3D-QSAR models are robust.As all Y-scrambling returned Q 2 values less than or equal to 0.001 and RMSE values between 0.801 and 0.804, the field-based 3D-QSAR models were accepted.

Figure 5 .
Figure 5. Predicted activity vs. observed (experimental) activity in pEC 50 for PXR agonists in the field-based 3D-QSAR model.(a) Training Set, (b) Test Set and (c) External Prediction Set (separate from training and test sets).

Figure 6 .
Figure 6.Principal components analysis (PCA) of descriptors from PXR agonists used in SVM Machine Learning Regression 3D-QSAR model.The number in parentheses is the fraction of the total variance explained by that component.

Figure 7 .
Figure 7. Predicted activity vs. observed (experimental) activity in pEC50 for PXR agonists in the SVM regression machine learning 3D-QSAR model.(a) Training Set, (b) Test Set, (c) External Prediction Set (separate from training and test sets).
for external prediction set.

Figure 10 .
Figure 10.Model Disturbance Index (MDI) vs. Prediction Error (PE) to determine applicability domain for PXR test set agonists by (a) true positive, (b) true negative, (c) false positive, (d) false-negative through Applicability Domain-Model Disturbance Index (AD-MDI).

Table 2 .
Statistical comparison of two methods of 2D-QSAR modeling of PXR agonism using Schr€ odinger Maestro.

Table 3 .
Statistical parameters of field 3D-QSAR models generated by Cresset V R software.
�Best predicted QSAR model as determined by Cresset V R Forge. a RMSE predicted parameter value.

Table 4 .
Statistics for Cresset auto-generated machine learning 3D-QSAR regression models.

Table 6 .
Comparison of statistics for the best models of PXR agonism from four QSAR techniques.ModelTraining R 2 Cross-validationQ 2 Test R 2 RMSE Extnl.R 2