Identification of promising compounds from curry tree with cyclooxygenase inhibitory potential using a combination of machine learning, molecular docking, dynamics simulations and binding free energy calculations

ABSTRACT Murraya koenigii (Linn.) Spreng, commonly known as curry leaf tree, is popular as a condiment and spice among Asian countries. Ethnobotanical studies suggest that certain phytochemicals in different parts of the plant are involved in anti-inflammatory, anti-oxidant and anti-cancer activities. There is a knowledge gap in connecting the ethnobotanical applications of M. koenigii, identification of chief phytochemicals through computational approaches and and evaluation of phytochemical’s effect on cancer cell lines. We present here a comprehensive study to identify the phytochemicals responsible for anti-inflammation using random forest (RF) models, the interactions with COX-1 and COX-2 enzymes using molecular docking, dynamics simulation and free energy calculations. The RF models prioritised four phytochemicals viz. girinimbine, murrayanine, murrastinine-B and mukolidine with COX-1 and COX-2 binding potential. These phytochemicals developed key contacts with COX targets which were largely retained in the trajectories of the dynamics simulations. Phytochemicals ranking based on the binding free energy suggested that girinimbine (a carbazole alkaloid) is selective towards COX-2 supported by the experimental studies on COX anti-inflammation. Cytotoxicity assessment on the breast cancer cell line MDA-MB-231 illustrated that the presence of this phytochemical in root, stem and leaf (IC50 value: 0.006 µg/ml) parts highlight its role as a COX-2 inhibitor.


Introduction
Inflammation forms one of the prominent hallmarks of cancer and plays an inevitable role in cancer pathogenesis. The supply of a variety of biologically active molecules during inflammation ensures the survival of cancerous cells, a crucial event between diverse cancer hallmarks. The active molecules including growth factors and enzymes are needed for the growth and sustenance of cancer cells [1]. Over many decades of intensive research, the role of cyclooxygenase (COX) enzymes has been associated with inflammation and cancer. COXs are the ratelimiting enzymes in the prostaglandin synthesis pathway which synthesises prostaglandin (PG) from arachidonic acid tissues [6,7]. COX-2 is one such enzyme implicated in inflammation and associated pathways related to cancer development. Epidemiological research showed the inhibition of PGE2 synthesis via COX-2 inhibitors can reduce the risk of various cancer types including prostate cancer, pancreatic cancer, breast cancer, osteosarcoma, etc [8]. Substantial COX-2 expression is also noted in cancer stem cells promoting proliferation and radioresistance which established a link between increased COX-2 level, carcinogenesis and poor prognosis [9,10]. Recent studies unravelled the potential role of PGE2 in 'Phoenix Rising cascade' in which the tissue damage initiates the stimulation of wound healing, regeneration and repair [9,11,12]. The impact of upstream modulators of COX-2 in cancer cells which mediates metastasis viz. epidermal growth factor receptor, family of mitogen-activated kinases and nuclear factor-κβ, could be reduced by COX-2 inhibition [11]. COX-1 is also pursued as a potential target for prevention and treatment of various cancers as its expression is noted in an array of different cancer types viz. breast cancer, cervical cancer, haematological cancers, oral cancer, ovarian cancer among others [5,6].
Ethnobotany is a study of documenting traditional knowledge, practical applications and customs about indigenous plants in a particular geographical region. Ethnobotanical studies are the starting point to exploit the medicinal properties of plants which in turn aids in uncovering the underlying phytochemicals responsible for eliciting pharmacological effects through scientific advanced approaches and techniques [13]. Phytochemicals act as a reservoir of lead compounds for drug discovery purposes which possesses multiple advantages over synthetic molecules such as reduced cost for screening and toxicity [14]. Certain plants are shown to constitute phytochemicals with anti-inflammatory properties including Momordica charantia (bitter melon), Adhatoda vasica Nees. (Malabar nut), Calophyllum inophyllum L. (Alexandrian laurel), Aloe vera L., Terminalia chebula Retz. (myrobalan), Plumbago zeylanica (leadwort), etc [15][16][17]. Selected molecules are known to be natural COX inhibitors belonging to alkaloids, flavonoids, phenols, stilbenes and terpenoids classes of phytochemicals [18]. Certain natural molecules inhibit COX-2 activity such as curcumin, epigallocatechin gallate, fisetin, genistein, kaempferol and resveratrol [19].
We had demonstrated earlier that statistical models could be used to identify principal compounds from the hit list of virtual screening exercise and predict the biological activity of natural molecules [20][21][22]. Machine learning (ML) approaches possess the efficiency to train thousands of descriptors to predict the biological activity from large-scale datasets and identify promising lead compounds [23]. For example, potential COX-2 inhibitors were identified by the random forest (RF) algorithm [24]. Comprehensive classification models for COX-1 and COX-2 were also developed using RF classifier and support vector machine [25].
Murraya koenigii (Linn.) Spreng (abbreviated as MK hereafter) is commonly known as curry leaf tree and belongs to the family Rutaceae. It is popularly used as a condiment and spice among Asian countries. The plant extracts are reported to constitute anti-inflammatory, anti-oxidant and anti-cancer activities [26]. Furthermore, the aqueous extract of MK leaves had shown immunostimulatory effect and inhibits tumour progression on 4T1 mice breast cancer cell line [27]. The ethanolic extract of MK leaves also exhibited cytotoxic effect on colon cancer cell line HT-29 [28]. Few comprehensive studies linking phytochemistry of extracts, recognition of the principal phytochemicals and evaluation of its effect in cell lines were performed for MK plants (e.g. murrayanol of MK plant) [29] or related species (e.g. girinimbine of M. euchrestifolia) [30]. Phytoconstituents such as mahanine, mahanimbine, isomahanimbine and girinimbine have been associated with antiinflammatory activity possibly through COX inhibition [31,32]. MK carbazole alkaloids such as murrayakonine A-D, murrayanol, murrayanine and O-methylmurrayamine-A extracted from the stem and leaf crude extracts also displayed anti-inflammatory activity [33]. Research efforts to identify active constituents responsible for anti-inflammatory activity from MK extracts are still progressive.
To the best of our knowledge, this is the first report on computational models which connect ethnobotany of MK plants and study the affinity and interactions of phyoconstituents with COX enzymes. The objective of the present study is to identify promising MK phytochemicals using RF algorithm trained on the large dataset of COX-1 and COX-2 inhibitors, extend the model to identify the chief phytochemicals with high predicted COX-2 activity and study its interaction with COX-2 protein structure using molecular docking, dynamics simulations and free energy calculations.

Dataset collection and preparation
COX-1 (989 molecules) and COX-2 (990 molecules) inhibitors were retrieved from ChEMBL database [34] and regarded as actives or positive instances for the machine learning model. Similarly, decoys of COX-1 (2343 molecules) and COX-2 (2343 molecules) targets were downloaded from DUD-E (Database of Useful Decoys -Enhanced) [35] which were considered as negative instances and seeded to develop databases for COX-1 (total 3332 molecules) and COX-2 (total 3333) targets.
The phytochemicals of Murraya koenigii L. (Spreng) were collected from literature [36][37][38]. The natural molecules were downloaded from the PubChem database [39] and unavailable molecules were drawn using Marvin Sketch utility v6.3 (Che-mAxon, LLC) [40] and exported into database SDF file format. The COX datasets and MK phytochemical database were prepared by adding hydrogen atoms and assigning atomic charges.

Generation of homology model and protein preparation
The Human COX-2 protein structure (PDB entry: 5IKR) bound to mefenamic acid with a resolution of 2.34Å was chosen as the receptor for docking [41]. The homology model of Human COX-1 protein was built using Modeller program using the primary sequence of Human COX-1 protein (Uni-ProtKB entry: P23219) [42,43]. We specified no restriction to choose templates in the modeling process. The generated model was subsequently prepared using Clean utility and energy-minimised using YAMBER force field in YASARA Structure (version 19.7.20) [44,45]. COX-1 homology model was assessed for its quality using the Ramachandran plot of RAMPAGE server [46,47] and ProSA (Protein Structure Analysis) program [48].

Molecular docking of COX-1 and COX-2 enzymes
Molecular docking of actives and decoys to COX-1 and COX-2 prepared structures were carried out using PLANTS (Protein-Ligand ANT system) docking program [50]. PLANTS utilise ant colony optimisation (ANT) to search for the best dock poses in the ligand-binding sites. Similarly, MK phytochemicals were then docked into respective ligand-binding sites of COX-1 and COX-2 structures. Prior to docking, proteins and ligands were pre-processed using SPORES (Structure PrOtonation and REcognition System) tool [51] suitable for docking in PLANTS docking environment.

Generation of structural interaction fingerprints
The dock poses of actives and decoys with both COX enzymes securing the best PLANTS score were selected to derive protein-ligand interaction fingerprints using PyPLIF (pythonbased protein-ligand interaction fingerprinting tool; http:// code.google.com/p/pyplif) [52]. The reference ligands (co-crystal ligands of selected COX structures), configuration file containing the grid dimensions and Cartesian coordinates of ligand-binding sites, ligand-binding site files (.mol2) were also prepared accordingly to run PyPLIF. PyPLIF records the 3D intermolecular interactions into 1D bit string. A 7-bit interaction profile is generated for each amino acid representing the presence (written as 1) or absence (0) of seven different interaction types [52]. Similarly, the best dock poses of MK phytochemicals were encoded in fingerprint. Four vector files were then created separately for each docking exercise as follows: COX-1 actives + decoys, COX-2 actives + decoys, COX-1 MK phytochemicals and COX-2 MK phytochemicals. The efficiency of 'PLANTS score' to score actives better than decoys was assessed using receiver operating characteristics (ROC) curve computed using easyROC web-tool [53] (Figure 1).

Generation of RF model
Each vector file was divided into 70% training, 20% test and 10% external sets with an equal share of positive (COX natural inhibitors) and negative (decoys) instances, accomplished using Kennard Stone feature selection method [54] in Dataset Division v1.2 [55]. The training, test and external sets were prepared in arff (attribute-relation file format, Weka compliant format).
The training set in arff format was supplied using Explorer module of Weka v3 package [56] Cross-validation was performed for 10 folds on training data using Random Forest utility. The stochastic nature of RF algorithm is defined with the following settings: number of iterations to construct RF tree (100), disabled number of attributes for random investigation (0) to allow systematic scanning, minimum number of instances per leaf (1), minimum class variance for training set splitting (0.001), seed for random number generator (1), unlimited depth for tree construction. The selection of the RF model was made on the basis of correlation coefficient, mean absolute error (MAE) and root mean squared error (RMSE). The generated RF model was saved in Weka model format and re-evaluated using test and external test sets set by submitting in arff format. Finally, the external set containing 132 MK phytochemicals in arff format was specified to apply on the saved Weka RF model to identify COX-1 and COX-2 inhibitors.

Filtering of MK phytochemicals
The drug-likeness of 132 MK phytochemicals was assessed using DruLito tool [57]. MK phytochemicals abiding the Lipinski's rule of five were selected for further studies.

Molecular dynamic simulations of selected proteinligand complexes
Protein-ligand complexes were selected for molecular dynamic simulations on the basis of better RF model score and proven in vitro anti-inflammatory activity of the MK phytochemicals. The best dock poses of the selected MK phytochemicals in complex with COX-1 and COX-2 receptors were chosen and were initially energy-minimised to remove inconsistency in the starting structures for dynamics simulations for 30 ns. We employed AMBER14 force field [58] and opted TIP3P (transferable intermolecular potential with three points) water model for simulations using YASARA Structure [44,45]. The NPT (number of atoms, pressure and temperature were kept constant) ensemble was chosen and the protein-ligand system was neutralised by adding counter-ions (0.9% NaCl) and setting solvent density of 0.997g/l, pH to 7.4, temperature to 298 K (Berendsen thermostat) and pressure to 1 bar (Berendsen barostat). Periodic boundaries of the simulation cell were enabled and subjected to 100 cycles of steepest gradient energy minimisation. Harmonic restraints were fine-tuned during the equilibration phase and long-range Coulomb electrostatics was invoked by Particle mesh Ewald method [59]. We exported the frame snapshot at regular intervals of 30 ns to analyze the underlying intermolecular interactions of the COX-1 and COX-2 with the MK phytochemical complexes. We examined various performance quantities in the form of graphic plots such as time (ps) vs. total energy of the system, time (ps) vs. root mean squared deviation (RMSD) and time vs. Radius of gyration (R g ). The MK phytochemical interactions with COX-1 and COX-2 captured by trajectory snapshots were plotted using LigPlot + v1.4 (academic license) program [60,61].

Binding free energy calculations
Free energy calculations of MK phytochemicals to COX-1 and COX-2 targets were performed using YASARA binding energy macro. Protein-ligand binding energy was enumerated using AMBER14 force field [60] and utilised APBS (Adaptive Poisson-Boltzmann Solver) for solvation energy and electrostatic treatments [62,63]. The free energy of binding is calculated using the following equation.
Binding energy = (G potential energy of receptor + G potential energy of ligand ) + (G solvation energy of receptor + G solvation energy of ligand ) − (G potential energy of complex + G solvation energy of complex ) The first two energy terms are about the potential energy of individual components of binding, receptor and ligand in isolation. The next two terms deals with the solvation energy of individual components computed using APBS approach and the last terms are related to potential energy and solvation energy of the protein-ligand complex [64,65].

Preparation of COX structures and quality check
The identification of active phytochemicals responsible for COX inhibitory activity is pursued using a combination of machine learning and molecular interaction approaches. The objective of employing RF model was to identify phytochemicals with COX inhibitory potential. Energetic calculations with COX-1 and COX-2 enzymes were then carried out to examine the possible selectivity using molecular docking, dynamics simulations and free energy calculations, detailed in later sections. In addition, the inherent properties of known COX inhibitors captured by structural interaction fingerprints were extended to match with MK phytochemicals. The Human COX-2 protein structure complexed with mefenamic acid (PDB entry: 5IKR, resolution 2.34Å) was chosen as the receptor for docking. The protein structure of Human COX-1 was modelled using Modeller program [42], due to the unavailability of Human COX-1 protein structure in the PDB [41]. The structure of Ovis aries COX-1 complexed with flurbiprofen (PDB entry: 2AYL) was chosen as the template. The primary sequence of Human COX-1 bears 94% identities with the PDB template ( Figure S1). The modelled COX-1 and COX-2 protein structures were energy-minimised using YAMBER force field in YASARA Structure program [44,45]. Obtained model score was 1.0 indicating a very reliable model for structure-based computations. In addition, GA341 (fold assessment score) score of 1.00, protein quality score of 2.079 and Z-DOPE (Z-statistic based on discrete optimised potential energy) of −1.25, suggests a reliable structure model.
Analysis of modelled COX-1 protein structure showed that 98.4% amino acid residues lied in the favoured region and 1.3% residues in the allowed region with 0.4% outliers in Ramachandran plot [46,47]. The z-score of −8.76 computed by ProSA [48] program indicated that the model COX-1 protein structure follows X-ray native structure parameters ( Figure S2). Superposition of modeled Human COX-1 structure on PDB template returned RMSD of 0.469 (550 aligned amino acids) Å. Similar efforts on the homology model development of Human COX-1 model were also reported. Khan et al., 2014 built the homology model using Ovis aries COX-1 structure (PDB entry: 1Q4G) as the template which constituted 91% sequence identities [64].

Docking of ligands with COX-1 and COX-2 targets and generation of structural interaction fingerprints
The actives and decoys enriched datasets of COX-1 and COX-2 targets were docked with respective COX-1 and COX-2 protein targets using PLANTS docking program. The ligand-binding sites of flurbiprofen and mefenamic acid were defined as the docking site of COX-1 and COX-2 receptors, respectively ( Figure 2). We analysed the efficiency of PLANTS docking program to recover actives from the database seeded with decoys by lowest PLANTS score. ROC analysis upon COX-1 docking exercise exhibited AUC (area under the curve) of 0.55 whereas COX-2 showed AUC of 0.54 ( Figure S3). The ROC-AUC value of 0.5 demonstrates that PLANTS score is a bad classifier for distinguishing COX isoform specific inhibitors. We recently showed that PLANTS genetic algorithm constitutes the potential to enrich databases of various targets when structural interaction fingerprints are derived from the optimal dock poses corresponding to lowest PLANTS score [65].
We generated protein-ligand interaction fingerprints of the best dock poses of COX ligand datasets using PyPLIF code. PyPLIF creates a fingerprint by switching on 1 or 0 bits for the presence or absence of particular interaction type for each amino acid residue of the COX protein pockets. The amino acids residues were located in and around 8Å of the ligandbinding site with respect to its centroid. Further, we utilised the attribute selection filter of Weka v3 machine learning package to ascertain which set of top intermolecular interactions distinguishes the selectivity pattern between COX-1 and COX-2 targets. The attribute importance computed using Weka mean decrease impurity method and the biochemical roles of pocket residues are given in Tables S1 and S2. Table 1 shows that the COX-1 interaction pattern is enriched with hydrogen bonds and apolar contacts while COX-2 pattern is dominant with aromatic interactions with equal share of hydrogen bonds and apolar contacts. The crucial amino acid which differs COX-1 (Ile522) from COX-2 (Val523) isoforms is also captured by the bit strings down in the attribute lists of COX interaction fingerprints.

Generation of RF model
Well-known for its high accuracy and robustness, RF algorithm is an advanced extension of classifiers and regressors to facilitate unbiased predictions of dependent variables by estimating probabilities from individual trees [24,25]. We had chosen RF algorithm to train the structural interaction fingerprints of COX-1 and COX-2 targets and predict the inhibitory potential of MK phytochemicals using Weka v3 package [56]. The active and decoy molecules of Human COX-1 and COX-2 inhibitors were collected from ChEMBL and DUD-E databases [34,35]. Simultaneously, 132 known MK compounds reported in the literature were obtained from PubChem database [39] if available and drawn otherwise. Further, we selected molecular interaction fingerprints as descriptors for RF modelling obtained from docking actives and decoys with their respective COX-1 and COX-2 targets using PLANTS docking program. Subsequently, MK phytochemicals were also docked to both COX structures. These fingerprints encoded from physicochemical properties were successfully utilised as the descriptors for RF classification of COX-1 and COX-2 inhibitors [23]. Initially, vector files containing the interaction fingerprints of actives and decoys were partitioned into training (70%) test (20%) and external (10%) sets by adopting Kennard Stone feature selection method [54] so that an equal share of positive and negative instances in each set is distributed. The training set was subjected to 10-fold cross-validation to internally validate the RF model upon the out-of-bag set and obtained a Pearson's correlation coefficient of ∼0.98 with RMSE of ∼0.08 in COX-1 and COX-2 systems (Tables 2 and 3). Furthermore, the RF model was validated using an internal test set and an external set. It was ensured that the internal, as well as external test sets, did not play any role in descriptors selection and fine-tuning of the RF model. The applicability domain built upon the distributions of top 8 interaction descriptors of both COX-1 and COX-2 structures (training set) indicated that the extent of intermolecular interactions of MK phytochemicals corresponding to these top 8 descriptors is below the range delineated by training set ( Figure S4). Statistically significant correlation coefficients of 0.991 and 0.996 for COX-1 and COX-2 RF models were noted with relatively less MAE and RMSE metrics. The overall characteristics of the RF model suggest that it can be extended to identify COX-1 and COX-2 inhibitors from the MK phytochemical collection effectively.

Prediction of COX inhibitory potential of MK phytochemicals using RF models
The MK dataset comprised of 132 MK phytochemicals was submitted to RF model to classify COX-1 and COX-2 inhibition (Table S2). Since our target was COX-2, we chose to examine the selectivity of phytochemical to COX-2 in relation to COX-1 with labels close or equal to 1 (i.e. classified as inhibitors). Moreover, the MK dataset was reduced to 76 phytochemicals with drug-like characteristics when Lipinski's rule of five was applied as a filter [57]. Thirteen compounds exhibited the possibility of selective affinity towards COX-1 than COX-2. Remaining 63 phytochemicals showed the selective binding pattern to COX-2 than COX-1 in the classifier label ranging from 0 to 0.28 (the difference between COX-2 and COX-1 predicted activities). The minimum activity difference between COX-2 and COX-1 isoforms suggests that MK phytochemicals possesses affinity to both COX targets regardless of the selectivity proposed by RF models. The primary reason for the narrow activity difference is due to the RF classification scheme (0 as non-inhibitors and 1 as inhibitors). It is anticipated that comprehensive analysis of intermolecular contacts with COX targets together with inhibition activity-based QSAR (quantitative structure-activity relationship) models will prioritise MK phytochemicals with COX-2 selectivity. Notably, RF classifier was not used to select these four MK phytochemicals but to distinguish MK phytochemicals with COX-2 selectivity. The labeling by RF classifiers are given in Table 4. Most of the filtered phytochemicals contained fused heterocycles with variable attachments of carbonyl, hydroxyl and methyl groups. To further refine these filtered phytochemicals, we sought experimental evidence to identify the promising MK phytochemicals responsible for COX-2 inhibition. Two phytochemicals viz. girinimbine and murrayanine possess COX inhibitory potential and anti-inflammatory properties as demonstrated experimentally [30][31][32][33]. NP-Scout web program [66] was used to assess whether the top four MK compounds lie in the drug-likeness space of natural molecules. Girinimbine and murrayanine secured a probability score of 0.95 and 0.98 indicating that these molecules bear likeliness of drug molecules and can be promoted for further study. Murrastinine-B and mukolidine were shapely similar to girinimbine and murrayanine (Table S3) and obtained 0.79 and 0.98 as NP-Scout probability score [66], respectively. These two phytochemicals were also selected for further calculations. Aromatic edge-to-face interaction 3 Leu352 H-bond interaction from protein donor 4 Tyr355 Aromatic edge-to-face interaction 5 Phe381 Aromatic face-to-face interaction 6 Tyr385 H-bond interaction from protein donor 7 Phe518 Apolar interaction 8 Phe518 Aromatic face-to-face interaction  We then extended the shape similarity analysis of MK phytochemicals to approved drugs of COX-1 (#21) and COX-2 (#22) extracted from Drug Bank [67] using SHaEP (Reminiscent of Shape and Electrostatic Potential) tool [68]. The selected four phytochemicals exhibited 69-74% similarities to COX-1 drugs and 65-69% similarities to COX-2 drugs, on average (Tables S3 and S4). Crystallographic and biochemical studies revealed that the ligand-binding site of COX-1 is narrow and hydrophobic while COX-2 site is wider and less hydrophobic comparatively [69,70]. Hence, the top 8 interactions captured in the RF models of COX-1 and COX-2 targets is in good agreement with its pocket environments. Visual inspection of shape similarity analysis of approved drugs depicted the volume overlap of COX-1 drugs is superior to MK phytochemicals selected in this study and unveiled relatively less similarity to COX-2 drugs. The chemical compositions of girinimbine and murrayanine in the crude extracts were found to be 0.015-0.162% and 0.05-1.78%, respectively. Similarly, the composition of mukolidine is in the range of 0.23-1.98% [31]. It is anticipated that these MK phytochemicals will act as potent inhibitors even though the composition is less. Phytochemicals with a higher percentage in composition did not predict as COX inhibitors by RF model and were poorly overlaid with the known COX drugs. In addition, these phytochemicals did not achieve optimal fit in the COX-1 and COX-2 ligand binding sites.

Molecular docking of MK phytochemicals with COX-1 and COX-2 protein targets
Molecular docking results produced by PLANTS docking program [50] shows that the four selected phytochemicals ( Figure S5 and S6) interacted better with COX-2 enzyme than COX-1 based on the lowest PLANTS score (Table 4). Girinimbine secured PLANTS score of −46.54 kJ/mol and developed H-bond with Ser529 amino acid residue, one among the top 8 key interactions of RF COX-1 classifier. Numerous apolar interactions are also observed in the dock pose with these residues: Val115, Arg119, Ile344, Leu351, Ser352, Phe517, Ile522, Gly525, Ala526, Ser529 and Leu530. PyPLIF captured an electrostatic weak H-bond (> 3.2Å distance) of Tyr354 residue with girinimbine in which one protein atom of the residue acted as H-bond acceptor (Figure 3). Contrastingly, girinimbine docking with COX-2 target obtained a better PLANT score of −76.83 kJ/mol. Although Tyr355 residue is listed in the top 8 attributes for aromatic (edge-toface) interactions, girinimbine established an H-bond and close inspection revealed that one protein atom of Tyr355 played the role of H-bond donor. Apolar contacts with Val523, Ser530 and Leu531 were also noted.
Similarly, murrayanine developed an H-bond with Ser529 of COX-1. Apolar interactions with residues Leu351, Tyr384, Trp386, Phe517, Ile522, Gly525, Ala526, Ser529, Leu530 and Leu533 along with top interactions of RF model viz. Ile344, Leu383 and Met521 were also developed. Intriguingly, murrayanine formed two aromatic systems-based interactions: edge-to-face (Trp386) and face-to-face (Phe517). It is worth noting that the top 8 interactions of RF COX-1 classifier did not list aromatic contacts as important distinguishing features between COX isoforms. The dock pose of murrayanine with COX-2 target showed an H-bond with Tyr355 and Ser530 residues by acting as protein donor atoms. Residues such as Leu384, Tyr385, Phe518, Val523, Ser530 and Leu531 created apolar contacts with murrayanine. The dock pose mapped the crucial aromatic face-to-face interactions with Phe518 residue in addition to new edge-to-face interactions of Trp387.
The dock pose of murrastinine-B contained apolar contacts with 3 top-listed residues of RF COX-1 model and 11 other amino acid residues. A new H-bond with Ile522 residue is observed. Fingerprint analysis revealed the presence of aromatic face-to-face interactions of Phe517 and a long-range weak H-bond of Ser529 (protein atom as donor; > 3.2Å distance). Two edge-to-face interactions by Tyr384 and Trp386 residues were also noted. Similar H-bonding of Ser530 is observed in the murrastinine-B dock pose with COX-2 target. Apolar contacts with four residues and aromatic edge-to-face interactions of Trp387 are also found in the dock pose.
Mukolidine developed a similar H-bonding with Ser529 residue of COX-1 enzyme along with new Tyr354 (weak) and Arg119 residue-based H-bond ( Figure 3). Twelve new apolar contacts were identified in the dock pose of mukolidine with residues including Val115, Val343, Tyr347, Leu351, Tyr384, Trp386, Phe517, Ile522, Gly525, Ala526, Ser529 and Leu530, with an exception of top apolar interactions of Met521 residue. Aromatic edge-to-face contacts of mukolidine with Trp386 residue were also detected. Mukolidine dock pose with COX-2 mapped two new H-bonds with Tyr355 and Arg120 residues ( Figure 4). Tyr355 residue is listed in the top 8 interactions of COX-2 RF model for its aromatic edge-to-face contacts. With an exception of top Phe518 apolar interactions with mukolidine, apolar contacts were identified with five new residues of the COX-2 ligand-binding site. Single aromatic edge-to-face interactions with Trp387 is also detected in the dock pose.
3.6. Dynamics simulations of MK phytochemicals with COX-1 and COX-2 targets Molecular dynamics simulations of girinimbine, murrayanine, murrastinine-B and mukolidine in complex with both COX-1 and COX-2 targets obtained from docking exercise were performed using YASARA Structure [44,45]. Simulations were carried out till 30 ns (production run) and manual analysis of structural parameters such as total energy of the system, RMSD and R g confirmed the binding stability of the COX-phytochemical complexes (Tables S5 and S6). All the resultant trajectories obtained stability in movements (less than RMSD of 1.3Å) after 0.9 ns of both COX systems. RMSD analysis of COX-1-phytochemical trajectories ( Figure S7) showed that girinimbine and mukolidine experienced fewer fluctuations compared to murrayanine and murrastinine-B complexes with peaks at multiple intervals. R g parameter indicates that the compactness of all four phytochemicals falls in the range of 24-25Å, on average. The 2D interaction plots of COX-phytochemical complexes depicted that girinimbine retained the H-bonding ability with Ser529 throughout the production run from the starting structure (dock pose with COX-1) together with a wide range of apolar contacts. Murrayanine was found to be enclosed with hydrophobic contacts of narrow COX-1 ligand binding site although H-bond interactions with Tyr354 (one of top 8 crucial contacts of RF model) with its carbonyl group at 20 ns was detected. The H-bond of murrastinine-B with Ser529 amino acid in the initial frames (till 10 ns) was lost afterwards. Strikingly, mukolidine maintained the H-bond with Tyr354 at most frames of the trajectory ( Figure S8). The total energy of the girinimbine-and murrayanine-COX2 systems was relatively better than murrastinine-B and mukolidine. Girinimbine fluctuated only around 0.4Å of RMSD and the other 3 phytochemicals yielded more than 0.8Å in the COX-2 RMSD profile. This observation reconfirmed the binding stability of girinimbine in the wide ligand-binding site of COX-2 whereas the remaining phytochemicals did not achieve optimal fit and therefore, were experiencing more RMSD fluctuations ( Figure S9). This was also mirrored in R g plot which provides evidence about changes in the compactness of the COX2-girinimbine complex and fewer changes in the other phytochemical complexes. Collectively, dynamics simulations suggested girinimbine can act as a potential COX-2 inhibitor.
Biochemical characterisations of COX2-ligand complexes showed that Ser530 residue undergoes acetylation by COX-2 inhibitors to covalently modify the enzyme activity [71]. All the four MK phytochemicals were found to interact with Ser530 at different intervals of the trajectories which are in good agreement with the H-bonding pattern of the equivalent amino acid Ser529 in COX-1 complexes. The starting structure of girinimbine contained H-bond of the oxygen atom of pyran with Tyr355 which eventually lost due to movements in the COX-2 pocket and established H-bond with Ser530 by nitrogen atom of indole group at 30 ns. Murrayanine developed H-bond with Ser530 after 20 ns while murrastinine-B conserved Hbond of Ser530 in most of the frames. Similar to girinimbine, the exchange in H-bonding pattern of mukolidine at regular intervals was detected ( Figure S10).

Free energy calculations of MK phytochemicals with COX-1 and COX-2 targets
YASARA Structure was used to compute the free energy of binding using the simulations trajectory of MK phytochemicals with COX-1 and COX-2 targets [44,45] and average free energies were interpreted for ranking phytochemicals ( Figure 5). Overall, girinimbine demonstrated the selectivity towards COX-2 over COX-1 on the basis of free energy of binding. The other 3 phytochemicals conferred selectivity of COX-1.
The distribution of free energy computed with both COX systems showed a minimum and maximum difference of −84 and −161 kcal/mol when a phytochemical binds both COX-1 and COX-2 ( Figure 5). This observation is partly influenced by the similar binding pattern conferred by four phytochemicals with COX enzymes (Tyr354 and Ser529 in COX-1; Tyr355 and Ser530 in COX-2). The interplay between H-bonds between Tyr and Ser residues in equivalent positions of COX enzymes contribute to phytochemicals binding to both COX enzymes, delineated from simulation trajectories. Murrayanine and mukolidine interact better with COX-1 than COX-2. In addition, murrastinine-B selectively binds COX-1 than COX-2. Finally, free energy calculations also provide insights into the selectivity of girinimbine with COX-2 target. As a first step towards lead identification, we carried out cytotoxicity assay of crude extracts from different parts of MK plant and the results are described in Supplementary Material.

Conclusion
We showed here that the machine learning approach trained on a large collection of actives and decoys using RF classifier is best-suited for the identification of MK phytochemicals with COX inhibitory potential. The rigorously-validated RF model identified four MK phytochemicals with high COX activity among the complete set of MK phytochemicals present in different parts of the MK plant. The selected four phytochemicals viz. girinimbine, murrayanine, murrastinine-B and mukolidine displayed better interactions with key residues of both COX systems. Girinimbine, a carbazole alkaloid present in the root, stem and leaf parts of MK plant developed crucial contacts with Ser530 and Tyr355 (crucial amino acids for COX-2 substrate binding). Further, the binding free energy calculations also revealed that the compound girinimbine selectively binds to COX-2 over COX-1. Other compounds murrayanine, murrastinine-B and mukolidine are also worth exploring as their natural product drug likeliness was computed to be closer to 1 and are selective towards COX-1 target.

Disclosure statement
No potential conflict of interest was reported by the author(s).