Virtual screening and multilevel precision-based prioritisation of natural inhibitors targeting the ATPase domain of human DNA topoisomerase II alpha

Abstract Human DNA topoisomerase II alpha (hTopIIα) is a classic chemotherapeutic drug target. The existing hTopIIα poisons cause numerous side effects such as the development of cardiotoxicity, secondary malignancies, and multidrug resistance. The use of catalytic inhibitors targeting the ATP-binding cavity of the enzyme is considered a safer alternative due to the less deleterious mechanism of action. Hence, in this study, we carried out high throughput structure-based virtual screening of the NPASS natural product database against the ATPase domain of hTopIIα and identified the five best ligand hits. This was followed by comprehensive validation through molecular dynamics simulations, binding free energy calculation and ADMET analysis. On stringent multilevel prioritization, we identified promising natural product catalytic inhibitors that showed high binding affinity and stability within the ligand-binding cavity and may serve as ideal hits for anticancer drug development. Communicated by Ramaswamy H. Sarma


Introduction
Topoisomerases are DNA-binding enzymes that are ubiquitous in maintaining genome topology in all living organisms.They bring about the alteration of DNA topology, through active site tyrosine-mediated, transient, single or double-stranded cleavage to fulfil the numerous demands of cellular metabolism (Nitiss, 2009a).The indispensable nature of these enzymes has long been recognised as evidenced by the vast number of studies detailing their classification, distribution, and repertoire of functions.Four structurally and functionally distinct subfamilies of topoisomerases have been elucidated, namely types IA, IB, IIA, and IIB.Type II topoisomerase enzymes catalyze ATP-dependent double-stranded breakage of DNA as opposed to the ATP-independent, single-stranded cleavage moderated by type I enzymes.The type IIA subfamily is of clinical significance, as it encompasses the bacterial enzymes, DNA gyrase, and topoisomerase IV, along with the eukaryotic DNA topoisomerases (Champoux, 2001;McClendon & Osheroff, 2007;J. C. Wang, 2002).Consequently, the critical role played by the aforementioned enzymes in microbial pathogenesis and cancer metabolism has established them as classical drug targets, with several antibiotics and anticancer agents in clinical use (Bax et al., 2010;Delgado et al., 2018;Liang et al., 2019;Nitiss, 2009b;Pommier et al., 2010).
Eukaryotic DNA topoisomerase II (TopII) is a homodimer, delineated into three domains (Figure 1)-the N-terminal ATPase domain, the central, DNA-binding cleavage-religation domain, and the non-conserved C-terminal domain.The ATPase domain binds one ATP molecule per chain, both of which are hydrolysed to supply the energy required to power the enzyme.This domain is further divided into two regions, namely, the nucleotide-binding domain (NBD) and the transducer domain (TD).The residues of the ligand-binding cavity of NBD, along with the Gln-Thr-Lys (QTK) loop from the TD, are highly conserved which denotes their significant role in ATP-binding and catalysis.The cleavage-religation domain binds with the DNA substrate and modulates double-stranded cleavage, along with the C-terminal domain.Together, the different domains work in synergy through the coordinated opening and closing of the dimer interface regions known as 'gates' to transport an intact DNA segment (T-segment) through the enzyme-linked, cleaved segment.This process is carried out iteratively and is driven by the hydrolysis of two ATPs per enzymatic cycle.In this way, the interconversion of different topological states of DNA takes place whenever required during processes such as DNA replication, chromosome regulation and dynamics, and transcription (Champoux, 2001;Pommier et al., 2016;Schoeffler & Berger, 2008;J. C. Wang, 2002;Wei et al., 2005).
In vertebrates, the two distinct type IIA isoenzymes are TopII alpha and beta, which are encoded by different genes and characterised by well-defined cellular roles and expression levels.Of the two, TopII alpha is the predominant isoform essential for cellular viability and its expression is elevated at the S and G2/M phases of the cell cycle.It plays a significant role in the progression of DNA replication as well as the condensation, segregation and maintenance of chromosomes.Its activity has also proven to be vital for developing embryos and proliferating cells.The beta isoform, however, is not crucial for cell survival and shows basal expression throughout the cell cycle.It is found in all tissues but is elevated in the heart, lungs, and quiescent cells.Though it plays an accessory role during mitosis, it is known to be highly effective in DNA repair and transcription.It has also proven to be essential for neural development and spermatogenesis (Akimitsu et al., 2003;Austin et al., 2018;Bollimpelli et al., 2017;Capranico et al., 1992;Lee & Berger, 2019;Turley et al., 1997;Woessner et al., 1991).Both isoforms are observed to be significantly upregulated in various cancer conditions, thereby placing them among the principal antineoplastic targets in humans.Higher expression of human TopII alpha is associated with several aggressive cancers such as breast, prostate, lung and colorectal cancers, as well as gynaecological malignancies (Coss et al., 2009;Depowski et al., 2000;Hou et al., 2017;Schaefer-Klein et al., 2015;van der Zee et al., 1994;Zhao et al., 2020).Whereas, overexpression of beta enzyme is seen in aggressive glioblastoma, slow-growing, and solid tumours in humans.This isoform has also been identified as the leading cause of the development of secondary malignancies (Austin et al., 2018;Azarova et al., 2007;Gao et al., 1999;Kenig et al., 2016;V� avrov� a & � Simůnek, 2012).
The wide range of chemotherapeutic drugs targeting the different stages of the TopII enzymatic cycle is categorised into poisons and catalytic inhibitors (CIs), based on the mechanism of action (Figure 1).TopII poisons operate through multiple modes, such as the prevention of DNA re-ligation by cleavage complex trapping (podophyllotoxins like etoposide, and teniposide) or DNA intercalation (anthracyclines like doxorubicin and daunorubicin).The resulting genotoxic stress due to the accumulation of DNA breaks and trapped cleavage complexes triggers apoptosis of both malignant and healthy cells expressing TopII.Therefore, although TopII poisons are potent anticancer drugs, the chief drawback is their detrimental mechanism of inhibition besides other deleterious side effects such as the development of cardiomyopathy, drug resistance, and therapy-induced metastasis (Hevener et al., 2018;Nitiss, 2009b;Pommier et al., 2010).
On the other hand, CIs interfere with TopII action without causing genotoxicity.CIs targeting the ATPase domain affect enzymatic activity in two primary ways.The first group consists of dexrazoxane (ICRF-187) and related molecules that arrest the ATPase domain in closed-clamp conformation and effectively prevent the passage of the T-segment of DNA.However, dexrazoxane is currently withdrawn from clinical use owing to poor efficacy.The other group is the ATP-site inhibitors that target the nucleotide-binding cavity to prevent ATP binding and subsequent hydrolysis, causing a decrease in enzyme turnover.Over the years, several classes of purine analogues, small molecules, and natural products have been predicted to exhibit ATP-site enzymatic inhibition (Bergant et al., 2018).Nevertheless, none of the ATP-site inhibitors have been approved for therapeutic use at present.
Therefore, due to the unfulfilled requirement for effective CIs targeting the ATPase domain, we have undertaken an in silico study for identifying novel natural product (NP) hits against human TopII alpha (hTopIIa).We carried out high throughput virtual screening (HTVS) of the NPASS database and potential CIs of hTopIIa were meticulously prioritised based on energetic parameters and significant interactions.Furthermore, molecular dynamics simulations of the selected receptor-ligand complexes were performed for understanding the stability of the hits followed by a detailed ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) analysis.The methodology of the study and rationale for the identification of putative natural product CIs of hTopIIa are discussed in the following sections.

Structure refinement of the ATPase domain of hTopIIa
The structure of the ATPase domain of hTopIIa (PDB ID: 1ZXM), co-crystallized with the non-hydrolyzable ATP analogue AMP-PNP (adenylyl-imidodiphosphate), was retrieved from RCSB Protein Data Bank (X-ray crystallographic resolution ¼ 1.87 Å) (Wei et al., 2005).The Protein Preparation Wizard of the Schr€ odinger suite v.2019-3 (Schr€ odinger LLC, 2019) was utilised for optimising the receptor by generating bond orders, adding hydrogen atoms, and deleting water molecules.Missing atoms and residues were modelled using the Prime module.Residue protonation states and heteroatom ionization states at physiological pH were generated through PROPKA and Epik modules (Shelley et al., 2007), respectively.Finally, energy minimization using the OPLS_2005 force field was carried out for obtaining a refined receptor at a minimum energy conformation.Additionally, the residue interactions of AMP-PNP in chain A of the prepared protein were analysed through visual inspection and using the Structural Interaction Fingerprints (SIFt) panel (Deng et al., 2004).

Preparation of NPASS ligand dataset
In our quest to identify novel ATP-site CIs of hTopIIa, we turned towards Natural Product Activity and Species Source (NPASS) v.1.0,a curated, open-source NP database (Zeng et al., 2018).
The entire structure dataset (in SDF format) was downloaded and prepared for docking using the LigPrep panel of the Schr€ odinger suite.LigPrep offers a variety of parameters for generating high-quality, geometry-optimised structures of small molecules.Initially, all-atom three-dimensional structures of the ligands were prepared from the two-dimensional input, following which ionization states were generated at pH 7.0 ± 2.0 using the Epik module.The ligands were then desalted, and tautomers and stereoisomers were generated.Finally, the OPLS_ 2005 force field was used for energy minimization of the entire dataset of optimised ligands.

Preparation of reference catalytic inhibitor salvicine
At present, there is a lack of CI-bound crystallographic structures of the ATPase domain of hTopIIa.Therefore, in this study, the compound salvicine was used as a reference CI for docking and comparing binding affinities of screened ligands (Hu et al., 2006).The structure of salvicine was retrieved from the PubChem database (PubChem CID 10359290) in SDF format and optimised for docking through the LigPrep panel.

Virtual screening of NPASS ligand dataset and salvicine against ATPase domain of hTopIIa
Structure-based virtual screening (SBVS) is a powerful in silico method for rapidly predicting the binding affinity of small molecule libraries with a selected target.It has aided in significantly reducing the time, expenditure and human effort involved in the identification of potential lead molecules for drug design based on parameters such as docking score and binding mode (Maia et al., 2020).In this study, the GLIDE (Grid-based Ligand Docking with Energetics) program of the Schr€ odinger suite was utilised for the HTVS of the NPASS ligand dataset against the NBD in chain A of ATPase domain of hTopIIa (Halgren et al., 2004).GLIDE efficiently employs hierarchical filters such as HTVS, SP, and XP for the successive filtration of docked ligands resulting in a refined pool of molecules with high binding affinity and optimal binding conformation.Prior to the screening, all heteroatoms were stripped from the receptor.

Receptor grid generation
SIFt analysis and visual inspection revealed critical interactions between the co-crystallized ligand and catalytic site residues in chain A of 1ZXM.These residues were selected for constructing a receptor grid for performing docking calculations.The van der Waals scaling and partial charge cut-off for non-polar atoms of the receptor were set to 1.0 and 0.25 respectively during grid generation.

Virtual screening protocol
For the screening of the NPASS dataset, the potential for nonpolar atoms of ligands was softened by setting the van der Waals scaling factor to 0.80 with a partial charge cut-off of 0.15.The ligands were flexibly docked and nitrogen inversions and ring conformations were sampled.Virtual screening was first carried out in HTVS mode following which the top 50% of unique hits were passed on to Standard Precision (SP) docking.The top 25% of unique hits obtained from this step were carried forward to Extra Precision (XP) mode, for which XP descriptor information was recorded.Per-residue interaction scores were written for all residues interacting with the ligands during docking.The top hundred unique hits shortlisted based on XP GScore were carried over to the next step.Finally, the reference inhibitor, salvicine, was also docked in XP mode.

Molecular Mechanics/Generalized Born Surface Area calculations
Molecular Mechanics-Generalized Born Surface Area (MM/GBSA) is a parameter calculated for assessing the relative binding free energies of a set of ligands with respect to the receptor (E.Wang et al., 2019).The DG bind scores for the top hundred hits as well as the hTopIIa-salvicine docked complex from the previous step were calculated using the Prime MM-GBSA panel, as follows: This equation can be decomposed into individual energy terms using the formula: where DE MM represents the gas-phase interaction energy given as a sum of internal, electrostatic, and van der Waals energies; DG SOL represents the free energy of solvation given as a sum of polar and non-polar energy terms and TDS represents the entropy change.
The VSGB solvation model and OPLS_2005 force field were used in the MM/GBSA calculations.

Selection of best five NP hits
The top hundred NP hits obtained after SBVS and MM/GBSA scoring were manually inspected and endogenous molecules found in the human body (such as ATP analogues, metabolic precursors and metabolites), as well as non-endogenous aliphatic molecules, were removed.The remaining ligands were then stringently scrutinised and reranked based on XP GScore, HBond score, binding free energy, penalties, and critical residue interactions.Out of these, only the best five complexes were selected for further steps such as the assessment of physicochemical properties and stability analysis through MD simulations.
The protein-ligand interactions present in the selected docked complexes were examined using Protein-Ligand Interaction Profiler (PLIP) (Salentin et al., 2015) and the structures were rendered using Pymol v.2.3.1.

Evaluation of physicochemical properties, pharmacokinetics, and toxicity parameters of salvicine and selected NP hits
The physicochemical and pharmacokinetic properties of ligands together with the toxicity profiles are important criteria that aid in determining the suitability of screened hits to be carried forward for in vitro testing and subsequent clinical development.These parameters are crucial for predicting the behaviour of a ligand within a biological system, such as bioavailability, potency toward the desired target and induction of undesirable reactions such as toxicity or promiscuity.Salvicine and the selected NP hits were assessed using various cheminformatic tools.The parent chemical groups of the ligands were identified using Classyfire, an online tool that computes chemical taxonomy based on structural features (Djoumbou Feunang et al., 2016).The physicochemical and pharmacokinetic properties were analysed by the Qikprop module of the Schr€ odinger suite.Furthermore, the webserver SwissADME was utilised for computing additional pharmacokinetic and medicinal chemistry parameters (Daina et al., 2017).The ProTox-II server, an open-source in silico toxicity prediction tool, was used for computing the acute toxicity class, various toxicological endpoints and toxicological pathway reactivities based on the structures of the ligands (Banerjee et al., 2018).

Molecular dynamics simulations of six docked complexes
Molecular dynamics (MD) simulations serve as a robust tool for studying protein-ligand interactions of complexes obtained from SBVS and aid in observing local fluctuations and deviations in the ligand-binding cavity (Alonso et al., 2006).The stability and interaction network of salvicine and the five NP hits within the binding cavity of hTopIIa were evaluated through 100 ns MD simulations.Condensed phase all-atom MD simulations of the hTopIIa-ligand complexes were carried out using the Desmond package, using identical protocols (Bowers et al., 2006).Each simulation system was set up through the System Builder panel by placing a docked complex in an orthorhombic box at a buffer distance of 10 Å from the box boundary.The system was then solvated by the TIP3P water model and neutralised by the addition of an appropriate number of Na þ /Cl -ions.Additional ions were added for simulating physiological salt concentrations of 0.15 M. The electroneutral system was brought to a relaxed state by energy minimization using the LBFGS algorithm, for a maximum of 2000 iterations until convergence with a threshold of 1.0 kcal/mol/Å, followed by an additional extensive default relaxation protocol.Short-range Coulomb interactions were treated by the cut-off method with a radius of 9 Å.Finally, an NPT ensemble was used for the production simulation run of 100 ns timescale.The simulation was carried out at a temperature of 300 K and pressure of 1.0 bar using Nose-Hoover chain thermostat (Tuckerman et al., 1992) and the Martyna-Tobias-Klein barostat (Martyna et al., 1994) respectively.Simulation trajectory data was recorded at an interval of 4.8 ps.
On completion, the MD simulation trajectories were subjected to exhaustive analyses using the Simulation Event Analysis and Simulation Interaction Diagram tools.Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) of receptors and ligands, the radius of gyration, and interaction network parameters were computed.The Simulation Quality Analysis panel was used to assess the overall quality of the simulations.
The methodology of the entire study has been summarised in Figure 2.

Key interactions and structural insights on the ATPase domain of hTopIIa
The target region of hTopIIa, the ATPase domain, was a homodimer composed of two chains A and B of 400 amino acids each, ranging from residues 29 to 428.In each chain, the NBD comprised residues 29 to 264, followed by the TD, from residues 265 to 428.The prominent characteristic of the NBD was the Bergerat motif, which was a conserved structural feature common to ATP-binding proteins of the GHKL superfamily (Bergerat et al., 1997;Wei et al., 2005) Critical inspection and SIFt analysis of the chain A catalytic site of 1ZXM revealed crucial protein-ligand interactions which were grouped based on the three distinct structural moieties of the co-crystallized ATP analogue, namely the adenine moiety, ribose moiety, and triphosphate tail which was co-ordinately bonded to Mg 2þ .Multiple water molecules were also present within this site and hence it was surmised that water bridges contributed significantly to ligand binding.The adenine moiety was strongly stabilised by a hydrogen bond (H-bond) with ASN 120, which along with THR 215 and TYR 34 (from chain B) participated in water-mediated interactions as well.Moreover, ILE 88, ASN 91, ASN 95, and LYS 123 and the side chain of ILE 125 were capable of water-mediated and hydrophobic interactions respectively.
The major interactions of the ribose moiety were through SER 149 which was within H-bond distance of both ribose hydroxy groups.Additionally, it was noted that both SER 149 and TYR 34 (B) might interact through water bridges and the side-chains of ILE 141 and PHE 142 were close enough to engage in non-polar interactions with ribose (Jane� zi� c et al., 2017;Pogorel� cnik et al., 2015).
The H-bond network of the triphosphate groups was identified to be most crucial for enzymatic function.The triphosphate moiety interacted extensively with a pocket of the catalytic site formed by a glycine-rich sequence given by GXXGXG and comprised residues 161-166.Known as the Walker A motif, it is a greatly conserved sequence found in ATP-binding enzymes.In addition to this motif, ASN 91 formed an H-bond with the a-phosphate, whereas SER 148 and ASN 150 formed H-bonds with the b-phosphate.The residue LYS 168 which flanked the motif, was hydrogenbonded to the O5' atom of the ligand and also formed a stable salt bridge with the a-phosphate.Most importantly, an Mg 2þ cation formed an octahedral coordination complex with the oxygen atoms of the three phosphates, the carbonyl oxygen of ASN 91, and two intimately placed water molecules.One of these conserved water molecules also showed a water-mediated interaction with GLU 87 and this residue was believed to hold the water molecule in place to promote nucleophilic attack during ATP hydrolysis.
Furthermore, it has been established that the QTK loop from the TD, which consisted of GLN 376, THR 377, and LYS 378 here, played a critical role in enzyme turnover (Bendsen et al., 2009).The strong salt bridge formed between the sidechain amino nitrogen of LYS 378 and c-phosphate of ATP will be hydrolysed to release the energy required for the catalysis besides the opening of the N-gate.GLN 376 was also situated near enough to be hydrogen-bonded to the c-phosphate.

Rationale behind SBVS in the absence of Mg 2þ cation
Based on the assumption that Mg 2þ was essential for ligand binding, earlier studies had docked potential CIs of hTopIIa in the presence of the cation (Bergant et al., 2018).However, we carried out high throughput SBVS after removing the Mg 2þ cation from the catalytic site.The rationale behind this step is detailed below.The catalytic cycle of hTopII utilises four Mg 2þ cations, out of which two are involved in the two-metal-ion DNA cleavage mechanism at the cleavage-religation domain and the remaining two at the ATPase domain.As described earlier, each of the latter two Mg 2þ cations binds an ATP in either chain of the ATPase domain.The triphosphate moiety of ATP exhibited strong tridentate coordination with the Mg 2þ cation which held the c-phosphate in place to facilitate hydrolysis.Though this illustrated the crucial role of the cofactor, the question arose whether a cation was required for the stabilisation of the CIs competing for the ATP-binding site.One of the primary reasons to support a cofactor-free docking environment was that there existed a greater probability of the native ligand binding to the NBD as an ATP-Mg substrate rather than a cation-free entity.This was experimentally demonstrated by Osheroff.N where it was observed that the rate of ATP-Mg complex formation was directly proportional to ATPase activity, showing that the ligand binds as an ATP-Mg substrate in the catalytic site (Deweese & Osheroff, 2010;Osheroff, 1987).Moreover, it is known that over 75% of cellular ATP exists in the form of Mg-complexed ATP and attained a biologically functional conformation by binding to the cofactor (Gout et al., 2014;Weston, 2009;Wolf & Cittadini, 2003).This meant that the cofactor in the catalytic site was not a component of the enzyme but rather came along with the nucleotide.Lastly, we sought to identify ligands that exhibited a strong affinity towards the binding site without the energetic contributions or stabilisation of a cofactor, which may prove to be more desirable.

Five NP hits selected from screened NPASS database
There are several open-source NP databases available now but the NPASS database was chosen for this study due to its many merits.Other repositories mostly contained a limited number of entries from specific biological domains, whereas, NPASS v.1.0comprised 30,926 NPs (at the time of the study) with diverse sources of origin across all domains of life from micro-organisms to plants and animals.Moreover, the curated, richly annotated entries are hosted on a well-maintained server, with a readily available structure dataset and metadata for download.The NPASS v.1.0ligand dataset of 30,926 ligands was prepared to yield 46,458 structures in total, and subjected to HTVS.Following this, 8,726 ligands (top 50% of unique HTVS output) were carried forward to SP docking, from which 2,179 (top 25% of unique SP output) were selected for XP docking.Subsequently, the top hundred hits ranked based on XP docking score were subjected to MM/GBSA analysis.These hundred were manually inspected, undesirable ligands were removed and the remaining ones were reranked to give five top hits namely, NPC29607, NPC470359, NPC43211, NPC186418 and NPC132921.The chemical structures of these five hits along with salvicine are illustrated in Supplementary Figure 1(a)-(f).The binding modes of salvicine, the chosen NP ligands and ATP (modelled from PDB ID: 1ZXM as a reference for binding orientation) within the nucleotide-binding site are depicted in Figure 3(a)-(g).
The empirical formula, NPASS and PubChem database identifiers, chemical class and natural sources of origin of all six small molecules are given in Table 1.The Classyfire tool computed the chemical taxonomy and the parent chemical group of each of the six ligands.The server output showing the hierarchical taxonomies of the ligands is displayed in Supplementary Table 1.
The reference compound salvicine is a semi-synthetic diterpene quinone of the naphthoquinone class, derived from a phytocompound found in Salvia prionitis, a Chinese medicinal herb.It was seen to exhibit ATP-competitive antitumor activity against hTopIIa and has currently advanced to clinical trials in China (Hu et al., 2006;Xu et al., 2012;Zhang et al., 1999).Hence, for this reason, it was chosen as the reference inhibitor for this virtual screening study.
NPC29607 was identified as longumoside B, which is a naturally occurring glycoside present in the Indian long pepper plant, Piper longum (Jiang et al., 2013).NPC470359 was a fungal phthalide named penicacid C, belonging to a class of mycophenolic acid derivatives, which was isolated from the marine Penicillium species SOF07 (Chen et al., 2012).NPC43211, NPC186418 and NPC132921 were identified as scutellarin, caffeic acid 4-O-glucoside, also known as linocaffein, and caffeoylmalic acid respectively, which have widespread occurrence in the plant kingdom (Budzianowski, 1990;Klosterman & Muggli, 1959;L. Wang & Ma, 2018).Scutellarin is a well-known flavonoid widely studied for its therapeutic properties.Linocaffein is a phenolic glycoside and caffeoylmalic acid belongs to the coumaric acid derivatives class.Both these ligands contain a hydroxycinnamic acid moiety and are being investigated for their nutraceutical effects on human ailments.It should also be noted that salvicine and all selected compounds contained aromatic substructures, with all but caffeoylmalic acid containing an additional heterocyclic or carbocyclic moiety as well.
The results from virtual screening, binding free energy analysis and PLIP analysis of the six docked complexes are listed in Table 2.The docking scores and binding free energies revealed that all five selected NP ligands scored significantly better than salvicine.The docking score of salvicine was only À 6.732 kcal/mol whereas the values for the NP hits ranged from À 17.072 to À 14.484 kcal/mol.Longumoside B notably exhibited the highest docking score of À 17.072 kcal/mol.It was followed by penicacid C (À 14.924), scutellarin (À 14.818), linocaffein (À 14.623) and caffeoylmalic acid (À 14.484).The dG bind of salvicine was À 46.786 kcal/mol whereas, for the NP ligands, the binding free energies ranged between À 81.777 and À 62.640 kcal/mol.The contribution of the HBond component of the GLIDE docking score, which is a measure of the energetic contribution of H-bond formation, was also lower for salvicine than the NP hits.These results are indicative of relatively stronger binding affinities exhibited by the NP ligands within the catalytic site than salvicine.
As for the binding mode of salvicine, the exact pose has not been determined by crystallographic methods and previous studies have proposed different modes using a variety of tools (Radaeva et al., 2020).From these studies, we observed that some residues were repeatedly shown to be involved in the recognition of salvicine within the NBD, which correlated with the interacting residues in our docked complex as well.In the hTopIIa-salvicine complex, the naphthalene group of salvicine partially coincided with the region occupied by the ribose of ATP and a terminal isopropyl group extended towards the QTK loop.H-bond formation with ASN 91, SER 148, ASN 150, ARG 162 and GLY 164 was detected.
On the other hand, the formation of extensive H-bond networks was observed in all the hTopIIa-NP complexes.Unlike salvicine, all NP complexes showed the presence of at least one salt bridge as well.In this scenario, salt bridges have been defined as ionic bonding formed between the proton donated by LYS 168 or LYS 378 and negatively charged moieties from the NP ligands.
The binding mode of longumoside B was such that the trans-cinnamic acid group, b-D-glucopyranose ring and methyl pentanedioic acid group aligned with the positions of the adenine, ribose and triphosphate moieties of ATP respectively.The hydroxy and carbonyl moieties showed extensive H-bond interaction with the Walker A motif and salt bridges were formed with both LYS 168 and LYS 378.
In the case of penicacid C, the isobenzofuranone (phthalide) group fell roughly between the region occupied by the adenine and ribose moieties in the native/co-crystallized ligand and the position of the carboxylic acid group coincided with the phosphate tail.Scutellarin was the largest among the selected NP hits and its binding region extended well beyond that of the native ligand.The flavonoid fragment extended beyond the adenine binding region and the glucuronic acid moiety coincided with the position of the triphosphate tail.Both formed H-bonds with the Walker A motif and salt bridge formation was observed between penicacid C and LYS 378 as well as scutellarin with LYS 168.In linocaffein, the carboxyl moiety of the hydroxycinnamic acid fragment was the only region that coincided with the binding orientation of the native ligand.Caffeoylmalic acid was oriented such that the double-ring moiety coincided with the position of the adenine of the native ligand.

Analysis of molecular dynamics simulations of the six docked complexes
The MD simulations yielded interesting results, where the formation and stability of various interactions exhibited by all six ligands were meticulously evaluated (Figures 4-9a-c and Supplementary Figure 2a-f).
The RMSD of the backbone atoms of the receptor protein indicated each structure's stability under simulated conditions whereas the radius of gyration indicated the degree of compactness of the receptor.For all complexes, the RMSD of protein backbones was around 3 Å and they also exhibited a stable Rg in the range of 2.7-2.8Å throughout the simulation timescales.The RMSF of the protein backbone atoms were computed to examine the local shifts in the receptor.While the residues in the loop and terminal regions showed large fluctuations as expected, the rest remained relatively stable, especially the catalytic site residues showed minimal fluctuations (<1.5 Å) in all complexes.The internal RMSD of all six ligands was maintained around 2 Å or under, which showed that they underwent mild internal fluctuations with respect to the input binding mode.
In the case of salvicine, visual inspection of the simulation trajectory showed that the ligand remained within the cavity throughout and did not undergo major internal conformational changes (as seen from ligand internal RMSD).However, the Lig fit Prot RMSD (which indicated the RMSD of the ligand from complex superimposed on protein backbone RMSD) fluctuated throughout the simulation timescale which indicated poor stability as a whole within the binding site.This was supported by the Protein-Ligand contacts histogram and timeline where it was observed that salvicine failed to maintain all of its strong H-bonds with ASN 91, SER 148 and ASN 150 from the docked complex.Moreover, hydrophobic contacts and water bridges were the primary interactions between the ligand and protein, with negligible H-bond formation.The number and nature of interactions as well as interacting residues across different time points in the simulation were observed to vary throughout.It can be concluded that a combination of multiple transient and mostly weak interactions kept salvicine from diffusing away from the NBD.The top-scoring NP hit longumoside B exhibited the best interaction profile and a high degree of stability when compared to salvicine and other NPs.Trajectory inspection showed that it was tightly bound to the protein predominantly by numerous stable H-bonds followed by limited hydrophobic interactions and water bridges.The Lig fit Lig RMSD showed two distinct shifts which were mirrored by the Lig fit Prot RMSD.That is, the ligand seemed to have readjusted its binding orientation such that the trans-cinnamic acid moiety moved from its original position to form a stable p-p stacking with PHE 142.This change was reflected in the RMSD increase from �20 ns to 60 ns.Around 60 ns the Lig fit Lig and in turn, Lig fit Prot had further increased showing more changes in the ligand binding mode, which remained stable till the end of the simulation.This was primarily due to the movement of the glucopyranoside ring.All H-bonds but one, observed in the docked complex were maintained extremely well throughout the simulation timescale.Instead of SER 149, an H-bond was formed with SER 148 during the simulation.Most importantly, the two salt bridges with LYS 168 and LYS 378 were present for over 95% of the simulation timescale.The crucial QTK loop was strongly bound by forming an H-bond with GLN 376 as well.
Trajectory analysis of penicacid C revealed it was similar to salvicine such that the ligand did not undergo internal conformation changes but exhibited a poor interaction profile (as evidenced by Lig fit Lig, Lig fit Prot RMSD and PL contact graphs).While the H-bonds with SER 148, SER 149 and Walker motif residues were relatively well-maintained, the weak H-bond with GLN 376 and salt bridge with LYS 378 were reduced to water-mediated interactions.Moreover, recurring intramolecular H-bonds formed between the ligand hydroxy groups were identified as major deterrents of intramolecular H-bond formation.On the whole, the ligand seemed to be stabilised by five H-bonds on average and transient water bridges.
Scutellarin did neither undergo internal conformational changes nor did move much from its initial binding orientation (as evidenced by Lig fit Lig, Lig fit Prot RMSD graphs).An exceptionally strong H-bond as well as water bridges with ASN 91, and an extremely stable salt bridge with LYS 168 were observed.A moderately stable H-bond with SER 149 was seen and none of the others from the docked complex was maintained well.Surprisingly, a highly secure Hbond was introduced during the simulation between the phenolic hydroxy group and ASN 120.Also, ARG 62 established a couple of relatively stable H-bonds, which were not observed in the docked complex.On the whole, scutellarin remained firmly bound within the cavity through ASN 91, ASN 120, and LYS 168 and a mixture of fairly strong H-bonds and shifting water bridges.
Linocaffein formed stable intermolecular interactions mainly in the form of H-bonds followed by water-mediated interactions.Lig fit Prot shows a mild disturbance between around 65 to 73 ns, when the glucopyranoside hydroxyl group was temporarily attracted by the LYS 157 side chain, but the ligand remained otherwise stable.Compared to the docked complex, the ASN 91 and Walker motif H-bonds were conserved and supported by water bridges.Most importantly, the LYS 378 salt bridge was exceptionally strong along with the GLN 376 H-bond, which held the QTK loop in place.On average, about 6 H-bonds were present at any point in the simulated duration.
Finally, caffeoylmalic acid was noted to be relatively stable (Lig fit Prot RMSD) and like linocaffein, was tightly bound through H-bonds and several water bridges during the simulation.The Walker motif H-bonds as seen in the docked complex, with residues ARG 162, ASN 163, TYR 165 and GLY 166 were preserved while the others mostly existed as transient water-mediated interactions.The QTK loop was well held steady by a robust GLN 376 H-bond and LYS 378 salt bridge.

Physicochemical properties and ADMET analysis of salvicine and five NP hits
The relevant physicochemical and ADMET properties computed for all six small molecules by Qikprop and SwissADME are presented in Tables 3 and 4. QPlogPo/w and QPlogS were within the preferred range for all six ligands, which indicated good lipophilicity and solubility.QPPCaco indicated passive entry across Caco-2 cells which represented the gut-blood barrier, for which salvicine exhibited high GI absorption whereas the rest showed low GI absorption.SwissADME qualitatively indicated high GI absorption for salvicine, borderline absorption for caffeoylmalic acid and low absorption for the other NPs.Salvicine also showed high oral absorption followed by penicacid C and longumoside B which were predicted to have moderate absorption.The other NP hits exhibited poor oral absorption.
SwissADME computed longumoside B, penicacid C and scutellarin as potential P-glycoprotein substrates.All NP hits were observed to be non-inhibitors of cytochrome P450 isoforms whereas salvicine was predicted as a CYP2C19 inhibitor.
QPlogHERG predicted the likeliness of HERG K þ channel inhibition which is a marker of cardiotoxicity and other adverse reactions.While all ligands scored in the recommended range, the predicted value for salvicine (À 4.228) was close to the cut-off value (À 5).
Only salvicine, longumoside B and penicacid C fell within the suggested range for QPlogBB.QPPMDCK, which indicated oral absorption as well as passive diffusion across the bloodbrain barrier (BBB) was predicted to be excellent for salvicine while the NP hits showed low MDCK cell permeability.The SwissADME prediction also supported the Qikprop results and indicated that only salvicine was BBB permeant.Moreover, salvicine was more likely to express CNS activity as well whereas the NP hits were predicted as inactive.All six ligands scored within the prescribed range for QPlogKhsa, which was indicative of human serum albumin binding which affects the pharmacokinetics.
Lipinski's rule of five (RO5) and Jorgensen's rule of three (RO3) define optimal physicochemical properties for orally bioavailable drugs.Similarly, the leadlikeness property predicted by SwissADME provides parameters for small molecules which can be optimised to become oral drugs.Salvicine, longumoside B, penicacid C and caffeoylmalic acid qualified the RO5 parameters.Salvicine was the only molecule which qualified RO3 and showed lead-like behaviour as well.The synthetic accessibility of all six ligands was predicted to be easy to moderate.It is a quantitative parameter supplied by SwissADME generated based on the principle that if fragments present in the screened molecules are frequently found in synthetic libraries, then they must be easier to synthesise.Though all five NP hits are naturally occurring, synthetic accessibility is a marker for prioritising molecules as it is indicative of future synthetic potential if they will be considered for downstream biomedical applications.
Pan-assay interference compounds (PAINS) and Brenk's structural alert filters are used to identify promiscuous substructures which demonstrate non-specific activity and produce undesirable side effects.These filters are critical for screening and eliminating molecules with unwanted fragments from a pool of potential leads before beginning in vitro testing.Salvicine showed two PAINS alerts, namely imine_one_A, quinone_D and one Brenk alert, which was a diketo_group.Scutellarin contained a catechol_A, linocaffein contained a michael_acceptor_1 and caffeoylmalic acid had one each of both fragments.Longumoside B and penicacid C were free from undesirable fragments.
The acute oral toxicity module of ProTox-II (Table 5) predicted all ligands as less toxic, with the predicted classes being 4 for salvicine and penicacid C, and 5 for the other NP hits (the prediction accuracy for these calculations was estimated to be between 67.38% to 69.26%).The median lethal dose (LD 50 ) for the class 4 ligands was significantly lower than the rest, predicted as 1600 mg/kg for salvicine and 352 mg/kg for penicacid C (lowest of all ligands).The LD 50 for the rest ranged from 3800 mg/kg to 5000 mg/kg.Salvicine was predicted to be a potential carcinogen and active for the MMP pathway, with possible mutagenic behaviour.Amongst the NP ligands, longumoside B and penicacid C were found to be inactive for all the toxicity endpoints, with only the former receiving high confidence scores for all parameters.Scutellarin was also predicted to be a potential carcinogen (low confidence score).Of all the ligands, both linocaffein and caffeoylmalic acid were classified as immunotoxic (high confidence scores).

Discussion
Catalytic inhibition of hTopIIa has long been proposed as a viable alternative to the enzymatic poisoning caused by neoplastic drugs targeting the DNA-binding region.Though such inhibitors are currently unavailable for medical use, several promising molecules are under evaluation across various stages of pre-clinical and clinical trials, some of which are NPs (Hevener et al., 2018;Liang et al., 2019).NPs have been used as novel precursors for drug development for decades and continue to inspire newer therapeutic scaffolds (Atanasov et al., 2015, Atanasov et al., 2021;Newman & Cragg, 2020).In recent times, several studies have utilised in silico tools for the screening of large NP databases in the hopes of identifying such novel lead molecules (Chikhale et al., 2021;Naik et al., 2020;Radaeva et al., 2020;Umashankar et al., 2021).
In this study, the ATPase domain of hTopIIa (PDB ID: 1ZXM) was our chosen receptor, in which the NBDs of both protomers exhibited a similar ligand binding pattern.We targeted only chain A of the receptor and screened the NP database, NPASS.This was done due to the similarity in interaction fingerprints of both chains and most importantly it has been proven that the binding of a single ATP to the NBD of chain A was sufficient for inducing enzymatic cycling, by triggering dimerization and capture of the T-segment (Skouboe et al., 2003;Wei et al., 2005).
HTVS was carried out after removing the cofactor Mg2 þ since it was most likely to be brought in by the native ligand ATP.We supposed that strong NP binders which functioned independently of metal coordination stabilisation contributed by the Mg2 þ cation would exhibit better inhibitor potential.Therefore, we selected salvicine as a reference inhibitor for a fair comparison of binding affinity and binding mode, rather than the native ligand ATP.
Following HTVS, the top hundred hits were subjected to MM/GBSA analysis.As an important step in the stringent selection of best hits, critical residues involved in the recognition of ATP and especially the crucial LYS 378 (chosen through binding site analysis) were used for hit prioritisation.The LYS 378 was given importance since an ideal catalytic lead molecule should form a strong non-hydrolyzable bond with this residue and hold the QTK loop in place, and limit the movement of the TD.However, it should be noted that, though this trait was preferable, any strong binder which remained tightly bound to the NBD and blocked the cavity from subsequent ATP binding was also suitable as this would .005 a Predicted octanol/water partition coefficient: -2.0-6.5.b Predicted aqueous solubility: -6.5-0.5.c Predicted apparent Caco-2 cell permeability in nm/sec: <25 poor, >500 great.d Predicted qualitative human oral absorption: 1, 2, or 3 for low, medium, or high.e Predicted human oral absorption on a 0 to 100% scale: >80% is high, <25% is poor.f No. of violations of Lipinski's rule of five, given by: Molecular weight < 500 Da, QPlogPo/w < 5, H-bond donors � 5, H-bond acceptors � 10. g No. of violations of Jorgensen's rule of three, given by: QPlogS > À 5.7, QPPCaco > 22 nm/s, No. of Primary Metabolites < 7. h Predicted IC 50 value for blockage of HERG K þ channel: concern below -5.i Predicted brain/blood partition coefficient: -3.0-1.2.j Predicted apparent MDCK cell permeability in nm/sec: <25 poor, >500 great.k Predicted central nervous system activity: -2 (inactive), þ2 (active).l Prediction of binding to human serum albumin: -1.5-1.5.
effectively prevent enzymatic cycling as well.Moreover, the slow turnover would in turn influence the DNA-dependent metabolic processes thus affecting the proliferation rate of the cells.In this regard, five NP hits-longumoside B, penicacid C, scutellarin, linocaffein and caffeoylmalic acid, were selected for subsequent analysis.
The XP docking scores, binding free energy results and binding mode analysis indicated salvicine having the lowest binding affinity and forming the least number of H-bonds, thus making the NPs better candidates.100 ns MD simulations of the five NP and salvicine docked complexes were carried out for gaining insights into the stability and interaction pattern exhibited by them within the NBD.MD trajectory analysis confirmed that salvicine was indeed the weakest in forming stable bonds in the NBD.In the case of interactions as well as overall structural parameters during the simulations, longumoside B and scutellarin fared best among the NP hits.
ADMET analysis of all six small molecules revealed that salvicine had higher potency of oral bioavailability and absorption through the GI tract.But it was also predicted to be a CYP2C19 inhibitor, which was undesirable behaviour in GI-absorbent leads.On the other hand, NPs often fall beyond (Lipinski's) rule of five space (bRO5) and break conventional rules of bioavailability (Doak et al., 2014).Despite this, it was observed that longumoside B and penicacid C, the more desirable NPs, fulfilled RO5 criteria.Salvicine was also predicted to have possible CNS activity and was predicted as BBB permeant, which are undesirable features in non-neurologic drug leads.Such qualities were however absent in the NPs.Most importantly, salvicine possessed unsafe and unwanted substructures, indicated by the PAINS and Brenk filters, thus making it a less-than-satisfactory drug precursor.
Once again longumoside B and penicacid C surpassed this quality of salvicine and turned out to be free from promiscuous fragments.Longumoside B also qualified all toxicological endpoints with high confidence scores, closely followed by penicacid C. Most tools predicting drug-likeness and leadlikeness prioritise orally bioavailable drugs.However, such small molecules are not always desirable owing to the other complications arising from small size, as exhibited by salvicine.NPs, such as the screened NPASS molecules, though unappealing from an oral bioavailability perspective are nonetheless more stable binders as evidenced by the virtual screening and MD simulation results.In recent times, the bRO5 space has gained prominence since an unexplored space of diverse NPs is found there.The bioavailability of such bRO5 NPs can be enhanced by improving formulations, using targeted delivery mechanisms or using other modes of administration.Though such small molecules bring along many complexities, it is worth exploring such a chemical space due to the increasing requirement for novel therapeutic precursor molecules (Caron et al., 2021;Doak et al., 2014).
At the end of this study, the multilevel analysis of the various characteristics of the chosen natural inhibitors revealed that longumoside B and penicacid C proved to be the best hits against the NBD of hTopIIa.Both these NPs showed an easy to moderate range of synthetic accessibility, which is suitable for any eventual downstream clinical applications.However, in terms of natural occurrence, penicacid C is currently only found in a rare species of fungi available in the South China Sea.Whereas, longumoside B is readily available for isolation from an abundantly found and commonly cultivated plant, Piper longum, making it a more suitable drug development candidate.Therefore, we conclude that longumoside B is a novel CI candidate for the ATPase domain of hTopIIa.We suggest further in vitro validation to be carried out for understanding the putative therapeutic role of longumoside B as an anticancer precursor molecule targeting hTopIIa.

Conclusion
hTopIIa is a well-known, primary anticancer target for which several inhibitors have already been and continue to be developed for the sake of improving therapeutic and prognostic outcomes.Some of the widely used inhibitors of hTopIIa have NP origins, for example, etoposide and teniposide are plant-derived podophyllotoxin analogues whereas daunorubicin isolated from Streptomyces species and its derivative doxorubicin are anthracyclines.Due to this reason, a natural product database was screened in this study for the identification of novel NP precursors that will serve as starting points for developing new catalytic inhibitors of hTopIIa.Following multilevel prioritisation of HTVS results and MD simulations, Longumoside B, a fatty acyl glycoside found in Piper longum, was selected as a novel NP catalytic inhibitor of the ATPase domain of hTopIIa.Experimental verification of this study outcome is necessary to support results observed in silico.This would corroborate the potential of Longumoside B as a prospective inhibitor of hTopIIa and also pave the way for the development of a new generation of NP-derived anticancer agents.

Figure 2 .
Figure 2. Schematic representation of study methodology with details of software and tools used.

Table 1 .
Characteristic features of the reference inhibitor salvicine and five chosen NP ligands.

Table 2 .
Virtual screening and MM/GBSA results along with residue interactions identified in the six docked complexes using the PLIP server.

Table 3 .
Physicochemical and ADMET properties of salvicine and five NP ligands computed by Qikprop.

Table 4 .
ADMET properties of salvicine and five NP ligands computed by SwissADME.Colour coding of predicted properties is given as green for inactive, red for active, light colours for low confidence score and dark colours for high confidence score.
a Based on GHS categories for the median lethal dose.b Organ toxicity.c Toxicity endpoints.d Tox21-Nuclear receptor signalling pathways.e Tox21-Stress response pathways.