Computational predictions of the site of metabolism of cytochrome P450 2D6 substrates: comparative analysis, molecular docking, bioactivation and toxicological implications.

Abstract Cytochrome P450 2D6 (CYP2D6) is a polymorphic enzyme responsible for metabolizing approximately 25% of all drugs. CYP2D6 is highly expressed in the brain and plays a role as the major CYP in the metabolism of numerous brain-penetrant drugs, including antipsychotics and antidepressants. CYP2D6 activity and inhibition have been associated with numerous undesirable effects in patients, such as bioactivation, drug-associated suicidality and prolongation of the QTc interval. Several in silico tools have been developed in recent years to assist safety assessment scientists in predicting the structural identity of CYP2D6-derived metabolites. The first goal of this study was to perform a comparative evaluation on the ability of four commonly used in silico tools (MetaSite, StarDrop, SMARTCyp and RS-WebPredictor) to correctly predict the CYP2D6-derived site of metabolism (SOM) for 141 compounds, including 10 derived from the Genentech small molecule library. The second goal was to evaluate if a bioactivation prediction model, based on an indicator of chemical reactivity (ELUMO–EHOMO) and electrostatic potential, could correctly predict five representative compounds known to be bioactivated by CYP2D6. Such a model would be of great utility in safety assessment since unforeseen toxicities of CYP2D6 substrates may in part be due to bioactivation mechanisms. The third and final goal was to investigate whether molecular docking, using the crystal structure of human CYP2D6, had the potential to compliment or improve the results obtained from the four SOM in silico programs.


Introduction
The cytochrome P450 (CYP) family is the major phase I metabolizing group of heme-containing enzymes, accounting for the mixed-function oxidation, reduction and dealkylation of more than 90% of pharmaceutical compounds into more hydrophilic metabolites. The six major CYPs responsible for metabolism of drugs in human liver are: CYP1A1/2, CYP2C9, CYP2C19, CYP2D6, CYP2E1 and CYP3A4 (Uttamsingh et al., 2005). Although CYP2D6 accounts for only a small percentage of all human hepatic CYPs (52%; approximately 13 pmol/mg microsomal protein) (Madani et al., 1999;Shimada et al., 1994;Zuber et al., 2002), it metabolizes about 25% of all currently marketed drugs (Stringer et al., 2009), and 41% of the 200 most-prescribed drugs in the USA (Yury & Roland, 2014).
The human CYP2D locus is located on chromosome 22q13 and consists of nine exons with an open-reading frame of 1491 base pairs. The 479 amino acid-containing CYP2D6 structure has a well-defined active-site cavity located above the heme group with a volume of $540 Å 3 , which is much smaller than that of CYP3A4 ($1385 Å 3 ) (Wang et al., 2009). While the larger active site of CYP3A4 permits access to a wide array of substrates, including relatively large small molecules like cyclosporine and erythromycin (molar masses of 1202.61 and 733.93 g/mol, respectively), the smaller active site of CYP2D6 is generally more restrictive, which contributes in part to its high degree of substrate specificity. Noteworthy exceptions to the restrictive metabolism of CYP2D6 are the free maytansinoid species N 20 -deacetyl-N 20 -(3-mercapto-1-oxopropyl)-maytansine (DM1) and (S)-methyl-DM1 which have molar masses 4700 g/mol and are components of several antibody drug conjugates (ADC) oncology drugs (Davis et al., 2012). The crystal structure of CYP2D6 reveals two negatively charged amino acids in the upper part of the binding cavity (Glu216 and Asp301), which facilitate binding of the positively charged regions of substrates (Rowland et al., 2006). A key phenylalanine, Phe120, controls the orientation of the aromatic ring found in most CYP2D6 substrates with respect to the heme (Wang et al., 2009). Typical CYP2D6 substrates are lipophilic bases with a planar, hydrophobic aromatic rings and a nitrogen atom which can be protonated at physiological pH, but several atypical substrates, such as spirosulfonamide and pactimibe, do not contain a basic nitrogen atom (Zhou et al., 2009).

CYP2D6 in the brain
CYP2D6 is involved in the biotransformation of many compounds which act predominantly in the central nervous system (CNS), including antidepressants, antipsychotics and neurotoxicants. It is estimated that 31% of currently approved CNS-acting drugs are primarily metabolized by CYP2D6 (Wishart et al., 2008). CYP2D6 is the main metabolizing enzyme for a wide variety of important endogenous substances in the brain, such as: 5-methoxytryptamine (melatonin and serotonin precursor) (Yu et al., 2003a), p-and m-tyramine (dopamine precursor) (Bromek et al., 2010), codeine (morphine precursor) (Poeaknapo et al., 2004), progesterone (Hiroi et al., 2001), the GABA-modulator allopregnanolone (Niwa et al., 2008), octopamine (norepinephrine precursor) (Funae et al., 2003) and the cannabinoid neurotransmitter anandamide (Snider et al., 2008). CYP2D6 participates in the biotransformation of many endogenous steroids, including progesterone. Progesterone is a neurosteroid which reduces brain excitability and elicits sedative and anticonvulsant effects. It is hydroxylated by CYP2D6 in brain, leading to the formation of four metabolites which are involved in the formation of other steroids: 2b-OH, 6b-OH, 16a-OH and 21-OH (Hiroi et al., 2001). Progesterone is involved in many important functions in the brain including increasing expression of myelin-specific proteins in oligodendrocytes and enhancing g-amino butyric acid-induced chloride channels (Djebaili et al., 2005).
CYP2D6 also plays an important neuro-protective role in the detoxification of several toxicants in the brain including 1,2,3-tetrahydroisoquinolines (Suzuki et al., 1992) and b-carbolines (Yu et al., 2003c). In light of this, the expression of CYP2D6 in brain has been an area of intense research over the past decade. CYP2D6 (mRNA and protein) has been found to be widely expressed in brain, including pyramidal cells of the cortex and hippocampus, the amygdala, the Purkinje cells of the cerebellum, cerebellar vermis, layers III and V of the neocortex and substantia nigra, which has the highest CYP2D6 activity in the human brain (Bromek et al., 2010;Siegle et al., 2001). The CYP2D6 expression level in human brain varies greatly throughout the life, being very low at birth and gradually increasing with age until it reaches its highest peak in adults over age 65 (Mann et al., 2012). This in part may account for the reduced response to several CNSacting drugs, such as the CYP2D6-substrate desipramine, in older adults (Nelson et al., 1995) and an increased sensitivity in infants to the CYP2D6-substrate codeine (Voronov et al., 2007). CYP2D6 mRNA and protein have been shown to be present in the substania nigra and cerebellum of the human brain at levels exceeding those observed in liver (Siegle et al., 2001). The regional expression and activity of CYP2D6 has been shown to be significantly increased in brain, but not in liver, in the presence of certain brain-penetrant agents, such as nicotine and ethanol in monkeys (Miksys & Tyndale, 2006;Miller et al., 2014;Yue et al., 2008); suggesting it is possible that CYP2D6-derived metabolites of drugs could be uniquely present, or formed in greater amounts, in brain compared to liver. Furthermore these findings suggest that smokers may respond differently than non-smokers to a CNS-acting drug or neurotoxicant that is a CYP2D6 substrate, even with similar plasma levels between the two groups. Organ-specific metabolite profiles are of particular interest to toxicologists due to the potential of unforeseen drug-associated toxicities, and therefore a thorough evaluation of CYP2D6-mediated metabolism during the non-clinical phase of drug discovery is advisable, especially when a drug has physicochemical properties that are consistent with brain-penetrant compounds: molecular weight (5250), cLogP (0-3), hydrogen bond donors ( 3), hydrogen bond acceptors ( 6), number of rotatable bonds ( 5), water solubility (450 mg/mL), high membrane permeability (Pm), low plasma protein binding and low polar surface area (PSA) ( 70); (Nau et al., 2010).
Drug-induced inhibition of CYP2D6 in the brain has been shown to cause individuals to become more susceptible to the effects of neurotoxicants, such as pesticides, solvents and 1-methyl-4-phenyl-1,2,3,6-tetrahydro-pyridine (MPTP). In addition, high, low and a lack of CYP2D6 activity in the brain has been associated with susceptibility to numerous neurological disorders including neuroleptic malignant syndrome (Kato et al., 2005;Kawanishi et al., 1998;Zivkovic et al., 2010), Parkinson's disease (Armstrong et al., 1992;Bon et al., 1999;Singh et al., 2010;Tsuneoka et al., 1993) and Lewy body disease (Kosel et al., 1996;Woo et al., 1999), suggesting an important role of CYP2D6 in brain health and disease.

Drug-associated suicidality
The role of brain-penetrant compounds in drug-associated suicidality (suicidal ideation and behavior) is an area of active research. In the past decade, two major classes of medications that are extensively metabolized by CYP2D6, antidepressants and anticonvulsants, have been associated with an increased risk of suicidality in safety analyses conducted by the FDA (US FDA, 2009). Several drugs that are extensively or exclusively metabolized by CYP2D6 in the brain, such as codeine, risperidone and duloxetine, carry black box warnings for suicidality risk. Numerous studies have demonstrated a clear association between drug-associated suicidality and CYP2D6 activity. Suicidal behavior and substance abuse are frequent phenomena observed among patients with mental diseases, such as depression and schizophrenia, and may be attributable in part to antipsychotic treatment failure. Individuals who carry functional variants of CYP2D6 are at an increased risk of toxic accumulation or rapid elimination of these drugs, leading to treatment failure (Kawanishi et al., 2002). A cohort study involving 716 patients in Sweden demonstrated a statistically significant relationship between patients who died of drug-associated suicide and CYP2D6 activity (Zackrisson et al., 2010). The study found that suicide victims were statistically more likely to have multiple copies of CYP2D6 (so called ultra-metabolizers) compared to those who died of natural death (p ¼ 0.007), and consequently suffered from either insufficient drug plasma concentration of their antidepressant medications (resulting in failure of treatment) or had particularly high concentrations of bioactive/toxic metabolites. Similar findings were described in a study involving 285 patients experiencing depressive episodes in a psychiatric hospital in Germany (Stingl & Viviani, 2011). Patients who fail to respond to antidepressant drugs which are substrates of CYP2D6 have been shown to have a frequency of CYP2D6 gene duplication that is up to 10-fold higher than healthy patients. Furthermore, a relationship between CYP2D6 polymorphisms and early discontinuation of antidepressants has also been observed (i.e. 73-100% typically drop-out by Day 28 of treatment, especially patients exhibiting the CYP2D6*2 allelic variant, compared to 10% drop-out for patients with the normal allelic variant, CYP2D6*1), which may increase the risk of drug-associated suicidality due to a lack of effective antidepressant therapy (Penas-Lledo et al., 2013). Screening for CYP2D6 polymorphisms prior to the dosing patients have been seen to increase the therapeutic benefit of psychiatric drugs (Chen et al., 1996).
CYP2D6 plays an important role in the biosynthesis and maintenance of serotonin (Yu et al., 2003b(Yu et al., , 2004, endogenous morphine (Poeaknapo et al., 2004), dopamine (Bromek et al., 2010) and anandamide (Snider et al., 2008) pools in the brain, and consequently its inhibition has been linked to depression. CYP2D6 plays an important role in the biosynthesis of serotonin and its major metabolite 5-hydroxyindoleacetic acid (5-HIAA) by O-demethylating 5-methoxytryptamine (5-MT). Investigations have shown that the concentration of 5-HIAA is significantly decreased in the cerebrospinal fluid of suicide victims at autopsy, suggesting roles for CYP2D6 and 5-MT in suicidal behavior (Cooper et al., 1992). Furthermore, several studies have found an association between low and high CYP2D6 activity and adverse personality disorders, such as anxiety, antisocial personality disorder, phobias, impulsivity and aggression (Bertilsson et al., 1989;Gonzalez et al., 2008;Llerena et al., 1989Llerena et al., , 1993Penas-Lledo et al., 2012), all of which are considered contributing factors for suicide risk (CDC, 1999).
A general lack of understanding of the key role of CYP2D6 in depression and drug-associated suicidality in the clinical setting is evident from a recent multi-center, observational study of severely depressed patients through a retrospective review of prescribed antidepressants in Spain. Of the patients analyzed (n ¼ 5630), 9% (n ¼ 507) were prescribed a combination of two antidepressants, one of which was a substrate of CYP2D6, while the other was a potent inhibitor of CYP2D6 (substrate-inhibitor group). These patients responded poorly to treatment and had an increased co-morbidity rate and suicide risk compared to patients prescribed a single antidepressant. Furthermore, 15.4% of patients (n ¼ 867) were prescribed two antidepressants that were both substrates of CYP2D6 (substrate-substrate group). Due to the saturable kinetics of CYP2D6, patients in this group experienced longer residence time of both drugs compared to patients taking one of these antidepressants alone, which contributed in part to increased dementia, neuropathy and stroke (Sicras-Mainar et al., 2014).

Polymorphisms
Due to the high degree of polymorphism in the human population that influences the expression and activity of CYP2D6, CYP2D6 inhibition is a potential show-stopper in drug development, and therefore screening for timedependent inhibition (TDI) of CYP2D6 is performed routinely by pharmaceutical companies. Polymorphic variants typically affect pharmacokinetic parameters, the extent of metabolism and metabolite profile in patients, and also influence the potential for drug-drug interactions. The importance of early investigation on the role of CYP2D6 polymorphisms in drug metabolism is underscored by the issuing of several guidelines from regulatory agencies on pharmacogenetic methodologies, such as the European Medicines Agency's guideline (EMA, 2011). The CYP2D6 gene cluster on chromosome 22 comprises the functional CYP2D6 and two putative pseudogenes CYP2D7 and CYP2D8 (Heim & Meyer, 1992). Unlike polymorphisms in other cytochromes, polymorphisms in CYP2D6 result not only from single-nucleotide polymorphisms but also from insertion/deletions of nucleotide bases, in addition to whole-gene deletions, duplications and multiplications. An excess of 90 allelic variants of CYP2D6 have been described to date, with the five most frequently occurring in the general population being: CYP2D6*2xn (increased enzyme activity due to gene duplication; $2% of general population), CYP2D6*4 (inactive enzyme due to splicing defect; $20% of Caucasians), CYP2D6*5 (no enzyme expression due to 11.5 kb deletion of CYP2D6; $7% of general population), CYP2D6*10 (unstable enzyme leading to intermediate metabolism; $50% of Asians) and CYP2D6*17 (altered affinity for substrates due to single nucleotide polymorphisms; $35% of Black Africans) (Ingelman-Sundberg, 2005). CYP2D6 ''poor-metabolizers'' (PMs) are often at increased risk of an exaggerated pharmacological response due to high steady-state plasma concentrations, such as the prolonged uterine contractions caused by sparteine (DeVoe et al., 1969) or the prolonged hypotension caused by debrisoquine (Idle et al., 1978); or a toxicological response, such as the hepatotoxic effects of perhexiline (Long et al., 1980). In one case, the half-life of the antidepressant paroxetine was found to be 195 h, instead of the usual 16 h, in a PM patient which was proposed to have contributed, in part, to a suicide attempt (Hilleret et al., 2002). Some drugs, such as tamoxifen, are bioactivated to pharmacologically active metabolites by CYP2D6 (Goetz et al., 2007), and consequently, CYP2D6 PMs derive less therapeutic benefit than CYP2D6 ''ultra-metabolizers'' (UMs) for these compounds. In light of this, clinicians and safety assessment scientists should be aware that the efficacy of CYP2D6 prodrugs is greatly reduced by concomitant therapy with CYP2D6inhibiting drugs, such as hydroxyzine, quinidine and terbinafine. On the other hand, in the case of the analgesic codeine (which is bioactivated to morphine via O-demethylation by CYP2D6), CYP2D6 UMs are at increased risk from morphine toxicity, which may lead to death (Kirchheiner et al., 2007). The US FDA (2013) has issued drug safety communications regarding the important role of CYP2D6 polymorphisms in codeine metabolism. Several studies have demonstrated clear benefits of dose-adjusting doses of brain-penetrant drugs, such as the antidepressant imipramine, depending on the CYP2D6 polymorphism of patients, with PMs typically requiring a much lower dose than UMs (Kirchheiner et al., 2004;Schenk et al., 2007).

Bioactivation
Although pharmaceutical companies routinely screen for CYP2D6 TDI, a thorough assessment of whether a drug is a specific substrate for CYP2D6, particularly in brain tissue, is rarely performed or reported in the literature. Lack of information regarding CYP2D6 substrate-specificity should be of concern to toxicologists due to the established role of CYP2D6 in the bioactivation of a wide-array of structurally diverse compounds (Table 1). An example of CYP2D6induced bioactivation involves the endogenous compound, 5methoxy-N,N-dimethyltryptamine (5-MDMT), which is bioactivated, via O-demethylation by CYP2D6 in the pineal gland to the potent hallucinogen, bufotenine. Bufotenine is significantly elevated in brains of schizophrenic patients and is hypothesized to play a role in the hallucinations and psychotic episodes experienced by these patients (Emanuele et al., 2010). Some possible reasons to explain the lack of routine screening for CYP2D6 substrate specificity include: (1) a specific method is must be developed to determine the product of the enzymatic reaction, which can be a time-consuming process, (2) a highly purified enzyme preparation and test molecule are required, (3) the procedure is time-consuming, (4) brain CYP2D6 activity is very sensitive to freezing ($40-80% activity is lost when stored at À80 C for 7 days; Tyndale et al., 1999) and (5) there is often an unawareness of the unique role played by CYP2D6 in metabolism, especially for brain-penetrant drugs.

Immunomodulatory effects
CYP2D6 is expressed in polymorphonuclear and mononuclear white blood cells, both of which play an important role in the immune response. CYP2D6 plays important roles in the biosynthesis of morphine in these cells: first, CYP2D6 hydroxylates tyramine to form dopamine, which combines with 3,4-dihydroxyphenylacetaldehyde to ultimately form codeine from neopinone (Zhu et al., 2005). Second, CYP2D6 bioactivates codeine, via O-demethylation, to form morphine (Stefano et al., 2008). Morphine is present constitutively in human plasma ($80 pg/ml) (Liu et al., 1997) and its levels significantly increase during trauma (Brix-Christensen et al., 1997). Human monocytes and granulocytes express a l opiate receptor subtype, l3, which is morphine selective but opiate peptide insensitive (Cadet et al., 2003). The role of endogenous morphine in white blood cells is believed to be the l3-mediated down-regulation of inflammatory cytokines and inhibition of T-lymphocyte multiplication via the MAP kinase signaling pathway, which leads to local immunosuppression in order to limit the inflammatory immune response (Chuang et al., 1997;Mellon & Bayer, 1998;Stefano, 1998;Stefano et al., 1996;Szabo, 1995). Based on the role of CYP2D6 in the biosynthesis of morphine in white blood cells, it is hypothesized that druginduced inhibition of the enzyme would prevent the synthesis of endogenous morphine, which may lead to increased production of inflammatory cytokines and consequently an increased inflammatory immune response. Based on the role of CYP2D6 in drug-associated suicidality, bioactivation, prolongation of the QTc interval and immunomodulatory responses, it is critical that drug developers have a clear understanding of the identity and potential off-target toxicities of CYP2D6-derived metabolites, as well as an awareness of the consequences of drug-induced inhibition of the enzyme. One approach to aid in the assessment of whether a compound is likely to be a CYP2D6 substrate is to use one of the numerous free or commercially available in silico metabolism prediction tools. In addition to being able to predict whether a compound is a CYP2D6 substrate, certain in silico tools can also predict the CYP2D6 site of metabolism (SOM) of molecules. SOM data are very useful as they enable chemical toxicologists to assess if a test molecule presents a risk for xenobiotic bioactivation, which may lead to the formation of toxic metabolites. In addition, since SOM data identifies the toxicophore regions of molecules, it is often possible for medicinal chemists to make structural modifications to prevent bioactivation.
The first goal of this investigation was to perform a comparative evaluation of the predictions provided by four commonly used in silico programs (MetaSite, StarDrop, SMARTCyp and RC-WebPredictor) for the SOM of CYP2D6-mediated metabolism for 141 compounds (121 known CYP2D6 substrates and 10 control compounds derived from literature sources, as well as 10 CYP2D6 substrates derived from the Genentech small molecule library). Predictions from each of the in silico programs are based on data derived from studies involving human CYP2D6 since rare, but important, species differences have been reported; for instance quinidine inhibits CYP2D6 in human, dog and monkey, but not in rat or mouse. In contrast, the stereoisomer of quinidine, quinine, has an inhibitory effect towards CYP2D6 in rat, dog and monkey, but not in human (Bogaards et al., 2000).
The second goal was to evaluate if a bioactivation prediction model, based on an indicator of chemical reactivity (E LUMO -E HOMO ), could predict if the biotransformation products of five known CYP2D6 substrates are bioactivated metabolites. There is a great need for such a model in drug metabolism and discovery toxicology since unforeseen toxicities of CYP2D6 substrates may in part be due to bioactivation mechanisms.
The third goal was to investigate whether molecular docking, using the crystal structure of human CYP2D6, had the potential to compliment or improve the results obtained from the four SOM programs. Molecular docking simulations offer a unique and rapid approach to the prediction of SOM of CYP2D6 substrates.

Test compounds
One hundred and twenty-one structurally diverse CYP2D6 substrates were randomly selected from drug metabolism, toxicology and drug safety literature, and from the ChEMBL, SciFinder and Pharmapendium databases ( Table 3). The substrates can be broadly assigned to five main groups based on pharmacological activity: antidepressants, antipsychotics, analgesics, antiarrhythmic agents and pesticides. Emphasis was placed on compounds that are known specific substrates for CYP2D6. In the case of CYP2D6 playing a multi-role in a metabolism pathway of a test compounds, its role in the direct-metabolism of the parent, rather than further metabolism of a metabolite, was considered in the assignment of the major metabolite. For instance, in the case of tamoxifen, CYP2D6 can directly metabolize the parent molecule (to 4-hydroxytamoxifen) and metabolize the CYP3A4-mediated metabolite, N-desmethyltamoxifen (to endoxifen) (Borges et al., 2006). In such a case 4-hydroxytamoxifen is considered the major CYP2D6-mediated metabolite of tamoxifen since the biotransformation involves direct metabolism of the parent rather than a metabolite. It is possible that the test compounds were part of the training set of the in silico programs but we were not aware of it due to the nondisclosure policy of the software developers.
Ten additional proprietary compounds (GNE1-10), derived from the Genentech small molecule library were included in this investigation for three reasons: (1) they were structurally and functionally diverse compounds, (2) they were not found in any of the training sets and (3) they have experimentally verified SOMs. Seven of the 10 compounds were CYP2D6 substrates, while the remaining three (which are CYP3A4 substrates) were used as negative control compounds. All compounds were not CYP2D6 inhibitors, further ensuring they do not bind within the CYP2D6 active site. While the structures of the molecules are not disclosed, molecular weights (MW) and cLogP values (calculated using ChemDraw vers. 12, Waltham, MA) are provided ( Table 3). The major CYP2D6-mediated metabolites of the proprietary molecules involved either oxidation or N-dealkylation of the parent molecule.

Control compounds
Ten compounds derived from literature sources were used as controls to evaluate if the CYP2D6 in silico programs correctly selected the CYP2D6-derived metabolite as compared to the metabolite formed by another CYP isoform (Supplemental Table 2). Only 10 compounds were used in the control group due to the lack of literature data which explicitly described the quantitative and qualitative comparative metabolism between CYP2D6 and another CYP. Since each of the four programs predict CYP2D6-derived SOMs, even if experimental data confirms that a compound is not a CYP2D6 substrate (at least within the limits of detection of existing analytical methods), it was necessary to choose control compounds which were metabolized by CYP2D6 and another CYP isoform. This resulted in the formation of two distinct metabolites. To pressure test the programs, priority was given to metabolic pathways where the CYP2D6-derived metabolite was relatively minor compared to the metabolite formed by another CYP isoform. This approach increased the likelihood that the SOM for the alternative CYP would be incorrectly predicted as the SOM for CYP2D6. Care was also taken to ensure that no reports describing a role of CYP2D6 in the formation of the major metabolite were found in the literature, and that the control compounds were not contained in the test set.

Scoring of predictions for CYP2D6 substrates and control compounds
Only the top three SOM predictions from each program were considered when determining the correct predictions for each molecule. A score of 3 (S3) was awarded when an in silico program correctly predicted the SOM that would give rise to the major CYP2D6-mediated metabolite as its number one prediction; a score of 2 (S2) signified that an in silico program predicted the SOM that would give rise to the major CYP2D6-mediated metabolite as its number two prediction, and a score of 1 (S1) meant that the major CYP2D6-mediated metabolite was predicted as the third most likely metabolite to be formed. A score of 0 (S0) corresponded to a failing of an in silico program to correctly predict the SOM for the major CYP2D6-mediated metabolite as one of its top three predictions.
For the control compounds, a binary score (0 or 1) was used. If a program predicted the SOM for the CYP2D6derived metabolite at a higher score than the SOM for the alternative, more dominant CYP, then it received a score of 1 for that substrate. Otherwise, the algorithm was assigned a value of 0.
Three negative control compounds from the Genentech internal library were also used. These compounds were incubated with CYP2D6 (as described in the in vitro assay procedure below). No metabolites were formed for any of these compounds following incubation with recombinant human CYP2D6. Subsequently, in a follow-up investigation, it was determined that these compounds were all CYP3A4 substrates. A binary score (0 or 1) was used to score these compounds. A score of 0 was awarded if a SOM was predicted for CYP2D6 (since it should not have been), and a score of 1 signified that a SOM for CYP2D6 was not predicted.

Statistics
A comparison of the in silico programs for the 121 test structures was performed using Friedman's Test (Hollander & Wolfe, 1973), a non-parametric statistical test used to determine differences in outcome across multiple test attempts. Wilcoxon Signed Rank tests (Hollander & Wolfe, Uses protein structural information, GRID-derived MIFs of protein and ligand, as well as molecular orbital calculations to estimate the likelihood of a metabolic reaction at a certain atom position (Cruciani et al., 2005).

StarDrop Optibrium
Combines quantum chemical analysis and a ligand-based model of CYP substrates to highlight potential SOMs. Takes into account calculated log P values (Optibrium, 2014).
Yes RS-WebPredictor RECCR a Utilizes a set of 148 topological and 392 quantum chemical atom-specific descriptors in combination with a SVM-like ranking and a multiple instance learning method to identify potential SOMs (Zaretzki et al., 2013).

SMARTCyp
University of Copenhagen Utilizes a set of pre-calculated DFT activation energies in combination with topological accessibility descriptors for prognosis of potential SOMs (Liu et al., 2012).  Analgesic N-acetyl-p-benzoquinone imine (NABQI) (Chen et al., 1998;Dong et al., 2000;Zhou et al., 1997)  Hydroxylation, C4-(aromatic ring) (Bach et al., 1999(Bach et al., , 2000 3 3 1 2 + Aprindine Antiarrhythmic 5-Hydroxyaprindine (major) (Ebner & Eichelbaum, 1993;Murphy, 1974;Shimizu et al., 1998)    1973) were used to perform the six post-hoc pairwise comparisons to identify any pairwise differences between groups and a Bonferroni multiple comparison adjustment (Shaffer, 1995) was used to control the overall alpha level at 0.05. Non-parametric bootstrap confidence intervals were also calculated for the proportion of compounds predicted at a specific score as well as the proportion of compounds deemed ''correct'' (as described in the ''Statistical evaluation of the in silico SOM programs'' section). A comparison of the in silico programs for the 10 control structures was also performed using Cochran's Q test which tests for differences between several treatments when the outcome is binary. All the methods were performed using the statistical software R (R, 2014). were pre-incubated for 5 min at 37 C. Reactions were initiated by the addition of compound (1 mM final concentration). The incubation volume was 400 mL and the final solvent concentration was 0.1% DMSO. At eight time-points (0, 2, 5, 10, 15, 30, 45 and 60 min), 40 mL of incubation solution was removed and quenched in 80 mL of 0.1% formic acid in acetonitrile, containing an internal standard Samples were centrifuged (2000 Â g for 10 min) and supernatants (60 mL) were removed then diluted with 0.1% formic acid in water (120 mL). Samples were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) for the disappearance of parent compound (M) and formation of M + 16 (oxidative) and M À 14 (demethylated) metabolites.

In vitro CYP2D6 metabolite identification (SOM determination)
Of the 24 internal compounds included in the reaction phenotyping study, seven internal compounds were identified as CYP2D6 substrates. Metabolite identification studies were performed in rCYP2D6 for these compounds to identify SOM mediated by CYP2D6. rCYP2D6 (85 pmol/mL) and NADPH (1 mM) in KPi (100 mM; pH 7.4) were pre-incubated for 5 min at 37 C. Reactions (1 mL) were initiated by addition of internal compound (10 mM final concentration). Two incubations for each compound were performed; each terminated by the addition of 1 mL ACN at either 0 or 60 min. Samples were centrifuged (2000 Â g for 10 min), supernatants (1.5 mL) were removed, blown to dryness with nitrogen gas, and reconstituted in 300 mL 20:80 ACN:H 2 O. Samples were analyzed by LC-UV-MS/MS.

Chromatographic and mass spectromic methods
LC-UV-MS/MS analysis was conducted using an AB Sciex Triple Quad 5500 coupled to an Ultra High Pressure Liquid Chromatography (UHPLC) pump equipped with a 1290 Infinity UV diode array detector from Agilent Technologies (Santa Clara, CA) and a CTC PAL autosampler from LEAP Technologies (Carrboro, NC). Positive ion mode Q1 MS with mass range of 300-500 Da was used to detect ions of the parent compound and oxidative metabolites, primarily +16 and À14 amu metabolites. The UV detector was operated in spectral mode, with a range of 250-280 nm. The HPLC column was a Thermo Hypersil Gold C18 (100 Â 2.1 mm, 1.9 mM, Thermo Fisher Scientific) with mobile phases consisting of solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in acetonitrile). The flow rate was 0.55 mL/min and the injection volume was 20 mL. The gradient started with 1% solvent B for 0.4 min, ramped up to 40% solvent B in 2.3 min, ramped up to 95% solvent B in 0.67 min and held for 0.5 min, and then stepped down to initial conditions of 1% solvent B and held for 1 min to equilibrate the column prior to the next injection.

Molecular docking
Four human CYP2D6 crystal structures obtained from the RCSB website (PBD: 2F9Q, 3QM4, 3TDA and 3TGB) were co-aligned along their heme moieties using the structural alignment function, MOE-Align, of MOE, version. 2013.08 according to the software developer's instructions. Root-mean square deviations (RMSD) relative to 2F9Q were all51 Å indicating high structural homology between each crystal structure. Each of the 121 test compounds, along with the 10 compounds from the Genentech internal small molecule library, were docked to the resulting co-aligned CYP2D6 model to determine if molecular docking could compliment and improve the predictivity of the in silico programs. Each ligand was positioned into the active site of CYP2D6 using PyMOL Molecular Graphics System, version 1.6 (Schrödinger, LLC, New York, NY). Force field parameters and electrostatics for each resulting ligand-receptor complex were obtained from the CHARMM 27 biomolecular force field (MacKerell et al., 2000). Severe interatomic clashes were alleviated in NAMD (Humphrey et al., 1996) via 100 steps of molecular mechanics optimization. Further structural relaxation was effected in NAMD via 1000 steps of molecular dynamics simulation at 298 K via a constant temperature and pressure ensemble. Electrostatic energies were calculated using Gaussian 09 (Gaussian, Inc., Wallingford, CT). The most favorable docking results were further optimized by molecular dynamics (MD) simulations using Gromacs, version 5.0 using the Gromacs96 (43a2) force fields. Force field parameters and charges for the 41 molecules were generated by PRODRG (Schuttelkopf & van Aalten, 2004). Each simulation was subjected to a steepest descent energy minimization until a tolerance of 100 kJ/mol. The solvent molecules were equilibrated with fixed protein at 310 K for several picoseconds. The models were relaxed gradually and heated up to 310 K. Finally, 600 ps MD simulations were performed under normal temperature and normal pressure with coupling time constant of 0.1 ps and 1.0 ps. The isothermal compressibility was set to 4.5 Â 10 À5 /bar for water simulations. Each MD simulation was performed with a time step of 2 fs, and coordinates were saved every 1 ps. To assess the key amino acids involved in the interaction of CYP2D6 substrates, ligand interaction diagrams for the most favorable docking poses were constructed for each ligand using MOE, version 2013.08.

Scoring of molecular docking simulations
Molecular docking simulations for each test molecule were scored as follows: a ''+'' was earned when the molecular docking simulation correctly predicted the major CYP2D6derived SOM, as supported by experimental data from literature sources, and a ''À'' was earned if any site, other than the major SOM, was predicted. If a positive free energy of binding for a ligand (i.e. + kcal/mol) was generated by the molecular docking simulations then docking of that ligand to CYP2D6 was considered highly unlikely to occur (or negative).

Prediction of bioactivation
A modified procedure, based on a method to predict bioactivation described elsewhere, was used (Ford, 2013). Briefly, two parameters were calculated for five test compounds, and their major CYP2D6-mediated metabolites: (i) electrostatic potential (ESP), for the identification of metabolic ''hot spots'' and (ii) E LUMO ÀE HOMO , also known as the band-gap, as a measure of chemical reactivity. A decrease in the band-gap of the metabolite compared to its parent is suggestive of bioactivation. Geometries of the compounds were fully optimized using density functional theory (DFT) with Becke's three-parameter hybrid exchange function and the Lee-Yang-Parr correlation function (B3LYP) in combination with the 6-31+G(d) basis set using Gaussian '09. Energies of the lowest unoccupied molecular orbital (E LUMO ) and highest occupied molecular orbitals (E HOMO ) were subsequently calculated using these settings. Electrostatic potential maps were constructed using Spartan '10. The bandgap values and ESP maps for the metabolites were compared to those of parent molecules. A weight-of-evidence approach was used to predict if the metabolites presented a risk of bioactivation relative to the parent molecules. Scores were assigned as follows: if the band-gap value was lower and the ESP map showed the appearance of electrophilic or nucleophilic hot-spots then a high risk of bioactivation was predicted. If only one of the aforementioned situations were present then a medium risk of bioactivation was assigned; and if the band-gap for the metabolite was higher than the value for the parent and the ESP map did not show the appearance of new metabolic hotspots when a prediction of low risk of bioactivation was assigned. Five molecules known to be bioactivated by CYP2D6 with important toxicological significance were chosen to evaluate the ability of the model to correctly predict bioactivation: (i) MPTP (and its metabolite MPP+), (ii) acetaminophen (and its metabolite N-acetyl-pbenzoquinoneimine, NABQI), (iii) chlorpyrifos (and its metabolite chlorpyrifos-oxon), (iv) trazodone (and its quinone imine metabolite, 2-(3-(4-(3-chloro-4-oxocyclohexa-2,5-dien-1-yl)piperazin-1-yl)propyl)-[1,2,4]-triazolo[4,3-a]pyridin-3(2H)-one) and (v) clozapine (and its nitrenium ion metabolite). The structures of the five parent compounds, and their metabolites of interest, are shown in Figure 1.

SOM predictions
In general, the in silico programs can vary widely in their SOM predictions for CYP2D6 substrates, as illustrated by the example of chlorpromazine (Figure 2). MetaSite predicted five CYP2D6 SOMs for chlorpromazine, with the most favored SOM being the carbon attached to the nitrogen of the thiomorpholine ring (C3). StarDrop also predicted five SOMs, but predicted the sulfur in the thioether group of the thiomorpholine ring (S12) to be the most likely SOM for CYP2D6 with 50% probability. SMARTCyp predicted three SOMs, with the 2 N-methyl groups (C5 and C6) being equally likely to be the top SOMs for CYP2D6. Similar to SMARTCyp, RS-WebPredictor also predicted three SOMs, Figure 1. CYP2D6-mediated bioactivation of (i) acetaminophen, (ii) chlorpyrifos, (iii) clozapine, (iv) MPTP and (v) trazodone. but predicted identically to StarDrop for the top SOM (i.e. S12). The high degree of variability amongst the in silico programs suggest that using one single program in isolation might be misleading in predicting the correct SOMs, and it underscores the importance of a comparative analysis of the programs, such as was performed in this investigation.
Considering point-estimates only, MetaSite predicted the correct CYP2D6 SOM as its top prediction more frequently than the other programs (52.4%), followed closely by StarDrop (48.4%), SMARTCyp (43.5%) and finally, RS-WebPredictor (22.6%). When the top first and second predictions were combined, StarDrop performed best (79%), followed jointly by MetaSite and SMARTCyp (75.8%) and finally RS-WebPredictor (58.9%). When the top three prediction sites were considered, StarDrop also had the highest prediction success (96%), followed by MetaSite (91.9%), SMARTCyp (90.3%) and RS-WebPredictor (84.7%; Figure 3 and Table 4). An analysis of the confidence intervals of the prediction success for these programs is described in the ''Statistical evaluation of the in silico SOM programs'' section.
Statistical evaluation of the in silico SOM programs RS-WebPredictor underperformed in predicting the SOM leading to a major CYP2D6-mediated metabolite as compared to the other programs (Figure 3 and Supplemental Table 3). While it appeared that RS-WebPredictor outperformed the other programs based on scores S1 and S2, the 95% nonparametric confidence intervals overlap indicating that the differences were not statistically significant (Supplemental Figure 2A-D). RS-WebPredictor also has the highest number of structures with score S0 (i.e. 19). Friedman's test was performed to ascertain if the results from each program were statistically different from each other over all the scores combined. The test yielded a 2 value of 43.8 on 3 degrees of freedom (p value: 1.634 Â 10 À9 ) indicating that the four methods are in fact statistically different from each other when accounting for all scores. To identify the specific differences between methods, a post-hoc analysis was performed wherein every pair of algorithms was evaluated. A Bonferroni adjustment was then performed to account for the multiple comparisons (Supplemental Table 4). The following algorithms were found to be statistically different from each other when accounting for score: (a) RS-WebPredictor versus MetaSite (p value: 1.73EÀ05), (b) RS-WebPredictor versus SMARTCyp (p value: 9.17EÀ05) and (c) RS-WebPredictor versus StarDrop (p value: 1.83EÀ06). In addition to comparing the methods over all the scores, the methods on a percentage ''correct'' basis for each score were compared. In all, three definitions of ''correct'' were considered: (A) the algorithm predicted the major metabolite as its first prediction (score ¼ 3), (B) the algorithm predicted the major metabolite as its first or second prediction (score ¼ 2 or 3) and (C) the algorithm predicted the major metabolite as its first, second or third prediction (score ¼ 1, 2 or 3). The percentages correct, along with 95% confidence intervals, are shown in Table 4 and Supplemental Figure 3A-C, respectively. Due to RS-WebPredictor's comparatively poorer performance when attaining the highest score, the disparity between it and the other three methods is highest when score ¼ 3 and decreases as more scores are included in the ''correct'' measure.

Control compounds
Each of the in silico programs performed well and with a high degree of specificity at predicting the SOM for CYP2D6 compared to the SOM of the dominant CYP isoform (Supplemental Table 2). MetaSite and StarDrop correctly predicted the CYP2D6 SOM for 9 out of 10 of the control compounds, with the outlier being N-desmethyltamoxifen for both programs. Both of the programs incorrectly predicted the CYP3A4/5 SOM that leads to the formation of N,N-didesmethyl-tamoxifen, as the SOM for CYP2D6. SMARTCyp correctly predicted 7 out of 10 of the control compounds, with the incorrect predictions being MDMA (the CYP3A4-derived N-desmethylation metabolite was predicted), clomipramine (the CYP2C19-derived desmethylclomipramine was predicted) and propafenone (the CYP1A2-derived N-desalkylpropafenone was predicted). RS-WebPredictor correctly predicted 6 out of 10 of the control compounds, with clomipramine (the CYP2C19derived desmethylclomipramine was predicted), E/Z-doxepin (the CYP3A4/5-derived E/Z-desmethyldoxepsin was predicted), N-desmethyltamoxifen (the CYP3A4/5-derived N,Ndidesmethyl-tamoxifen, was predicted) and nevirapine (the CYP3A4/5-derived 2-hydroxynevirapine was predicted) being incorrectly predicted. Cochran's Q test (Cochran, 1950) was used to perform an omnibus test of difference between the four programs for the control compounds. Cochran's Q test resulted in a chi-square statistic of 4.2632 on 3 degrees of freedom corresponding to a p value of 0.2344. A statistically Table 4. Percentage of compounds correctly predicted for (i) score 3 (S3) alone, (ii) combined scores 3 and 2 (S3 + S2) and (iii) combined scores 3, 2 and 1 (S3 + S2 + S1). significant difference between the program rankings was not detected when identifying the CYP2D6-derived metabolite, signifying that each program performed well at correctly identifying the CYP2D6 SOM, even for compounds where CYP2D6 plays a minor role in the metabolic pathway.

Bioactivation
Results from the predictions of bioactivation were all in full agreement with literature sources (Figure 4 and Table 5).
(1) Acetaminophen: the E LUMO À E HOMO values of acetaminophen versus NABQI (5.19 and 3.61 eV) support that NABQI is a bioactivation metabolite of acetaminophen. ESP maps (Figure 4) show the presence of numerous electrophilic sites in NABQI (as indicated by the blue regions; V max is 119.945 kJ/mol, versus 8.1 kJ/mol for acetaminophen), which are prone to nucleophilic attack, indicating that the quinoneimine is more reactive than acetaminophen.
(2) Chlorpyrifos: the E LUMO À E HOMO values of chlorpyrifos versus its oxon metabolite (11.35 and 5.27 eV) support that the oxon is a bioactivation metabolite of chlorpyrifos. ESP maps (Figure 4) show the presence of a reactive site over the oxygen atom (as indicated by the electrophilic red region; V max is À187.6 kJ/ mol versus À103.8 kJ/mol for chlorpyrifos). Desulfuration of the relatively soft electrophilic chlorpyrifos parent to the highly reactive hard electrophilic oxon metabolite indicates that the oxon is more reactive than the parent molecule.
(3) MPTP: the E LUMO À E HOMO values of MPTP versus MPP + (5.04 and 3.98 eV) supporting that MPP + is a bioactivation metabolite of MPTP. ESP maps (Figure 4) show the presence of a reactive site over the phenyl ring (as indicated by the change from a blue region to a red region; V max is 231.6 kJ/mol versus À84.4 kJ/ mol for MPTP) indicating that metabolite is more reactive than the parent molecule. (4) Trazodone: the E LUMO À E HOMO values of trazodone versus its quinone imine metabolite (3.87 and 3.4 eV) support that the quinone imine is a  bioactivation metabolite. ESP maps (Figure 4) clearly show the presence of a reactive site over the oxygen atom (as indicated by the red region; V max is À176.1 kJ/mol versus À49 kJ/mol for chlorpyrifos) indicating that the metabolite is more reactive than the parent molecule. (5) Clozapine: the E LUMO À E HOMO values of clozapine versus its nitrenium ion metabolite (4.05 and 1.95 eV) demonstrate that the nitrenium ion is a bioactivation metabolite. ESP maps (Figure 4) clearly show the presence of a reactive site over the diazepine core (as indicated by the change from a red region to a blue region; V max is 414 kJ/mol versus À103.2 kJ/mol for clozapine) indicating that metabolite is more reactive than the parent molecule.

Molecular docking
Molecular docking results for six representative molecules, chosen to represent a broad chemical space (codeine, desipramine, dextromethorphan, MDMA, thioridazine and tramadol) are shown in Figure 5. In each case, the molecular docking simulations correctly predicted the SOM for these compounds. These results are of particular interest since the four SOM programs incorrectly predicted the CYP2D6derived SOM for five of these molecules, with the exception being dextromethorphan. Dextromethorphan was included since it is a widely used probe drug for human CYP2D6 activity both in vitro and in vivo . For codeine, the O-methyl group is facing the iron of the heme of CYP2D6, which subsequently demethylates this group and gives rise to morphine, in agreement with experimental data ( Figure 5A) (Williams et al., 2002). Carbon-2 of the saturated dibenzazepine group of desipramine correctly faces the heme of CYP2D6, which gives rise to 2-hydroxydesipramine (Rendic, 2002) ( Figure 5B). Results from the molecular docking simulations are described in Table 3. The O-methyl group of dextromethorphan is predicted to face the iron of the heme, which would give rise to the desmethyl metabolite ( Figure 5C). This prediction has been confirmed experimentally (Kerry et al., 1994). MDMA is metabolized to 3,4dihydroxymethamphetamine by CYP2D6 via O-demethylenation of the dioxolane group (Fukuto et al., 1991). The molecular docking simulation correctly predicted the CYP2D6-mediated SOM ( Figure 5D). Thioridazine is metabolized to mesoridazine via sulfoxidation (Wojcikowski et al., 2006). As shown in Figure 5(E), the S-methyl group of thioridazine faces the iron of the heme, which suggests that sulfoxidation is the most likely CYP2D6-mediated metabolic reaction to be observed, in agreement with the literature. The major CYP2D6-derived metabolite of tramadol is formed via O-demethylation (Olah et al., 2011;Paar et al., 1997). The molecular docking simulation correctly places the O-methyl group in close proximity to the iron of the heme, suggesting the formation of a dimethyl metabolite as the major CYP2D6derived metabolite ( Figure 5F). As illustrated in the ligand interaction diagrams (Supplemental Figure 4), four key amino acid residues were identified as being important in the interaction of substrates to the active site of CYP2D6: Phe120, Glu216, Asp301 and Ala305. Phe120 engages in p-p stacking interactions with the substrates (such as with the morphinan group of dextromethorphan), and has been shown to play a key role in the regioselectivity of the enzyme (Flanagan et al., 2004). The p-p stacking interactions distances ranged from 2.29 to 3.88 Å (Supplemental Table 5). Glu216, Asp301 and Ala305 form one or more hydrogen bonds with the substrates, especially the backbone oxygen atoms of Glu216 and Asp301, in agreement with previous investigations (Olah et al., 2011;Paine et al., 2003); this suggests that a correct docking pose was obtained for each ligand. Lengths and angles for the hydrogen bonds formed by the drug molecules and Glu216, Asp301 and Ala305 and interaction energies (kcal/mol) for each of the six ligands are described in Supplemental Tables 5  and 6, respectively.

Discussion
A thorough evaluation of the pharmacokinetics (PK) of a drug candidate is of critical importance in drug development in order to ensure favorable absorption, distribution, metabolism and excretion (ADME) properties. One of the major influencing factors in PK is the role played by CYP P450 family of enzymes. CYPs are responsible for the majority of the reactive metabolites, drug-drug interactions and bioactivation-related toxicities encountered in the clinic, as well as numerous idiosyncratic responses. Furthermore CYP 450s play a major role in drug half-life and clearance. Identification of CYP-derived metabolites is important as it allows for structure activity relationship (SAR) efforts to improve the metabolic profiles by modifying metabolic susceptibility. Consequently, being able to predict the sites of metabolism (SOM) in a molecule is of great interest to medicinal chemists, drug metabolism scientists and chemical toxicologists.
CYP2D6 is an important xenobiotic-metabolizing enzyme, responsible for the metabolism of approximately 25% of known drugs. CYP2D6 has been described to play important roles in xenobiotic bioactivation, drug-associated suicidality, QTc prolongation, biosynthesis of several neurotransmitters and neurosteroids, and immunomodulation. Despite its importance, the unique functions of CYP2D6 in drug metabolism and toxicology are often over-looked. Due to the liver's central role in drug metabolism, most investigators typically identify and quantify drug metabolites using immortalized hepatic cell lines, microsomes and primary hepatocyte cultures (Dambach et al., 2005). However, there are several shortcomings with these approaches: (1) CYP2D6 is present at a very low level in hepatic cells ( 2%) so its distinct contributions to drug metabolism, including formation of unique metabolites, are generally over-shadowed by those of CYP3A4/5 (40%), CYP2C (25%), CYP1A2 (18%) or CYP2E1 (9%) (Shimada et al., 1994), (2) hepatocytes are typically supplied frozen at À80 C, and it is known that the activity of CYP2D6 is significantly reduced following periods of storage at such low temperatures, which is different from other CYP isoforms such as CYP3A4 (Tyndale et al., 1999), (3) testing in hepatocytes does not provide information on brain-specific metabolites, where CYP2D6 is present in significantly amounts, (4) unlike other CYPs, CYP2D6 is not readily induced in culture so once its level of activity decreases its levels cannot be easily increased, (5) CYP2D6 activity decreases over time in cell culture, so the ability to perform repeated dosing may be limited and (6) since CYP2D6 is polymorphic its activity can vary widely in different cell cultures depending on the CYP2D6 polymorphisms of the tissue donors.
Several approaches have been proposed in recent years to help overcome the limitations of cell culture in predicting CYP2D6-derived metabolites, including the use of transgenic C57BL/6 mice expressing human CYP2D6 (Cheng et al., 2013) and CYP2D6 knockout mice (Scheer et al., 2012). An additional approach to help predict CYP2D6-derived metabolites is the use of in silico methodologies in the early stages of drug discovery, which can be conveniently used to shorten the discovery cycle and enhance late-stage success. In silico tools offer many advantages to drug development teams: (1) purified enzymes and test materials are not required, (2) rapid delivery of results, (3) ease of use, (4) cost-effectiveness, (5) they allow for a reduction in the use of test animals, (6) provide significant reduction in experimental error and (7) they can be used to predict metabolism of insoluble compounds, reactive intermediates, transition states and CYP2D6 time-dependent inhibitors. The optimization of compounds in early development using in silico tools typically shortens the discovery cycle and enhances the latestage drug survival rate.
Site of metabolism (SOM) prediction tools are used throughout the drug discovery process by medicinal chemists, chemical toxicologists and drug metabolism scientists. Four commonly used in silico programs for predicting the SOM of CYP2D6 substrates were evaluated and compared in this study. Each program predicts SOM using a unique computational algorithm: MetaSite (combines structure similarity indices with ligand reactivity), StarDrop (combines quantum chemical analysis and a ligand-based model of CYP substrates), SMARTCyp (utilizes a set of pre-calculated DFT activation energies in combination with topological accessibility descriptors) and RS-WebPredictor (utilizes a set of 148 topological and 392 quantum chemical atom-specific descriptors). In the early stages of drug discovery, these programs can be used to predict whether metabolism occurs on the core structure or on functional groups, which can potentially be substituted without altering the pharmacological activity. In later stages of drug discovery, SOM tools can be used to predict whether metabolism remains the same or if it shifts to a different region of a molecule due to the effect of a bioisoster, an enantiomer or a protective group.
Several reports describing the importance of SOM prediction tools in the drug development space have been outlined in the literature. MetaSite-derived SOM predictions have been widely employed in the pharmaceutical industry for various purposes, like predicting metabolic stability of a series of COX-2 inhibitors at AstraZeneca (Ahlstrom et al., 2007), predicting metabolic soft-spots for structural analogues of indomethacin at Pfizer (Boyer et al., 2009) and for improving in vivo clearance of pyrazolopyridazine a-2-d-1 ligands at GlaxoSmithKline (Myatt et al., 2010). StarDrop SOM predictions to improve metabolic stability we have been employed by researchers at Anacor Pharmaceuticals to predict the SOM for a series of boron-containing PDE4 inhibitors (Bu et al., 2012) and by researchers at OSI Pharmaceuticals to predict the SOM for a series of 7-aminofuro[2,3-c]pyridine inhibitors of TAK1 (Hornberger et al., 2013). Although fewer reports on the use of SMARTCyp and RS-WebPredictor in drug development were found in the literature, these programs have also been used to verify experimental data, including the SOM of acetaminophen by SMARTCyp (Kirchmair et al., 2014) and the SOM of cinnarizine by RS-WebPredictor (Zaretzki et al., 2013).
Overall, the in silico programs evaluated here proved to be reliable in the prediction of the CYP2D6 SOM when the top three predictions were taken into consideration (Scores of 3, 2 or 1). The order of correct predictivity, ranging from best to worst, was StarDrop (96%)4MetaSite (91.9%)4SMARTCyp (90.3%)4RS-WebPredictor (84.7%), which are in general agreement with a previous comparative study (Shin et al., 2011). This high level of predictivity for these in silico programs suggests that these programs can greatly simplify and accelerate the task of soft spot localization compared with manual data review. While RS-WebPredictor did not perform as well as the other programs, its level of performance was still high overall, and it has the advantage of being particularly easy to use (plug and play) and free of charge. Two main conclusions were drawn from the statistical analysis: (i) RS-WebPredictor performs less successfully in predicting the major metabolite as its first prediction and does not provide a significant improvement for the other scores compared to the other programs, and (ii) the remaining three programs are approximately comparable to each other based upon overlapping confidence intervals when the top three predictions are considered.
Bioactivation of drugs is a cause of concern since the resulting metabolites can be associated with undesirable adverse effects. The bioactivation model described in this investigation successfully predicted the increased reactivity potential for major metabolites of 5 structurally and functionally diverse CYP2D6 substrates: acetaminophen, MPTP, chlorpyrifos, clozapine and trazadone. An in silico-based paradigm to help predict bioactivation of compounds by CYP2D6 in the CNS is described in Figure 6. This paradigm consists of four parts: first physico-chemical properties associated with CNS-penetrant compounds should be calculated for the test compound of interest. If the compound is determined to have properties which will likely enable CNSpenetration, it could then be screened, by molecular docking, for favorable binding properties with CYP2D6. If the in silico evaluation raises a cause for concern, follow-up screening in a CYP2D6 SOM prediction program is warranted, and the resulting metabolites could then be assessed for bioactivation potential. In vitro methods, such as rCYP2D9 or brain microsomes can be used to confirm these predictions.
In conclusion, although further research and validation are required, in silico tools offer new, exciting and generally reliable tools to safety assessment scientists for the determination of serious toxicities associated with CYP2D6 metabolic processes.