Structure-based virtual screening and molecular dynamics simulations for detecting novel candidates for allosteric inhibition of EGFRT790M

Abstract Structure-based virtual screening (SBVS) was applied to predict lead compounds for the allosteric inhibition of epidermal growth factor receptor (EGFR) by screening the library of chemical compounds prepared from the e-molecules chemical database. The library of chemical compounds consisting of 133,083 ligands was composed by evaluating the chemical and physical properties of e-molecules chemicals. The prepared library was screened by CCDC Gold software in the allosteric binding site of EGFRT790M using the library and virtual screening default parameters to filter out, respectively. The GOLD fitness scores 75 and 80 were selected as threshold values for the library and virtual screening processes, respectively. After the docking study, molecular dynamics simulations (MDS) of the top 25 compounds were built for calculating binding free energies from their MDS trajectories. MM-GBSA binding free energies for the compounds were computed from 20 ns MDS, 50 ns MDS and 200 ns MDS trajectories to filter out the candidates. Following MM-GBSA/MM-PBSA binding free energy calculations, six compounds were detected as the most promising candidates for allosteric inhibition of EGFRT790M. The dynamic behaviors of final compounds inside EGFR T790M were searched using structure stability, binding modes and energy decomposition analysis. Besides, the estimated inhibitors were exposed to docking study and MM-GBSA/MM-PBSA binding free energy calculations inside wild-type EGFR, respectively, to be determined their selectivity towards mutant form. Five of the estimated inhibitors displayed estimated selectivity towards EGFRT790M. Besides the ADMET properties of the estimated inhibitors were predicted by PreAdmet tools. Communicated by Ramaswamy H. Sarma


Introduction
Epidermal growth factor receptor (EGFR) is a member of the receptor kinase (RK) family that exists widely in every part of the human body.EGFR consists of an extracellular binding site for EGFs or other ligands, a trans-membrane region, and an intracellular tyrosine kinase domain.As with all tyrosine kinases, kinase domain of EGFR consists of two lobes, the N-terminal and the C-terminal lobe, which share a catalytic domain.This domain has a catalytic cleft divided into the front and the back.The ATP-binding site (orthosteric site) and the non-ATP contact region (allosteric site) constitute the front and the back of the catalytic cleft, respectively.In addition, there are several structures in EGFR catalytic domain, namely, a-loop, aC-helix, P-loop and A-loop (Figure 1; Li & Guo, 2021).In the normal cell, EGFR plays a crucial role in sustaining downstream signaling pathway and regulating the mTOR-serine/threonine protein kinase pathway (Liu et al., 2017).Any malicious transformation in EGFR contributes to starting signalization for Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC), which can lead to the propagation of these cancers (Carcereny et al., 2015;Marmor et al., 2004).The NSCLC constitutes an 80-85% rate of lung cancers and has a 5-year survival rate of under 15% (Bray et al., 2018;Pallis et al., 2009).EGFRs are crucial therapeutic targets for NSCLC and contribute survival, proliferation, migration, differentiation and adhesion of cancer cells (Sharma et al., 2007).The gene amplification, protein overexpression, mutations, or in-frame deletions observed in EGFR, often alter the EGFR signaling in several human cancers (Roskoski, 2014).In addition, EGFR is one of the most frequently changed oncogenes in solid cancers.The mutation that activates kinase in EGFR and over-expressed EGFR protein are the most seen pathological alterations in EGFR-related cancers.Major mutations that activate kinase in EGFR confer in NSCLC and glioblastoma, but they could rarely occur in other types of cancers (Thomas & Weihua, 2019).The most frequent mutations in NSCLC contribute to increased signaling and tumor development and seem in the intracellular domain of EGFR (Bhatia et al., 2020;Sigismund et al., 2018).
Various mutations exist in the tyrosine kinase domain of EFGR (Sigismund et al., 2018).In-frame deletion in exon 19 (delE746-A750) and L858R substitution in exon 21 of EGFR activate the EGFR overexpression leading to uncontrolled proliferation of cells.Therefore, the activations related to these most common mutations, accounting for around 90% of all EGFR mutations, cause the formation of tumors in NSCLC patients (Kawahara et al., 2010;Ladanyi & Pao, 2008;Lynch et al., 2004).These point out the importance of inhibition of mutated EGFR.Related to this, EGFR tyrosine kinase inhibitors (TKI) targeted to the ATP binding site were reported effective against EGFR-mutant advanced NSCLC in the clinical therapy of cancer patients.Besides, measuring response rate (RR) and progression-free survival (PFS) displayed that EGFR TKIs are more effective than classical chemotherapy of EGFR-mutants related NSCLC in randomized clinical trials.(Ercan et al., 2015).
Gefitinib and erlotinib are well-known EGFR inhibitors that are developed to target EGFR mutants, and these inhibitors display efficiency in NSCLC treatment.On the other hand, the disease progression could develop in most patients following victorious therapy with an EGFR inhibitor (Figure 2).The secondary EGFRT790M mutation detected in 60% of these patients is the major mechanism of drug resistance (Yu et al., 2013).The affinity of ATP for the binding site increases owing to this secondary mutation; therefore, it becomes inconvenient binding of the competitive inhibitors to ATP binding site (Yun et al., 2008).
To overcome this drug resistance, second-generation EGFR inhibitors, namely afatinib and dacomitinib, were developed to inhibit EGFRT790M in mutant NSCLC (Figure 2).They display their inhibitory activity by binding irreversibly to the kinase domain of EGFRT790M (Roskoski, 2016).On the other hand, second-generation inhibitors have significant limitations in the treatment of EGFRT790M activated NSCLC patients.These inhibitors target the inactive form of the kinase domain due to recognizing solely inactive mutant EGFR, and they fail to inhibit the monomeric state of the kinase domain.The inactive site only recognizes the kinase dimer complex of the kinase, and the dimer state is comprised after the autophosphorylation of the TK domain.Therefore, these groups of drugs are partially appropriate for the inhibition of NSCLC (Maity et al., 2020).Besides, afatinib inhibits both mutant and wild-type EGFRs powerfully.Owing to the inhibition of wildtype EGFR (wt-EGFR), side effects, including skin rash and diarrhea, have been observed that restricts to increase the inhibitor doses of afatinib in the treatment of NSCLC patients associated with EGFRT790M (Ercan et al., 2015).
EGFR-T790M sourced drug resistance and toxicity issues make limitations using the first and second-generation EGFR inhibitors in the cure of NSCLC patients.Third-generation EGFR inhibitors have shined out for the therapy of NSCLC patients in clinics to overcome the specified limitations.These drugs have a covalently binding capacity.Osimertinib, a member of this class, has been approved by USFDA for EGFRT790M-related metastatic NSCLC treatment in patients (Patel et al., 2017).This inhibitor covalently interacts with Cys797 and diminishes toxicity (Figure 2).But the therapeutic benefits of this inhibitor have diminished due to Cys797 mutation to serine (C797S; Fassunke et al., 2018).This C797S mutation induces resistance to irreversible EGFR inhibitors.Drug resistance caused by EGFR mutations towards the current reversible and irreversible EGFR inhibitors, makes harder to regulate protein kinase (PK) activation for NSCLC therapy (Du & Lovely, 2018).Therefore, this situation forces the necessity for the determination of new drug targets to get over the issues sourced by EGFR mutations in clinics.
The binding of ligands or effectors to the allosteric site for various protein kinases assists in conformational changes at the active or substrate binding site, which contribute to regulating protein activity.As a result of this, the concept of targeting the allosteric site of EGFRs has started improving to avoid drug resistance to EGFR mutations.Supporting this concept, the allosteric drugs, which regulate protein activity are mostly non-covalent binders and affect with a low dose for allosteric changes.Thus, these drugs can cause low toxicity in patients (Ling et al., 2015).The fourth-generation compounds, targeting the allosteric site developed instead of the previous generations' lack of therapeutic benefits due to EGFR mutations, were reported to hinder the ATP activity and stop autophosphorylation by inhibiting EGFR T790M, L858R/T790M and T790M/C797S/V948R mutations (Figure 2).EAI001 and EAI045 are the important members of the fourthgeneration compounds.These compounds have been pointed out as mutant sensitive for EGFR by analyzing their sensitivity against 250 types of protein kinase (Jia et al., 2016, Zhao et al., 2018).On the other hand, these inhibitors display their inhibitory potency with a monoclonal antibody known as cetuximab.Cetuximab drops the tendency for asymmetrical dimerization of EGFR mutants (Zhang et al., 2006).Owing to asymmetrical dimerization was occurred by the binding of a monomer's C-lobe to another monomer's Nlobe, the inhibitor does not bind to the allosteric site in the active dimer.Cetuximab breaks EGFR dimer formation, following, EAI045 binds all allosteric sites and successfully inhibits EGFR.JBJ-04-125-02 identified as an allosteric EGFR inhibitor, has shown inhibitory efficiency for in vivo and in vitro models of mutant EGFR (Figure 2).Besides, it has better inhibitory potency against EGFRL858R/T790M compared with EAI045 (To et al., 2019).JBJ-04-125-02 displays selectivity for mutant EGFR by inhibiting the cell proliferation in the L858R/T790M EGFR Ba/F3 cell line and exhibits this potency remarkably more effective than in wild-type-EGFR (To et al., 2019).Although the promising progressions for developing mutant selective allosteric EGFR inhibitors, there are no approved drugs by any organizations of drug administration for the treatment of NSCLC patients.
Herein, the current study aims to find the lead compounds, which have allosteric inhibitory potency against EGFR T790M mutations by avoiding drug resistance and toxicity using computational methods theoretically For this purpose, over 6 million compounds were yielded from the e-molecules database, prior to initiating the calculation processes (https://search.emolecules.com/info/products-datadownloads.html).The obtained compounds were filtered by the physicochemical descriptors to prepare a chemical library.Then, the chemical library was exposed to computational methods such as virtual screening (VS), molecular dynamics simulations (MDS) and MM-GBSA/MM-PBSA calculations to detect the potential allosteric inhibitors.Followingly, the interactions formed between allosteric site residues and final compounds were determined by the binding mode and energy decomposition analysis from the MDS trajectories of promising inhibitors.Finally, ADMET properties of potential inhibitors were predicted.

Library preparation
The library of chemical compounds used for screening and docking studies, that obtained from e-molecules chemical database was generated evaluating the chemical and physical properties of chemicals belong to this database.The criteria for the chemical and physical properties are based on Lipinski's 'rule of five' (Ro5).On the other hand, this rule is not exactly fulfilled for the FDA-approved kinase inhibitors (Roskoski, 2021).According to Ro5, ideal oral effectiveness could decrease when the calculated Log P (cLogP) is greater than 5, the number of hydrogen-bond donors is greater than 5, the number of hydrogen-bond acceptors is greater than 10 (5 � 2), and the molecular weight is greater than 500 (5 � 100).On the contrary, it was reported that the molecular weight (MW) of the orally effective FDA-approved kinase inhibitors is ranging from 306 to 615 (Da), they are bearing hydrogenbond acceptors ranging from 4 to 15, and their cLogP values are ranging from 0.3 to 6.0 (Roskoski, 2021).Thus, the library was prepared that MW is ranging from 250 to 650 (Da), number of hydrogen-bond acceptors is ranging from 4 to 15 and cLogP values are ranging from 0.3 to 6.0 and number of hydrogen-bond donors is 5 or less.In addition, four chemical properties namely, polar surface area, number of heavy atoms, number of rotatable bonds and ring count, were also used to prepare the library of chemicals.For the FDA-approved kinase inhibitors, it was reported that polar surface area is ranging from 59.5 to 149, number of heavy atoms is ranging from 23 to 43, number of rotatable bonds is less than 12 and ring count is ranging from 3 to 6 (Roskoski, 2021).Therefore, the studied library was generated with chemicals that polar surface area is ranging from 50 to 160, number of heavy atoms is less than 45, number of rotatable bonds is less than 12 and ring count is less than 7 in addition to previously mentioned properties.The chemical and physical properties of database compounds were calculated by MOE2020.09 program (Molecular Operating Environment (MOE 2020.09)Chemical Computing Group Incorporation, 2021).

Library and virtual screening, and docking studies
Calculations of partial atomic charges, the parametrization and energy minimization for the studied compounds were executed with MOE.2020.09,using MMFF94x force field (Molecular Operating Environment (MOE 2020.09) Chemical Computing Group Incorporation, 2021;Halgren, 1996).The crystal structure of EGFRT790M (PDB ID: 6DUK resolved at 2.20 Å) was taken from the RCSB Protein Data Bank (http://www.rcsb.org/pdb).Chain A of EGFRT790M, ANP (phosphoaminophosphonic acid-adenylate ester) and magnesium were preserved, and other chains, heteroatoms and water molecules were deleted from pdb file.The partial atomic charges of ANP were calculated by the antechamber module with the AM1-BCC charge model (Case et al., 2021;Jakalian et al., 2000).The system was prepared using xleap module of AmberTools 20 with AMBER14SB force field for protein and GAFF2 for ANP (He et al., 2020;Maier et al., 2015).The prepared system was solvated in an octahedral box with TIP3P water molecules with 10 Å distance between the protein surface and the box boundary, respectively (Jorgensen et al., 1983).Then, the systems were neutralized with appropriate number of sodium counter ions and the energy minimizations were executed for the system using Sander.MPI module of Amber suite 20 (Case et al., 2021).Following the energy minimization, the generated sodium counter ions and water molecules (excluded the water molecules included in the binding site) were removed from the systems for library and virtual screening and docking studies.
GOLD 5.2.1 software was used to calculate fitness function scores of the ligands inside EGFRT790M with library and virtual screening, and default generic algorithm parameters, respectively (Jones et al., 1997).Search efficiency was set at 10, 30 and 100% for the library and virtual screening, and default generic algorithm parameters, respectively.GOLD attempts search efficiency to apply optimal settings and calculates an optimal number of operations for each ligand.The title parameters executed these utilizations with the mentioned ratios above.GoldScore fitness function is calculated according to the equations given below.
In Equation 1, Fitness represented GoldScore fitness function that consists of the external hydrogen bond energy (S(hb ext)), external van der Waals energy (S(vdw ext), internal hydrogen bond energy (S(hb int)) and internal energy (S(int)) scoring contributions.In Equation 2, S(vdw int) and S(tors) terms have symbolized the internal van der Waals energy and internal torsion energy scoring contributions, respectively.
The compounds were docked in an area within a radius of 15 Å centering the carboxylate oxygen (OD1) of Asp855 of EGFRT790M.GoldScore fitness function was used to generate hundred conformations per ligand (Jones et al., 1997).In order to determine the accuracy of the prepared protein model, EAI045 and JBJ-04-125-02 were docked inside the allosteric site of model using GOLD 5.2.1 and AutoDock Tools 1.5.6 (Morris et al., 2009).Docking methodology of AutoDock Tools 1.5.6 for reference compounds was given in supplementary materials.The obtained docking poses of these inhibitors were compared to the crystal structure of them by superposing the generated and crystal structure of the enzyme-ligand complexes.The RMSD values for docking poses were calculated by Dock RMSD Calculator plugin for MOE2020.09.Superpositions of the complexes and the calculated RMSD values are given in Figures S1 and S2.
The crystal structure of wt-EGFR has no allosteric pocket, so wild-type was generated from EGFRT790M (PDB ID: 6DUK resolved at 2.20 Å) to use in the docking study, MDS and binding free energy calculations (Li & Guo, 2021).Met790-Thr790 exchange was performed with MOE.2020.09program and the rest of the preparation process was executed same as the preparation of EGFRT790M (Molecular Operating Environment (MOE 2020.09)Chemical Computing Group Incorporation).

Molecular dynamics simulations
In this study, AMBER20 was used to build MDS of apo EGFRT790M and MDS of EGFRT790M-allosteric ligand complexes .The docking poses of compounds D1-D25 inside the allosteric site of EGFRT790M were used to prepare the initial ligand-protein systems.The partial atomic charges of title compounds were calculated by the antechamber module with the AM1-BCC charge model (Case et al., 2021;Jakalian et al., 2000).The apo EGFRT790M and EGFRT790Mligand complexes were prepared by the xleap module for energy minimizations and MDS (Case et al., 2021).AMBER ff14SB force field and General AMBER force field 2 (gaff2) were used to parameterize proteins and ligands in the complex systems, respectively (He et al., 2020;Maier et al., 2015).All systems were solvated with TIP3P water at 10 Å distance between box boundary and protein surface, in an octahedral box (Jorgensen et al., 1983).The systems were neutralized by appropriate number of sodium counter ions.All systems were exposed to energy minimizations using Sander.MPI, and then MDS of the systems was carried out using pmemd.cuda (Case et al., 2021).The initial systems were exposed to an energy minimization in two steps to avoid bad steric contacts.In the first step, the steepest descent algorithm and conjugate gradient methods were applied to the restrained initial structures at 1000 iterations.Then, these methods were applied to the unrestrained systems at 2500 iterations in the second step.The heating (0.1 ns), equilibration (1 ns) and production steps (20 ns) were implemented for the MDS of the systems.The production steps have been extended to at 50 ns for nine compounds, and then, at 200 ns for seven of these nine compounds.In the heating step of MDS, protein and ligands were restrained using 10 kcal/mol/Å restraint force during the temperature of systems from 0 to 300 K was increased, allowing water molecules and ions to move without constraint.Then, the temperature of the entire systems was equilibrated at 300 K using the Langevin dynamics with a collision frequency of 1.0 ps À 1 in a constant volume periodic boundary.The pressure of the systems was equilibrated in periodic boundary conditions with constant pressure at 1 bar using the isotropic position scaling method at 300 K.Then, positional constraints were gradually cleared out preserving the system temperature at 300 K and system pressure at 1 bar.The constrain band vibrations including hydrogen atoms in the equilibration and production steps were constrained with SHAKE algorithm (Ryckaert et al., 1977).The Particle Mesh Ewald (PME) method was performed for longrange electrostatic interactions (Essmann et al., 1995).MDS were recorded by 2 fs time step, and nonbonded interactions were shortened using a cutoff of 10 Å. Trajectories were visualized by Xmgrace program (Grace Development Team, http://plasma-gate.weizmann.ac.il/Grace/; accessed on 21 May 2016).Hydrogen bonding networks for the systems were determined by cpptraj module using default parameters (Roe & Cheatham III, 2013).The unrestrained MDS, which is the 200 ns trajectory, was divided into 4 segments of 50 ns length.Binding free energy calculations were carried out with MMPBSA.py.MPI module, using the Generalized-Born (GB) and Poisson Boltzmann (PB) models from 100 spaced snapshots of each segment of the unrestrained MDS (Miller III et al., 2012).The final MM-GBSA and MM-PBSA binding free energies are the mean of binding free energy calculated from all 4 segments (Singh et al., 2022).MM-GBSA and MM-PBSA calculations were executed to the unrestrained simulations with 0.1 M salt concentration.The energy decomposition analysis was carried out with MMPBSA.py.MPI module, using the Generalized-Born (GB) model from 100 spaced snapshots of the unrestrained MDS (Miller III et al., 2012).Average structures of complexes were generated by UCSF Chimera package (Pettersen et al., 2004).5.11.4., 2012).

ADMET calculations
Absorption, distribution, metabolism, excretion and toxicity (ADME-Tox) properties of final compounds were calculated using ADME and Toxicity tools of The PreADMET website (Lee et al.,2004).

Structure-based virtual screening
In the current study, 64136313 compounds were obtained from e-molecules chemical database and these compounds were filtered to generate a library of chemical compounds that are used for screening and docking studies by evaluating the chemical and physical properties of them according to the criteria mentioned at library preparation section.The generated library was evaluated with a virtual screening approach to determine the estimated allosteric inhibitors of EGFRT790M.The workflow of high-throughput virtual screening is shown in Figure 3. Firstly, the fitness scores were calculated inside the allosteric binding site of EGFRT790M using the library screening and virtual screening protocols of CCDC Gold to filter out low-scoring ligands (Jones et al., 1997).The threshold value was accepted to GoldScore fitness as 75 because of the GoldScore fitness value for EIA045, the allosteric inhibitor of EGFR, was calculated as 76.37 as the result of library screening (Table S1).The low-scoring compounds were eliminated according to accepted GoldScore threshold values.The filtered compounds were screened by CCDC Gold/virtual screening protocol and filtered out with a threshold fitness value of 80.After the mentioned process, the remained compounds were screened with CCDC Gold/default docking protocol (Jones et al., 1997).Then, the first-ranked docking poses of the top 25 compounds inside the allosteric site of EGFRT790M were chosen as initial structures for MM-GBSA calculations from MDS trajectories.Although the compounds were docked using GOLD 5.2.1, the top 25 compounds of GoldScore solutions were docked in the EGFRT790M by AutoDock Tools 1.5.6 in order to improve the docking solutions (Morris et al., 2009).Docking methodology of AutoDock Tools 1.5.6 for the title compounds was given in supplementary materials.Chemical structures, chemical and physical properties, and the first-ranked docking scores of the estimated inhibitors are given in Figure S3, Tables 1, 2 and S2, respectively.

Molecular dynamics simulations and MM-GBSA/MM-PBSA binding free energy calculations
MDS is an essential tool that allows the investigation of the dynamic behaviors of ligands inside the target structures.MM- GBSA and MM-PBSA calculations are employed to evaluate the binding affinity of hits to their targets using the generated MD trajectories.These are not true binding free energy due to the lack of entropy contribution and therefore are specified as relative free energies.If the systems are related, the solute entropy is counted the same for each system being compared.Hence, there is no need to calculate solute entropy in relative free binding calculations (Miller III et al., 2012).In this study, MD simulations were generated from the first-ranked conformations of filtered ligands inside the allosteric pocket of EGFRT790M.The conformations were obtained from the docking study of title compounds using GOLD 5.2.1 software.The preparation method for the systems is defined in the experimental section.After the equilibrium step, 20, 50 and 200 ns the unrestrained MDS were constituted for the protein-ligand complexes, and 200 ns the unrestrained MDS were built for apo form.
In the first place, the MM-GBSA binding free energy calculations were executed to 20 ns MDS trajectories of the ligand-EGFRT790M complexes of compounds D1-D25 to determine their estimated affinities to the target protein.Besides, these calculations were executed for the EFGRT790M inhibitors named EAI045 and JBJ-04-125-02, and the results are given in Table S1.The compounds that have lower estimated binding affinity than EAI045 were eliminated (Tables 2, S1).For the selected candidates, 50 ns MD simulations were generated and a threshold of MM-GBSA value as À 60 kcal/mol was accepted.Seven compounds were filtered and then exposed to 200 ns MDS.Final compounds were investigated for defining the dynamic behaviors inside the target protein by binding free energy calculations, binding mode and structural stability analysis.In addition, binding free energy calculations obtained from the MD simulation trajectories of these compounds inside the wild type of EGFR were used to evaluate their selectivity of them towards mutant EFGR.

Initial structures
Given the estimated MM-GBSA binding free energies of the screening compounds, compounds D7, D8, D13, D14, D16, D18 and D25 were detected as the most potent compounds for allosteric inhibition of EGFRT790M (Figure S3).When the first-ranked docking poses used in the MDS study of the filtered compounds inside mutant EGFR, were evaluated, it seemed that a hydrophobic pocket formed by Leu747, Met766, Leu777, Met790, Asp855, Phe856 and Leu861 locked in by all candidates like the allosteric inhibitor JBJ-04-125-02.Trident form of JBJ-04-125-02 consists of 5-fluoro-2-hydroxyphenyl, N-(thiazol-2-yl)ethylamide and 3-oxoisoindolin groups, has occupied this hydrophobic pocket.This trident form was mimicked by the filtered candidates, which have different chemical structures (Figures S3-S10).Besides, it was observed that the alkyl chains and/or ring systems with terminal groups on them of the candidates linked to trident form or trident mimic form have extended to gate of allosteric pocket of mutant protein like piperazinophenyl substituted oxoisoindole group of JBJ-04-125-02 (Figures S4-S10).

Structural stability analysis
The root-mean-square deviation (RMSD) belonging to the backbone atoms is usually known to indicate system stability.
Herein, the conformational stability of the ligand-protein and apo form systems has been described by calculating the RMSD values of the backbone atoms of EGFRT790M according to their initial positions.The average RMSD values of the ligand-protein system simulations have been plotted as smaller than 2.5 Å, excluding the compound D8-protein system, after the equilibration step, and also the average RMSD values have demonstrated the systems are stable.This has indicated the high stability of EGFRT790M systems.In addition, the RMSD value of the compound D8-protein system is around 2.5 Å during the whole simulation.And also, the RMSD values of complex systems have been determined as smaller than the apo forms.The RMSD plots of EGFRT790Mligand complex (mut-holo) and EGFRT790M apo form (mutapo) systems were given in Figure 4.
The RMSDs of ligand-protein and apo form systems for the wild type (wt) of EGFR besides EGFRT790M were evaluated from their MD simulations.Contrary to mut-holo systems, the RMSDs of wild type EGFR-ligand complex (wt-holo) system simulations were not consistent with each other.The RMSD values of wt-holo systems for compounds D7, D16 and D18 were observed bigger than the RMSD value of the apo form.Besides, the RMSD values of compound D14-wt EFGR complex system were seen slightly bigger than the RMSD value of the wt-EGFR (wt-apo) form.For the other compounds, the RMSD values of wt-holo systems were detected that are similar to the RMSD value of apo form or smaller than it.The RMSD values of compound D16-protein complex have approached the RMSD values of apo form after around 85 ns of MD simulations.The RMSD values of the apo form were observed around 2.5 Å during the entire MD simulation.The RMSD plots for the wt-holo and apo form systems were given in Figure 5.The average RMSD values of all systems were reported in Table 3.
Root-mean-square fluctuation (RMSF) plays an essential role in investigating the dynamics of molecular systems.The RMSFs of backbone atoms were computed from the free MD simulations (200 ns) to evaluate the conformational fluctuation of the proteins.As pictured in Figure 6, the average RMSF values of most regions were observed as smaller than 1.5 Å.This refers to the stability of the protein structures on the entire simulations.The C-terminus of EGFRT790M formed by the residues named Leu979-Asp1006 interacts with the remained structures weakly and is high exposure to solvent, so it seems very flexible in all systems.Besides the C-terminus, functional regions named the aC-helix, A-loop, P-loop and a-loop, relatively display flexibility in comparison to the rest of the protein.While evaluating the RMSF profile of mut-holo systems, it was observed that the protein skeleton fluctuation is remarkably low compared to the mut-apo, especially for  apo state of the mutant.The RMSF values of mut-holo and mut-apo complexes have illustrated that the stability of mutant protein was so impressed by the binding of inhibitors.
In addition to mut-holo and mut-apo systems, the RMSF values of wt-holo and wt-apo systems were calculated from their free MD simulations.The RMSF values of mut-apo and wt-apo systems are near values for functional regions excluding aC-helix region.In this region, it was observed that the RMSF values of mut-apo system are bigger than the RMSF values of wt-apo system (Figure 7).Regarding RMSF values in aC-helix region, it was seen when the mut-EGFR and wt-EGFR have been exposed to the inhibitors, that the RMSF values of backbone atoms were reduced in comparison to both apo states in aC-helix region in general.On the other hand, the RMSF values of wt-apo state are smaller than the RMSF values of wt-holo system for compound D18 for this region.The binding of inhibitors inside the allosteric pocket of mut-EGFR and wt-EGFR has reduced the conformational changes for aC-helix region.Besides, the binding of some candidates to their allosteric targets in wt-EGFR caused conformational changes in the other functional regions.The binding of compound D7 to the allosteric pocket of wt-EGFR reduced the conformational stability of P-loop and A-loop of protein.Moreover, it seemed that the RMSF values of wt-holo system of this candidate are bigger than the RMSF values of both the apo systems and mut-holo systems for the title regions.Different from compound D7, the binding of compound D16 had reduced the conformational stability of a-loop besides of A-loop of wt-EGFR in comparison to both apo systems and mut-holo system.Finally, the binding of compound D18 inside the allosteric pocket of wt-EGFR reduced the conformational stability of P-loop and aC-helix of protein.Radius of gyration (Rg) relies on the interrelation between the mass of certain atoms and the center of gravity of the molecule.This may utilize to characterize the compactness of the proteins binding with ligands here.For the current study, the calculated average Rg value of both apo systems and mut-holo wt-holo systems were tabulated in Table 3 and presented in Figures 10 and 11.The mut-apo system has the smallest average Rg value among all systems with an average of 19.96 Å.This refers that mut-apo form prefer less dynamic conformation than the mut-holo systems, because of the fact that mut-holo systems have so closed average Rg values to average Rg value of mut-apo system.In the presence of ligands inside the allosteric pocket of EGFRT790M, protein conformations slightly have changed.For the wt-apo and wt-holo systems, average Rg values of wt-EGFR in the presence of compound D8 and D14 inside the allosteric pocket are slightly smaller than the average Rg value of wt-apo form (Table 3).On the other hand, average Rg values of other wt-holo systems are slightly bigger than the average Rg value of wt-apo form.In short, all systems have close average Rg values, and for mut-holo systems are similar compactness and stability as the mut-apo form and this can be mentioned for the wt-apo and wt-holo systems too.
Solvent accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent.This refers that SASA calculations may indicate changes in protein folding.Average SASA values of all systems were tabulated in Table 3 and presented in Figures 12 and 13.For the mut-apo and mut-holo systems, mut-apo form has a little bit bigger average SASA value than the mut-holo forms.The lowest average SASA value of 14343.1 Å 2 was calculated for the compound D18-EGFRT790M complex.This can be considered that the presence of the ligand decreases the SASA of protein and may causes the conformational changes for the mutant protein.This has been realized for the complexes of the other studied compounds with EGFRT790M too.Besides, the average SASA value for wt-apo system was calculated bigger value than all systems (Table 3).As is mut-holo systems, the presence of ligands also increases the SASA of wt-apo form, which means that this may make conformational changes for wt-EFGR.

Binding mode analysis
Hydrogen bonding is an essential interaction for ligand binding and affinity to target proteins.In the current study, these interaction networks formed between ligands and EGFRT790M were detected from MDS trajectories of mut-holo systems using Cpptraj.
Evaluating the outputs of hydrogen bonding analysis, it was detected that bonds were formed by ligands and the residues of aC-helix and A-loop regions, in general (Table 4).Besides, compound D7 and a P-loop residue (Phe723) have formed a hydrogen bond.Even though compounds D13 and D14 have formed direct hydrogen bonds with allosteric pocket residues in low frequency, these have mainly formed hydrogen bonds with water molecules and water-mediated hydrogen bonding with aC-helix and A-loop residues (Tables 4-6).In addition to compounds D13 and D14, the rest of the candidates have formed hydrogen bonds with water molecules and water-mediated hydrogen bonding (Tables 5  and 6).Mostly, Asp855 as an A-loop residue has formed water-mediated hydrogen bonding with the studied compounds, especially as compounds D7, D8, D14, D16 and D25.Other key residues that contribute to forming watermediated hydrogen bonding are Lys745, Thr751, Glu762 and Gly857.Lys745 as b-3 strand residue has formed this bonding with compounds D8 and D16.Thr751 and Glu762 as a-loop and aC-helix residues have formed water bridges with compounds D13 and D14, and compounds D13, D16 and D25, respectively.Another A-loop residue as Gly857 has formed water-mediated hydrogen bonding with compounds D13 and D14.In a particular way, water-mediated hydrogen bonds play key role binding of compounds D13 and D14 in comparison to direct hydrogen bonds formed with allosteric pocket residues (Tables 4 and 6).For compound D18, it was observed that water-mediated hydrogen bonding does not contribute to the binding of this compound to the allosteric pocket of EGFRT790M than the direct hydrogen bonds.
The salt bridge is another crucial interaction between some of the studied compounds and EGFRT790M.It has been observed that salt bridges have formed between compound D7 and Glu749, compound D16 and Glu762, and compounds D13 and D18 with Asp855.These salt bridges have been created by the positively charged tertiary amine groups of title compounds and the carboxylate group of Glu749, Glu762 and Asp855 (Figures S19-S26).Besides the salt bridges, other polar interactions were observed between the candidates and receptor residues.Lys745 has contributed to the constitution of these interactions with the help of the positively charged amino group that has borne.In addition, the positively charged tertiary amine groups of compounds D13, D14, D18 and D25 have constituted polar interactions with Phe856.
The observed main interactions between the studied compounds and allosteric pocket residues due to the hydrophobic character of pocket are non-polar interactions such as CH-p, p-p, van der Waals and hydrophobic interactions.Mainly, these were formed between the candidates and the residues of the functional regions namely a-loop, aC-helix and A-loop.In addition, b-4 and b-5 strand residues had composed non-polar interactions with the candidates.Detection of polar and non-polar interactions was supported by binding free energy calculations and energy decomposition analysis, and it was mentioned in the next section.

MM-GBSA/MM-PBSA binding free energy calculations and energy decomposition analysis
MM-GBSA/MM-PBSA binding free energy calculations were executed for the studied compounds using 200 ns MDS of their ligand-protein complexes, results were tabulated and given in Table 7. Amber MM-GBSA/MM-PBSA energy values (DG bind ) of each ligand have been calculated by MMPBSA.py.MPI module according to the equations given below.
In Equation 3, E MM represented the molecular mechanics contribution that consists of the electrostatic, internal and van der Waals contributions to binding in vacuum.In the same equation, G solv represented the solvation-free energy contribution to the binding that consists of polar and nonpolar solvation-free energies.In Equations 4 and 5, rec, com and lig have symbolized the receptor, complex and ligand, respectively.
In the current study, for all studied compounds in EGFRT790M, electrostatic energy (in vacuum) is the primary energy contribution to the binding free energy using both MM-GBSA and MM-PBSA calculations.However, the total electrostatic energy contribution, consisting of electrostatic energy and polar solvation energy calculated by both methods, is an unfavorable contribution to the binding of all candidates.The polar solvation energy contributions of binding free energy for all systems were calculated high by MM-PBSA calculation method compared to MM-GBSA calculation method.Similarly, the non-polar solvation energy contributions of binding free energy were calculated high by MM-PBSA calculation method for all systems compared to MM-GBSA method too.On the other hand, it was determined that non-polar contributions, consisting of van der Waals and non-polar solvation energy, are the most preferred energy contribution for binding free energy of the studied compounds.Although the total electrostatic energy for all systems is an unfavorable contribution for the binding free energy, non-polar energy is the main contributor for the total binding free energy.In addition, MM-PBSA binding free energies of all ligand-EGFRT790M complexes were calculated bigger than the MM-GBSA binding free energies of them because of the solvation free energy contribution (Table 7).
Energy decomposition analysis was executed to the unrestrained MDS of EGFRT790M-ligand complexes by MM-GBSA calculation method.According to analysis results, non-polar energy contributions for all candidates were provided by a P-loop residue (Phe723), b-3 strand residue (Lys745), a-loop and aC-helix residues (Leu747, Ile759 and Met766), b-5 strand residue (Leu788) and A-loop residues (Asp855, Phe856, Leu858 and Leu861) (Figure 9, Tables S3-S9).At the same time, Glu762 (aC-helix), Leu777 (b-4 strand), Met790 (b-5 strand) and Thr854 (A-loop) contributed non-polar components for the binding free energy of the studied candidates in the allosteric pocket of EGFRT790M.Glu762 has not provided non-polar contribution to the binding energy of compound D14, but Leu777, Met790 and Thr854 have not contributed the same component for the binding energy of compound D25.The contribution of total electrostatic energy plays an important role in the free-binding energy of candidates.This contribution was commonly provided by several residues such as Phe723, Lys745, Leu747, Glu749, Glu762, Met766, Thr854, Asp855 and Phe856.The total electrostatic energy contributed by Glu749, Glu762 and Asp855 to binding free energy of compounds D7, D16, D13 and D18, respectively, have supported the salt bridges between the candidates and the allosteric residues of EGFRT790M.The other serious interaction that was supported by the contribution of total electrostatic energy, is hydrogen bonding.The main contributors of the electrostatic energy for hydrogen bonding are Phe723, Lys745, Glu749, Glu762, Thr854, Asp855 and Phe856.The contribution of these residues to total electrostatic energy has provided the formation of hydrogen bonds with compound D7 (Phe723 and Glu749), compound D8 (Thr854 and Asp855), compound D16 (Glu762 and Phe856), compound D18 (Asp855) and compound D25 (Asp855).In addition to hydrogen bonding, it was observed that the electrostatic energy contributions provided by Phe856 to binding free energy are crucial for the binding of compounds D13, D14, D18 and D25.The energy composition analysis results were given in Figure 9 and tabulated in Tables S3-S9.MM-GBSA/MM-PBSA binding free energies have been calculated from the unrestrained MDS trajectories of the ligandwt-EGFR complexes of the potential inhibitors for the indicating estimated selectivity against wt-EGFR.The docking scores and MM-GBSA/MM-PBSA binding free energy calculation results for the studied compounds inside wt-EGFR were reported in Table S2 and Table 8, respectively.According to the MM/GBSA binding free energy calculations, the estimated inhibitors, excluding compound D8, have shown selectivity to EGFRT790M from low to mild (Tables 7 and 8).Increasing of the MM-GBSA binding free energies for the estimated inhibitors in wt-EGFR have been provided by van der Waals and non-polar solvation energy components for compounds D7, D16 and D18, and total electrostatic energy components for compounds D13, D14 and D25.On the other hand, MM-PBSA binding free energy calculations executed to MDS trajectories of ligand-wt-EGFR complexes have displayed that compounds D13, D14, D16, D18 and D25 have selectivity to mutant form.Even though compound D7 displayed low selectivity towards EGFRT790M according to MM-GBSA binding free energy, it has selectivity to wt-EGFR because of the solvation free energy contribution to MM-PBSA binding free energy.Besides, compound D25 is the most selective estimated inhibitor as the results of MM-PBSA binding free energy calculations.
The energy decomposition analysis was executed to the unrestrained MDS of the ligand-protein complexes in order to determine the contributions of residues to binding free energy.The results are tabulated and given in Tables S10-S16 and Figures S27-S28, respectively.The energy contribution reduction sourced from A-loop residues plays a crucial role in the estimated enzyme selectivity for compounds D7 and D18.Another significant case that has impacted the selectivity is the Met790-Thr790 exchange between the mutated form and the wild type.It has reduced the estimated affinity to wt-EGFR for compounds D13, D14, D16 and D18.The changes of energy contributions sourced by aC-helix, b4 and b5-strand residues of wild type, excluding Lys745 and Met766, have supported the estimated selectivity of compound D25 towards mut-EGFR.Besides, the energy contributions of wt-EGFR catalytic domain residues have reduced the enzyme affinity over the corresponding contributions of EGFR T790M for compound D16.

The prediction of the ADMET properties
The ADMET properties of potential inhibitors may give an opportunity to evaluate the possibility of becoming drugs to use for treating the target sickness.For this purpose, PreAdmet tool was used to predict to the ADMET properties of potential inhibitors in the current study.Toxicity data in PreAdmet tool Algae at, Ames test, carcino mouse, carcino rat, Daphnia at, hERG-inhibition, medaka at, minnow at, TA100-10RLI, TA100-NA, TA1535-10RLI and TA1535-NA.Carsino mouse and carsino rat properties were mostly predicted as negative for the estimated inhibitors.Compound D25 for carsino mouse test and compounds D16 and D18 for carsino rat test were predicted as positive.Positive prediction means no evidence of carcinogenic activity.TA100-10RLI, TA100-NA, TA1535-10RLI and TA1535-NA test results were found as negative which means no change of population versus blank plate for all estimated inhibitors.According to hERG inhibition test, they were predicted to display highrisk to medium-risk profiles.Besides it was found by Ames test hat compounds D13 and D14 are non-mutagen, but the rest of estimated inhibitors are mutagen.Caco2 cell permeability test for the estimated inhibitors displayed that they have middle permeability.Human intestinal absorption predicted that the estimated inhibitors are well-absorbed compounds.They have high absorption to CNS and are active compounds in CNS, according to blood-brain barrier (BBB) penetration test.These compounds were also predicted to bind to plasma protein weakly.The ADMET properties of the estimated EGFRT790M inhibitors are reported in Table 9.

Conclusion
In summary, 64,136,313 compounds were yielded from the e-molecules database.They were filtered out to 133,083 compounds using physical and chemical descriptors.For filtering out, the prepared library was exposed to library screening, virtual screening and docking studies, respectively.Then, the generated 20 ns MDSs for twenty-five compounds have been used for MM-GBSA free energy calculations.After these calculations, MDSs were extended to 50 ns for nine compounds.It was found that seven compounds have the estimated inhibitory potency according to binding free energy calculated from 50 ns MDS trajectories.For compounds D7, D8, D13, D14, D16, D18 and D25, MDSs were extended to 200 ns.MM-GBSA/MM-PBSA binding free energy calculations and structural stability, binding mode and energy decomposition analyses were performed to the 200 ns MDS trajectories of ligand-EGFRT790M complexes to indicate the estimated enzyme affinity and binding motif of the estimated inhibitors inside EGFRT790M.It has known that the JBJ-04-125-02 has a crystal form in the allosteric site of EGFRT790M.This allosteric inhibitor has a trident form that consists of 5-fluoro-2-hydroxyphenyl, N-(thiazol-2-yl)ethylamide, and 3-oxoisoindoline groups, and this form has occupied the hydrophobic pocket formed by Leu747, Met766, Leu777, Met790, Asp855, Phe856 and Leu861.According to docking study and binding mode analysis, it was seen that these filtered candidates bearing trident form or trident form mimicked structures have occupied the hydrophobic pocket of an allosteric site like JBJ-04-125-02.The analysis outputs suggest the compounds that have the trident form or mimic the trident form, can affect the allosteric site of EGFR T790M such as JBJ-04-125-02.The results of binding mode and energy decomposition analyses displayed that hydrogen bonding and/or watermediated hydrogen bonds play an important role for the binding of these compounds inside the allosteric site of EGFRT790M.aC-helix residues namely Glu749 and Glu762, and A-loop residues namely Thr854, Asp855 and Phe856 are the key residues that formed hydrogen bonds with the different candidates.For compounds D13 and D14, watermediated hydrogen bonds are especially important for the binding of these compounds inside the allosteric site of EGFRT790M in comparison to direct hydrogen bonds.In addition to hydrogen bonding, non-polar interactions have contributed binding of the studied compounds to the related protein.Especially, Leu747, Met766 and Leu858 are the main energy contributors for non-polar interactions associated with the binding of the final candidates.Lys745 and Phe856 are also crucial amino acid residues with polar and/or nonpolar energy contributions to the binding motifs of the studied candidates.
On the other hand, MM-GBSA/MM-PBSA binding free energy calculations were executed to the whole unrestrained MDS trajectories of the ligand-wt-EGFR and the ligand-mut-EGFR complexes in order to predict their estimated affinity and selectivity towards EGFRT790M.The estimated EGFRT790M inhibitors as compounds D13, D14, D16, D18 and D25 have shown selectivity to EGFRT790M according to both calculation methods.It was observed that A-loop residues contribute to the estimated enzyme selectivity for D18, and the Met790 increases the estimated affinity to EGFRT790M for compounds D13, D14, D16 and D18.Lastly, the energy contributions provided by aC-helix, b4 and b5strand residues of mutant EGFR, excluding Lys745 and Met766, affect the estimated selectivity for compound D25.According to both MM-GBSA and MM-PBSA binding free energy calculations, it could be considered theoretically that these six compounds are the estimated allosteric inhibitor for EGFRT790M, and five of them are selective allosteric inhibitors for EGFRT790M.Besides, the ADMET properties of the proposed EGFRT790M inhibitors were predicted PreAdmet tool.Even though they didn't give the same results each all toxicity tests, this proposes to compound D13, D14, D16, D18 and D25 may be potential drug candidates.

Figure 1 .
Figure 1.A cartoon representation of EGFR (PDB ID: 6DUK).The hinge-region, the aC-helix, the P-loop, a-loop, A-loop, b 2 -sheet, b 3 -sheet, b 4 -sheet and b 5 -sheet are marked in cyan, light brown, green, dusty rose and orange, yellow, red, blue and dark green, respectively.JBJ-04-125-02 is displayed in olive sticks and ANP is displayed in light pink sticks.The orthosteric and allosteric sites are highlighted by dark magenta and olive circles, respectively.

Figure 3 .
Figure 3. Numbers of used compounds in hierarchal screening approach; Library and virtual screening, docking and MMGBSA/MMPBSA binding free energy calculations (10, 50 and 200 ns, respectively).

Table 1 .
Properties of the studied compounds for allosteric inhibition of EGFRT790M.
a_don, Number of hydrogen bond donors; a_acc, Number of hydrogen bond acceptors; logP (o/w), Calculated Log10 of the partition coefficient; ASA_P, Total polar surface area; A_heavy, Number of heavy atoms; b_rotN, Number of rotatable bonds.

Table 3 .
The average RMSD, radius of gyration and SASA values of the mut-apo, mut-holo, wt-apo and wt-holo systems.

Table 4 .
Hydrogen bonds are formed between the studied compounds and EGFRT790M.Cpptraj module of Ambertools 20 was used to analyze hydrogen bonding using default criteria of cpptraj module for hydrogen bonds (A cutoff of 3.0 Å for distance and 135 � for angle).

Table 5 .
Hydrogen bonds are formed between the studied compounds and water molecules.Cpptraj module of Ambertools 20 was used to analyze hydrogen bonding using default criteria of cpptraj module for hydrogen bonds (A cutoff of 3.0 Å for distance and 135 � for angle).

Table 6 .
Hydrogen bonds are formed between the studied compounds and water-mediated bridging residues of EGFRT790M.

Table 8 .
The calculated MM-GBSA/MM-PBSA binding free energies (Delta Total Energy) and their components of the studied compounds inside wild type of EGFR modified from EGFRT790M (PDB ID:6DUK) with the standard deviation.

Table 7 .
The calculated MM-GBSA/MM-PBSA binding free energies (Delta Total Energy) and their components of the studied compounds inside EGFRT790M (PDB ID:6DUK) with the standard deviation.

Table 9 .
ADMET properties of the estimated EGFRT790M inhibitors.