Fragment-based investigation of thiourea derivatives as VEGFR-2 inhibitors: a cross-validated approach of ligand-based and structure-based molecular modeling studies

Abstract Angiogenesis is mediated by the vascular endothelial growth factor (VEGF) that plays a key role in the modulation of progression, invasion and metastasis, related to solid tumors and hematological malignancies. Several small-molecule VEGFR-2 inhibitors are marketed, but their usage is restricted to specific cancers due to severe toxicities. Therefore, cost-effective novel small molecule VEGFR-2 inhibitors may be an alternative to overcome these adverse effects. Here, a set of thiourea-based VEGFR-2 inhibitors were considered for a combined fragment-based QSAR technique, structure-based molecular docking followed by molecular dynamics simulation studies to acquire insights into the key structural attributes and the binding pattern of enzyme-ligand interactions. Noticeably, amine-substituted quinazoline phenyl ring and a higher number of nitrogen atoms, and the hydrazide function in the molecular structure are crucial for VEGFR-2 inhibition whereas methoxy groups are detrimental to VEGFR-2 inhibition. The MD simulation study of sorafenib and thiourea derivatives explored the significance of urea and thiourea moiety binding at VEGFR-2 active site that can be utilized further in the future to design molecules for greater binding stability and better VEGFR-2 selectivity. Therefore, such findings can be beneficial for the development of newer VEGFR-2 inhibitors for further refinement to acquire better therapeutic efficacy. Communicated by Ramaswamy H. Sarma


Introduction
Angiogenesis refers to the generation of new blood vessels for the supply of nutrients and oxygen required by a tumor to grow rapidly (Carmeliet, 2005).Angiogenesis is triggered by pro-angiogenic mediators at high levels by tumor cells (Viallard & Larriv� ee, 2017).This phenomenon contributes to the growth and development of cancer as well as metastasis (Carmeliet, 2005;Frezzetti et al., 2017).Again, the irregularly shaped, unorganized blood vessels generated during angiogenesis are hemorrhagic, tortuous and leaky which may result in high interstitial pressure (Carmeliet, 2005;Ferrara, 2005).A prime mediator of this angiogenesis process is the vascular endothelial growth factor (VEGF) which is a homodimeric glycoprotein (Carmeliet, 2005).VEGF is induced by several characteristics of tumors, most importantly hypoxia (Ferrara, 2005).The production of this key mediator along with certain growth factors produces an 'angiogenic switch' that leads to the development of a newer vasculature around and within the tumor.Therefore, the growth rate becomes exponential and the tumor vasculature possesses functional and structural abnormalities (Ferrara, 2005).VEGF has been found to give rise to several other abnormalities and diseases as well.Overexpression of VEGF has been found to play a key role in the progression of certain malignancies and several types of major solid tumors including ovarian cancer (Chen et al., 2018), breast carcinoma (Bender & Mac Gabhann, 2013), lung cancer (Chen et al., 2014), gastric carcinoma (Partyka et al., 2012), cervical cancer (Meng et al., 2016), colon cancer (Lee et al., 2010), pancreatic cancer (Shi et al., 2016) and head and neck cancer (Formento et al., 2009).The overexpression or high serum level of VEGF has often been found in lung cancer conditions (Viallard & Larriv� ee, 2017).VEGF is produced both by macrophages as well as cancer cells during breast carcinoma, therefore participating in breast cancer progression (Song et al., 2018).Nevertheless, it is a major player in leukemia and lymphoma (de Bont et al., 2002;Yang et al., 2015).Overexpression of VEGF in connection to angiogenic events such as enhanced bone marrow vascularization has a strong correlation with acute myeloid leukemia (Amin et al., 2017;de Bont et al., 2001).For such crucial contributions, VEGF acts as a rational therapeutic target to treat and prevent various cancers.
The human VEGF/VEGFR system comprises five growth factors [namely VEGF-A, VEGF-B, VEGF-C, VEGF-D, as well as the placental growth factor (PGF)], three related VEGF receptors (i.e., VEGFR-1, VEGFR-2 and VEGFR-3) along with two non-protein kinase co-receptors, namely neuropilin-1 (NRP-1) and neuropilin-2 (NRP-2) (Wang et al., 2020).In this entire system, it is mainly the two prime receptors, namely VEGFR-1 and VEGFR-2 which are found to control angiogenesis as well as vascular permeability.However, VEGFR-2 serves as the prime signal transducer during angiogenesis (Wang et al., 2020).When the VEGF binds to the two Ig-like subdomains (i.e., IgD2 and IgD3) of VEGFR-2, dimerization of the receptor is induced.This leads to the activation of the tyrosine kinase and subsequently, becomes trans-autophosphorylated and finally initiates the cascades of intracellular signaling which ultimately triggers and modulates various aspects of angiogenesis (Wang et al., 2020).Human VEGFR-2 is a kinase domain receptor (KDR) gene or a kinase insert domain-containing receptor gene.It is mainly expressed in the vascular endothelial cells and encodes a total of 1356 amino acids (Wang et al., 2020).Certain anti-VEGF drugs were developed and approved for treating some advanced cancers, either alone or combined with chemotherapy currently.Again, several anti-angiogenic drugs including axitinib, sunitinib and sorafenib approved by the FDA have delivered good effectivity against VEGFR-2.However, these small molecules possess some serious drawbacks (https://www.cancerresearchuk.org;Ramjiawan et al., 2017).Apart from that, anticancer therapy by these existing VEGFR-2 inhibitors is so costly that is beyond the limit of common people throughout the world.Thus, there is a tremendous requirement for alternative inhibitors of VEGFR-2 for safe and effective therapeutic development.
Acknowledging the fact that molecular fragments exhibit immense control over the biological activities of drug molecules, it is not surprising that the fragment-based drug designing approaches for designing and discovery of drugs have met with tremendous success over time (Murray et al., 2012;Romasanta et al., 2018).Therefore, decoding the VEGFR-2 inhibitors by such fragment-based approaches is a necessary step to identify the crucial structural attributes responsible for effective ligand-receptor interactions for potent and selective VEGFR-2 inhibition.In this context, the quantitative structure-activity relationship (QSAR) is a useful approach that allows the correlation of the quantifiable chemical properties of bioactive molecules with their biological activities (Banerjee et al., 2020a;2020b, Adhikari et al., 2010)).Therefore, in this current study, a series of thioureabased VEGFR-2 inhibitors have been explored using a combined cross-validating fragment-based molecular modeling studies followed by molecular docking, and molecular dynamic (MD) simulation study to identify the important structural contributors and their significance in the VEGFR-2 inhibition that will allow further development of potent and selective VEGFR-2 inhibitors for effective therapeutic development.

Dataset curation and preparation
A set of 98 thiourea-based VEGFR-2 inhibitors were collected (https://www.bindingdb.org/;Supplementary Table S1).The 'Prepare Ligands for QSAR' module of the Discovery studio (DS) 3.0 software was used to prepare the molecules for the QSAR study (Discovery Studio 3.0, 2011, available at http://www.accelrys.com).The VEGFR-2 inhibitory activities (IC 50 in nM) of these molecules were transformed into their respective negative logarithmic values (pIC 50 ) that are required as the dependent variable for the molecular modeling studies.

Calculation of molecular features and pre-treatment
The PaDEL descriptor (Yap, 2011) software was utilized to extract a total of 8272 molecular descriptors including 1444 1D and 2D structural properties along with the PubChem (881 bits), Substructure (307 bits), Klekota-Roth (4860 bits) and atom-pair (780 bits) fingerprint features for these compounds.
Next, a dataset pretreatment was carried out to eliminate the highly correlated and no-variance independent variables.This process yielded a total of 952 descriptors with a correlation > 0.90 and a co-variance < 0.001 (Small QSAR Tools of DTC Lab, 2021, available at http://teqip.jdvu.ac.in/QSAR_Tools).

Division of dataset
The dataset division is an essential process for both internal and external validation purposes required to establish the reliability and robustness of the QSAR model (Amin et al., 2019).
Here, the k-means cluster analysis (k-MCA) method was used to club these compounds into five clusters.The test set compounds from each cluster were then randomly chosen.During the selection of test set compounds, a 3:1 ratio was preserved for the training set and the test set populations.Finally, 74 molecules were populated in the training set (N Train ¼ 74) and the remaining 24 compounds were kept in the test set (N Test ¼24).

Partial least square (PLS) analysis
The partial least square (PLS) is a regression-dependent method that follows the function of the linear straight line to correlate multiple independent variables with a single dependent variable (Adhikari et al., 2010).Here, the genetic function algorithm (GFA)-guided best subset selection technique (Banerjee et al., 2022a) was used on the pre-treated training set to develop PLS  models that were validated both internally as well as externally.
The best PLS model was considered depending on predictive ability (R 2 Pred ) and explanatory capacity (R 2 and Q 2 ) for the biological variance.Additionally, other statistical parameters including the r 2 m metrics and concordance correlation coefficient (CCC) (Small QSAR Tools of DTC Lab, 2021, available at http://teqip.jdvu.ac.in/QSAR_Tools) were also evaluated for model acceptability, robustness and reliability.

Hologram QSAR (HQSAR) study
HQSAR adopts specified molecular fingerprints of varying lengths (also known as molecular holograms) as independent variables to equate them with the respective biological property of compounds with the help of the partial least square (PLS) method (Yu et al., 2015) (Equation 1).
In Equation 1, BA i represents the biological response of the i th molecule, L indicates the length of the hologram, X ij symbolizes the occupancy value of the molecular hologram of the i th molecule at the position j, whereas C j is the coefficient for the bin j derived from the PLS analysis and Z is the constant (Myint & Xie, 2010).Here, the HQSAR study was conducted in SYBYL-X 2.0 software (SYBYL-X 2.0, 2012, available at http://www.certara.com)with different associations of fragment distinction and fragment generation properties on the training set molecules.A fragment distinction parameter value between 1 and 10 along with the aggregation of fragment distinction parameters such as atom (A), bond type (B), connections (C), donor-acceptor (DA) features and the hydrogen (H) (Banerjee et al., 2022b) were used.The constructed HQSAR models were validated both internally on the training set as well as externally on the test set.Depending on the highest Q 2 , the best model was selected (Yu et al., 2015).
The HQSAR models are visualized as color-coded fragmentbased diagrams where each color encodes a specific contribution range of the fragment to the biological activity of the compound.

Classification-based molecular modeling studies
The classification-based QSAR studies are conducted to correlate the structural attributes with different groups/classes to predict each class instance correctly (Adhikari et al., 2021).Before conducting the class-based QSAR study, the transformation of the numerical dependent variable data into a binarized manner was done for model development and validation.Here, a classification threshold of 30 nM was set for the binarization of the dependent variable.Compounds with an IC 50 value � 30 nM were considered as higher active, whereas molecules with IC 50 value > 30 nM were marked as lower active VEGFR-2 inhibitors.

Potential structural alert analysis
The particular molecular substructures/fragments of a bioactive molecule that are responsible for any biological response of that compound can be referred to as a potential structural alert (Ferrari et al., 2013).The potential Active structural alerts for these thiourea derivatives are curated using the Python-based SARpy (Structure-Activity Relationships in Python) tool (Ferrari et al., 2013;Tang et al., 2021).This process uses a recursive algorithm on the training set compounds to generate structural fragments of a given length that provides potential structural alerts in the form of SMILES notation depending on their likelihood ratio (LR) (Equation 2; Ferrari et al., 2013;Yang et al., 2017).LR values range from 1 to infinity (inf).The LR assumes a value of inf when the structural alerts are noticed only in the positive inspections (Yang et al., 2017).
For the structural alert analysis, various atom fragments were fixed between 2 and 10 with 3 minimum occurrences.An 'Optimal' precision was conducted, and simultaneously the target activity class was denoted as 'Active'.A potential 'Ruleset' was thus created.The curated structural alerts were validated on the test set molecules externally.The predictive ability and reliability of this 'Ruleset' were assessed by inspecting the sensitivity (Se), specificity (Sp), precision (Pr) and accuracy (Acc) scores for both the training and test sets.

Bayesian classification study
The Bayesian classification modeling approach is generated on the application of Bayes' theorem (Equation 3) of predicting probabilities with given data (Chen et al., 2011).
In above Equation 3, P(AjB) denotes posterior probability, P(BjA) denotes likelihood, A represents the model or hypothesis, B represents the experimental data, P(A) indicates the prior belief and P(B) indicates the evidenced data.
The 'Create Bayesian Model' protocol in DS 3.0 software (Discovery Studio 3.0, 2011, available at http://www.accelrys.com) was used for the construction of the Bayesian classification model as well as internal and external cross-validation performance.Several foundational molecular properties such as lipophilicity (ALogP), number of rotatable bonds (nRB), molecular weight (MW), number of rings (nR), number of aromatic rings (nAR), number of hydrogen bond donor groups (nHBD), number of hydrogen bond acceptor groups (nHBA) and molecular fractional polar surface area (MFPSA) combined with extended class fingerprint feature (ECFP_6) were calculated for the Bayesian classification model construction.To validate the quality and to assess the predictability and performance of the model, receiver operating characteristics (ROC) statistics along with the sensitivity (Se), specificity (Sp), precision (Pr) and accuracy (Acc) scores were estimated during both internal and external cross-validation.

Molecular docking study
For molecular docking analysis, the sorafenib-bound crystal structure of VEGFR-2 (PDB ID: 4ASD) was used.The preprocessing of the protein molecules including the addition of hydrogen atoms and crystalline water molecule removal was performed in the CCDC GOLD (Genetically Optimized Ligand Docking) suite (Jones et al., 1997;Tejera et al., 2020).Default settings for the active cavity search and the 'ChemScore kinase' template were used with the ChemPLP scoring function for the molecular docking of the ligand.The scoring function helps to make the rank of the various binding modes (Equation 4).

GOLD FITNESS
In Equation 4, S hb-ext indicates the score of the protein-ligand hydrogen bonding, S vdw-ext represents the score of the protein-ligand van der Waals interaction, S hb-int refers to the intramolecular hydrogen bond contribution related to the Fitness in the ligand, and S vdw-int is the intramolecular strain contribution in the ligand (Verdonk et al., 2003).Finally, the genetic algorithm (GA) search parameter was set to be slow (most accurate) which equates to nearly 100,000 GA operations (Jones et al., 1997).

Molecular dynamics (MD) simulation study
The molecular dynamic (MD) simulations for the VEGFR-2-sorafenib and the VEGFR-2-compound 1 were performed by the help of Desmond molecular dynamic simulation package of Schrodinger Suite (Schrodinger Suite, 2019).The protein for the complexes were prepared using the Protein Preparation protocol from the Maestro v12.1 software of the Schrodinger suite (Schrodinger Suite, 2019).The System Builder module was used to develop the system and the complex was centered in an orthorhombic box.The TIP3P water model was used to fill up the system with a periodic boundary conditions at a distance of 10 Å distance between the box edge and the protein.The size and volume of the system were adjusted depending upon the size of the complex and was neutralized by introducing Na þ randomly.Beside water molecules, Na þ and Cl À ions were also introduced in a 0.15 M concentration to simulate the complexes in a buffer system (Gopinath & Kathiravan, 2021;Mohankumar et al., 2020).The OPLS_2005 forcefield was used to minimize the solvated system followed by the relaxation using the default protocol.Finally, 100 ns MD simulation studies were carried out at 100 ps time step at 1 atm pressure and a temperature of 300K using the Nose-Hoover (Gopinath & Kathiravan, 2021;Martyna et al., 1992) and NPT-ensemble method (Gopinath & Kathiravan, 2021;Shinoda & Mikami, 2003).Furthermore, the MM/GBSA analysis (Kumari et al., 2014) for the simulated complexes was also performed to evaluate the binding stability of the complexes during the MD simulation.

Partial least square (PLS) model
To develop the PLS model for these thiourea derivatives, a two-step feature selection process was used to identify significant molecular features.Primarily, the pre-treated training set compounds were used for GFA runs followed by using the best subset selection process with the help of the features obtained from the genetic algorithm.The best set of features that were considered to build the final PLS model (Equation 5) was selected based on their Q 2 and R 2 pred values shown during the best subset selection process.5indicates the positive influence of the molecular properties such as the nN, SHBint3, MATS2s and GATS6s as well as the fragment KRFP504 while suggesting the negative influence of nF9Ring and PubchemFP12 features on the activity of the dataset molecules.The observed versus predicted biological response of the training and test sets for the PLS model (Equation 5) is provided in Figure 2A.Also, the observed and predicted activity and the final model are provided in Supplementary Table S2-S4.

Hologram QSAR (HQSAR) model
To build the Hologram QSAR (HQSAR) model, different combinations of the key fragment distinction parameters (A, B, C, H, Ch and DA) were investigated which led to the 50 different HQSAR models, as shown in Supplementary Table S5.Among these HQSAR models, 5 models (HQSAR model 8, 12, 26, 27 and 41) were found to fulfill the internal cross-validation criteria (Q 2 > 0.50).Out of these 5 HQSAR models, model-8, developed using the Atoms (A) and connections (C) with the highest Q 2 value of 0.537 is preferred to be the best model.HQSAR Model 8 was further optimized using different atom counts (Table 2) to check if a model with an even higher Q 2  value could be obtained or not.Among these, model-8d with 6 components, an atom count of 4 to 7, and a length of 257 was found to possess the highest Q 2 value (Table 2).Model-8d also showed an R 2 of 0.866 for the training set and an R 2 pred of 0.562 for the test set compounds.The observed as well as predicted activities of the molecules according to the best HQSAR model (Model-8d) are shown in Figure 2B and Supplementary Table S4.
In an HQSAR model, the color coding of fragments is very relevant as it helps to determine the significant contribution of the definitive fragment to the VEGFR-2 inhibition of the compounds.Here, in this study, the bad contributions are indicated by red (< À 0.277), red-orange (À 0.277 to À 0.166) and orange (À 0.166 to À 0.111) fragments while white-colored fragments signify neutral contributions (À 0.111 to 0.119) to the potency of the compounds.Moderate contributions are indicated by yellow (0.119-0.179), whereas green-blue fragments (0.179-0.298) and green fragments (�0.298) signify good contributions.
The fragment contribution contour for the most active compounds 1 and 2 depict the positive contribution of the quinazoline moiety (Figure 3A and B, respectively) where one quinazoline hydrogen and cyclic carbon atom are shown in green color while suggesting the nearby carbon and hydrogen atoms as moderately positive contributors for VEGFR-2 inhibition (Figure 3A).Also, three of the aromatic carbons of the unsubstituted phenyl ring that connect the thiourea and the quinazoline groups are found to have a good contribution to VEGFR-2 inhibition.Two of the hydrogen atoms of the connecting phenyl ring are suggested to be good and moderate contributors to biological potency.Additionally, the oxygen atom from the terminal trifluoromethoxy group of compound 1 has been identified as a moderately negative contributor to VEGFR-2 inhibition (Figure 3A).
Interestingly, for the second most active molecule (compound 2), the 4-methyl phenyl-substituted quinazoline moiety is suggested to deliver a good to moderately positive contribution to the activity of the compound.The carbon atom of the 4-methyl phenyl connecting with the quinazoline nitrogen atom is suggested as a good contributor to VEGFR-2 inhibition (Figure 3B).Also, both the ureido oxygen and thioureido sulfur atoms have been identified as important contributors to the activity.Additionally, the carbon and the methyl group of the ethylimine function between the thiourea and the phenyl ring of compound 2 are indicated as a good contributor to the VEGFR-2 inhibitory activity of the molecule.Therefore, substitutions at the green fragments of these two highly potent inhibitors can produce a detrimental effect on their VEGFR-2 inhibition.Interestingly, no significant good or bad influencers are found in the structure of the least active compound 83 except one of the quinazoline heterocyclic nitrogen atoms that have been identified as a moderately negative contributor to the activity of the compound (Figure 3C).Furthermore, regarding one of the second least active compounds (Compound 81), one of the isoindolinone and phenyl ring hydrogen atoms have been identified as moderately harmful and bad contributors, respectively, while indicating the highly negative influence of the thiourea carbon atom for the activity of the compound (Figure 3D).

Structural alert analysis
The SARpy-mediated structural alert analysis for the active dataset compounds can be able to identify four sub-structural features as the Active Ruleset.12-13 and 23-24 containing such a feature in their structures are potent inhibitors of VEGFR-2 (IC 50 ranges between 3.5 and 24 nM) (Figure 4).

Bayesian classification model
The constructed Bayesian classification model was statistically validated as disclosed by both the internal and external validation metrics.This model delivered a 5-fold ROC score (ROC 5-CV ) of 0.855 and a leave-one-out ROC (ROC LOO ) of 0.848 for the training set compounds including a ROC score of 0.891 for the test set molecules.Also, the constructed Bayesian classification model has yielded 85.7% Se, 90.6% Sp along with 78.3% Pr and 89.2% Acc for the training set, whereas an 85.7% Se, 70.6% Sp with 54.5% Pr and 75% Acc for the test set population.Additionally, the estimated MCC values for the Bayesian classification model are found to be 0.743 and 0.514 for the training and the test set molecules, respectively.The ROC plots for the training and the test set molecules have been disclosed in Supplementary Figure S1.
Regarding the Bayesian classification model-mediated molecular fragments with positive and negative influence on the VEGFR-2 inhibition of these thiourea derivatives, it is observable that the 20 good and 20 bad structural attributes (Supplementary Figures S2 and S3) can be clustered into a few specific groups of sub-structures.The top 5 good and top 5 bad Bayesian classification model-generated molecular fragments are provided in Figure 5.

Analysis of the VEGFR-2 inhibitor binding site
Regarding the inhibitor binding site of the VEGFR-2 (PDB ID: 4ASD, Figure 6A-D), it is noticeable that the ligand binding site is similar to a tunnel-like pocket highly hydrophobic in nature as well as suitable for hydrophobic contact with amino acid residues such as Phe1047 for stable binding interaction (Figure 6B).Hydrogen bond acceptor regions are also observed near the amino acid residues Glu885, Val848, Phe918, Cys919 and Lys920 where electronegative charges are noticeable in a small amount.This suggests the possibility of hydrogen bond formation near those active site amino acid residues (Figure 6A, C).Additionally, solvent exposure is found between the Leu840, Phe918, Cys919 and Lys920 residues at the active site (Figure 6D).

Molecular docking study
The molecular docking analysis was conducted using the sorafenib-bound VEGFR-2 (PDB ID: 4ASD) (RCSB Protein Data Bank, https://www.rcsb.org,accessed on 5 November 2022) by using the CCDC GOLD suite (Jones et al., 1997).The alignment of the crystal structure-bound and redocked sorafenib along with the interactions of sorafenib at the VEGFR-2 active site are provided in Figure 7A and B, respectively.
From the VEGFR-2-bound sorafenib (Figure 7B), it is noticed that the phenyl ring present between the ureido and pyridine groups forms a p-p stacking interaction with Phe1047 residue where several stable hydrogen bond interactions are produced by the ureido function.The carbonyl oxygen atom forms a hydrogen bond interaction with the amino acid residue Asp1046, whereas the two amide hydrogen atoms interact with the Glu885 residue.Additional hydrogen bonds are also noticed between the terminal methyl amino group and Cys919 as well as the pyridine heterocyclic nitrogen atom and Phe918 amino acid residues.Moreover, a halogenic interaction is also formed between the terminal trifluoromethyl fluorine atom and the carboxamide oxygen that is present between Cys1045 and Ile1044 residues.A similar interaction is noticed for the thiourea function of the most active compound 1 (GOLD PLP.PLP Score ¼ À 76.5624) (Figure 7C) where the sulfur atom has interacted with Asp1046, and the amide functions contact with Glu885 residues by forming hydrogen bond interaction.The structural alert c1n(ccn1)C (LR ¼ 5.05) (Figure 4) also indicates the importance of the ureido group in the structural alert analysis.Moreover, in this molecular docking interaction study, the thiourea function is found crucial for interacting at the VEGFR-2 binding site.The 7-phenylquinazolin-2-amine moiety is also suggested to be a potential structural alert for potent VEGFR-2 inhibition in the structural alert analysis.Again, the 7-phenylquinazolin-2-amine moiety is noticed to have multiple interactions at the binding site of VEGFR-2.The amine nitrogen attached to the quinazoline moiety forms a hydrogen bond interaction with Leu840 residue while the phenyl ring of the 7-phenylquinazolin-2-amine moiety forms a p-p stacking interaction with the side chain phenyl ring of Phe1047.Furthermore, similar to sorafenib (Figure 7B), the terminal trifluoromethyl fluorine atom is also found to interact with His1026 residue at the binding site (Figure 7C).It is interesting to note that the amine-containing quinazoline ring may be suggested as a positive influencer, whereas the methoxy-substituted quinazoline ring may have a detrimental effect on VEGFR-2 inhibition through the Bayesian classification study.For the least active compound 83 (GOLD PLP.PLP Score ¼ À 76.6616), it is noticed that the binding of the molecule is opposite to the binding of sorafenib as well as compound 1, which also contains a quinazoline ring (Figure 7B, C vs D) and thus, may form a very less contact at the binding site between the heterocyclic oxygen atom of compound 83 and Phe918 residue.This also validates the findings of molecular docking as well as fragment-   based molecular modeling studies and vice versa.Additionally, it is visible that the amine group attached to the quinazoline ring of compound 1 which has a higher number of nitrogen in its structure interacts at the binding site, whereas such interaction is found absent for compound 83 which possesses a lower number of nitrogen atoms in its structure.This justifies the positive correlation of descriptor nN with the VEGFR-2 that has been adjudicated by the PLS model (Equation 5).

Analysis of the molecular dynamic (MD) simulation study
The molecular dynamics (MD) simulation study for a 100 ns run has been performed for both the VEGFR-2-bound sorafenib and the best-docked pose of the most active compound of this dataset (Figure 8) to understand the binding patterns along with the dynamic behavior of these compounds at the binding site.The RMSD values for the protein, and ligands are shown in Figure 8A, D.
Both the VEGFR-2 inbound Sorafenib (PDB ID: 4ASD) and compound 1 showed low RMSD values (< 3 Å) compared to the protein, whereas the sorafenib was found to exhibit a comparatively lower RMSD compared to the VEGFR-2 protein (Figure 8A), whereas the most active compound 1 displayed an RMSD value similar to the receptor (PDB ID: 4ASD) (Figure 8B), suggesting a stable binding pattern not only for the sorafenib but also for the most active compound 1.The RMSF plots for the VEGFR-2 also showed a lower fluctuation of the amino acid residues (Figure 8C, D).However, the calculated radius of gyration (R g ) for the VEGFR-2-sorafenib and VEGFR-2-compound 1 complexes (Figure 8E, F, respectively) showed a comparatively higher gyration for the protein during the simulation of the VEGFR-2-compound 1 complex.For the VEGFR-2 and compound 1 complex, it was also noticed that compound 1 showed a better stability at the VEGFR-2 binding site after the 40 ns timeframe (Figure 8B) which was also noticed in the R g analysis where the protein was found to have less gyration after 50 ns timeframe (Figure 8F).This backends the high inhibitory activity of the compound 1 against VEGFR-2.Moreover, the MM/GBSA analysis also delivered quite stable binding of both sorafenib (DG binding ¼ À 102.27 kcal/mol) and for compound 1 (DG binding ¼ À 60.01 kcal/mol).Regarding the hydrogen bond (H-bond) and interactions at the active site of VEGFR-2 (PDB ID:4ASD), it was noticed that the ureido carbonyl oxygen atom of sorafenib showed a 97% H-bond occupancy with Asp1046 residue, whereas the sulfur atom from the thiourea moiety of compound 1 was able to occupy Asp1046 with a 33% of the time throughout the MD simulation (Supplementary Figure S4).On the other hand, the ureido amide functions were found to have 92-99% H-bond occupancy with Glu885 amino acid residue, whereas the thiourea amide groups were able to occupy 98-99% H-bond occupancy with Glu885 residue.Interestingly, the thioureido function including the sulfur atom was suggested as a positive contributor for the VEGFR-2 inhibition of these compounds by the SARpy analysis (Figure 4).The slightly higher H-bond occupancy of the thiourea amide functions compared to ureido amide groups of sorafenib may be due to a lower occupancy of the Asp1046 residue by the thiourea sulfur atom compared to the ureido oxygen atom that allowed the thiourea amide groups to form more stable interaction with Glu885 residue.
During the analysis of possible interactions of the simulated compounds at VEGFR-2 (PDB ID: 4ASD) active site, it was also noticeable that the compound 1 interacted with a higher number of amino acid residues (Figure 9A) compared to sorafenib (Figure 9B) though having less contacts with most of those residues (Figure 9C vs D).Both sorafenib and compound 1 showed higher frequency of contacts with Glu885, Ala866, Val916, Cys919, Leu840, Val848, Val916 and Asp1046 residues (Figure 9C, D).The most intriguing fact is that though sorafenib showed a greater number of interaction frequency with Asp1046 and Val916 residues, the compound 1 showed interactions with His1046 amino acid residue that showed a highly frequent contact after 40 ns timestamp.This may indicate that the greater binding stability of the compound 1 at the binding site of VEGFR-2 (PDB ID: 4ASD) might be the effect of the Hbond and hydrophobic interactions occurred between His1026 and compound 1 (Figure 9B, D).
While analyzing the binding mode of both sorafenib and compound 1 at the active site of VEGFR-2 (PDB ID: 4ASD) (Figure 10), it was noticed that one of the phenyl rings of sorafenib interacted with Asp1046 residue at the 0 ns (Figure 10A) and 100 ns (Figure 10C) time stamp, where such bond was not observable at 50 ns (Figure 10B).
On the other hand, despite having similar hydrophobic contact of compound 1 with Asp1046 residue at 50 ns (Figure 10E), no such interaction was seen at 0 ns and 100 ns (Figure 10D and F,respectively).Instead of that, the bromine and trifluoromethoxy substituted phenyl ring of compound 1 were found to interact with the previously mentioned His1026 residue via p-p stacking interaction at the 100 ns time (Figure 10F).This signifies the importance of the thiourea group and the bromine and trifluoromethoxy substitutions.At the bromine and trifluoromethoxy group substitutions of compound 1, due to their comparatively higher fluctuation at the binding site during the simulation (Supplementary Figure S5A) compared to the trifluoromethyl and the chlorine group substitutions of sorafenib (Supplementary Figure S5B) may have aided the movement of the phenyl ring which led to the formation of such p-p stacking interaction with His1026.This contribution of such bromine and trifluoromethoxy group substitutions led to a more stable binding of compound 1 at VEGFR-2 binding site.

Conclusion
The importance of the VEGF/VEGFR systems has been established as a prime mediator of angiogenesis associated with a wide variety of solid tumors and hematological malignancies.Therefore, over time a list of VEGF and/or VEGFR inhibitors has been approved.However, their use is restricted only to treat specific cancer conditions because of the long list of associated adverse effects that may be the major problem apart from cost-effectiveness.Thus, this landscape urges the development of newer VEGF/VEGFR inhibitors devoid of a high number of therapy-related adverse effects.This study has shown several crucial structural aspects of these inhibitors along with their binding pattern with VEGFR-2.Interestingly, it has been observed that the amine-substituted quinazoline phenyl ring is a positive regulator for good VEGFR-2 binding, required for the high inhibitory potency of these compounds.Similarly, the presence of a higher number of nitrogen atoms can also produce better interactions at the active site of VEGFR-2, and therefore, can pertain to specific and potent VEGFR-2 inhibition.On the other hand, methoxy groups were suggested as a detrimental feature of these compounds for VEGFR-2 inhibition.The thiourea function was found to have similar functions to that of the urea moiety of sorafenib for effective VEGFR-2 inhibition.Additionally, the hydrazide function has been suggested as a potential factor for VEGFR-2 inhibition.The MD simulation study of sorafenib and thiourea derivatives has also described the significance of urea and thiourea moiety binding at VEGFR-2 active site that can be utilized further in the future to design molecules with greater binding stability and better VEGFR-2 selectivity.Therefore, the findings of this current study of these thiourea-based VEGFR-2 inhibitors can be useful for the identification, design and development of newer VEGFR-2 inhibitors for further refinement of VEGFR-2 inhibitors for better therapeutic efficacy.

Figure 1 .
Figure 1.VEGF inhibitors as drug candidates available in market.

Figure 2 .
Figure 2. The observed versus predicted VEGFR-2 inhibitory activity for the training set and the test set as obtained from the (A) PLS, (B) HQSAR model.

Figure 4 .
Figure 4. Higher active dataset compounds with the potential structural alerts obtained from the Active Ruleset of the SARpy analysis.

Figure 6 .
Figure 6.The binding site of VEGFR-2 inhibitors (PDB ID: 4ASD) in term of (A) hydrogen bond donor-acceptor, (B) hydrophobicity, (C) atomic charge and (D) solvent accessibility with important amino acid residues.

Table 1 .
Marketed VEGF inhibitors and their common, less common and rare side effects.

Table 2 .
The summary of the HQSAR models with different atom count.