Identification of structural requirements and prediction of inhibitory activity of natural flavonoids against Zika virus through molecular docking and Monte Carlo based QSAR Simulation

Abstract There has been growing interest in the research of flavonoids due to their potential antiviral activities. Recently, some natural flavonoids have shown potential inhibitory activity against zika virus NS3–NS2B protease. In order to accelerate the drug discovery efforts using flavonoids, a Monte Carlo simulation-based QSAR method has been applied to find out the structural requirements for the inhibitory activity. The best QSAR model was obtained using SMILES descriptors and HSG descriptors with 1EC connectivity with the following statistical parameters: R 2 = 0.9569 and Q 2 = 0.9050 for the test set. The best model was further utilised for the prediction of inhibitory activity of some other natural flavonoids. Four flavonoids (amentoflavone, fisetin, isorhamnetin and theaflavin-3-gallate) have shown higher predicted inhibitory activity and further validated by performing docking analysis. This study may help in understanding and performing natural flavonoids-based drug discovery against zika virus.


Introduction
Zika virus (ZIKV) comprises genus flavivirus, a mosquito-borne single-stranded positive sense RNA similar to arboviruses. Zika virus including dengue virus, yellow fever virus, west nile virus and Japanese encephalitis virus belong to the family flaviviridae. It causes defects in brain and other severe neurological disorders like meningitis, meiningoencephilitis and Gillain-Barre syndrome. According to WHO, zika virus is the second-ranked serious and explosive viral disease (Qin and Shi 2014). Some studies demonstrate that zika virus infections can affect the development of neural precursor cell in animal studies (Fauci and Morens 2016). There is no anti-viral drug or any vaccine against zika virus available till date. As development of vaccine against ZIKV is a difficult task, the development of inhibitors of zika virus is important (Zhang et al. 2016).
The genome of zika virus is firstly translated into single-stranded RNA polyprotein precursor and it is divided in three structural proteins, i.e. C, M and E proteins and seven non-structural (NS) proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5) by viral proteases and host (Stephen et al. 2016). The NS1 protein of flavivirus is responsible for pathogenesis by interacting with immune proteins and thus, it is an efficient target for antiviral drugs. NS2A is a trans-membrane protein which inhibits the induction of interferon (Qamar et al. 2014;Leung et al. 2008), whereas NS2B is a cofactor of NS3 viral protease. NS3 protein is multi-domain protein and it is secured at C-terminal binding through membrane NS4A protein. NS5 protein is known to be a viral RNA-dependent RNA polymerase in replication of genome (Luo et al. 2015). The association of NS3 protein with NS2B integral membrane protein plays multiple roles in the virus life cycle; hence, it could be a good target for the discovery of antiviral drug. In our current studies, some flavonoids having zika inhibitory activities, published by Lim et al. (2017), were taken as a data-set for the Monte Carlo-based QSAR simulation for identification of structural requirements for zika inhibitory activity. The Monte Carlo QSAR simulation is an attractive approach in drug design (Toropov et al. 2005;Toropova et al. 2011) and applied successfully in modelling of HIV-1 integrase inhibitors (Veselinovic et al. 2014), HIV-1 protease inhibitor ), hepatitis C virus NS5B polymerase inhibitors (Worachartcheewan et al. 2015), phosphodiesterase 1 inhibitors , etc.

Results and discussion
The data-set comprising of 22 compounds (Supplementary Figure S1) was taken for analysis by Monte Carlo method and biological activity (logarithm of % inhibition) taken as endpoint in this process (Supplementary Table S1). Whole of the data-set divided into three types of sets, i.e. test, training and calibration sets containing 6, 10 and 6 compounds, respectively. Models were prepared with three random splits containing same number of test, training and calibration compounds as mentioned above which are shown in Table S2.

Model development
The QSAR models were first generated on training set for all the splits using different combination of SMILES-and GRAPH-based descriptors (HFG: hydrogen filled graph; HSG: hydrogen suppressed graph; GAO: graph of atomic orbitals). The most predictive combinations of threshold (T) and Nepoch (N) for the splits were performed from values 0-5 (for threshold); 0-40 (for Nepoch). Seven types of models for each split were generated with different combination of descriptors and total 21 models were generated using the three splits. Table S3 indicates the various models produced with different statistical qualities and the hybrid models were found to be better as compare to SMILES-based descriptors. The best model for each split was selected on the basis of internal predictability, i.e. higher value of R 2 and Q 2 for test set (Amin et al. 2016b).
The best model (SMILES descriptors and HSG descriptors with EC1 connectivity) with split1 is represented as follows: (training set: R 2 = 0.9503, Q 2 = 0.9186, s = 0.074, MAE = 0.055, F = 153, calibration set: In order to check the lucky and unlucky splits, two another splits were also generated as discussed above and models were developed with the same process used for split1. The best model (SMILES descriptors and HFG descriptors with EC0 connectivity) obtained with split2 is as following: where, R 2 is the correlation coefficient, Q 2 is the leave-one-out cross-validated correlation coefficient; s is the standard error of estimation; MAE is mean absolute error and F is Fischer ratio.
Nevertheless, model of split1 developed with SMILES descriptors and HSG descriptors with 1 EC connectivity was selected as a best model among all the models by considering different statistical parameters. This model showed the highest value of internal parameters (R 2 = 0.9569 and Q 2 = 0.9050) and also the value of Rm 2 av was found to be 0.7471 indicating its statistical robustness. Therefore, split1 model was found to be the best model and a detailed discussion about the interpretation of the model has been below.

Interpretation of the model and identification of structural features
The correlation weights (CW) of structural features/attributes (SAk) calculated using CORAL both for SMILES-and graph-based descriptors can be used for classification of these features according to their values from three probes of best and appropriate Monte Carlo model. These features could be divided into three categories: promoters of zika inhibitory activity (features with stable positive values), promoters of decrease or hinderers of zika inhibitory activity (features with stable negative values) and unstable features which are having both the positive and negative values for correlation weights and it has undefined role. The list of all structural attributes (SAks) with correlation weights in the three runs is shown in Table  S5 for the best model which is for split1.
According to presented results for split1 (the best among all the models) graph-based SAk which having stable positive values and therefore classified as promoters of zika inhibitory activity are EC1-C…5…, E C1-C…6…, EC1-C…9…, EC1-o…6… and EC1-O…3…. These SAks are showing the presence of sp 2 and sp 3 oxygen bonded with carbon. Some of the SMILES-based descriptors have stable positive values and therefore considered as promoters of inhibitory activity are c……….. (presence of sp 2 carbon), =………. (presence of double bond), =…2……. (presence of double bond with two rings), c…2…=… (presence of sp 2 carbon within a ring system), c…c…(…, c…(…c…, (…c…(… (presence of sp 2 carbons with branching), c…1…c…, c…c…1… (presence of sp 2 carbon with ring system) etc. The structural feature, o……….. (Figure 1) is very crucial in enhancing the zika inhibitory activity as the higher active compounds 1, 3, 11, 12 have this feature in their structures, whereas the compounds 19, 20, 21 and 22 are less active due to the absence of the feature. The higher active compounds 1, 3, 6 and 7 contain structural attribute, O…=…2…(oxygen doubly bonded to ring) and thus, O…=…2… is another important feature for the zika inhibitory activity in case of flavonoids. Higher occurrence of structural attributes like O…(…1…, (…O…(…, O…(…O… (presence of oxygen with branching) are also crucial for the activity. The global SMILES attribute like NOSP01000000 index (the presence of only oxygen atom, while nitrogen, sulphur and phosphorus are absent) is considered also promoter of zika inhibitory activity. This is evidenced by the least active compound 21 which has nitrogen in its structure. HALO00000000 index showing that presence of halogen is not necessary for required activity. Another important SAk, ATOMPAIR index, ++++O -B2== is showing the pairing of oxygen atom with double bond is important for the activity. On the other side, there are some SAks with stable negative values called as hinderers of zika inhibitory activity are c…1…(…, c…(…1. These attributes indicate that the presence of sp 2 carbon in a one ring system with branching is detrimental for the activity. This is indicated by the less active compounds 19, 20, 22 which have only phenyl ring in their structures. The glycone moiety is detrimental for the zika inhibitory activity as o…c…(… has stable negative values in the three runs of Monte Carlo optimization. Thus, the above-mentioned structural features as promoters/hinderers are important determinants for the zika inhibitory activity in case of flavonoids ( Figure 1).

Prediction of activity of some other natural flavonoids
Another set of 22 natural flavonoids ( Figure S3) was selected for the prediction of their activity against zika virus using the Monte Carlo-based QSAR model. Those selected structures were converted into SMILES format. We have used the best three models for the prediction of inhibitory activity of selected compounds. The predicted activities of those compounds are shown in Table S6.
From the presented data it can be observed that there is slight deviation in the predicted activities of those compounds with respect to three best models. Therefore, the average of all the three activities was taken for further processing. Among the 22 compounds the four compounds were selected on the basis of higher predicted activity from the models (predicted activity > 1.5 in Table S6) and these compounds were amentoflavone (cpd 2), fisetin (cpd 14), isorhamnetin (cpd 15) and theaflavin-3-gallate (cpd 22). The compound theaflavin-3-gallate (cpd 22) is the best predicted flavonoid against zika virus (predicted activity 2.045) as a number of structural features/promoters of anti zika activity are observed in its structure ( Figure S4). The four flavonoids were used for the docking analysis for further validation in the next step.

Docking simulation
The results from the docking runs of four compounds are summarised in Figure S5. To validate the docking analysis, the co-crystalised ligand was redocked and compared with its inbound form with protein (Amin et al. 2016a). The redocked form of the ligand is overlapped with the inbound form ( Figure S6). Among the selected four compounds, compound 2 (Amentoflavone) and 22 (Theaflavin-3-gallate) have shown higher binding affinities from docking analysis. It is interesting to note that all the selected compounds have better binding affinities than redocked co-crystallised ligand and the important binding residues are shown in Table S7. The compound 2 has highest binding affinity −9.0 kcal/mol. The TYR161 residue of NS3 protein forms three stable π-π interactions with the chromene and phenyl part of the compound. Compound 2 also makes hydrogen bonds with LYS54 and SER135 residues of NS3 protein. The ALA132 residue of NS3 protein constitutes sigma-π interaction with the phenyl ring of the compound. These observations also suggest that presence of hydroxyl groups and ring system may increase significant binding of the molecule with the NS2B-NS3 protease of the zika virus. Compounds 14 and 15 showed binding affinities −7.7 and 7.4 kcal/ mol, respectively. SER135 and TYR161 of NS3 protein are the important binding residues for the H-bond and π-π interactions, respectively, for the compound 14. HIS51 and GLY151 are forming H-bond interactions in case of compound 15. Compound 22 has significant higher binding affinity −8.4 kcal/mol and the 2D visualisation of the drug receptor interaction ( Figure S7) suggests hydrogen bond interactions with ASP129 (NS3 protein), ASP83 (NS2B protein), ASN152 (NS3 protein), GLY153 (NS3 protein), PHE84 (NS2B protein) and VAL36 (NS3 protein) amino acid residues. These observations suggest that the presence of hydroxyl groups is suitable on these sites for better binding affinity. The chromene parts of the compound also make important cation-π as well as π-π interaction with LYS54 and TYR161, respectively, of NS3 protein.

Conclusions
Molecular modelling can play an important role in drug discovery from natural products (Jasamai et al. 2015;Sosa et al. 2016;Ahmed et al. 2017). Moreover, natural compounds have immense potential to provide useful lead information for the drug discovery. Therefore, in the present communication we have applied Monte Carlo-based QSAR simulation on some natural flavonoids for their zika virus inhibitory activities. The obtained models were statistically significant, well validated and can be used to predict zika virus inhibitory activity of other flavonoids. Natural flavonoids like amentoflavone, fisetin, isorhamnetin and theaflavin-3-gallate showed interesting virus inhibition from the modelling study. Further, the compounds also showed good binding affinities against the NS2B-NS3 protease protein in docking simulation. This information could be used to design new compounds with better inhibitory activity against zika virus in future.