Virtual screening of epalrestat mimicking selective ALR2 inhibitors from natural product database: auto pharmacophore, ADMET prediction and molecular dynamics approach

Abstract Epalrestat is the only effective aldose reductase (ALR2) inhibitor available in the market for the treatment of diabetic neuropathy. Clinical effectiveness of epalrestat in diabetic neuropathy encouraged us to develop some more ALR2 inhibitors with a better therapeutic profile. Herein, we utilized the pharmacophoric features of epalrestat to search some novel ALR2 inhibitors from an InterBioScreen database of natural compounds. ADME and PAINS filters were applied to provide drug-likeness and to remove toxicophores from the screened hits. The pharmacophoric features of 4-hydroxy-2-nonenal (HNE), a well-known substrate of ALR1, were also explored to identify selective ALR2 inhibitors. The structure-based analysis was then adopted to find out the molecules showing interactions with ALR2 which are crucial for their therapeutic activity. These interaction patterns and binding modes were compared with that of epalrestat. Molecular dynamics (MD) analysis was also carried out to get more insight into the interactions of screened hits in the catalytic domain of ALR2. Additionally, the top hits were docked and simulated with aldehyde reductase (ALR1) to determine their selectivity for ALR2 over ALR1. Overall, five hits including STOCKIN-44771, STOCKIN-46041, STOCKIN-59369, STOCKIN-69620 and STOCKIN-88220 were found to possess a good therapeutic profile in terms of key interactions, binding energies and drug-likeness. Two hits, STOCKIN-46041 and STOCKIN-59369, were identified as the most selective ALR2 inhibitors when assessed their selectivity profile. Communicated by Ramaswamy H. Sarma


Introduction
Diabetes mellitus (DM) is an extremely prevalent endocrine disease that is associated with various microvascular and macrovascular complications (Association Diabetes Association, 2010, 2018UK Prospective Diabetes Study Group, 1998). Among these complications, retinopathy, neuropathy, and nephropathy have affected the patient's quality of life to a major extent (Seferovic et al., 2018;Volpe et al., 2018). These complications have complex etiology and progress through multiple pathways. Polyol pathway is one of these pathways in which ALR2 is a rate-limiting enzyme, belongs to the oxidoreductase family and is the most extensively explored therapeutic target for the treatment of diabetes-associated secondary complications (Brownlee, 2005;Costantino et al., 2000). Epalrestat is the only effective ALR2 inhibitor available in the market for the treatment of diabetic neuropathy (Choudhary & Silakari, 2019;Miyamoto, 2002;Vyas et al., 2018). It is an acetic acid derivative which under hyperglycemic conditions reduces the accumulation of intracellular sorbitol and also reported to improve the nerve conduction velocity in patients with diabetic neuropathy (Steele et al., 1993). Most of the ALR2 inhibitors reported in the literature are associated with some serious side effects. Although epalrestat has also manifested some side effects including liver dysfunction in clinical trials, it is regarded as the most effective and safe drug as compared to other ALR2 inhibitors (Gabbay, 2004;Hotta et al., 1996;Ramirez & Borja, 2008). Clinical effectiveness and safety of epalrestat over other ALR2 inhibitors were the reason that it was given marketing approval in Japan, India, and China (https://cdscoonline.gov.in/CDSCO/Drugs). Since epalrestat is a potent drug, it is pivotal to understand the structural basis of ALR2 inhibition. The exemplary therapeutic profile of epalrestat encouraged us to find out novel drug molecules mimicking the structural features of eplarestat and having a better pharmacokinetic and pharmacodynamic profile.
The over-activation of ALR2 enzyme in the polyol pathway has also been correlated with an imbalance between NADPH/NADP þ and NADH/NAD þ ratio that subsequently increases the oxidative stress by decreasing the level of reduced glutathione (GSH) (Morsi et al., 2019;Yan, 2018). Therefore, compounds with ALR2 inhibitory activity and antioxidant behavior would be more effective than ALR2 inhibition alone. As evident from the literature, natural compounds have long been reported to have ALR2 inhibitory activity in vitro and most of them are also good antioxidants (de la Fuente et al., 2003;de la Fuente & Manzanaro, 2003;Manzanaro et al., 2006;Quattrini & La Motta, 2019;Wang et al., 2009). Thus, exploring the natural database in search of the novel ALR2 inhibitors may be a fascinating approach that will impart an add on advantage of reducing oxidative stress for the treatment of diabetic complications (DC).
Database mining is one of the important steps in the drug designing process which is adopted by the researchers to find out the information about novel molecules of interest (Cosconati et al., 2009;Rastelli et al., 2002). There are several molecular databases available on the Internet which are easily accessible to researchers (Oloff et al., 2005;Shemetulskis et al., 1995). InterBioScreen database is one the most reliable database of natural compounds which has been used by various research groups (Annapoorani et al., 2012;Bologa et al., 2006;F€ ullbeck et al., 2006;Ma et al., 2011;Steindl & Langer, 2004). In this study, this natural product database was utilized to search for new candidates that are expected to act as potent and selective ALR2 inhibitors with a better therapeutic index. For screening a database, pharmacophoric features of a reference drug need to be identified to include in new hits. Since epalrestat is effective in diabetic neuropathy in clinical studies, this work intended to search for small molecules that demonstrate the pharmacophoric features of epalrestat. Pharmacophore modeling is the most suitable method to execute this plan. Thus, for the logical screening, single ligand-based pharmacophore modeling was adopted for the hierarchical virtual screening followed by ADMET properties calculations. The pharmacophoric features of HNE were also utilized to attain the selectivity of screened molecules for ALR2 over ALR1. Moreover, the top hits were screened through pan assay interference compounds (PAINS) filter to omit the toxicophoric features that may be responsible for the off-target interactions and toxicity. The top hits were then subjected to molecular docking analysis, MD simulations, and free binding energy calculation. The hierarchical virtual screening protocol used for the current study has been represented in Figure 1. This ligand and a structurebased hybrid approach of virtual screening come up with the best hits which may serve as an excellent template for the designing of a series of novel ALR2 inhibitors. These hits can further be experimentally validated for their therapeutic profile using in-vitro and in-vivo models using epalrestat as reference.

Auto pharmacophore generation
The structural features of epalrestat and HNE were exploited using an auto-pharmacophore protocol available with Discovery Studio (DS). The auto-pharmacophore generation method considers hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), hydrophobic (H), negative ionizable (NI), positive ionizable (PI) and aromatic ring (RA) features of the bioactive conformation of a drug to generate a selective pharmacophore model from a single ligand. It enumerates all pharmacophoric features then score all possible combinations of pharmacophores and eventually find out the top pharmacophore. A set of 10 candidate pharmacophore models were generated from the above-mentioned features. The pharmacophore model with the highest selectivity, which was predicted by Genetic Function Approximation (GFA) model, was used for further mapping the ligands from the database using the "ligand pharmacophore mapping" module of DS. The GFA model of selectivity was built from a training set of 500 pharmacophore models (Rogers & Hopfinger, 1994). The highly selective pharmacophore model, which is used for screening, was validated for its ability to pick actives from the database while leaving behind the inactives. Two well-established statistical parameters, i.e. Enrichment factor (EF) and G€ uner Henry (GH) score were calculated (V. K. Vyas et al., 2013). EF measures the ability of pharmacophore to distinguish actives from inactives. If the value of EF is 5, that means it is five times more probable to pick active compounds from the database than the inactive one. EF of less than 1 means the scoring function is worse than random (Bender & Glen, 2005). On the other hand, GH score corresponds to the goodness of the hits by taking into account the 'yield and percentage of the actives' retrieved from the database. Its value lies within 0.1 to 1. The GH score of 1 corresponds to the ideal model (Kurogi & Guner, 2001). The EF and GH scores can be calculated using the following formulas: where H t represents the number of hit molecules screened out of the selected dataset of decoys, H a is the number of active molecules in the screened hit list, D corresponds to the number of total molecules in the dataset and A is the number of actives molecules spiked in the dataset. The autopharmacophore models were further validated by a receiver operator characteristic curve (ROC) analysis. For this purpose, a data set including active and inactive compounds was prepared. The ROC curve analysis describes the sensitivity (true positive rate, Se) for any possible change in the number of selected compounds as 1-Sp (specificity, which is defined as the true negative rate) (Triballeau et al., 2005).
In the ROC analysis, true positive (TP) refers to the active compounds which are truly identified as active in the data set. Similarly, true negative (TN) refers to the inactive compounds which are detected as inactive by the pharmacophore. On the other hand, false-positive (FP) are those inactive molecules that are captured as active while false negative (FN) are those active molecules that are detected as inactive by the pharmacophore. A model that detects less FP and FN is considered an ideal model, and selectivity and specificity are the ratios of these.

Pharmacophore mapping and screening of InterBioScreen database
The present study involves the single ligand-based pharmacophore guided screening for the identification of new ALR2 inhibitors from the InterBioScreen database. The InterBioScreen database (www.ibscreen.com) is a big database of 68,373 molecules from natural origin and one of the most popular libraries used by pharmaceutical companies and academics for high throughput screening (Bologa et al., 2006;Ma et al., 2011;Steindl & Langer, 2004;Whittle et al., 2003). This database was screened by ligand-pharmacophore mapping option available in DS which considers the spatial distribution of all pharmacophoric sites of the submitted pharmacophore model and overlapping of the important structural features present in the hits required for the activity.

ADMET predictions
Pharmacokinetic properties including Absorption, Distribution, Metabolism and Excretion (ADME) of drug molecule can be determined in terms of a number of physicochemical parameters such as molecular weight (MW), molecular polar surface area (MPSA), aqueous solubility (logS), lipophilicity (log P), distribution coefficient (LogD), number of hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), rotatable bonds (RB), blood-brain barrier (BBB) penetration, CYPP2D6 binding, hepatotoxicity, human intestinal absorption (HIA) and plasma protein binding (PPB). These parameters are crucial in determining the capability of a drug molecule to reach the therapeutic concentration at the site of action.
Besides, some of the properties of a drug molecule may also be responsible for their toxicity. Therefore, ADME properties in combination with in-silico models for toxicity prediction were also employed in order to assess the overall ADMET profile of filtered hits. ADMET profiling of a total of 1297 screened hits was done using ADMET descriptors and toxicity prediction extensible algorithms available with DS. Different in-silico models such as Ames mutagenicity, FDA rodent carcinogenicity, carcinogenic potency TD50, rat maximum tolerated dose, skin irritancy, ocular irritancy model were used to predict the toxicity profile of these hits. The top hits were also screened through the PAINS filter to omit such substructures that may be responsible for the off-target interactions. PAINS are the compounds that give false-positive results in the virtual screening. Such false-positive molecules were removed by using freely available webserver PAINS-Remover (Baell & Holloway, 2010).

Molecular docking
Molecular docking is an integral part of the drug designing process. Hence, the docking analysis was carried out to find out the best orientation and key interactions between ligand and receptor. The hits obtained after applying various ADMET filtering rules were subjected to structure-based drug designing (SBDD) such as molecular docking, in order to find out the important interactions of obtained hits with the catalytic domain of ALR2. A grid-based molecular docking study was performed using CDOCKER, MD simulated-annealingbased algorithm, available with Discovery Studio (DS) that operates on Chemistry at Harvard Macromolecular Mechanics (CHARMM) force-field (Gagnon et al., 2016;Wu et al., 2003). Since the selectivity of ALR2 inhibitors over ALR1 remains one of the daunting hurdles, the selectivity profile of the best hits was also assessed by docking them with ALR1. To carry out docking analysis, the protein structures were downloaded from the Protein Data Bank (PDB). In the case of ALR2, the crystal structure of human ALR2 cocrystallized with LDT (PDB id: 1US0, resolution 0.66 Å) (Howard et al., 2004) was selected. However, the human ALR1 crystal structure complexed with ligand and co-factor NADP þ was not available, therefore porcine ALR1 (share 94.15% sequence identity with homo sapiens) complexed with inhibitor fidarestat and cofactor NADP þ (PDB id: 3H4G, at 1.85 Å) was selected (El-Kabbani et al., 2005). The 3D crystal structure of porcine ALR1 (PDB id: 3H4G) aligned well over the Homo sapiens ALR1 (PDB id: 2ALR) with an RMSD of 2.12 Å and displayed 94.2% identity. The similarity search data for the same is provided as Figure S1, see supplementary data). For molecular docking, the receptor was kept rigid while the ligands were held flexible in the refinement process. The prior knowledge of binding sites was extracted from the interactions of co-crystallized ligands (LDT in ALR2 and fidarestat in ALR1). Accordingly, the binding sites were defined using define and edit binding site option available in macromolecules window of DS by keeping the SBD site sphere radius of 10 Å. All unnecessary ligands and het atoms were deleted after defining the binding site. Random conformations of ligands were generated from the initial ligand structures by high-temperature MD at 1000 K, followed by random rotations. The random conformations were refined by grid-based simulated annealing and a final full force field minimization. The molecular docking simulations were executed opting the simulated annealing as True while keeping the other options as default. The top 10 poses were saved for comparison and knowledge-based analysis. Finally, the poses with the lowest CDOCKER interaction energy were used for the subsequent MD simulations.

Molecular dynamics (MD) simulations and MM-GBSA calculation
The best hits retrieved after molecular docking were submitted to MD simulations for a period of 50 ns each using the "Dynamics (NAMD)" protocol, an extension of Nanoscale Molecular Dynamics (NAMD), available with DS (Phillips et al., 2005). NAMD is a CHARMM force-field based parallel molecular program designed for high performance simulation of large biomolecular systems. This protocol is typically run using an equilibrated input structure, therefore, the protein-ligand complexes were minimized and equilibrated using 'Standard Dynamics Cascade' protocol of DS. Solvated system was generated with TIP3P (Transferable intermolecular potential with three points) water model with an octahedral water box and equilibrated for 450 ps using CHARMM at physiological pH. The pH of the system was adjusted by adding Na þ ions and concentration of the salt was fixed to 0.15 M. The MD simulation was carried out using the NPT (constant particle number, pressure, and temperature) ensemble at a time step of 2-fs together with shake algorithm. The temperature was maintained at 310 K by using Langevin dynamics and the pressure was maintained at 1.01325 bar using Nose-Hoover Langevin piston (Hoover, 1985;Nos e & Klein, 1983). The CHARMM restart files generated from standard dynamics cascade were used for actual simulation. Simulated trajectories were investigated for the residue wise root mean square deviation (RMSD), the radius of gyration (Rg), potential energies (PE) and root mean square fluctuation (RMSF), etc., in order to analyze the changes in the structural and dynamic behavior of the protein.
Further, the ligand-binding energies were calculated by using MM-GBSA algorithm. MM-GBSA is a method to calculate the free binding energy of a ligand to its protein and is calculated in terms of the MM-GBSA score (Genheden & Ryde, 2015). The MM-GBSA binding energy is calculated in kcal/mol by using the equation: where ER: EL, EL, and ER are the prime energies of the optimized complex, free ligand and free receptor, respectively (Genheden & Ryde, 2015).

Results and discussion
Any drug molecule shows its biological response in two phases; pharmacokinetic (PK) and pharmacodynamic (PD) phase. This whole study includes in-silico analysis of epalrestat based putative ALR2 inhibitors by considering various molecular events involved in both the PK and PD phases. Insilico PD analysis includes pharmacophore modeling followed by virtual screening for lead identification and docking followed by MD to understand the molecular interactions of the hits with ALR2 to determine the potency as well as with ALR1 to assess the selectivity. In-silico PK analysis includes detailed ADMET profiling of retrieved hits considering various elementary rules and tools available in DS and freely available web-server ADMETlab.

Pharmacophore based virtual screening
The biological activity of a compound not only depends upon the physical properties but also on the 3D conformation of a molecule. Therefore, the bioactive conformation of epalrestat was extracted from X-ray crystal structure available in the PDB database with entry code 4JIR (Zhang et al., 2013) and pharmacophore models were generated based on its main pharmacophoric features using the auto pharmacophore generation method. For this purpose, the minimum feature distance was kept as 2.5, maximum features as 6 and maximum pharmacophores as 10. The autopharmacophore generation protocol recognized total 14 pharmacophoric features (9 Hydrogen bond acceptor, 2 Hydrophobic, 1 Negative ionizable, 2 Aromatic rings) from MiniMaybridge database that mapped well with the pharmacophoric features of epalrestat resulted in the 10 best pharmacophore models from overall 256 candidate hypotheses (Table S1, see supplementary data). Among them, a highly selective model (pharmacophore 1) that aligned well with epalrestat ( Figure 2) and demonstrated 3 Hydrogen bond acceptor (HBA), 1 Aromatic ring (RA), 1 Hydrophobic (H), and 1 Negative ionizable (NI) was selected and validated to determine its ability to pick actives from a given database of unknown molecules, in terms of GH and EF scores. For this purpose, a dataset of known actives and decoys was obtained from DUD.E (A database of useful decoys: Enhanced) database. This database of decoys was utilized to validate the pharmacophore model and consisted of a total of 9356 molecules (D), out of which, 220 molecules (A) were known actives against ALR2 while rest of the molecules were either moderately active or inactive. The virtual screening of this database using the pharmacophore 1 model resulted in 272 molecules as a hit (H t ) out of which, 45 molecules (H a ) were known actives. The EF and GH scores were found to be 7 and 0.85, respectively, that signifies the good quality of the generated model and its capability to preferentially pull out the active molecules from a database. The total 1297 natural product based putative ALR2 inhibitors were retrieved from the InterBioScreen database (containing 68373 molecules) after screening with the duly validated pharmacophore model. These molecules were further analyzed to redeem potential lead molecules. Mapped poses of the five most active hits obtained from the InterBioScreen database using pharmacophore model 1 are shown in Figure 3.
On the other hand, to screen out the selective ALR2 inhibitors, a separate auto-pharmacophore model of ALR1 was generated by using the bioactive conformation of its wellknown substrate HNE. The coordinates of bioactive conformation of the HNE were extracted from the docked complex of HNE with 3H4G. This auto-pharmacophore generation protocol recognized a total of 17 pharmacophoric features (9HBA, 6HBD, 2H) from MiniMaybridge database that mapped well with the pharmacophoric features of HNE and resulted in the four best pharmacophore models from overall 326 candidate hypotheses (Table S1, see supplementary data). Among total four hypotheses generated for HNE, topranked pharmacophore model ( Figure S2) that displayed 1HBA, 1HBD and 1H was selected and validated to determine its ability to pick actives from a given database of unknown molecules, in terms of GH and EF scores. The virtual screening of this database using this pharmacophore model resulted in 105 molecules as a hit (H t ) out of which, 81 molecules (H a ) were known actives. The EF and GH scores were found to be 2.33 and 0.69, respectively, that signifies the good quality of the generated model and its capability to preferentially pull out the active molecules from a database.
Moreover, ROC curve was also generated for both the autopharmacophore models of epalrestat as well as HNE ( Figure S3). ROC analysis shows how accurately a pharmacophore model picks true positive from a pool of true positive and false positive. Sensitivity (also called true positive rate) and specificity (also called true negative rate) are general indices to show the predictive power of a validated model. The ROC curve is a graphical representation of the false positive rate (1-specificity) and the true positive rate (sensitivity). Top-ranked autopharmacophores of both epalrestat and HNE showed an accuracy rate of 0.836 and 0.784, respectively, with good sensitivity and specificity.

ADMET predictions
The oral bioavailability is one of the most important parameters in the PK phase of drug action. It determines the capability of a drug molecule to reach into the systemic circulation after oral administration. For any new chemical entity (NCE) to be a drug, its PK profile should be promising. The physicochemical properties that govern various PK parameters and the values of different PK parameters must be within an acceptable limit. A total of 1297 hits obtained after pharmacophore-based screening were initially submitted to determine their drug-likeness by following various rules including Lipinski's rules, Veber's rules, Oprea's rules, Ghose's rules and Varma's rules. Among a total of 1297 molecules, only 924 molecules could make it to pass these filters and considered as good drug candidates while the remaining 374 molecules failed in this rule-based filtering process.
After this initial evaluation, 924 molecules that passed these filters were further subjected to in-silico prediction of the complete ADMET profile. The toxicity parameters including human ether-a-go-go-related gene (hERG) blockage, human hepatotoxicity, Ames mutagenicity, skin sensitization, LD 50 of acute toxicity, drug-induced liver injury, etc., were predicted (see supplementary data) through different in-silico models available with DS and ADMETlab. Some common ADMET properties of the top 30 hits are given in Table 1. Almost all the ADMET properties of these hits were found to be under acceptable range and some were even better than the reference drug epalrestat. As mentioned earlier, the short biological half-life (T1/2) of epalrestat is one of the hurdles in its journey to FDA approval. In our obtained hits, we also observed an improvement in the predicted T 1/2 values when compared with epalrestat.
A total of 924 molecules that demonstrated good results in in-silico ADMET analysis were further considered for molecular docking studies followed by MD, to understand crucial interactions of obtained hits with the catalytic domain of ALR2 that are required for its inhibition.

Docking analysis, selectivity profile and knowledgebased virtual screening
The hits obtained after the ADMET predictions were upgraded to molecular docking analysis in order to redeem the active molecules having suitable orientations and key interactions with the catalytic site amino acids of ALR2 enzyme. In this study, virtually screened 924 molecules along with reference drug epalrestat were docked against the ALR2 protein using -CDOCKER module of DS, adopting the protocol provided in the material and method section. Among those 924 molecules, only 801 molecules successfully  docked, and correspondingly 7158 refined poses were generated. The molecules displayed CDOCKER energy and CDOCKER interaction energy comparable or better than epalrestat, i.e. À6.94 kcal/mol and À37.90 kcal/mol, respectively, were considered to select effective hits. As a result of this cut-off, a total of 30 hits that might be superior or equivalent to epalrestat were identified (Table 2). Several studies demonstrate that the scoring functions used in the docking programs are not as good as the MM-GBSA model (Chen et al., 2020;Greenidge et al., 2014). Accordingly, MM-GBSA method can help to identify the best pose. This rescoring method fills a gap in structure-based drug discovery when the best binding pose is not known (Chen et al., 2020). Therefore, MM-GBSA method was used to accurately rank the docked ligands. Furthermore, to avoid the substructures that may be responsible for the off-target interactions, top 30 hits were also screened through PAINS filter by using PAINS-remover webserver. Consequently, a total of 25 molecules could successfully pass this toxicophore filter while the rest of five failed to do so. Among the failed molecules, rhodanine moiety was identified as a common toxicophore that was defined as PAINS by the webserver.
Since selectivity of ALR2 inhibitors over ALR1 remains the main issue, the selectivity profile of all the 30 hits, obtained after docking based screening with ALR2, was also assessed. For this purpose, initially, an auto-pharmacophore of ALR1 that was generated using the bioactive conformation of HNE (ALR1 substrate) was utilized to screen top 30 hits. This screening yielded a total of 12 molecules that were not aligned well over the auto-pharmacophore model of HNE while the remaining 18 mapped well over it. The molecules which did not align well over the pharmacophoric features of HNE were considered as selective ALR2 inhibitor. To confirm their selectivity over ALR1, these 12 hits were docked into the catalytic domain of ALR1 (share 65% homology with ALR2). The same docking protocol was adopted for ALR1 as we adopted in the case of ALR2. Out of 12 hits, 11 molecules approached the active site of ALR1 (Table S2, see supplementary data), whereas, none of the molecule shown interaction with the amino acid residues which are involved in the catalysis process, i.e. Tyr50, His113. The -CDOCKER interaction energy (À46.55 kcal/mol to À28.45 kcal/mol) and MM-GBSA scores (À59.89 kcal/mol to À30.99 kcal/mol) of all the molecules was also found to be poor than the -CDOCKER interaction energy (À47.90 kcal/mol) and MM-GBSA (À65.13 kcal/mol) of epalrestat signifying the poor affinity of these hits for ALR1 enzyme as compared to epalrestat.
Collectively, on the basis of ideal binding mode, highest dock score, key residue interactions, MM-GBSA scores and PAINS filter results, top five hits (STOCKIN-44771, STOCKIN-46041, STOCKIN-59369, STOCKIN-69620, STOCKIN-88220), were further considered for MD analysis. The docked poses of these five hits are provided as supplementary data in Figures S4-S9.
Importantly, the active site of ALR2 is divided into two regions; first is catalytic anion-binding site comprising of Tyr48 and His110 and responsible for normal detoxification of aldehydes; second is the induced cavity region which adopts various conformations according to the nature of the interacting ligand (Kumar et al., 2012). The induced cavity region is highly hydrophobic accommodating nicotinamide ring of NADP þ coenzyme in the center and is responsible for the glucose metabolism. The overall binding domain of ALR2 is composed of aromatic residues (Trp20, Tyr48, Trp79, Trp111, Phe121, Phe122 and Trp219), polar residues (Gln49, His110 and Cys298) and nonpolar residues (Val47, Pro218, Leu300 and Leu301). The specificity pocket of ALR2 is composed of Trp111, Phe122, Leu300, Thr113, Phe115, Val130, Ser302 and Cys303. The residues Trp111, Phe122 and Leu300 are shared by both the active site and specificity pocket of ALR2 (Kumar et al., 2020). Since, ALR2 shows the induced fit phenomenon, essentially, the resulting induced cavity region is selective for ALR2 over its close analogs. To differentially inhibit the catalytic activity of ALR2 for glucose and leaving unaltered the detoxification site of toxic aldehydes, the inhibitor should interact with the induced fit cavity. For this purpose, molecules should have a hetero-aryl ring system with optimum linker via carbonyl linkage and should not interact with the catalytic residues (Tyr48, His110) involved in the detoxification process (Kumar et al., 2012). These types of structural features were also found in our top 30 hits. Interestingly, features such as chromenone-7-one/furochromenone-7one, carboxylate group, and cyclic imide were common in four of the total five top hits, while, thiazolidinedione and indole rings were observed in the fifth molecule (Figure 4).
Since epalrestat binds to the catalytic residues (His110, Tyr48) of ALR2, it hampers the detoxification process that further leads to side effects. However, our top five hits did not show any interaction with these catalytic residues and therefore it can be inferred that they should not interfere with the detoxification process. Rather, they were found to interact with the induced fit cavity (specificity pocket) which suggests the differential inhibitor potential of these molecules.
The docking interactions pattern of top hits exhibited that the carboxylate group present in almost all molecules act as a donor group and form hydrogen bond with Val47 (apolar residue) and Thr113 (polar residue) of specificity pocket of ALR2. On the other hand, the phenyl ring and chromenone-7-one/furochromenone-7one ring present in obtained hits interacts via p-p stacking with Trp111and Phe122 (aromatic residues) of hydrophobic pocket. Moreover, the oxygen of chromenone-7-one serves as an acceptor group and forms a hydrogen bond with Trp20. Few of these molecules were found to have a high affinity toward the induced cavity region without any interaction with the residues Tyr48 and His110 which are involved in detoxification process. For these hits, the CDOCKER interaction energy (À50.64 kcal/mol to À42.47 kcal mol À1 ) and binding affinity (À85.79 to À65.14 kcal mol À1 ) were in good range and even better than the CDOCKER interaction energy (À37.90 kcal/mol) and binding affinity (À33.95 kcal/mol) of epalrestat.

Molecular dynamics (MD) simulations
MD simulation is an important tool for understanding the internal motion and resulting conformational changes in the   protein-ligand complex, that play an important role in eliciting biological activity (Karplus & McCammon, 2002). Additionally, the stability of the docked complex can also be assessed by running MD for a particular simulation period. In the present study, MD based stability analysis of most active hits in catalytic domains of both ALR1 and ALR2 proteins (docked protein-ligand complexes) was done for a period of 50 ns and compared with reference drug epalrestat.  On the basis of RMSD profile obtained after 50 ns simulation period (Figure 5), it was observed that the average RMSD of final hits were comparable to the reference drug epalrestat (1.66 Å) and recorded as 1.48 Å, 1.63 Å, 1.29 Å, 1.29 Å and 1.58 Å corresponding to STOCKIN-44771, STOCKIN-46041, STOCKIN-59369, STOCKIN-69620 and STOCKIN-88220, respectively.
As it is well known that the ALR2 has induced-fit functionality; the active site is flexible and able to accommodate substrates with diverse structures. Therefore, in order to get an idea about the compactness of the ALR2 protein, Rg was also determined.
The Rg of a group of atoms can be defined as the root mean square distance of each atom from their centroid and can be calculated as follow: where ri and rj are the position vectors of atoms i and j and N is the number of atoms. Rg is an indicator of compactness of protein and concerned with whether the secondary structures are compactly packed into a 3D structure of a protein or expand during MD simulation (Huang & Powers, 2001). The lowest value of Rg corresponds to the tightly-packed complex and vice-versa.
The Rg plot obtained after MD simulations demonstrated that the ALR2 backbone was highly compact and stable throughout the simulation period, displaying no change in the folding of protein while complexes with obtained hits ( Figure 5). These complexes were also found to be stable when investigated in terms of their potential energies that lied within the range of À106336.372 to À106553.480 kcal/ mol and was comparable to epalrestat (À106,530.455 kcal/ mol). Overall, MD results demonstrated that the protein-ligand complexes were stable throughout the simulation period with little aberrations. Finally, the MM-GBSA scores were calculated again for the simulated complexes to see any changes in the binding affinities after simulation. We observed an improvement in the binding affinities of these complex, i.e. À65.14 kcal/mol to À85.79 kcal/mol (after docking) and À78.41 kcal/mol to À94.82 kcal/mol (after dynamics). The RMSD, Rg, PE and MM/GBSA scores obtained after the MD simulation of top hits are given in Table 3.

Insights into the binding mode of epalrestat and final hits with ALR2
Further, in order to get insight into the binding mode of final hits, the last 2 ns trajectory was extracted which suggests that the final hits retained crucial interactions with some additional interactions with the specificity pocket of ALR2. The histogram representing interaction fractions of the top two hits along with reference drug epalrestat are shown in Figure S10 (see supplementary data). The simulation interaction diagram suggested that the reference drug Epalrestat interacts with ALR2 protein active site amino acid residues including His110 and Trp111 through hydrogen bonding. However, no contacts were observed with the "specificity" pocket's amino acid residues ( Figure 6). The top hit STOCKIN-44771 retained interaction with Phe122 and achieved maximum interactions with the specificity pocket residues (Trp111, Leu301, Ser302). In this molecule, the chromenone7-one ring formed hydrophobic interaction with Phe122 while the carbonyl group present in this ring interacted through H-bond with Leu301 and Ser302. The carboxylate group of STOCKIN-44771 acts as acceptor group and displayed H-bonding with Trp111 (Figure 7). Another top hit, STOCKIN-46041, shown p-p stacking with Trp111 and H-bond contacts with Trp20 (hydrophobic pocket) and Ser302 (specificity pocket) as shown in Figure 8. On the other hand, STOCKIN-59369 displayed some H-bond     and p-p stacking interaction with Trp20 ( Figure 9). In case of STOCKIN-69620, the furan ring of furochromenone-7one formed H-bond and p-cation interactions with Ser302. The phenyl ring of chromenone-7one displayed p-p stacking with hydrophobic pocket (Trp79). Two H-bond interactions were also observed between carbonyl groups and Gln49 ( Figure 10). Finally, in STOCKIN-88220, the carbonyl group of thiazolidinedione ring bound through H-bond with two of the specificity pocket residues, i.e. Thr113 and Ala299 ( Figure  11). The contact summary and RMSD plots of epalrestat and top hits are shown in Figures 6-11.

Insights into the binding mode of epalrestat and final hits with ALR1
Similarly, to determine the stability of docked complexes with ALR1 protein, the top five molecules were subjected to MD simulation for a period of 50 ns (see supplementary data, Figures S11-S15). Molecular docking and MD results suggested that none of the selected hits shows interactions with the conserved amino acids of ALR1 and complexes were stable throughout the simulation period of 50 ns. Further, the MM-GBSA score signifies that all the hits have a lower binding affinity (À56.02 to À39.62 kcal/mol) with ALR1 enzyme when compared with epalrestat (À56.89 kcal/mol). The simulation events and quality analysis results are summarized in Table 4. Simulation interaction diagram of these hits demonstrated the H-bond (Trp22, Trp114, Lys127, Arg312, Met302) and hydrophobic interactions (Trp22, Trp220) with the amino acid residues of ALR1. In a study performed by Kabbani et al., it was reported that the non-conserved C-terminal loop amino acid residue Arg312 significantly reduces the binding of inhibitors to ALR1 while the mutation in this amino acid increases the ALR1 inhibitory activity. In this context, the H-bond interactions of STOCKIN-46041 and STOCKIN-88220 with Arg312 may improve the selectivity of these molecules for ALR2. Moreover, the carboxylate function that interacts with the anionic binding site of both ALR1/ ALR2 enzymes is a common trait required for their inhibition. Accordingly, the carboxylate group of only STOCKIN-46041  was not found to interact with any of the anionic binding site amino acids of ALR1 even after the MD simulation of 50 ns. Contrarily, in case of ALR2, the carboxylate group of this molecule displayed interactions with the anionic binding site amino acid residue Trp20. Essentially, the probable reason for different binding modes of the same molecule might be different orientations of these molecules within the active sites of ALR1 and ALR2. Although ALR1 and ALR2 share 65% homology, they are not identical and which is why these molecules may have attained different orientations. Since STOCKIN-46041 displayed crucial interactions required for the selective inhibition and STOCKIN-59369 could not successfully docked within the active site of ALR2, these molecules may be regarded as selective inhibitors of ALR2. These observations indicate that STOCKIN-46041 and STOCKIN-59369 may serve as selective ALR2 inhibitors without hampering the detoxification process mediated by ALR1 and hence can be considered as safer therapy for DC.
Lee and colleagues reported that for a compound to be selective ALR2 inhibitor, it should have certain structural requirements, i.e. a primary lipophilic region, a carbonyl group and an acetamide group coplanar to the primary lipophilic moiety that can interact with specificity pocket. The hits obtained in this study possess all these features as described in Figure 12 and hence, can be expected to be selective inhibitors of ALR2.

Conclusion
The present investigation is simply an epalrestat guided search for selective ALR2 inhibitors from a database of natural compounds, by utilizing in-silico tools including pharmacophore-based screening, ADMET profiling, molecular docking, MD, etc. These tools were successfully implemented for the identification of the ideal binding mode of novel molecules which are required for the ALR2 inhibitory activity of new molecules and their selectivity over ALR1. Structural features like thiazolidinedione, chromenone-7-one/furochromenone-7one, proline, indole, carboxylate group and cyclic imide were found common in the top hits. On the flip side, the rhodanine ring present in epalrestat was identified as a toxicophore. For the selective inhibition of ALR2 enzyme, features like carbonyl group, lipophilic group and acetamide group were found crucial. These structural features displayed key interactions with the non-conserved C-terminal loop amino acid residues of the ALR2 enzyme. Briefly, the results of present investigation indicate that the molecules from a natural database, which were retaining the pharmacophoric features of epalrestat but not toxicophores, have the potential to specifically inhibit the ALR2 enzyme. Overall, based on ideal binding modes, binding affinities and key interactions, five molecules were identified as potential ALR2 inhibitors and were having drug-like properties. Among these, STOCKIN-46041 and STOCKIN-59369 demonstrated a good selectivity profile that can further be optimized for the treatment of DC such as neuropathy, nephropathy and retinopathy. This study may open the avenue for the researchers to develop selective and safer ALR2 inhibitors with a better therapeutic profile than the currently available therapies of DC.