An in silico molecular modeling approach of halolactone derivatives as potential inhibitors for human immunodeficiency virus type-1 reverse transcriptase enzyme

Abstract Acquired Immune Deficiency Syndrome (AIDS) is an infectious disease caused by Human Immunodeficiency Virus (HIV) infection and its replication requires the Reverse Transcriptase (RT) enzyme. RT plays a key role in the HIV life cycle, making it one of the most important targets for designing new drugs. Thus, in order to increase therapeutic options against AIDS, halolactone derivatives (D-halolactone) that have been showed as potential non-nucleoside inhibitors of the RT enzyme were studied. In the present work, a series of D-halolactone were investigated by molecular modeling studies, combining Three-dimensional Quantitative Structure-Activity Relationship (3 D-QSAR), molecular docking and Molecular Dynamics (MD) techniques, to understand the molecular characteristics that promote biological activity. The internal and external validation parameters indicated that the 3 D-QSAR model has good predictive capacity and statistical significance. Contour maps provided useful information on the structural characteristics of compounds for anti-HIV-1 activity. The docking results showed that D-halolactone present good complementarity by the RT allosteric site. In MD simulations it was observed that the formation of enzyme-ligand complexes were favorable, and from the free energy decomposition it was found that Leu100, Val106, Tyr181, Try188, and Trp229 are key residues for stabilization in the enzymatic site. Thus, the results showed that the proposed models can be used to design promising HIV-1 RT inhibitors. Communicated by Ramaswamy H. Sarma


Introduction
Acquired Immune Deficiency Syndrome (AIDS) is a high risk disease for health, resulting from Human Immunodeficiency Virus (HIV) infection, which is responsible for the progressive decline of immune function and the consequent low immunity allowing the appearance of opportunistic diseases such as tuberculosis, pneumonia and various forms of cancer (Shi et al., 2019). There are two types of the virus, HIV-1 and HIV-2, wherein HIV-1 is the main causative agent of worldwide spread and has the most aggressive form of the disease (Guo et al., 2018;Mahdi et al., 2018). According to the data reported by UNAIDS, it is estimated that more than 38 million people are infected with HIV, associated with 1.5 million new cases and 690 thousands AIDS-related deaths in 2019 (UNAIDS., 2021).
Replication of the HIV-1 virus requires catalysis of the Reverse Transcriptase (RT) enzyme in the step of reverse transcription of the RNA viral genome into a double copy of DNA, which is then integrated into the host cell chromosome to form a new virus (Tekeste et al., 2015). Thus, the key role of the enzyme in the virus life cycle makes RT a promising target for designing new drugs with anti-HIV-1 activity . RT is characterized by being an asymmetric heterodimer provided with two subunits 66 kDa (p66) and 51 kDa (p51), in which p66 subunit is constituted by the polymerase (440 amino acids) and RNaseH (120 amino acids) domains, while the subunit p51 is identical to the first 440 residues of the p66 region (Sarafianos et al., 2009). Study with the crystallographic structure of HIV-1 RT revealed that the p66 subunit resembles a right hand provided with the subdomains fingers, palm, thumb and connection (Kohlstaedt et al., 1992). Despite the domains similarity in p66 and p51, the spatial arrangements differ, as the palm of the p66 subunit contains the polymerase active site and the allosteric site, while p51 mainly performs the structural function (Jacobo-Molina et al., 1993;Sarafianos et al., 2004).
Therapeutic agents in clinical use against RT are divided into Nucleoside Reverse Transcriptase Inhibitors (NRTIs) and Non-Nucleoside Reverse Transcriptase Inhibitors (NNRTIs) classes (Luo et al., 2019). NRTIs are competitive inhibitors, which in their triphosphate form are capable of preventing viral DNA synthesis at the polymerase site. NNRTIs are hydrophobic compounds that do not require intracellular metabolism to exert anti-HIV-1 activity, performing non-competitive inhibition at the lipophilic allosteric site (Dodda et al., 2019;Li et al., 2012). Bindings with NNRTIs restrict the flexibility of the enzyme and modify the geometry of the active site, which interferes with the interaction affinity of RT by the nucleotide substrate, thus rendering impossible the chemical stage of DNA polymerization (Chiang et al., 2018;Wang et al., 2019).
Among NNRTI inhibitors, the Efavirenz (EFV) is used as the first-line anti-HIV-1 drug most commonly recommended by the World Health Organization (WHO), due to its potency against HIV-1 and greater resistance to mutations when compared to first-generation drugs such as Nevirapine and Delavirdine (Namasivayam et al., 2019). Despite its action against the disease, antiretroviral therapy used against HIV has several disadvantages such as toxicity, drug interactions and long-term therapy (Soto et al., 2019). The emergence of spontaneous mutations in RT genome results in the development of new resistant variants, representing a major obstacle to the effective action of inhibitors (Kamil et al., 2018). Therefore, it is essential to develop new NNRTIs with anti-HIV-1 activity (Yang et al., 2016).
Lactones consist of a compound group that are found in marine organisms such as algae, sponges and corals, in addition to plants (Adekenov, 2017). Compounds comprising a lactone ring exhibit diverse biological properties such as anti-inflammatory (Abe et al., 2015), antimicrobial (Flamm et al., 2017), anticancer (Wyre R bska et al., 2013) and antiviral activities . In the search for new anti-HIV-1 agents, lactones demonstrate a potent inhibitory effect against the enzymes responsible for the replication of the HIV-1 virus (Trova et al., 1994). Molecules comprising halide groups in their structure are of great pharmaceutical interest, especially halogenated lactones (Grabarczyk et al., 2019). According to Han et al. (Han et al., 2015) lactones have a molecular structure similar to EFV and halolactone derivatives (D-halolactone) have shown activity against HIV-1.
Computer-Aided Drug Design (CADD) includes a variety of different techniques, such as Quantitative Structure-Activity Relationship (QSAR), molecular docking and Molecular Dynamics (MD), which are applied in medicinal chemistry, as they allow the successful identification of promising new drugs, combined with cost reduction (Poli et al., 2020). QSAR study explores the correlation between biological activity and the molecular structure of bioactive compounds and molecular docking is one of the most used methods for virtual screening based on receptor structure, due to its ability to predict the conformation of a molecule in the active binding site (Honarparvar et al., 2014). MD is the method that simulates the dynamic behavior of a molecule or a system of molecules over time, commonly used to estimate the stability of proposed ligands in the docking step (Liu et al., 2018).
In this study, a set of 26 D-halolactone was selected for Three-dimensional Quantitative Structure-Activity Relationship (3 D-QSAR) study, molecular docking and MD, in order to understand the three-dimensional structure-activity and binding mechanism relationship of inhibitors in the active site of the enzyme, and thus contribute to the development of new derivatives of the halolactone class with potential anti-HIV-1 activity.

Construction and division of the set of molecules
For the study, 26 D-halolactone compounds synthesized by Han et al. (Han et al., 2015) were selected, which determined the activities expressed in the compound concentration necessary to promote 50% of biological activity (EC 50 ). EC 50 values of molecules set were converted into values corresponding to the logarithmic scale (pEC 50 ¼ -logEC 50 ), as shown in Table 1. The two-dimensional structures of the compounds were built in the MarvinSketch program (Chemaxon, 2020), and later optimized in the Avogadro program (Hanwell et al., 2012) with the molecular mechanics force field MMFF94 (Halgren, 1996). Hierarchical Cluster Analysis (HCA) was used to divide the compounds into training and test sets, determining the Euclidean distance and the Ward Linkage method (Ward, 1963), using the MINITAB 14 software.

Alignment of the set of molecules and calculation of the interaction fields
In the 3 D-QSAR method, Molecular Interaction Fields (MIF's) were analyzed as independent variables to calculate the relationship between tridimensional structure and biological activity, while pEC 50 was used as a dependent variable in the model, wherein the study was carried out in the Open3DQSAR program (Tosco & Balle, 2011). Open3DQSAR is intended for the exploration of pharmacophores by highthroughput chemometric analysis of MIF's, which has shown results comparable to Comparative Molecular Field Analysis/ Comparative Molecular Similarity Indices Analysis (CoMFA/ CoMSIA) (Ghasemi & Shiri, 2012). Alignments between D-halolactone were performed using the Open3DALIGN program, which applied method consisted of multiple overlaps of three-dimensional structures in order to define each molecule of the set as a possible reference, capable of combining the pharmacophoric elements of the others (Leach et al., 2010). The overlapped molecules were arranged in a 3 D grid box, with regular grid points distributed at a distance of 1 Å from each other, 5 Å apart from the compounds. Thus, their respective scores were calculated to assess the quality of the pharmacophores overlapping in common. Subsequently, the MIF's were obtained by calculating the interaction between molecules and virtual probes. The stereochemical field was based on the AMBERFF99 van der Waals parameters , and calculated by the Lennard-Jones potential between the n molecule atoms and a carbon probe sp 3 , while the electrostatic field was based on the point charge model, and was calculated by Coulomb interactions between a positively charged probe and the n atoms (Tosco & Balle, 2011). In order to exclude non-informative variables, which only add noise and negatively condition the model, the elimination of non-correlated variables with biological activity was applied. Data removal occurred in cases where grid values were close to zero, when grid points were above or below the established threshold , grid points that exceeded the cutoff point, points with a standard deviation greater than 0.1, and finally, the N-level variables capable of adding noise to the model were eliminated.

3D-QSAR studies
The Partial Least-Squares (PLS) method was used in order to generate reliable 3 D-QSAR models, and thus correlate pEC 50 values with the corresponding MIF's. PLS is efficient for dealing with experimental noise and collinearities, in addition to being robust due to its parameters that practically do not change with the inclusion of new samples in the calibration set (Ferreira et al., 1999). The Fractional Factorial Design (FFD) and Uninformative Variable Elimination by PLS (UVE-PLS) were applied by the Open3DQSAR program, associated with PLS, to obtain more robust regressions. FFD method has as its main advantage the selection of informative variables, in addition to the ability to operate with individual variables or groups; while the UVE-PLS removes the least informative variables called PLS pseudo-coefficients (Sepehri et al., 2018).

Validation of 3D-QSAR models
The validation process is a critical task of any QSAR model development protocol, as it is necessary to describe its explanatory and predictive capacity and the internal and external validation methods can help to validate the proposed QSAR model (Abdolmaleki et al., 2017). The internal validation of the model was statistically based on parameters such as the coefficient of determination (R 2 ), which demonstrates the explanatory capacity and the F test at the 95% confidence level, which is the ratio of the variance explained by the model and the variance due to the error in the regression. High values of F indicate that the mathematical model has statistical significance (Souza et al., 2021). The predictive potential of the QSAR model was analyzed using cross-validation with the techniques, represented by the respective Q 2 LOO and Q 2 LMO coefficients (Arba et al., 2018). External validation was used in order to assess the predictive capacity of the QSAR model developed for applying prediction of activity values of the test set, being characterized by the correlation coefficient R 2 pred (Muhammad et al., 2018) .

Molecular docking
Molecular docking simulations were carried out using the Genetic Optimization for Ligand Docking (GOLD) program version 5.5 (Jones et al., 1997), based on the crystallographic model of the HIV-1 RT enzyme, obtained from Protein Data Bank (PDB ID: 3DRP), with a resolution of 2.6 Å (Tucker et al., 2008). GOLD 5.5 takes into account the flexibility of the molecule and amino acid residues at the receptor site and shows to be a successful tool in the virtual screening, optimization and identification of the compounds binding mode (Atanasova et al., 2015). To carry out the docking, the protocol of removing water molecules and adding hydrogen atoms in the structure of the RT enzyme was followed. Then, the binding site in the region comprising the amino acid residues at a distance of 10 Å from the crystallographic inhibitor (3-f5-[(6-amino-1H-pyrazolo[3,4-b]pyridin-3-yl)methoxy]-2-chlorophenoxyg-5 chlorobenzonitrile) (R8E) was defined (Tucker et al., 2008). In re-docking, the Root Mean Table 1. Two-dimensional representation of the halolactone scaffold composed of R 1 , R 2 and R 3 radicals and the anti-HIV-1 activity values expressed as EC 50 and pEC 50 for the 26 D-halolactone. Square Deviation (RMSD) was calculated using Fconv program (Neudert & Klebe, 2011). Subsequently, 26 D-halolactone were docked in the allosteric cavity and their conformations were classified using the GoldScore scoring function, which has shown better fit compared to the other GOLD functions (Verdonk et al., 2003). Furthermore, the Poseview online server (Stierand & Rarey, 2014) was used to verify the interactions between the enzyme and the ligands.

Molecular dynamics
The selected compounds from the docking had their charges calculated by the Gaussian 03 program (Frisch et al., 2003), applying Restricted Electrostatic Potential (RESP) (Bayly et al., 1993) and the Hartree-Fock method together with the base (6-31 G(d,p)) (Hariharan & Pople, 1973). Taking into account that pKa values of ionizable groups can be changed by the protein environment, an assignment of the protonation states of the amino acid residues under pH 7 condition by the Propka program was performed (Li et al., 2005). MD simulations were performed in the AMBER 18 program, implemented by the pmemd.CUDA module (Salomon-Ferrer et al., 2013). In the AMBER 18 package, GAFF (Wang et al., 2004) and MMFF99SB (Hornak et al., 2006) force fields were used for the parameterization of ligands and amino acid residues of enzyme, respectively. The complexes were prepared using the tleap program included in the AMBER program package. A cubic box with an edge equal to 12 Å with TIP3P water molecules (Jorgensen et al., 1983) was created for each system, ensuring that all atoms are within the established limit, with inclusion of Clcounter ions to make the system electrically neutral.
The systems were submitted to four energy minimization steps, in the first step 25000 minimization steps were carried out, divided into 10000 steps performed with the steepest descent method and the remaining 15000 with the conjugate gradient method. The next three steps were performed in 10000 steps of each minimization, using both mentioned methods. After the minimizations, systems were gradually heated using the Langevin algorithm (Loncharich et al., 1992) with atoms constrained by a constant force equal to 25 kcal/mol.Å 2 , to raise the temperature from 0 K to 298 K in the NVT set. Minimizations and heating were carried out with SANDER module (Case et al., 2005). Periodic boundary conditions using the Particle Mesh Ewald (PME) method (Darden et al., 1993) were employed for long-range electrostatic interactions. The cutoff distances for the long-range and van der Waals interactions were set to 9 Å. The systems were balanced in the NPT set at 1 atm, at a temperature of 298 K. During the MD simulations, all bonds with hydrogen atoms were constrained using the SHAKE algorithm (Ryckaert et al., 1977), and the motion equations were integrated every 2 fs using the Verlet algorithm (Verlet, 1968). Thus, the simulation of the enzyme-binding complexes was carried out in a time of 100 ns using the methods performed in the NPT set at a temperature of 298 K. Furthermore, the RMSD values were used to determine the average deviation resulting from the enzyme conformational variations in the apo and holo form. The B-factor was applied to identify which amino acid residues are more flexible in the enzyme, and to observe how the addition of D-halolactone influences its fluctuation. Both RMSD and B-factor structural analyzes were performed by the CPPTRAJ (Roe & Cheatham, 2013) module of AMBERTOOLS 18.

Binding free energy calculation
Binding free energy is an important factor for evaluating the binding affinity attributed to the biological activity exerted by a molecule in a dynamical system (Kostjukov & Evstigneev, 2019). In total, 1200 frames of the last 10 ns of the MD path were extracted to calculate the binding free energy (DG bind ), using the CPPTRAJ and MMPBSA.py (Miller et al., 2012) modules of AMBERTOOLS 18. Thus, the calculations were performed using the Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA)  and Mechanics Generalized Born Surface Area (MMGBSA) methods , contained in the AMBERTOOLS 18 package. The binding free energy of the complexes was determined according to the equations: In Equation 1, DH is the enthalpy term, while the terms T.DS represent the product between the absolute temperature and the entropy resulting from the conformations obtained in the dynamic simulation, DE MM consists of the energy obtained by molecular mechanics and DG bind, solv is the free energy from solvation. Entropy contributions to energy results from changes in translation, rotation, and vibration. In Equation 2, the mechanical energy comprising the individual contributions of internal energy (DE int ), electrostatic (E ele ) and van der Waals (DE vdW ) is calculated. Equation 3 contains the sum that makes up the internal energy, being the contributions of the bond length (DE bond ), bond angles (DE angle ) and the torsion angles (DE torsion ). In Equation 4, the solvation free energy (DG bind, solv ) results from the sum of the polar (DG PB/GB ) and non-polar (DG n-polar ) contributions. The polar electrostatic contribution to the solvation free energy can be calculated by the Poisson-Boltzmann (PB) method or by the generalized Born (GB) approximation methods, while the non-polar energy is estimated by the product of the surface tension c with value equal to 0.0072 kcal/mol.Å 2 , and solvent accessible surface area (SASA), according to Equation 5.
The interaction free energy obtained by the MMGBSA method in the last 10 ns of simulation was decomposed according to Equation 6, and it allows determining the individual contribution of all residues to the free energy of the complex and elucidating their importance in the active site of the enzyme (Alves et al., 2020). The main residues that contribute to the total energy were visualized using the CHEWD plugin (Raza et al., 2019) by the Chimera program (Pettersen et al., 2004). The van der Waals interactions (DG vdW ) and electrostatic (DG ele ) between the D-halolactone and the amino acid residues of the RT enzyme were determined by the SANDER module present in the AMBER 18 software. Both DG PB and DG GB were calculated using the generalized Born model, igb ¼ 2 (Onufriev et al., 2000) and in the MMPBSA method the dielectric constants of solute and solvent were considered 1 and 80, respectively (Sun et al., 2014).

Data set and alignment procedure
From the HCA method, the test set (30% of the compounds), selected randomly, was formed by compounds 3, 8, 9, 12, 18, 20 and 25 (Table 1), while the other 19 were used in the training set (70% of the compounds). In the alignment, the best result was obtained when the other 25 derivatives were aligned with reference to compound 17 (Figure 1), which enabled the three-dimensional arrangement for combining the pharmacophoric points, necessary for performing MIF's calculations.

Molecular modeling and evaluation for 3D-QSAR
3D-QSAR models were built with 1 to 5 PCs using the PLS and linear regression combined with the two variable selection methods FFD and UVE-PLS. The results of the statistical parameters (Table 2) show that the QSAR model with the best predictive capacity was obtained with the regression associated with the FFD method with one principal component, which coefficient of determination R 2 was 0.76 and the calculated test F (51.45) was five times greater than the tabulated critical value F tab(5.13) ¼3.03, F calc(5.13) /F tab(5.13) ¼ 16.9. These results characterize that the developed model is acceptable and has statistical significance (Frimayanti et al., 2011).
The cross-validation coefficients using the Leave-One-Out Q 2 LOO ¼0.57 and Leave-Many-Out Q 2 LMO ¼0.55 methods,  indicate that the QSAR model can be considered reliable and with good predictive capacity for biological activity, as its values are higher than those recommended in the literature (Golbraikh & Tropsha, 2002). For the external validation performed with the test set, it is observed that the coefficient of determination value of the cross validation R 2 pred (Table 2) is greater than 0.6, which demonstrates that the model has good predictive ability (Zhao et al., 2020). The developed model did not present overfitting of the data, as the difference between the R 2 e Q 2 LOO (0.198) coefficients is within the range suggested in the literature (Kiralj & Ferreira, 2009).
In the linear scatter plot between values of experimental pEC 50 as a function of the predicted (Figure 2A), it is highlighted that the points are well distributed in relation to the adjusted line, with some of them being located on or close to the line, demonstrating that the predicted activity presents good agreement with the experimental values. Another highlight is that all points are within the bounded area of the prediction range (dashed green lines). Furthermore, through Figure 2B it is observed that the model presented low values of residual activity. Thus, it is verified that the results of the 3 D-QSAR model fits properly to the data set and has the ability to predict the anti-HIV-1 activity of D-halolactone compounds. Figure 3 shows the two derivatives that were selected, the most active (Cp1) and the least active (Cp21) compound, in order to demonstrate the structure-activity relationship of Dhalolactone based on the analysis of the contour maps. The maps were built from the best predictive model, where the stereochemically favorable regions are represented by the color green, and the unfavorable ones in yellow. In Figure 3A, it is verified that the group o-Cl-Ph present in the R 2 substituent of Cp1 (Table 1) is inserted in the green region, which suggests that the addition of bulky groups in this position induces an increase in the value of anti-HIV-1 activity. Likewise, Cp2-Cp5 derivatives have a phenyl group in R 2 and have high pEC 50 values against the RT enzyme (Table 1). Cp21 derivative has only the ethyl group as a substituent in R 2 ( Figure 3B), which does not fit into this region and therefore does not contribute to biological activity, being the same observed for other molecules, such as Cp6-Cp7, Cp9-Cp19 and Cp22-Cp26 that present intermediate and low activity values, as they exhibit small bulky radicals in this position.

3D-QSAR contour map analysis
In addition, it appears that the hydrogen atoms in the R3 position of the compound Cp1 is not close to the yellow outline ( Figure 3C). Furthermore, it is observed that the compounds Cp2-Cp5 have only hydrogen atoms in R 3 and likewise are not subject to unfavorable effects. In turn, the Cp21 derivative ( Figure 3D) presents p-F-Ph of the R 3 radical inserted in the yellow region, which demonstrates that the addition of groups in R 3 should be avoided when designing new D-halolactone, since bulky substituents in this position cause a decrease in biological activity. It is important to highlight that Cp9-Cp26 derivatives show less inhibitory potential against HIV-1 RT due to the presence of the groups p-Me-Ph, p-F-Ph, p-Cl-Ph and p-Br-Ph em R 3 . In addition, the yellow outline near to R 1 (benzene ring of the halolactone scaffold) indicates that the chemical group substitutions in this position are sterically unfavorable for biological activity. Therefore, it is noted that the results obtained by the stereochemical maps are in accordance with the experimental studies of Han et al (Han et al., 2015), in which it was verified that the substitution of phenyl groups in the R 2 position promotes a better inhibition of RT than its analogues, and defines R 1 and R 3 as positions that influence the decrease ofthe anti-HIV-1 activity.
The blue outline of the electrostatic map represents the positions for replacement of positively charged groups or hydrogen binding donors, while the red maps indicate regions for replacement of negatively charged groups or hydrogen binding acceptors, both contours being favorable to biological activity (Figure 4). The blue outline in contact with meta position of the R 2 radical in Cp1 ( Figure 4A) suggests that the positive substitution could favor the formation of hydrogen bonds with the allosteric residues. The red outline of the R 2 substituent of compound Cp1 ( Figure 4C) indicated that the addition of hydrogen bond acceptor groups is favorable to raising the anti-HIV-1 inhibitory potential. Contributions to the compound Cp21 are linked to the R 3 radical ( Figure 4B and D), however, the possible additions to the phenyl ring of this radical are unfavorable according to the stereochemical maps.

Molecular docking
To assess whether the docking parameters of the GOLD 5.5 program are satisfactory to describe the macromolecular target and the ligand conformation, re-docking was performed using the R8E ligand crystallographic structure of the HIV-1 RT enzyme (PDB code: 3DRP). The RMSD value between the experimental and docked R8E using the Fconv program  (Neudert & Klebe, 2011) was 0.318 Å. According to Ram ırez and Caballero, RMSD values less than or equal to 2 Å ensure that the chosen scoring function adequately reproduces the ligand orientation (Ram ırez & Caballero, 2018). Thus, the GOLD program showed good ability to predict the orientation of the crystallographic ligand in the active site of RT. Table 3 shows the docking results with the interactions (hydrophobic and p-p) and the GoldScore function values for the compounds Cp1 and Cp21 that were selected in the analysis of the 3 D-QSAR model. It is observed that the Cp1 derivative had a higher GoldScore value (57.76) when compared to Cp21 (53.92), which is in accordance with the experimental data obtained by Han et al. (Han et al., 2015).
When analyzing the results obtained by compound Cp1 (Table 3) it is observed that the substituent o-Cl-Ph at position R 2 performed p-p interaction with amino acid residues Try188 and Trp229, in addition to hydrophobic contacts with residues Val106, Try181, Try188 and Leu234, while the benzene ring present in the halolactone scaffold structure interacted via p-p interaction with Try181. Regarding compound Cp21, the replacement of the p-F-Ph group in the R 3 radical obtained hydrophobic interactions with residues Val106, Try188, Phe227 and Leu234, while the interaction via p-p of the molecule occurred due to the halolactone scaffold. The residues identified in the docking are located in the RT allosteric site, constituted by Leu100, Lys101, Lys103, Val106, Thr107, Val108, Val179, Tyr181, Tyr188, Val189, Gly190, Phe227, Trp229, Leu234, Pro236, His235 and Tyr318 (Namasivayam et al., 2019). Thus, it finds that this site is predominantly hydrophobic, and the substituents that promote interactions of this nature contribute to the molecular recognition process (Mao et al., 2012). Furthermore, Cp1 derivative stands out for having a greater number of p-p aromatic interactions, which play a crucial role in the binding of NNRTIs to the RT enzyme site through residues Tyr181, Tyr188 and Trp229 (   When analyzing the results of the docking with those obtained by the 3 D-QSAR model, it is verified that the green region ( Figure 3A) for the compound Cp1 consists of enzyme lipophilic site, where interactions of the substituent R 2 with Val106, Try181, Try188, Thr229 and Leu234 were identified. For Cp21 ( Figure 3D) it is highlighted that the yellow outline indicates that the bulky group in R 3 is unfavorable, as it explores a region farther away from the site, which made p-p interactions with the allosteric amino acid residues impossible. In Figure 4 (A and C) it is observed that the addition of groups in the R 2 position improves the electrostatic properties (positive and negative) of D-halolactone, which can make possible the formation of hydrogen bonds with Tyr188 (Singh et al., 2021), and thus result in improved biological activity due to the increased bindings strength between the compounds and the RT enzyme. Therefore, molecular docking confirmed the predictive capacity of the 3 D-QSAR model and contour maps based on pharmacophoric points of D-halolactone.

Conformational stability (RMSD)
The docked structures of the Cp1 and Cp21 compounds were selected for MD simulations over 100 ns, and the RMSD values were calculated in order to determine the deviation of the main protein chain atoms in relation to the initial enzyme structure, in order to investigate the structural stability of enzyme-binding complexes. As shown in Figure 5, it was observed that after the start of the simulation process, the RMSD values of the free RT (apo enzyme) were fluctuated (1 to 6 Å) up to equilibration. After equilibration (60 ns), the apo enzyme gets stabilized, and this trend continued up to 100 ns ( Figure S1 -supplementary material). Thus, it is observed that the results as a function of RMSD values suggest that large deviations are characteristic for this target (Ivetac & McCammon, 2009).
Compared to the deviation observed for the apo enzyme, the mean variation of RMSD in the RT-Cp1 system was more stable, this improvement being also reported in simulations with NNRTIs inhibitors (Ivetac & McCammon, 2009). As illustrated in Figure S2 (supplementary material), the RMSD of the ligand Cp1 (highlighted in green) shows a conformational switching during the simulation, where the values are are less than 2.0 Å. Thus, this results indicated that the molecule did not make a great rotational change.
For the RT-Cp21 complex there was a relatively larger deviation with a gradual increase up to 3 Å in 4 ns, with successive variations from 3 Å to 7 Å up to 10 ns, followed by the maximum peak of 7.9 Å in the range of 19 ns to 20 ns, but after 60 ns the complex maintaining at a level of 4.6 Å. However, the Cp21 ligand reaching RMSD values lower than 2 Å, suggesting that ones remained stably bound to the protein throughout 100 ns MD simulation process. Thus, MD simulation showed that the performance of this compound occurs as a result of the greater instability of the RT-Cp21 complex, which is directly related to the absence of bulky substituent groups in R 2 capable of conferring p-p interactions in the lipophilic region, as demonstrated by 3 D-QSAR model and molecular docking.

Residue flexibility analysis (B-factor)
In order to evaluate the effects of D-halolactone on the enzyme the calculation of the RT B-factor in the holo (RT-Cp1 and RT-Cp21) form was performed, to analyze the structural conformation changes and flexibility of each residue along the protein chain. The polymerase domain of the p66 subunit is provided with four subdomains: fingers (residues 1-85 and 118-155), palm (86-117 and 156-236), thumb (237-318) and connection (319-426); while the RNaseH domain comprises residue 427 to position 560 (Sarafianos et al., 2009). As seen in Figure 6A, the amino acid sequences in the polymerase domain that showed the highest B-factor values in the study systems were Thr7-Gly18 (Loop A), Lys22-Lys32  The results indicated that the fingers and thumb subdomains are the most flexible in the polymerase, where A to F Loops make up the fingers region (except for the Loop E located at the end of fingers and beginning of the palm), while the G and H Loops are located on the thumb. In accordance with the studies by Hosseini et al. (2016), the data obtained confirm that the fingers region is more flexible than the thumb region.
The highly conserved amino acid regions in the finger and palm subdomains, together with the two thumb a-helices, act as a clamp for positioning the primer in the primer grip next to the polymerase active site (Gutierrez-Rivas & Menendez-Arias, 2001). The terminal 3 0 -OH group of the primer is inserted close to the catalytic triad, and then positioned for nucleophilic attack to the a-phosphate of a nucleoside triphosphate, which results in the incorporation of nucleotides into the growing strand of DNA in the polymerization (Jacobo-Molina et al., 1993;Yang et al., 2017). Its placement is also important for RNA-DNA substrate cleavage in the RNaseH domain of RT (Sarafianos et al., 2004). Furthermore, the nucleotide binding at the active site also induces conformational changes in the enzyme, especially in the finger and palm regions (Li et al., 2000). To evaluate the flexibility of the complexes, fluctuations of each amino acid residue over ther simulation time in the RT allosteric site are show in Figure 6B. From this, it is observed that Lys103, Val106, Thr107, Val108, Val179, Try188, Val189, Gly190, Pro236 and Tyr318 residues showed lower values to RT-Cp1, while Leu100, Lys101, Phe227, Trp229, Leu234 are lower to Cp21, besides the Tyr181 and His235 exhibited similar fluctuations for both compounds, where a small value indicates stability and also the presence of secondary structures such as sheets and helices. Thus, the fluctuation data corroborate with the analyses of the RMSD where the RT-Cp1 is more stable than RT-Cp21 complexes.

Binding free energy calculation (DG bind )
The results of Table 4 show that the DG bind values for the RT-Cp1 and RT-Cp21 complexes were À36.04 kcal/mol and À32.64 kcal/mol, respectively, by the MMGBSA method, while for MMPBSA the values were À30.20 kcal/mol and À29.0 kcal/ mol for the respective systems complexed by Cp1 and Cp21, which demonstrates that the two methods predict that the formation of enzyme-binding complexes are favorable and stable. In both systems, the largest contributions to total free energy were the van der Waals energy (DE vdW ), electrostatic energy (DE ele ) and non-polar solvation energy (DE n-polar, GB/ DE n-polar, PB ), while the polar solvation energy (DE GB/ DE PB ) and total solvation free energy (DG solv, GB /DG solv, PB ) showed to be unfavorable. These results indicate that non-polar solvation energy together with van der Waals and electrostatic interactions act as determining factors for the affinity of Cp1 and Cp21 compounds for the active site of the RT enzyme. It is noteworthy that the van der Waals energy (DE vdW ) has a dominant effect in the binding process of compounds at the active site of RT, demonstrating that hydrophobic interactions play a key role in the binding affinity and stabilization of CP1 and Cp21 ( Figure S3 -supplementary material). It indicates that the optimization of van der Waals interactions between D-halolactone and the allosteric site may help to develop more potent inhibitors against the RT enzyme.

Decomposition of the binding free energy and analysis of the contributions
The residue decomposition method was used to identify the RT enzyme residues involved in the interactions with the inhibitors, as well as to quantify the contribution of each amino acid to the total free energy value obtained by the MMGBSA method. Here, any residue that contributes to the binding free energy values below À1.0 kcal/mol À1 was considered an important residue to the binding process. The interaction plots of the RT-Cp1 and RT-Cp21 complexes are shown in Figure 7. The residue contributions to binding energies using the plugin CHEWD (Raza et al., 2019) with Chimera software (Pettersen et al., 2004) are shown on the right side of Figure 7. From this, the favourable interactions are at the blue of the colour scale, that is, those that contribute to the stabilization of ligand into complex, while the red colour represents interaction by residue with positive values and corresponds to unfavourable interaction values. It is noted in Figure 7A and B that residues Leu100, Val106, Tyr181, Tyr188 and Trp229 contributed favorably to the total binding energy in both systems, and the most of these residues had already been found in molecular docking results. Furthermore, there were no significant repulsive forces in the complexes. The R2 radical resulted in a greater number of interactions for the Cp1 compound, which stands out for its attractive contributions with His235 and Try318, while the substituent in R3 reduced the possibility of binding to the Dhalolactone Cp21. The interactions occurred entirely at the enzyme's allosteric site (Costa et al., 2019), and the key residues Leu100, Val106, Tyr188, Phe227, Trp229, His235 and Ty318 were important in stabilizing the Cp1 ligand and its consequent biological activity. Thus, the combined study of different CADD techniques provided important information for the development of Dhalolactone with greater inhibitory potential against the HIV-1 RT enzyme.

Conclusion
3D-QSAR study for a set of 26 D-halolactone as HIV-1 RT inhibitors was performed using a PLS regression analysis, in which internal and external validations demonstrated that the 3 D-QSAR model is reliable, statistically significant and has good predictive ability. The contour maps of the stereochemical and electrostatic fields showed that the modifications in the R 2 radical of the compounds are favorable for increasing the anti-HIV-1 activity.
The binding mode between selected inhibitors and the RT enzyme was determined by docking, where amino acid residues Val106, Tyr181, Tyr188, Trp229 and Leu234 were identified to be involved in the molecular recognition of Cp1 and Cp21 compounds at the allosteric site, where Cp1 is highlighted by the largest number of p-p interactions by the R 2 radical. Furthermore, the molecular docking corroborated the results obtained in the contour maps, thus confirming the predictive capacity of the 3 D-QSAR model.
Subsequently, MD simulations demonstrated that Cp1 compound is more stable than Cp21 from RMSD values, as well as exhibited smaller fluctuations in the fingers and thumb subdomains and in the RNaseH domain by the B-factor calculations, that is due to structural movements characteristic of this class of enzyme. The binding free energy confirmed the favorable formation of the complexes, with the van der Waals energy playing a crucial role in binding affinity and should be exploited to enable the improvement of the anti-HIV-1 activity. The free energy decomposition revealed that Leu100, Val106, Try188, Phe227, Trp229, His235 and Try318 are key residues for stabilization of the more active derivative Cp1. Overall, the results obtained provided useful information for understanding the mechanisms of action and can be used in designing new HIV-1 RT inhibitors. On the right are shown in blue the residues that contributed the most to the binding free energy, visualized by the CHEWD plugin in the Chimera program.