Ligand-based virtual screening, molecular dynamics, and biological evaluation of repurposed drugs as inhibitors of Trypanosoma cruzi proteasome

Abstract Chagas disease is a well-known Neglected Tropical Disease, mostly endemic in continental Latin America, but that has spread to North America and Europe. Unfortunately, current treatments against such disease are ineffective and produce known and undesirable side effects. To find novel effective drug candidates to treat Chagas disease, we uniquely explore the Trypanosoma cruzi proteasome as a recent biological target and, also, apply drug repurposing through different computational methodologies. For this, we initially applied protein homology modeling to build a robust model of proteasome β4/β5 subunits, since there is no crystallographic structure of this target. Then, we used it on a drug repurposing via a virtual screening campaign starting with more than 8,000 drugs and including the methodologies: ligand-based similarity, toxicity predictions, and molecular docking. Three drugs were selected concerning their favorable interactions at the protein binding site and subsequently submitted to molecular dynamics simulations, which allowed us to elucidate their behavior and compare such theoretical results with experimental ones, obtained in biological assays also described in this paper. Communicated by Ramaswamy H. Sarma


Introduction
Chagas disease (CD), also known as American trypanosomiasis, is an infection caused by the protozoan parasite Trypanosoma cruzi.This disease is classified as a Neglected Tropical Disease and, as its name suggests, it is mostly endemic in continental Latin American countries.But this scenario has been changing over the last decade and nowadays affects populations all over the world, especially in the USA, Canada, many countries in Europe, and some in Africa, Eastern Mediterranean, and Western Pacific (World Health Organization, 2022)-a fact that encourages more research for new effective treatments against this disease.
Currently, the drugs used in the treatment of CD are benznidazole and nifurtimox.Benznidazole had its use approved by the FDA only in 2017, although it was already available in the centers for the control and treatment of diseases in the United States due to the increased number of CD cases in the last decade (Drugs for Neglected Diseases initiative -DNDi., 2017).Nifurtimox was discovered in the 1960s, nevertheless, its effectiveness in different stages of CD is still under discussion.Although benznidazole and nifurtimox are the effective onset of the acute phase, tolerance and serious adverse effects of these drugs are still a problem during respective treatments (Patterson & Wyllie, 2014).For instance: prohibited use by pregnant women and people with kidney or liver failure (World Health Organization, n.d).Particularly, nifurtimox is poorly tolerated among adults in the chronic stage that has a significant risk of serious adverse events, besides being contraindicated for people with a background of neurological or psychiatric disorders (World Health Organization, 2022;Jackson et al., 2010).
In the search for new and selective targets of Trypanosoma cruzi and knowing that it belongs to the kinetoplastid protozoa along with Trypanosoma brucei spp.and Leishmania spp., which have similar biology to each other, but are distinct from other eukaryotic mammalian lineages (El-Sayed et al., 2005).Kinetoplastid proteasome has become an interesting target, especially because that proteasome is a well-known therapeutic target for humans (Adams, 2003).
The proteasome is a multi-subunit complex composed of a catalytic center of cylindrical shape (20S) composed of an outer ring containing seven homologous a subunits, and an inner ring with seven homologous b subunits, where the 20S can be associated with one or two regulatory particles (19S) (Adams, 2003;Tanaka, 2009).The catalytic site of the proteasome is composed of the inner ring b1, b2, and b5 subunits responsible for caspase-like, trypsin-like, and chymotrypsinlike proteolytic activities, respectively, which act in a synergistic way to degrade acid, basic and hydrophobic residues (Tanaka, 2009).When the proteasome is inhibited, its proteolytic activities are disrupted resulting in cell-cycle arrest and apoptosis (Adams, 2003;Tanaka, 2009).
The first study involving kinetoplastid proteasome as a target was published in 2016 by the Genomics Institute of the Novartis Research Foundation (GNF) (Khare et al., 2016).They identified the compound GNF6702 that showed a potency of inhibiting the protozoan chymotrypsin-like activity, without affecting the activity of the human proteasome.In 2019, Wyllie et al. published the first crystallographic structure of Leishmania tarentolae 20S proteasome complexed with compound GSK3494245, which inhibits the chymotrypsin-like activity and presents an IC 50 value of 0.16 lM (Wyllie et al., 2019).In 2020 Nagle et al. published a study with the LXE408 inhibitor, the optimized structure of GNF6702, also complexed with the Leishmania tarentoale proteasome (Nagle et al., 2020).
Based on the promising kinetoplastid proteasome target, we propose the application of the drug repositioning strategy to search new treatments for Chagas disease.For this purpose, different computational strategies were applied, such as protein homology modeling, ligand-based virtual screening, analysis of toxicity predictions, molecular docking, and molecular dynamics.Best virtual hits selected by such workflow were obtained and submitted to experimental validation, i.e. in vitro assays consisting of evaluation of trypanocidal activity against amastigotes forms by IC 50 .Finally, we compared the inhibition potential of the selected drugs with benznidazole and evaluated their possible efficacy in the treatment of Chagas disease.

Proteins homology modeling
For the selection of the sequence of amino acid residues for the target protein, we searched for specific nucleotide sequences of b4/b5 subunits of Trypanosoma cruzi proteasome 20S, considering those available in FASTA format.The b4 subunit of the Trypanosoma cruzi proteasome sequence was retrieved from the UniProt Knowledgebase (Bateman et al., 2021) (UniProtKB-Q4CU77_TRYCC), and the b5 subunit of Trypanosoma cruzi proteasome sequence was retrieved from the National Center for Biotechnology Information (NCBI) (Agarwala et al., 2016) (GenBank: RNC60113.1).
Therefore, we search for homologous protein sequences with experimentally resolved 3D structures using the BLAST (BLAST (Basic Local Alignment Search Tool), n.d) software, which can recognize similarity regions between protein sequences and rank them according to their percentage of similarity.More specifically, we used the BLASTp algorithm (BLAST (Basic Local Alignment Search Tool), 2022) that allows one to choose proteins with maximum sequence identity in comparison to the template protein sequence.The selected protein sequences were submitted to multiple sequence alignment using the Clustal Omega software (Clustal Omega, 2022;Madeira et al., 2022).Results obtained using Clustal Omega were manually refined by structural superposition of the crystallographic templates (deposited in the Protein Data Bank) and used to construct b4/b5 subunits of the proteasome, with a subsequent observation of respective insertion and deletion positions of amino acid residues throughout the sequences (see Scheme 1 and 2 in SI).
For modeling the 3D structures of the proteasome b4/b5 subunits we used the MODELLER software ( � Sali & Blundell, 1993), based on the structures of the template proteins.It optimizes the spatial restraints derived from the alignment of the protein sequences, expressing these results in probability density functions for the restrained characteristics.These restraints are applied to distances between a-carbons, N-O distances from the main chain, and dihedral angles of the main and side chains.In this way, the 3D model of the protein is built with the minimum amount of structure violations regarding these restraints that come from the template proteins ( � Sali & Blundell, 1993).Finally, hydrogen atoms are not present in the final homologous models, so, it was necessary to add them to the structures.For this purpose, we used the Protein Preparation Wizard tool implemented in Maestro software (Madhavi Sastry et al., 2013).

Model validation
First, we used the webserver Saves (UCLA-DOE LAB -SAVES v6.0., 2022) to run the PROCHECK, Verify 3D, and ERRAT software, which is commonly used to validate protein structures.Then, we used the 'Coarse Packing Quality Control' from the What IF (Vriend, 1990) software to evaluate the general quality of models according to the atomic contacts.
PROCHECK is a set/package of software that provides a detailed assessment of the stereochemistry of a protein structure (UCLA- DOE Institute, 2019).We analyzed the Ramachandran plot, which provides measures the percentage of residues that are within favored regions in the plot, besides the five main chain properties of the structure: the planarity of peptide bonds (torsion angles x), the tetrahedral distortion of Ca (torsional angles f, zeta), and the standard deviation of hydrogen bond energies of the main chain (Kabsch & Sander, 1983).The Verify 3D was used to identify the compatibility between the protein sequence and its generated 3D structure.The study of the compatibility between the protein sequence and its 3D structure is based on the separation of amino acid residues in three different environments: total area of the side chain present in the hydrophobic cavities of the protein, fraction of the area of the side chain exposed to solvents (or other polar molecules), and the local secondary structure (Bowie et al., 1991;L€ uthy et al., 1992).ERRAT evaluates the chemical environment of the protein through the combination of three different atoms that represent non-covalent interactions with each other: (CC), (CN), (CO), (NN), (NO), (OO).The interactions between these pairs of atoms should obey the following restrictions: atoms must be at a distance smaller than 3.5 Å, and atoms covalently bonded to each other or belonging to the same residue are not considered.In this way, the software calculates the fractions of interactions, f (interaction), based on the number of atoms present in the protein.The result of the average fractions is normalized according to the relative abundance of C, N, and O atoms, then the non-random frequencies of different types of interaction are indicated (Colovos & Yeates, 1993).Finally, What IF was used to check the structures by evaluation of atomic contacts, i.e. assumes the distribution of different types of atoms is determined around the amino acid fragments (Vriend & Sander, 1993).As a result, it generates the quality index used in the evaluation of the general quality of the model, which refers to the agreement between the average distributions obtained by a database and the contacts observed in the analyzed protein.For example, in residue analysis, if it has a score of À 5.0 or less, it indicates three possibilities: the residue makes symmetry contacts, or it is in contact with a ligand or ion, or there is some contact problem (Vriend & Sander, 1993).

Databases processing
Since there is no single-official database of drugs approved by the US FDA, our group compiled the drugs approved by different agencies or organizations with available structures in only one database.This drug library was compiled from three main databases -DrugBank, FDA, and ZincWorld, comprising approximately 8,000 molecules.More specifically, we used: DrugBank (Release Version 5.1.5)(Wishart et al., 2006), Zinc15 (subset: world not FDA) (Sterling & Irwin, 2015), Selleckchem subset FDA-approved Drug Library (https:// www.selleckchem.com/),Enamine subset approved Drugs Collection (https://www.enaminestore.com/platedsets/fdaapproved)e-Drug3D (Douguet, 2018), and the BindingDB subset of FDA approved drugs (Liu et al., 2007).Beforehand, all drugs present in each database were prepared as follows: removal of duplicate/redundant molecules with the software OpenBabel (O'Boyle et al., 2011) according to their InChI (IUPAC InchI identifier); addition of hydrogens; assignment of bond order and partial charges; and saving final molecules in mol2 extension.Drugs were also processed with FILTER (OpenEye.Scientific Software., 2020) using its default functions of canonicalization, pKa normalization, and salt removal.Afterward, the database was processed by the OMEGA (Hawkins et al., 2010;OpenEye. Scientific Software. OMEGA 4.0.0.4., 2020), software in order to generate conformers, with the flipper flag enabled (for specifying unassigned stereocenters), a strain energy (energy above the global minimum) lower than 9.0 kcal/mol, and a mean square deviation (RMSD) of 0.6 Å as a cutoff for conformer identity.
For the Ligand-Based Virtual Screening (LBVS) approach here performed, up to 300 conformers were generated per molecule.

Virtual screening
The protein models generated by homology modeling were used as a target, i.e. those regarding b4/b5 subunits of Trypanosoma cruzi proteasome 20S.The ligand selected as reference for the LBVS was the compound GSK3494245, whose crystallographic pose was obtained through the protein-ligand complex PDB ID 6QM7 (Leishmania tarentolae proteasome 20S subunit complexed with GSK3494245, resolution 2.80 Å) (Wyllie et al., 2019).The crystallographic pose of the reference compound (GSK3494245) was kept unchanged and it was used as a query in subsequent ligandbased strategies.
Our virtual screening workflow was carried out with emphasis on LBVS, considering the query of ligand GSK3494245 as well as shape and electrostatic potential similarities.For this, we used the ROCS (OpenEye Scientific Software.ROCS 3.2.1.4.) and EON (OpenEye Scientific Software.EON 2.2.0.5.) software, respectively, for each database, separately.We selected the 1,000 top-ranked molecules according to their ROCS Tanimoto Combo values, followed by the selection of the 500 top-ranked molecules, according to their EON Tanimoto Combo values.Reminiscent compounds were then submitted for prediction of relevant toxicity endpoints by using DEREK (Deductive Estimate of Risk from Existing Know-edge) (Greene et al., 1999).Drugs that showed any alert fired for carcinogenicity, mutagenicity, or teratogenicity were discarded, while alerts such as 'certain' and 'plausible' for any other endpoints were also considered to filter molecules out.Finally, the 'survivor' compounds were submitted to docking simulations using GOLD (Verdonk et al., 2003) to inspect their interactions within the T. cruzi proteasome binding site.

Validation of methodologies
Validation of the LBVS strategy, more specifically of the query (reference molecule) used with the ROCS software, was carried out by analyzing ROC (Receiver Operating Characteristic) curves and corresponding AUC (Area Under the Curve) values.ROC curves may be defined as probability curves and the corresponding values of AUC indicate the degree of separability or classification of the model under evaluation -so that the higher the AUC value, the better the recovery of active ligands by the model.To do so, we created a database consisting of 13 active ligands, plus 13 inactive ligands, and decoys generated on the web server DUD-E (Mysinger et al., 2012).DUD-E generates 50 decoys for each active compound based on their physicochemical properties, but with different 2-D topologies.Moreover, the ROCS capacity of recovering active ligands within such a database was analyzed in terms of respective ROCS Tanimoto Combo values (calculated in terms of shape and color).

Docking simulations
Docking parameters were also validated by ROC curves analysis, using the same database aforementioned.So, we performed a cross-docking using the crystallographic ligand GSK3494245 and the b4/b4 subunit of proteasome T. cruzi modeled and then we analyzed the capacity of the docking parameters to reproduce the crystallographic pose of the inhibitor and to recover the active ligands within a database in terms of respective score values, increasing the chances to select a bioactive molecule.
The docking solutions were obtained using the GOLD (Verdonk et al., 2003) software and the respective CHEMPLP score function.We used the modeled protein of the Trypanosoma cruzi proteasome, a sphere centered at the coordinates X:146.1752;Y: À 15.1932 and Y: À 8.3950, and 11 Å radius, setting PHE24 as flexible residue.It is worth mentioning that such docking coordinates are very close to that used for docking compounds with the Leishmania tarentolae proteasome (X:140.1824;Y: À 15.8925 and Z: À 8.3427, R:11.5 Å).

Molecular dynamics
For the conduction of molecular dynamics studies, we used the GROMACS 2019.3 (Ch� avez Thielemann et al., 2019) software, with the Charmm36 force field (Huang et al., 2017).Moreover, we used the modeled protein of b4/b5 subunits of the Trypanosoma cruzi proteasome and three ligands (topranked hits obtained from virtual screening), which were parameterized by CGenFF (Vanommeslaeghe & MacKerell, 2012).For Leishmania tarentolae was used the crystallographic proteasome b4/b5 subunits complexed with GSK3494245.
The initial coordinates used in the molecular dynamics simulations were extracted from the lowest energy conformation obtained by molecular docking of each one of the three ligands and we applied the following conditions: the Trypanosoma cruzi proteasome b4/b5 subunits were placed into a cubic box with dimensions of 10.84 � 10.84 � 10.84 nm, and solvated with � 39,770 water molecules of TIP3P, using the editconf and solvate tools (Jorgensen et al., 1983).To neutralize the charges in this system we used NaCl.In the case of Leishmania tarentolae, the initial coordinates used in the molecular dynamics simulations were extracted from the crystallographic pose and applied the following conditions: Leishmania tarentolae proteasome b4/b5 subunits were placed into a cubic box with dimensions of 10.75 � 10.75 � 10.75 nm, and solvated with � 38,770 water molecules of TIP3P, using the editconf and solvate tools and the charges was neutralized using NaCl.
All the systems were minimized using the steepest descent method.This method consists of a first-order descent method, which utilizes the gradient of the potential-energy surface search path towards the nearest energy minimum, avoiding unfavorable contacts between atoms, so that we archived the convergence at potential energy below 500 kJ/mol.nm.We used the canonical ensemble (NVT) and the isobaric-isothermal ensemble (NPT) simulations to reach the equilibrium of each system, keeping the pressure constant at 1 bar through the Berendsen barostat and temperature of 300 K, with time couplings of 2.0 and 0.1 ps, respectively.The short-range nonbonded cutoff is 1.2 nm, and the van der Waals forces are truncated at 1.2 nm with an 'inner' cutoff of 1.0 nm while electrostatic forces are calculated with PME and a real-space cutoff of 1.2 nm (Lemkul, 2019).For all three systems consisting of ligand-protein complexes, the total time of simulation was 150 ns, with an integration time of 2 fs.The analysis of number of hydrogen bonds in function of time was compute using gmx hbond function.
To calculate the binding free energy for the three modeled systems were applied the MM/PBSA (Molecular Mechanics/Poisson Boltzmann Surface Area) method.In this method, only the initial and final states of the systems are evaluated to estimate a free energy change (Kollman et al., 2000) according to the equation: The term G complex corresponds to the Gibbs free energy of the ligand-protein complex, while the terms G protein and G ligand are the total free energies corresponding to the protein and the ligand, respectively, in the presence of a solvent.
For that, the individual free energies must be calculated for each term according to the equation: where x can be the protein, the ligand, or the complex, the E MM term is the average potential energy in a vacuum calculated using molecular mechanics and G solvation is the free energy of solvation.
The term E MM includes the energy of bonded interactions such as bonding, angle, dihedral, and improper interactions, in addition to non-bonded interactions such as electrostatic and Van der Waals energy calculated using the Coulomb and Lennard-Jones (LJ) potential function, respectively.When the approach occurs in a single path, the conformation of the protein and ligand in its bound and unbound forms are supposedly identical, thus, DE binding is considered zero, leaving only the unbound energies.
Finally, the free energy of Gibbs solvation (G solvation ) is calculated from two new terms: G polar and G nonpolar .The G polar term is the electrostatic contribution to the solvation free energy, calculated using the Poisson-Boltzmann96 equation.The G nonpolar term consists of the non-electrostatic contribution to the free energy of solvation, divided between the work done by the solute to create a cavity in the solvent and the van der Waals attractive energy between solvent and solute.These G polar and G nonpolar energy components are calculated through the APBS program, using the g_mmpbsa subroutine implemented in the GROMACS package by Kumari et al. (2014)

In vitro Trypanocidal activity assay
Trypanocidal activity of the drug repurposing and benznidazole (BZN) was evaluated against amastigotes forms of the Tulahuen strain expressed beta-galactosidase.Briefly, LLCMK-2 cells (2.5 � 10 4 cells.mL À 1 ) were resuspended in RPMI-1640 medium without phenol red (Sigma-Aldrich), supplemented with 5% bovine fetal serum (GIBCO, Grand Island, NY, USA), 100 IU mL À 1 penicillin G, and 100 mg mL À 1 streptomycin (Gibco-BRL, Grand Island, NY, USA) and seeded in 96-well microplates.After 3 h, the cells were infected with 5.0 � 10 5 trypomastigotes forms of T. cruzi Tulahuen strain stably expressing the b-galactosidase gene from Escherichia coli and incubated for 48 h.After the infection period, the plates were washed to remove the free trypomastigotes forms of T. cruzi.The infected cells with the intracellular amastigotes forms were incubated with the compounds or benznidazole (BZN), in serial concentrations between 500 to 3.9 mM.After 72 h, 50 mL of PBS (phosphate buffered saline) containing 0.3% of Triton X-100 and 400 mM chlorophenol Red-b-D-galactoside (CPRG) were added.Plates were incubated at 37 � C for 4 h and the absorbance was read at 570 nm.The BZN was used as positive control and the culture media as the negative control.The concentration of the compound corresponding to 50% trypanocidal activity in amastigote forms was expressed as the IC 50amastigote .

In vitro cytotoxicity assay
The LLCMK2 cell viability was assessed using the classical [3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide] (MTT) colorimetric assay to determine selectivity against T cruzi.For this purpose, the same protocol established for trypanocidal activity was maintained.Briefly, 2.5 � 10 4 cells were incubated for 48 h in a 96-well microplate for cell culture.After this incubation period, the compounds or BZN were added (concentrations 500 to 3.9 mM in serial dilution/ compounds solubilized in DMSO 0.5%) in a final volume of 200 mL.Cells were incubated for 72 hours at 37 � C.After incubation with the ligands and their complexes, the medium was removed and added 50 mL of MTT (5.0 mg mL À 1 ) was diluted in phosphate-buffered saline (PBS).The precipitated blue MTT formazan was then dissolved in 50 mL of DMSO, and the absorbance was measured at 570 nm in a VARIAN CARY-50 plate reader MPR multiwell.Cell viability was expressed as the percentage of absorption values in treated cells compared with untreated (control) cells.EC 50 (cytotoxic concentration of 50% of the cells) was also calculated.Thus, SI is defined by the ratio of CC 50 to IC 50 .
The effectiveness of the compounds was evaluated by the value of IC 50 (concentration (mM) that inhibits the growth of the parasite by 50%), determined by the dose-response curve using the statistical software GraphPad Prism 5 for Windows (GraphPad Software, San Diego, CA, USA)

Results and discussion
In this work, we chose to use the proteasome as a biological target for new putative inhibitors, with trypanocidal properties.According to Khare et al. (2016)-also later confirmed by Wyllie et al. ( 2019) -, proteasome inhibitors interact at the interface of the b4/b5 subunits.Therefore, we decided that it would not be necessary to consider all of the seven b subunits of the proteasome (or even the entire proteasome) so we only considered the b4/b5 subunits.Worth reminding that the species Trypanosoma cruzi does not present crystallographic structures of its proteasome elucidated, making it necessary to build these protein models, employing homology modeling.

Proteins homology modeling
Protein homology modeling (also known as comparative modeling) is a method used to build 3D protein structures (Franc¸a, 2015), from one or more sequences of homologous proteins that already have their structures experimentally elucidated (Sliwoski et al., 2014;Xiang, 2006).This method is based on the fact that proteins that are evolutionarily related share three-dimensional structure features that have been conserved during evolution and thus can be classified as belonging to the same family (homologous proteins) (Liu et al., 2004).Hence, the available structure of only one protein could allow one to describe the three-dimensional structure of all proteins that belong to the same family (Xiang, 2006).
In this work, we applied homology modeling intending to build a model for the Trypanosoma cruzi proteasome (20S) and use it on structure-based computational experiments such as docking and molecular dynamics simulations.This should help us to elucidate possible chemical interactions between the proteasome ligand site and potential ligands, increasing the probability of selecting the most promising compounds as proteasome inhibitors.
We used BLAST (Basic Local Alignment Search Tool) (BLAST (Basic Local Alignment Search Tool), 2022) to select template proteins and the best results obtained for the Trypanosoma cruzi b4 subunit include Leishmania tarentolae 20S proteasome (PDB code 6QM7), Bos taurus 20S proteasome (PDB code 1IRU), and Mus musculus 20S proteasome (PDB code 3UNB).Likewise, the best results obtained using BLASTp for the b5 subunit include the 20S proteasome from Leishmania tarentolae (PDB code 6QM7), the 20S proteasome from Bos taurus (PDB code 1IRU), and the 20S proteasome from Mus musculus (PDB code 3UNB).
After selecting the sequences of template and target proteins (see more details on Material and Methods), we performed multiple alignments of these sequences using Clustal Omega (Multiple Sequence Alignment Tool) (Clustal Omega, 2022).Multiple alignments of protein sequences can be used to identify structurally conserved regions of proteins as well as to find the best local and global alignment between sequences that will be used to build a comparative model (Cavasotto & Phatak, 2009).All the sequence alignment software always directly compares the sequences, for example, Clustal Omega first aligns all pairs of sequences separately, then produces guide trees that were used to align larger groups of sequences.
Multiple alignments obtained for the Trypanosoma cruzi proteasome b4 subunit showed identity matrix values of 69.90, 37.95 and 36.92% for Leishmania tarentolae, Bos taurus, and Mus musculus, respectively; while b5 subunit showed identity matrix values of 73.67, 61.19 and 62.19% for the same respective species.The b4 subunit does not exert any known catalytic function resulting in low conservation of its amino acid residues between these different species.In contrast, the b5 subunit, also known for its chymotrypsin-like activity, maintains a high degree of conservation between species.Furthermore, when comparing the b4 and b5 subunit's identity values between Leishmania tarentolae and Trypanosoma cruzi, one may see that the latter presents an interesting identity value between residues, even belonging to the same family The second step was to analyze the alignment obtained by CLUSTAL Omega according to the number of gaps present in the sequence as well as the location of these gaps in the alpha helices and beta sheets since insertion-deletion mutations (indel) are rare process among homologous sequence (Chowdhury & Garai, 2017).It is expected that the occurrence of gaps/insertions of amino acid residues in the protein sequences usually occurs in regions outside the secondary structures of the proteins that remain more conserved during evolution (Vyas et al., 2012).To such issues along the alignment, we refined the multiple alignments by visual inspection, based on the superposition of the crystallographic protein structures here used as templates and comparison of insertion and deletion regions in the alignment with their corresponding positions in the secondary structures of proteins (see Scheme 1 and 2 in SI).
After aligning and refining the alignment including the sequences of the target as well as the template proteins, we were able to proceed to build protein models.Among the different types of methods used to build proteins threedimensional structures based on structural homology, in this work we chose the MODELLER software, which is based on the satisfaction of spatial restraint ( � Sali & Blundell, 1993).Thus, the 5 top-ranked models obtained were subsequently validated in order to check the one with fewer errors and deviations from normality concerning protein structures ( � Sali & Blundell, 1993).

Validation of proteins obtained by homology modeling
Homologous models can be used for both docking and virtual screening as long as their structures are properly validated (Ibrahim Uba & Yelekc ¸i, 2019; Uba & Yelekc ¸i, 2020).The structure validation can be done through the stereochemical properties of the model, such as bond angles, and bond lengths, as well as the evaluation of spatial properties.In this case, the PROCHECK programs (see Materials and Methods) were used.Furthermore, the normality of the atomic contacts of the amino acid residues of the homologous model was evaluated using What IF.
For the five proteins generated for the b4 and b5 subunits of the Trypanosoma cruzi proteasome using MODELLER, the stereochemical quality evaluated from the Ramachandran plot indicated that model 2 presented the best results.According to the Ramachandran plot, 334 residues (92%) are found in the most favored region, 28 of the residues (7.7%) are found in the generously allowed region and only 1 residue (0.3%) is found in the region disallowed (see Figure 1 in SI).Using Verify 3D, the result obtained for model 2 was 89.93% and for model 3 90.66%,values within the recommended minimum of at least 80% compatibility.The quality of the chemical environments for the b4 and b5 subunits was evaluated using ERRAT (Colovos & Yeates, 1993), with values of 90.3553 and 79.7927 for model 2, and 91.9192 and 73.057 for model 3.Both models 2 and 3 presented their values close to and above the accepted value, but the main amino acid residues of the binding site are found in the b5 subunit.Normality of the atomic contacts of the amino acid residues and the quality of the structure according to the average distribution of the contacts observed in the protein analyzed by WHAT IF -Coarse Packing Quality Control (Vriend & Sander, 1993), have the lowest values for model 2.
Values obtained for models 2 and 3 are very close to each other, in special when comparing the values of Verify 3D and ERRAT.However, considering the general average of values, model 2 was chosen to be used in the next stages of the in silico study.Validation values obtained for each protein model can be found in Table 1 in SI.

Drug repurposing using LBVS and analysis of drugprotein intermolecular interactions
As previously mentioned, LBVS was conducted using three drug databases (DrugBank, FDA, and ZincWorld) as well as the compound GSK3494245 as a query (see Figure 1).Validation of such query was carried out in advance of applying it to similarity search by ROCS, to assess the performance of such strategy for screening the most promising drugs.Therefore, we checked if this query was able to recover active ligands out of a database containing 13 active compounds, 13 inactive compounds, and 650 decoys generated by the DUD-E webserver (Mysinger et al., 2012).Ranking their obtained ROCS Tanimoto Combo values allowed us to build a ROC curve and obtain respective AUC values.The AUC value obtained was 0.856 (see Graphic 1 in SI), which indicates that the protocol using a such query can distinguish active from non-active compounds with a success rate of 86%.
In this manner, we were able to use the compound GSK3494245 as a query to perform LBVS using our compiled drug database of � 8,000 molecules.As described in Methods, the first screening methodology consisted of shape-based similarity using ROCS, and the second one consisted of electrostatic potential similarity using EON.The output of 500 compounds, from each database, with the highest Tanimoto Combo values, were selected to proceed to toxicity predictions.
Since this compounds database consists of already approved drugs, we infer that their pharmacokinetic (ADME) properties are suitable for oral use in humans and, therefore, we did not conduct predictions of such parameters.However, we thought it would be prudent to double-check the toxicological profile of such compounds, so we have performed toxicological predictions for them, using DEREK software (Greene et al., 1999).In this case, all drugs for which an alert has been triggered for carcinogenicity, mutagenicity, and teratogenicity were discarded.Likewise, drugs for which some structural alert was fired with a prediction level higher than 'equivocal' were also discarded, once at this level 'there is already an equal weight of evidence for and against the proposition, according to the software user's guide.Finally, a total of 423 drugs survived these filters.
The last stage of the screening workflow consisted of performing docking simulations with the remaining drugs, using the same region of the binding site where the compound GSK3494245 is located.After validating the docking parameters by assessment of the obtained ROC curve (see Graphic 2 in SI), the idea here was to evaluate the binding modes as well as the drug-proteasome interactions to select those making similar interactions to the observed between GSK3494245 and the protein, thus increasing the chance of selecting an active drug.
Finally, using the combined protocol of LBVS that filtered 423 different drugs with the docking simulations, were selected 150 drugs based on the best score function, then were filtered drugs with similar docking pose with the crystallographic ligand and those who were capable to make similar chemical interactions crystallographic ligand.The most promising drugs selected as virtual hits were: Bepotastine besilate, Famotidine, and Sorafenib (Figure 2) to be tested as potential proteasome inhibitors.Bepotastine besilate is used for the treatment of allergic conjunctivitis, rhinitis, and urticaria, all associated with pruritus; famotidine is used in the treatment of gastric or duodenal ulcers; and sorafenib is used for the treatment of hepatocellular carcinoma.
The docking solutions are shown in Figure 3.It is possible to observe the similarity of the crystallographic pose of GSK3494245 with the drugs here selected as promising proteasome inhibitors, according to their superposition.Also, bepotastine besilate, famotidine, and sorafenib present some structural differences that enable to explore of the chemical space of scaffolds capable of interacting at the considered binding site of b4/b5 subunits of Trypanosoma cruzi proteasome.
The top-ranked docking pose obtained for the drug Bepotastine besilate shows potential interactions with the residues PHE24, ILE27, THR207, SER227, ALA252, ASP321, VAL334, GLY335, and SER336 (Figure 4, 1).PHE24 can form a p-stacking interaction, a T-shaped p-p interaction, and a p-alkyl interaction.The residues SER336, THR207, and SER227 may form conventional hydrogen bonds with the drug, whereas ILE27, ALA252, and VAL334 may form hydrophobic interactions.Lastly, GLY335 can interact via a p-donor hydrogen bond and ASP321 via a carbon-hydrogen interaction.
At last, the drug Sorafenib showed potential docking interactions with PHE24, GLY229, GLN230, ALA252, MET303, TYR319, ASP321, SER333, VAL334, GLY335, SER336, and ALA 375 (Figure 4, 3).PHE24 establishes a p-stacking interaction as well as a p-alkyl interaction, which are expected for such residue.The residues TYR319, GLY335, and SER336 can form conventional hydrogen bonds with the drug, which are also expected interactions with the ligand.ALA252, VAL334, and ALA375 are involved in hydrophobic interactions.ASP321 and SER333 make van der Waals as well as hydrogen interactions, while residue GLN230 makes a halogen-like interaction, whereas GLY229 and MET303 make a p-amide stacking interaction.
The three selected drugs may present interactions in common with the residues PHE24, ALA252, VAL334, GLY335, and SER336.From these, PHE24 is responsible for p-stacking-type interactions, GLY335 and SER336 are responsible for conventional hydrogen bond interactions, and ALA252 and VAL334 are involved in hydrophobic interactions.Finally, only the type of carbon-hydrogen interaction was not observed for the drug Famotidine.

Molecular dynamics
Since b4/b5 subunits of the T. cruzi proteasome have been obtained by homology modeling, we initially analyzed such model by molecular dynamics (MD).So, the b4/b5 subunits were placed in a cubic box and solvated with water, and the charges of the system were neutralized by the addition of NaCl.It has been analyzed during a trajectory of 150 ns and the following information was obtained: Solvent-Accessible Surface Area (SASA), total Radius of gyrate of the protein in space (Rg), Root Mean Square Deviation (RMSD), and Root Mean Square Fluctuation (RMSF), as depicted in Figure 5a-d Furthermore, we carried out molecular dynamics simulations considering the complexes formed between T. cruzi proteasome b4/b5 subunits and the three selected drugs.MD parameters of each system were similar as described earlier and the same metrics have been generated to analyze results.Also, we should mention that, for each ligand, we considered the best docking pose with the highest score among those with consensual docking poses.The behavior of the three complexes was analyzed during a trajectory of 150 ns, allowing the analysis of their stability, and the flexibility of the residues present in the b4/b5 subunits of the proteasome.The SASA and Rg measurements were evaluated determine the stability and integrity of the complexes for each trajectory according to their compaction degrees, and the values obtained for SASA and Rg were extracted every 2 fs and thus plotted.The SASA graph (Figure 5c -red, green, and blue lines) shows the degree of compaction of each complex, where the solvent-accessible surface area of protein kept constant during the trajectory, with no significant variations, indicating that the secondary and tertiary structures of complexes were maintained during the entire simulation, without unfolding.The Rg measurements (Figure 5d -red, green, and blue lines) confirm the SASA analysis since there were no large variations in Rg values during the simulations.
In addition, the RMSD of the complex was also measured along the trajectory to evaluate the balance and stability of the complex's formation, permitting it to observe whether the structure of the complex converges on its most stable conformation.According to the graphs presented in Figure 5a, we can observe that such three complexes behave in different ways to achieve stable conformation.For the complex with bepotastine besilate (red line) and sorafenib (blue line), we observe that is necessary �30 ns for the complex to achieve stability, with a variation of RMSD around 0.3 nm, with no variations during the rest of the trajectory.In the case of the complex with famotidine (green line), the complex needed �30 ns to achieve the first stability, however, it changes the conformation around �75 ns of trajectory, where it reaches the final stability with no large variations until the final 150 ns of trajectory, with none of its conformation changes exceeding 0.4 nm.The occurrence of plateaus in the RMSD graphs for the complexes indicates that the dynamics of the system can contribute to the protein assuming new conformations, requiring low energy.This can be attributed to the fact that we are working not with the hole proteasome 20S complex, but only with the b4/b5 subunits of the Trypanosoma cruzi proteasome.
Interestingly, the RMSD graphs for ligands only (Figure 5e -red, green, and blue lines) show that there were no large variations in the RMSD values in none of the simulations, so, the ligands seem to remain in their initial docking poses.
The analysis of the RMSF graphs for the complexes (Figure 5b -red, green, and blue lines) indicates a large fluctuation in the amino acids that make up these regions.All the large fluctuations (peaks above 0.3 nm) observed are outside the ligand site and are regions that are not being influenced by the ligand, moreover, all the fluctuations follow the pattern of the apo T. cruzi proteasome (b4/b5 subunits).
In order to compare the results obtained by MD simulations of the modeled proteasome subunits complexes we run the MD simulations of crystallographic b4/b5 proteasome subunits of L. tarentolae complexed with GSK3494245 (see Figure 5 in SI).As expected, results from SASA and Rg were similar comparing the complexes of T. cruzi (Figure 5c,d) and L. tarentolae (Figure S5e,f).RMSD of ligands sorafenib (Figure 5e) and the crystallographic (Figure S5d) are very similar with a level around 0,1nm.The RMSD of the protein of L. tarentolae achieves stability around 0.5 nm (Figure S5c), so, compared with the results from complexes of T. cruzi (Figure 5a), bepotastine besilate and sorafenib show very similar behavior.Finally, comparing the RMSF of L. tarentolae with those from T. cruzi, there is a similar behavior, except for famotidine.These results suggest that the MD simulation of modeled proteasome subunits complexes of T. cruzi showed a robust result when compared with the crystallographic complex of L. tarentolae.
Finally, we calculated the number of hydrogens bonds during the MD trajectory between the drugs and the amino acid residues in the binding site.Figure 6 shows that famotidine (green) presents the high and constant number of H bonds during the trajectory in agreement with the docking analysis, while Sorafenib presents a high number of H bonds in some points of the trajectory, and bepotastine besilate presents at least one H bond during the trajectory.
To complete the analysis of complexes, we determine the binding free energy using the MM/PBSA method.The free binding energies calculated for the three drugs complexed with the T. cruzi model were all negative, indicating that there is an energetic favor in the formation of the complex, although all complexes showed very similar values (Table 1).The drug bepotastine besilate presented the lowest value of binding energy (-105 KJ/mol), although it did not present the lowest value of Van der Waals energy.Interestingly, sorafenib showed the lowest Van der Waals energy value with À 179 KJ/mol, responsible for the greatest contribution to its binding energy.Finally, the drug famotidine had the lowest electrostatic energy value of À 200 kJ/mol, a much lower value when compared to bepotastine and sorafenib with À 63 and À 36 KJ/mol, respectively, which can be attributed to the ability of famotidine to form seven conventional hydrogen bonds with the amino acid residues at the binding site.So, according to its highest absolute value of biding energy, it is expected that bepotastine besilate shows the best inhibition of growth result followed by sorafenib and famotidine (Table 1).However, the high value of the Van der Waals energy released by sorafenib indicates that it forms a stable complex with the proteasome and, consequently, a relevant inhibition of parasite growth despite not having the highest absolute free energy value.

In vitro assays of compounds selected by LBVS
For the in vitro assays of the parasite Trypanosoma cruzi, cells of the LLCMK2 lineage were used, containing only the intracellular amastigote forms, whose procedure is described in the materials and methods section.None of the selected compounds was effective in eliminating the intracellular form of the parasite.
Although Sorafenib showed an IC 50 below 10 mM and similar to the activity of BZN, the cytotoxicity of this drug was equivalent to its trypanocidal activity demonstrating that this compound was not specific to the parasite (Table 2).Famotidine and Bepotastine besilate were not toxic to epithelial cells and were not selective for parasites (Table 2).
It is worth mentioning that Sorafenib most common side effect profile is hand-foot skin reaction (Keating, 2017), which explains the high toxicity to epithelial cells used in the assays and, although BZN is less cytotoxic to epithelial cells (IC 50 ¼ 219.8 mM), it also causes side effects probably due to reactive oxygen species generated by the reduction of the nitro to an amino group by type II nitroreductases (NTRs) present in the mammalian organism (Patterson & Wyllie, 2014)

Conclusions
In the search for more effective treatments against the Chagas disease, the exploration, as well as continuity of the study of new biological targets and their inhibitors, is essential to achieve the desired results.Thus, in our computational studies, we used the Trypanosoma cruzi proteasome as a biological target to explore new inhibitors through drug repurposing using LBVS approaches.
The validation of the LBVS strategy together with the prediction of the toxicity of the compounds allowed a more accurate repositioning of drugs to act as potential inhibitors of the Trypanosoma cruzi proteasome.Moreover, the access to the active site structure of the T. cruzi proteasome through homology modeling was successful, as well as its use in docking simulations, which enabled the analysis of the intermolecular interactions of the structure-ligand complex to select three different drugs.From in vitro experiments, the drug sorafenib showed the most interesting IC 50 , but with a low selectivity index for the epithelial lineage used in the assay, so that becoming interesting to test this drug in a different cell to confirm if the cytotoxic effects are specific to the line cells used in the assay.This finding could be supported by the fact that, in the molecular dynamics simulations of sorafenib, the values of SASA, Rg, and RMSD of both complex and ligand were the most consistent among the three drugs, with no large variations in the values along the 150 ns MD trajectory.

Figure 1 .
Figure 1.Workflow of drug repurposing by LBVS to obtain potential hits with interest in the Chagas disease treatment.

Figure 2 .
Figure 2. Drugs selected as virtual hits after application of LBVS workflow.

Figure 3 .
Figure 3. Superposition of the GSK3494245 crystallographic pose (in stick representation, with carbon atoms in black) with repurposed drugs: in (1), Bepotastine besilate (with carbon atoms in green), (2) Famotidine (with carbon atoms in purple), and (3) Sorafenib (with carbon atoms in cyan).At the center, the superposition of the GSK3494245 crystallographic pose and the three drugs is represented within the binding site of the T. cruzi proteasome b4/b5 subunits (in Ribbons representation).
, black line).According to the results of SASA and Rg (Figure5c,dblack line), the b4/b5 subunits of the T. cruzi proteasome remained compact during the trajectory.The SASA of the protein model kept constant with values � 200 nm 2 , i.e. showed no significant variations indicating that the secondary and tertiary structures of the protein model maintained without unfolding during the entire simulation.Rg measurements (Figure5d-black line) also show no large variations during the simulation, allowing us to infer that the structural dimension of the b4/b5 subunits of the T. cruzi proteasome remained compact and stable throughout the trajectory.The RMSD shown in Figure5a, black line, indicates that the protein model is stable during the first � 75 ns; however, it changes its conformation when it reaches final stability and shows some variations until the final 150 ns of the trajectory -without exceeding 0.3 nm.Moreover, the RMSF graph (Figure5b-black line) shows no large fluctuations.

Table 2 .
Trypanocidal activity and cytotoxicity of the tested compounds.IC 50 (the ability of the test compound to eliminate 50% of intracellular amastigotes of the Tulahuen strain of T. cruzi), EC 50 (the ability of the test compound to eliminate 50% of epithelial cells, LLCMK2), and SI (selectivity index).