Structural characterization of cassava linamarase-linamarin enzyme complex: an integrated computational approach

Abstract Cassava linamarase is a hydrolyzing enzyme that belongs to a glycoside hydrolase family 1 (GH1). It is responsible for breaking down linamarin to toxic cyanide. The enzyme provides a defensive mechanism for plants against herbivores and has various applications in many fields. Understanding the structure of linamarase at the molecular level is a key to avail its reaction mechanism. In this study, the three-dimensional (3D) structure of linamarase was built for the first time using homology modelling and used to study its interaction with linamarin. Molecular docking calculations established the binding and orientation nature of linamarin, while molecular dynamics (MD) simulation established protein-ligand complexes' stability. Binding-free energy based on MM/PBSA was further used to rescore the docking results. An ensemble structure was found to be relatively stable compared to the modelled structure. This study sheds light on the exploration of linamarase towards understanding its reaction mechanisms. Communicated by Ramaswamy H. Sarma


Introduction
Linamarase (EC 3.2.1.21) occurs naturally with its two substrates: linamarin (1) and lotaustralin (2). It works by hydrolysing the glycolytic bond of the substrates into b-D-glucopyranose and a-Hydroxynitrile, which under favourable conditions, spontaneously degrade to propanone/butanone and hydrogen cyanide as indicated in Figure 1 (Murugan et al., 2012;Nartey, 1968;Rolle, 1998). Linamarase is present in various plants, including cassava, lima beans and lax (Fadahunsi et al., 2020). It can also be extracted from eukaryotes, bacteria, archaea and phytopathogenic fungi (Fusarium solani) that are capable of secreting extracellular enzymes (Huertas et al., 2006). The enzyme plays a significant role in the cyanogenesis of cassava products, by degrading the cyanogenic to free cyanide (Figure 1), which either evaporates or dissolves readily in water (Fadahunsi et al., 2020). Linamarin is known to break down at high pressure, temperature, and in the presence of acidic minerals, where, in the latter, linamarin breaks easier (Huertas et al., 2006).
Cyanide is known to inhibit cellular respiration, thereby preventing cellular energy generation (Williams et al., 2006). Cyanide binds into mitochondria specifically to cytochrome oxidase, thereby preventing electrons movement to an oxygen molecule and therefore, blocking the adenosine triphosphate (ATP) generation. The World Health Organization has set the minimum cyanide level in cassava and related products to be less or equal to 10 ppm (Cardoso et al., 2005). A higher intake of a product with cyanide up to 100 ppm is lethal to humans. Furthermore, long-term ingestion of trace amounts has been associated with fibrocalculous pancreatic diabetes, iodine deficiency, and cretinism (Mathangi et al., 2000).
Understanding the molecular structure and interaction of the linamarase-linamarin complex is important to both structural chemists and biochemists towards establishing the related stability and reaction mechanisms at the molecular level. However, the lack of crystal structure of linamarase has hindered its exploration.
In this work, for the first time, we modelled and characterized the structure of linamarase using an integrated computational approach viz; homology modelling, molecular docking, molecular dynamics and binding-free energy. Findings presented in this work provide a basis for understanding the structure and interaction of the linamarase-linamarin complex and unveils molecular insights into the conformational stability of the linamarase-linamarin complex.

Homology modelling of linamarase
Since the experimental 3D crystal structure of linamarase is not yet available in tFihe protein data bank (PDB) (Burley et al., 2019), homology modelling was performed to obtain linamarase structure. Rice (Oryza sativa L.) Os4BGlu12 (PDB ID: 3PKT) from the National Center for Biotechnology Information (NCBI Resource Coordinators, 2018) was used as the initial template after BLAST run (Altschul et al., 1990) to obtain sequence similarity. The linamarase sequence obtained from FAST in NCBI was submitted to SWISS-MODEL (Waterhouse et al., 2018), where the model was developed with enough query sequence coverage and sequence identity. The 3D structure was screened based on the Global Model Quality Estimation (GMQE) with values ranging from 0 to 1 and Quality Model Energy Analysis (QMEAN) (Sainy & Sharma, 2017) which provides the degree of nativeness and agreement between the model and the experimental value (Benkert et al., 2010). To improve the model's reliability MODELLER (Sali & Blundell, 1993) software and I-TASSER web server (Roy et al., 2010) were used to generate the linamarase molecular model. The query sequences (linamarase) and template sequences from rice, maize and sorghum were aligned by CLUSTALW program (Thompson et al., 1994) and displayed using ESPript 3 (Robert & Gouet, 2014). The modeled proteins were refined using Coot software (Emsley & Cowtan, 2004) and quality was further assessed by SAVES 6.0 (Colovos & Yeates, 1993) to obtain the Ramachandran plots.

Protein preparation
Initially, the modelled protein was prepared for docking calculation in chimera (Pettersen et al., 2004), where Gasteiger charges (Hou et al., 2013) and hydrogen at pH 7.4 were added. The structure was then converted to pdbqt format using the AMDock tool (Vald es-Tresanco et al., 2020). To investigate the effects of structural and conformational changes between the ligand and modelled protein, we employed a relaxed complex scheme (RCS) to perform full flexible docking. In RCS, a long molecular dynamics simulation (200 ns) of the modelled protein was run, and 50 receptor ensemble structures were generated from the equilibrium MD simulation and similarly prepared for docking.

Ligand preparation
The 3D structure of the natural ligand was obtained from PubChem database (CID: 11128) (Kim et al., 2021) and saved in PDB format. The structure was prepared in chimera (Pettersen et al., 2004), where Gasteiger charges (Hou et al., 2013) were added, and the structure was energy minimized and converted to pdbqt.

Docking bench-making and validation
Before performing the actual docking calculations, the protocol was validated by comparing a set of experimental binding-free energy obtained from (Baum et al., 2009) and the calculated binding-free energy. To determine calculated binding-free energy uncertainty, the calculations were repeated four times. Docking calculation was performed using Autodock Vina (Trott & Olson, 2009) with the default setting. Experimental and calculated binding-free energy values (Table S1, Supplementary Data) gave a reasonable agreement with a correlation value of 0.69; this allowed us to continue with the docking experiments.
After validating the docking protocol, the modelled protein active site was obtained through two approaches; first, blind docking was performed to the modelled structure to allow the natural ligand to explore the favourable binding pose. Secondly, an online server CASTp (Tian et al., 2018) was used to predict the binding site. The favourable binding poses and the predicted active site suggested that residues presented in Table S2 (Supplementary Data) formed the protein's active site. For the ensemble structures, the binding site was defined for each receptor. The natural ligand was docked to all 50 snapshots to establish the protein conformational changes on binding affinity. All docking calculations were done using the Autodock Vina tool (Trott & Olson, 2009).

Molecular dynamics (MD) simulation
The structural stability and dynamical behaviour of the apo linamarase (modelled) and holo linamarase (complex) were investigated by classical MD simulation using Gromacs version 2018 (Abraham et al., 2015). Protein topology was obtained using the OPLS-AA force field (Robertson et al., 2015), while the ligand topology parameters were obtained using LigParGen server (Dodda et al., 2017a, b;Jorgensen & Tirado-Rives, 2005). The system was explicitly solvated with SPC/E (Mark & Nilsson, 2001) water model in a cubic box. The system was neutralized by adding 5 Na þ , and then energy minimized for 5000 steps using the steepest descent algorithm to remove any crushes and constraints from the modelled and docked systems (Fletcher & Powell, 1963). The energy minimized system was subsequently subjected to two equilibration steps. First, the system was allowed to equilibrate and relax at NVT (constant number of particles, Volume and Temperature) ensemble for 100 ps, then followed by NPT (constant number of particles, Pressure and Temperature) ensemble for 100 ps. During equilibration, temperature and pressure were maintained by the Berendsen method (Berendsen et al., 1984) at 300 K and 1 bar, respectively. A production run was performed at NTP ensemble for 200 ns using Perrinelo-Rahman barostat (Parrinello & Rahman, 1981) for pressure coupling at 1 bar and v-rescale (Bussi et al., 2007) for temperature coupling at 300 K. The Particle-Mesh Eward (PME) (Cheatham et al., 1995;Ewald, 1921) method was used to treat long-range electrostatic interactions, while LINCS (Fadrn a et al., 2005) algorithm was used to constrain covalent bonds, in all calculations a time step of 2 fs was used. Energetics, pressure and density were monitored during the entire simulation time. From the equilibrium MD trajectory of the apo protein, a total of 50 configurations were sampled every 4 ns and subjected for relaxed complex scheme (RCS) docking calculations. To assess the stability of the apo protein and the complex, we used root mean square deviation (RMSD), root mean square fluctuations (RMSF), the radius of gyration (R g ) and solvent accessible surface area (SASA).

Molecular mechanics Poisson-Boltzman surface area (MM/PBSA) analysis
Since docking calculations are poor in establishing the stability and binding affinity of the complex, end-point free energy calculations based on MM/PBSA were performed to rescore docking binding-free energy. In this work, g_mmpbsa (Kumari et al., 2014) tool was used to calculate the bindingfree energy values, as described in Equations 1 to 4.
Where G complex represents the total free energy of the proteinligand complex, G protein, and G ligand for individual protein and ligand, respectively. In Equation 2 x represents either individual protein, ligand or the complex, E MM expresses the average molecular mechanic's energy in vacuum, TDS is the entropic contribution where T is temperature and DS entropy, in g_mmpbsa, this energy term is not considered, G solvation is the free energy of solvation. The molecular mechanic's energy is determined in the gas phase, as shown in Equation 3, where, E bonded describes the bonded interactions, E nonbonded represents non-bonded interactions including the electrostatic and van der Waals. The G solvation was calculated using an implicit solvent model, which consist of the polar and nonpolar terms. In Equation 4 c represent the surface tension proportionality constant set at 0.0072 kcal mol À1 Å À2 while b is the nonpolar solvation-free energy for solute set at 0.00 kcal mol À1 . The solvents dielectric constant (e) was set to 80, and for solute was 6, it should be noted that different solute dielectric constants i.e. 2,4 were also tested. A total of 200 snapshots were evenly sampled and used to calculate the binding-free energy using a single trajectory approach.

Results and discussion
Homology modelling of linamarase (b-glucosidase) using (Oryza sativa L.) target template sequence alignment Rice (Oryza sativa L.) bound with 2,4-dinitrophenyl-2-deoxyl-2-fluoroglucoside (DNP2FG) (PDB ID: 3PTK) (Sansenya et al., 2011) was used as the template for the modelling. The protein has a sequence similarity of 0.47 and similarity identity of 53.59%. The modelled 3D structure of linamarase shows a GMQE and QMEAN values of 0.76 and À1.04, respectively. GMQE estimates the target-template and target-structure quality by combining these two properties and has a score range between 0 and 1. A higher number indicates the reliability of the model. Our modelled structure showed an acceptable value of 0.76, suggesting a quality target-structure (J. M. S. Cardoso et al., 2018;Gu & Bourne, 2009). The QMEAN scoring function (with a value ranging from À4 to 0) derives the quality of global and local modelled protein; zero indicates the quality protein and a good score between the modelled and experimental structure. Our results showed a QMEAN value of À1.04 close to zero, presenting a good score with experimental structure. Figure 2, shows the sequence alignment between modelled linamarase, the rice as template, sorghum and maize which were produced using CLUSTALW (Thompson et al., 1994) program and visualized by ESPript software (Robert & Gouet, 2014). After structural refinement in Coot software, SAVES 6.0 web server provided an evaluation of the modeled models through PROCHEK (Laskowski et al., 1993). The main-chain and side-chain parameters allowed us to make comparison of the stereochemical quality and accuracy of all models from SWISS-MODEL, MODELLER, and I-TASSER. From PROCHEK we were able to obtain Ramachandran analysis, which is summarized in Table 1 and stereochemical parameters summarized in (Tables SA 1-3 and Tables SB 1-3, Supplementary Data). From these summaries it can be seen clearly SWISS-MODEL provides a high quality model with 89.6%, 9.6%, 0.7% and 0.0% of its residues in the most favoured, generously allowed and disallowed regions respectively. We presented (Figure 3) from the best (SWISS-MODEL), other models from MODELLER and I-TASSER are available in supplementary materials ( Figures S3 A and B), respectively.

Molecular docking calculations
Molecular docking calculations of the ligand ( Figure S1, Supplementary Data) were performed to both modelled and ensemble structures obtained from MD simulation to establish the ligand-binding nature. The binding-free energy based on the modelled protein was À6.9 kcal/mol, similar to the inhibition constant of 8.39 lM. The effect of protein conformational Figure 2. Multiple sequence alignment for modelled linamarase, template from rice, sorghum and maize. The red highlighted represents identical, the secondary structures are presented on the top while the residues' surface accessibility to solvent is displayed at the bottom (the blue for surface accessible, cyan for partially accessible and white for buried residues. changes on the binding-free energy of the natural ligand to the protein was investigated by docking it to the ensemble structures extracted from the equilibrium MD. As presented in ( Figure S1, Supplementary Data), one can see that there is the sensitivity of docking results to ensemble structures. The best pose from the ensemble structure exhibited binding-free energy of À7.2 kcal/mol, and the low pose had À4.9 kcal/mol. The wide distribution of binding-free energy observed in the ensemble structure ( Figure S1, Supplementary Data) suggests the importance of incorporating ensemble structure. It should be noted that the difference of 2.3 kcal/mol represents %500fold dissociation constant. The docking results ascertained that the natural ligand has binding stability in the enzyme. The binding modes of linamarin in the modelled and ensemble structure presented in Figure 4 show different binding modes by interaction with different residues. However, it was interesting to observe that the CN group from ligand interacted with Trp385 forming hydrogen bonds in all structures. Another important interaction observed was the formation of hydrogen bonds of Glu413 with OH from the ligand. The observed result suggests that the ligand bound to the active site; however, this does not necessarily imply stability of the complex. To establish the complex stability and protein fluctuations induced by the ligand MD simulation was performed for two complexes: modelled and ensemble structures.

Molecular dynamics simulation
The modelled structure and the two complexes were subjected to MD simulation to establish the stability and induced   [a,b,l,p] 9.6% 9.7% 20.6% Residues in generously allowed regions [$a,$b,$l,$p] 0.7% 1.1% 4.7% Residues in disallowed regions 0.0% 0.2% 1.5% Figure 4. The 2D diagram of the interaction of (A) holo protein (complex 1) and (B) ensemble structure (complex 2) (for proper interpretation of the references to color in the figure legend, the reader is referred to the Web version of this article).
conformational changes over 200 ns. RMSD provides important insight on the structural stability of the modelled protein and the complexes. An average RMSD value of 3 Å indicates the stability of the system (Wu et al., 2007). The RMSD time-dependent and probability values of apo and the two complexes are shown in Figures 5a and 5b, complex 2, and apo structure equilibrated during the first 50 ns, and remained stable with some fluctuations. Complex 1 equilibrated around 75 ns and had more fluctuations throughout the simulation. The apo protein had a maximum RMSD value of 0.215 nm. However, as expected, it was interesting to note that, complex 1 with the low binding-free energy of À6.9 kcal/mol showed a larger RMSD Figure 5. Stabilities of the apo and holo proteins (a) time-dependent RMSD values for apo and holo proteins; (b) probability distribution for the RMSD, (c) timedependent R g for the apo and hopo proteins, (d) probability distribution of the R g (e) time-dependent for SASA, (f) probability distribution for the SASA between the apo and holo proteins. value with two maxima at 0.255 and 0.310 nm compared to complex 2 with best binding-free energy of À7.2 kcal/mol, whose RMSD value shows the maxima at 0.19 nm. The lower RMSD value of complex 2 indicates the stability of the system as compared to complex 1. Based on the RMSD values for apo protein and the two complexes 1 and 2, one can conclude that the binding of linamarin induces some conformational changes and fluctuation within 0.33 nm, suggesting stability of the complexes.
The R g was analysed to provide information on the protein compactness and size before and after ligand bound. As observed for the RMSD value, the R g value follows the same trend; the R g value for the apo protein shows a maximum value at 2.17 nm; however, the complex 1 with binding-free energy of À6.9 kcal/mol indicates an increase in R g value 2.18 nm while the complex 2 shows the maximum R g value of 2.165 nm (Figures 5c and 5d). A similar observation is made for SASA, where the complex with best binding-free energy showed a lower SASA value with a maximum at 196 nm 2 compared to complex 1 with the least binding energy whose maximum SASA value is 205 nm 2 (Figures 5e  and 5f).
The induced conformational and plasticity of the protein amino acids was further investigated by computing the RMSF difference between the bound complex1 and free protein. Figure 6 shows that the binding of the protein induced some noticeable residues fluctuations in some regions. The observed fluctuations are 64-84, 270-300 and 380-385 amino acids that form the active site. The observed fluctuations at the active site suggested that the ligand interacted with the residues and to some extent caused fluctuations. Other regions of the protein are observed to have small RMSF differences, suggesting that residues that were least involved in the interaction had little fluctuation.

MM/PBSA binding-free energy analysis
To better understand the amino acid contribution to the binding-free energy of complex 1 and 2, we performed a binding-free energy calculation based on MM/PBSA. Table 2 clearly shows that complex 1 has lower binding-free energy compared to complex 2. However, the MM/PBSA results show higher binding-free energy values than docking results but have the same trend as observed from docking calculations. The difference in results for the two complex structures is explained by the differences in van der Waals (vdW), electrostatic and polar energy contributions. In complex 1 the electrostatic energy terms strongly contributed to the interaction between linamarin and linamarase. However, to some extent, the vdW energy terms contributed to the interaction, while the polar energy terms were observed to oppose the interaction.   Contrary to complex 1, in complex 2, the vdW energy terms highly contributed to the interaction, followed by electrostatic energy terms (Table 2). Furthermore, in complex 2 the opposing effect of polar energy terms is least observed as compared to complex 1. Interestingly, in all complexes, the nonpolar (SASA) energy terms were observed to contribute equally (Table 1). The observed binding-free energy in complex 2 is nearly 2-folds higher than complex 1. Such difference is attributed to the structural changes in the two complexes, where the ensemble structure from equilibrium MD was observed to have the favourable interaction with linamarin. This observation is expected as the protein is not static but flexible when linamarin is bound to it.
The observed differences in binding-free energy further motivated us to explore the effects of protein structural changes and residue contributions to binding-free energy. As shown in Figure 7, linamarin adopted some orientation and/ or conformational changes and interacted with different amino acids at the pocket. Interestingly, the interaction of linamarin with residues His152, Phe212, Glu413 was observed in all complexes, although with different energetic contributions. Surprisingly, in complex 1, residue Glu413 is observed to have an opposing contribution effect to the binding-free energy by contributing nearly 5 kJ/mol while in complex 2 it contributes the opposite energy, i.e. À5 kJ/mol. In addition, the contribution of His152 in complex 2 is nearly 2-folds higher to that in complex 1. One can conclude that a structural adaptation of the equilibrium MD structure has a role to stabilize the observed interaction in the linamarase-linamarin complex.

Concluding remarks
In this study, a 3D structure of linamarase was developed by homology modelling. Molecular docking was used to investigate the binding mechanism of modelled receptor and linamarin substrate. To ascertain the stability of the modelled structure and the protein-ligand complex MD simulation was performed. The results from docking allowed us to propose to this receptor the binding mode and the residues involved. The MD simulation provided important information about the conformational stability of the modelled protein and the complexes. MM/PBSA provided a detailed analysis of the amino acid contribution to the binding-free energy. After explorative MD and MM/PBSA simulation on the protein-ligand complexes, it was revealed that residues Try47, Phe205, Phe212, Trp385, Trp463, Trp471, Trp153, Ala201, Phe205, Trp385, Trp463, Phe479 had strong hydrophobic interaction with linamarin and retained the stability of the complex. Moreover, residues His152, Pro197, and Try340 have the highest contribution to binding-free energy. Therefore, for the first time, this study provided information on linamaraselinamarin structural complex interaction and paved the way for future research on linamarase enzymes and their application in other fields. The structural mechanisms of linamarase hydrolyzing linamarin at the atomic level are highly suggested for further studies.