Targeting the EGFR in cancer cells by fusion protein consisting of arazyme and third loop of TGF-alpha: an in silico study

Abstract The anticancer effects of arazyme, a bacterial metalloprotease, have been revealed in previous studies. Because of the overexpression of epidermal growth factor receptor (EGFR) in tumor cells, targeting this receptor is one of the approaches to cancer therapy. In the present study, we designed fusion protein by using a non-mitogenic binding sequence of TGFα, arazyme, and a suitable linker. The I-TASSER and Robetta web servers were employed to predict the territory structures of the Arazyme-linker-TGFαL3, and TGFαL3-linker-Arazyme. Then, models were validated by using PROCHECK, ERRAT, ProSA, and MolProbity web servers. After docking to EGFR, Arazyme-linker-TGFαL3 showed a higher binding affinity and was selected to be optimized through 100 ns Molecular dynamic (MD) simulation. Next, the stability of ligand-bound receptor was examined utilizing MD simulation for 100 ns. Furthermore, the binding free energy calculation and free energy decomposition were carried out employing MM-PBSA and MM-GBSA methods, respectively. The root mean square deviation and fluctuation (RMSD, RMSF), the radius of gyration, H-bond, and binding free energy analysis revealed the stability of the complex during simulation time. Finally, linear and conformational epitopes of B cells, as well as MHC class I and MHC class II were predicted by using different web servers. Meanwhile, the potential B cell and T cell epitopes were identified throughout arazyme protein sequence. Collectively, this study suggests a novel chimera protein candidate prevent cancer cells potentially by inducing an immune response and inhibiting cell proliferation. Communicated by Ramaswamy H. Sarma


Introduction
Targeted drug delivery which is a new aspect of anti-cancer drug design shows an optimized therapeutic index owing to the focused drug activity on the site of tumor cells. By selecting a suitable ligand as a specific cancer cell binder, the targeted receptor on the only cancer cell surface is recognized and the carried drug is delivered to the cells. As a result, the normal cells are not recognized .
Several ligands such as TGFa and EGF can activate the epidermal growth factor receptor (EGFR). EGFR induces the important cell process, cell proliferation, differentiation, survival, and motility which are related to the activation of different signaling cascades, including Ras/Raf/MAPK, PI3K/Akt, and phospholipase Cc pathways (Appert-Collin et al., 2015). High density and overexpression of epidermal growth factor receptor (EGFR) and its related ligands were demonstrated in tumor cells, as well (Wykosky et al., 2011;Yotsumoto et al., 2009). Therefore, EGFR-targeted therapeutics can overcome tumor cell growth and its resistance to apoptosis (Coleman & Tsongalis, 2016;Wykosky et al., 2011). EGFR (HER1 or ErbB1) which belongs to the ErbB family of tyrosine kinase receptors comprises L1, L2, and CR1 as three structural domains. A back-to-back dimer interaction by the CR1 domain of each receptor was suggested. The mature structure of TGFa consists of 50 amino acids, from which the residues 15-22 and 31-48 make the conserved core of the TGFa structure. Two residues, including Gly19 and Gly40 with three conserved disulfide bonds (among cysteine residues 8-21, 16-32, and 34-43) are required to stabilize the structure (Garrett et al., 2002).
In each receptor, the large ß-sheets of L1 and L2 domain interact with TGFa. Twenty-five residues of TGFa make contact with L1, L2, or both domains of EGFR (Garrett et al., 2002). Among them, 17 interacting C-terminal residues, in the region of the third disulfide bond (Loop three) of TGFa, show binding ability without the mitogenic induction. Meanwhile, this fragment competes with EGF and TGFa to bind with receptors and block cell proliferation induction (Nestor et al., 1985).
Some bacterial secondary metabolites such as toxins and enzymes are used as an important source to design Ligandbased targeted anticancer drugs owing to their antitumor effects and immunomodulating capacity Felgner et al., 2016). Arazyme, a proteolytic enzyme produced by Serratia proteamaculans, shows anticancer potency against different cancer cells (Amjadi et al., 2020;Bersanetti et al., 2005). Arazyme can stimulate the antitumor response by the induction of CD4 and CD8 T lymphocytes, as well as B lymphocytes, macrophages, and dendritic cells. TLR4 activation by arazyme, initiates TLR4-MyD88-TRIF-and MAPK-dependent signaling pathway, which results in both macrophages and dendritic cells (DCs) activation (Li et al., 2017;Pereira et al. 2016).
The computational approaches to the prediction of ligand-receptor complex interactions are a key part of drug design studies (Naqvi, 2018;Lokhande et al., 2020b). Molecular docking predicts the structure of a complex by estimating the binding energy of its components based on ab initio or data-driven approaches. The latter such as HADDOCK possesses higher accuracy since it merges the data derived from experimental and bioinformatics methods (De Vries et al., 2010;Lokhande et al., 2020). In addition, molecular dynamic simulations allow the search for motions and interactions in atomic levels and provide insights into the effects of structural changes on the function of biomolecules as well as their binding affinity to the related receptors (Lokhande et al., 2019;. The immunogenicity assessment by particular identification of the B-and T-cell epitopes is a determining factor of the safety and efficacy of protein therapeutics. In silico epitope mapping facilitates the complexity of in vitro or in vivo methods although its prediction accuracy must be more developed (Bryson et al., 2010;Potocnakova et al., 2016).
In the current study, we designed fusion proteins, Arazyme-linker-TGFaL3, and TGFaL3-linker-Arazyme to target cancer cells by considering the potential anti-cancer activity of arazyme and binding affinity of the third disulfide loop of TGFa to EGFR. The three-dimensional structure of fusion proteins was predicted and validated. Then, the chimeric proteins were docked to the receptor and their binding affinities were compared. Afterward, MD simulation was employed to explore interactions in atomic details. Finally, the immunological properties were evaluated.
Then, by using UCSF Chimera and Discovery Studio Visualizer softwares, the best-predicted models were aligned to both sequence and secondary structure of the template used in homology modeling. In addition to RMSD, UCSF Chimera calculates Structural Distance Measure (SDM) and the Q-score to yield more accurate sequence alignment. The SDM value equal to one indicates the identical structures and the higher values show the lower similarity, while the higher Q-scores (ranged between zero and one) values mean the stronger similarity between structures (Johnson et al., 1990; Krissinel & Henrick, 2004). Furthermore, the similarity between predicted structures was evaluated using the IPBA web server (https://www.dsimb.inserm.fr/dsimb_tools/ipba/). This server compares protein structures by the improved version of Protein Blocks alignment and provides RMSD whose lowest value indicates greater conformational similarity between the proteins (Gelly et al., 2011).

Molecular docking
At first, to compare the binding energy of different regions of TGFa with EGFR, protein structures were prepared to employ the software MGL Tools 1.5.6 and Discovery Studio Visualizer. To prepare the ligand structure and EGFR, polar hydrogen and Kolleman charges were added and then all heteroatoms and water molecules were removed. The grid box was set to cover the entire ligand surface. Then docking was carried out by autodock vina software (Trott & Olson, 2010). Afterwards, the interaction of the 3 D structure model of chimera protein with EGFR was docked using HADDOCK 2.4 (https://wenmr.science.uu.nl/haddock2.4/), and ClusPro 2.0 (https://cluspro.bu.edu/login.php) servers.
ClusPro web server performs rigid docking using a Fast Fourier Transform (FFT) correlation method and then, finds the most likely models of ligand-receptor interactions that are finally refined by energy minimization. This server yields the top ten largest docked structures with the lowest docking energy clustered by Weighted Score based on interface RMSD between the number of neighbors within 9-Å radios and three parameters, including native complex, the average number of docked structures within less than 10-Å radius and the average of lowest server archived IRMSD. Therefore, the best cluster has the highest Weighted Score (Kozakov et al., 2013Vajda et al., 2017). The focused docking approach was conducted by HADDOCK 2.4 to obtain more accurate results followed by three stages of energy minimizing and final refinements. The top largest clusters were sorted based on haddock score which is calculated by the weighted sum of individual energy terms, including electrostatic, van der Waals, desolvation, AIR energies, and a buried surface area (BSA) (van Zundert et al., 2016). The active residues of ligand and receptor were defined based on the crystal structure of Egfr-TGF alpha complex while the other surface residues with 6.5 Å distance from active residues were automatically mentioned as passive residues by the HADDOCK protocol (Garrett et al., 2002;van Zundert et al., 2016). Then, the binding affinity of x-ray crystallography complex of TGFa and EGFR was evaluated.
Afterward, a model of the ligand with the highest score of binding to the receptor was selected to be optimized via MD simulation. Next, the average structures relevant to the stable part of the trajectory were subjected to MD simulation of ligand-receptor complex.

MD simulations of ligand and ligandreceptor complex
The GROMACSV R 2019.6 software was used to perform MD simulations with the amber 99SB þ ILDN force field. The protein was solvated in TIP3P explicit water model in a cubic box and the distance between the solute and each side of the box was kept at 0.8 nm. The sodium (30 Na þ ) counter ions were added to maintain the neutrality of the system. Energy minimization was performed for 50,000 steps, using the steepest descent method, to prevent inappropriate geometry and steric clashes. Then, the system was equilibrated under NVT ensemble at 310 K for 200 ps using Vrescale method followed by NPT ensemble at 1 bar for 200 ps using Parrinello-Rahman barostat (Bussi et al., 2007;Parrinello & Rahman, 1981). In both phases of system equilibration, the positions of all heavy atoms were restrained by a force constant of 1000 kJ/mol/nm. Ultimately, the full dynamic production was performed for 100 ns under NPT condition (T ¼ 310 K, p ¼ 1 bar). The LINCS algorithm was used to constrain all covalent bond lengths allowing for a 2.0 fs time step (Hess et al., 1997). The output of the MD simulation was analyzed using tools that were included in the Gromacs package. The changes in the ligand secondary structure during the simulation were estimated by DSSP software (Kabsch & Sander, 1983). To assess the stability of the fusion protein (ligand) and ligandreceptor complex, multiple parameters, including root-meansquare deviation (RMSD), root-mean-square fluctuation (RMSF), and radius of gyration (Rg) were determined. The number of hydrogen bonds at the ligand-receptor interface was analyzed. Furthermore, every 10 ps of the stable time of simulation was extracted to calculate the end-point binding free energy and perform the energy decomposition analysis using Molecular Mechanic/Poisson-Boltzmann Surface Area (MMPBSA) and Molecular Mechanics Generalized Born Surface Area (MM-GBSA) methods, respectively (Miller et al., 2012;Tresanco, 2021). Based on the MM-PBSA or MM-GBSA methodology, the binding free energy was calculated using the following equations (Chen et al., 2016;Massova & Kollman, 1999): The binding free energies (DG bind ) was determined by the total free energy of complex (DG complex ), the energy of unbound ligand (DG ligand ), and receptor (DG receptor ). The Molecular Mechanics Energies in the gas-phase (DE MM ) contain electrostatic (DE elec ) and the van der Waals energies (DE vdW ). The solvation free energies (DG sol ) include polar solvation (DG polar) and nonpolar solvation (DG nonpolar ) energies. The contribution of conformational entropy (-TDS) was not considered in the present study (Chen et al., 2016;Massova & Kollman, 1999).
The Xmgrace tool was used to plot all graphs and the UCSF Chimera program was employed to visualize the results. Furthermore, the Ligplus (https://www.ebi.ac.uk/ thornton-srv/software/LigPlus/) was employed to depict the H-bonding and hydrophobic interactions (Laskowski & Swindells, 2011).
The distinct discontinuous epitope defined by ElliPro is three repeated clustering steps based on the protrusion index S and the distance R between each residue's centers of mass. The minimum score and maximum distance (Angstrom) are set to 0.5 and 6 Å, respectively (Ponomarenko et al., 2008). To get the highest sensitivity and specificity values for epitope identification via DiscoTope the threshold was set to À3.7. Surface accessibility of epitopes was evaluated by the Emini surface accessibility scale server (Emini et al., 1985). The flexibility and hydrophilicity were evaluated by Karplus and Schulz and Parker methods, respectively (http://tools.iedb.org/bcell/) (Karplus & Schulz, 1985;Parker et al., 1986).

Structure analysis
The structure of arazyme is composed of N-terminal zincdependent metalloprotease and C-terminal Peptidase_M10_C domains. The two designed fusion protein constructs have the molecular weight of 59,955.38 Da containing the 504 amino acid fragments of arazyme, a fragment of 17 amino acid fragments of TGFa (third loop), and a rigid linker (A(EAAAK)4ALEA(EAAAK)4A). The physicochemical properties are presented in Table 1.

3D structure prediction and qualification
Five models were built by each of the I-TASSER and Robetta servers. Among templates used by Robetta and I-TASSER to construct the models, the crystal structure of metalloprotease from S. Marcescens (PDB id: 1af0) showed the greatest similarity based on the comparison of RMSD, Q and SDM values (Table  S1). The multiple sequence and structural alignment between Robetta models and the metalloprotease of Serratia are shown in Figure 2. To find the number of residues in favored, allowed, and disallowed regions, the Ramachandran plot analysis was performed by Prochek and MolProbity. For each chimera, the model with the better score and the most residues in the favored region was subjected to Ligand-receptor docking. The three-dimensional structure of Arazyme-linker-TGFaL3, and TGFaL3-linker-Arazyme, predicted by Robetta was selected as the best model because of the highest residues percentage in the favored regions of PROCHECK and MolProbity Ramachandran distributions (Table 2). Also, the models were qualified by Qmean, ProSA, and ERRAT. The Robetta models show the highest ERRAT percentage quality factor. Based on ERRAT, a quality factor higher than 95% has a good high resolution (not higher than 3 Å). Also, a higher ProSA and Qmean score revealed that all Robetta models are closer to X-ray structures. The alignment made by the IPBA webserver between the arazyme predicted model and each one of the fusion proteins shows that Robetta models obtain lower RMSD compared to I-TASSER models for the two fusion proteins (Table S2).

Ligand-receptor docking
The interaction energy between different regions of TGFa with the receptor was calculated by autodock vina software

Figure 2.
Sequence alignment of (a) Arazyme-linker-TGFaL3 and (b) TGFaL3-linker-Arazyme with the sequence of the metalloprotease from S. marcescens. The dark blue indicates the identical residues and light blue shows similar residues. The structural alignment between the crystal structure of the metalloprotease from S. Marcescens (red) and Robetta predicted models (blue), (c) Arazyme-linker-TGFaL3, and (d) TGFaL3-linker-Arazyme. The linker and TGFaL3 fragments are represented in green and pink, respectively. (Table S3). Meanwhile, the two fusion proteins were docked to the EGFR by using ClusPro and haddock servers. Haddock and ClusPro results showed a higher affinity of the fusion protein, Arazyme-linker-TGFaL3 to EGFR (Table 3).

MD trajectory analysis of Arazyme-linker-TGFaL3
The trajectories generated from the 100 ns MD were used to evaluate the stability of Arazyme-linker-TGFaL3. As shown in Figure 3a, the DSSP analysis of Arazyme-linker-TGFaL3 revealed that the secondary structure of amino acids in most areas of protein remains conserved during the simulation period and b-sheets are the most stable structures, although changes have occurred as follows: The residues of 21-32 showed a conformational change from a-helix to bend and coil conformation; the residues of 438-443 undergoes a conformational change from a-helix to the coil; and the residues of 520-540 exhibited a conformational change from a-helix to bend conformation. Totally, the content of a-helix declined from 24.6% to 11.1% while the content of coil and bend increased from 41.9% to 52%.
The RMSD plot (Figure 3b) for protein revealed a quick increase in RMSD value up to 0.4 nm at the initial 4 ps of simulation. Afterward, a gradual increase in RMSD value was observed that reached up to 0.5 nm after 35,000 ps, and a sudden increase was seen up to 0.7 nm at 48,000 ps. Then, the value showed a decreasing trend to reach 0.55 nm at 65,000 ps which fluctuating around this value during simulation time. Then, from 85,000 ps to the end, the value slightly increased to reach 0.6 nm. Therefore, the stable state was considered to be between 70,000 and 100,000 ps.
The RMSF plot of the Ca atoms against residues number of protein was generated from the stable state of RMSD. As shown in Figure 3c, the terminal residues in N-and C-terminal have higher flexibility compared to other regions of the protein. The radius of the Gyration value of protein indicates fluctuations in compactness during the initial 70,000 ps, which then remain relatively constant till the end of the simulation (Figure 3d).

MD simulation of ligand-receptor interactions
The average MD structure of Arazyme-linker-TGFaL3 obtained from 100 ns simulation was docked against EGFR using HADDOCK 2.4. The best cluster with the largest cluster size (62 structures with a similar binding site) and the highest HADDOCK score (À111.2) was subjected to MD simulation (Table S4). To investigate the stability and dynamic, MD simulation for 100 ns was performed. Figure 4 shows the superimposition of the binding pose of Arazyme-linker-TGFaL3 in docking with EGFR at 10 ns intervals.
The changes of RMSD value of ligand and receptor and their interaction were shown in Figure 5a. The complex revealed a sharp increase in RMSD value at the first 25,000 ps to about 0.6 nm which fluctuated slightly around this value until the end of the simulation. The stable state of the ligand-receptor complex was considered 70,000 ps, according to the RMSD plot.
The RMSF of the ligand with the time before and after docking with the receptor was measured based on 70 ns of trajectory data. As shown in Figure 5b, the docked ligand showed less fluctuations in residues 510-567 compared to ligand alone while the RMSF value of other residues is similar between docked and unbounded ligand. The compactness of the complex was measured using the Radius of gyration (Figure 5c). At first 4000 ps, the Rg increased to 3.725 nm, thereafter it dropped to 3.57 nm at about 6000 ps which then continuously fluctuated around 3.6 nm till the end of the simulation.
The change in the number of hydrogen bonds was measured during MD simulation. As shown in Figure 5d, four Hbond was formed by the complex at initial of MD simulation which was increased to about 11 hydrogen bond at 2000 ps and then dropped to about 3 at 4000 ps. Afterward, the number of formed H-bond fluctuated between about 3 to 13 which continued up to 45,000 ps and then increased to 15 at 60,000 ps. After this time, the hydrogen bond decreased to reach 10 at end of simulation time ( Figure 6).
The contribution of van der Waals, electrostatic, and the polar and non-polar solvations to the MMPBSA binding free energies was presented in Table 4. Also, Figure S1 shows the MMGBSA binding free energy decomposition analysis.

T cell epitope prediction
Netmhcpan 4.1 presents a rank to choose the best epitopes by the comparison of random natural peptides. The threshold of less than 0.5 was expressed as a strong binder. The epitope sequence, which has the highest affinity and the  Table 3. The docking of TGFa, Arazyme-linker-TGFaL3, and TGFaL3-linker-Arazyme with EGFR using Haddock and Cluspro servers.

Docking server TGFa
Arazyme-linker-TGFaL3 TGFaL3-Linker-Arazyme Haddock score À166.4 þ/À 1.9 À71.0 þ/À 1.8 À42.0 þ/À 3.2 Cluspro score À975.3 À678.5 À668 lowest rank in each allele serotype was selected based on Netmhcpan prediction. The IC50 value less than 50 nm was mentioned as a strong binding affinity by PickPocket and SMM servers but SYFPEITHI gives a score to each residue of epitope oligomer based on the frequency of amino acids in natural ligands. For MHCI, the best five predicted epitopes from different alleles serotypes were shown in Table S5 and the related sequences were highlighted in Figure 7. For MHCII, the five epitopes identified by NetMHCIIpan and SYFPEITHI were displayed in Table S6 and Figure 7.

B cell epitope prediction and allergenicity
The five high-ranked linear epitopes are predicted by ellipro, as well as Kolaskar and Tongaonkar method whose physicochemical properties are shown in Table 5. Based on the ElliPro, Kolaskar, and Tongaonkar method, eight linear B cell epitopes with the length ranged from 11 to 17 residues and 17 epitopes with the length ranged from 6 to 20 residues were identified, respectively. All the above three tools identify the residues 476-481 (EALLSY) and 487-494 (VSDLALNI). The probability average of EALLSY and VSDLALNI obtained from BepiPred-2.0 were 0.561 and 0.509, respectively. Furthermore, the VSDLALNI sequence obtains the highest Vaxijen score (1.1432) among all the best-predicted B cell epitopes. Also, 89 and 326 residues were defined as conformational Bcell epitopes by DiscoTope and ellipro, respectively (Figure 8).   SDAP and algpred servers showed no potential allergen similar to any allergens region in their library.

Discussion
Ligand-based therapy by targeting the epidermal growth factor receptor (EGFR) is an approach to design anti-cancer drugs. Increased level of EGFR in various tumor cells makes it a candidate in the subject of designing anticancer drugs (Lim et al., 2017;Yousefi et al., 2016). In the current study, we designed two fusion proteins Arazyme-linker-TGFaL3 and TGFaL3-linker-Arazyme using arazyme as an effective anticancer protein and non-mitogenic fragment (loop three) of TGFa as a ligand of EGFR (Coleman & Tsongalis, 2016). A rigid 46 amino acid linker (A(EAAAK)4ALEA(EAAAK)4A) which includes eight helix-forming motifs was selected to increase stability as well as improving the biological activity of fusion protein (Chen et al., 2013). The increased expression level of the fusion protein with helical linkers is due to its secondary structure that is resistant to enzymatic cleavage. Also, the true folding of fusion proteins is enhanced by helical linkers resulting in higher stability and expression level (Amet et al., 2009).
The physicochemical analysis of two chimera proteins is similar to that of arazyme. Since different residues of the Nterminal of protein determine the stability of protein, ProtParam estimates the stability of 30 h for arazyme and Arazyme-linker-TGFaL3 based on the N-end rule. But TGFaL3linker-Arazyme showed stability of 100 h owing to Amino-terminal valine which is responsible for the highest stability among other amino acids (Gonda et al., 1989). Moreover, the instability index of all proteins is <40, and the high aliphatic index, which is the relative volume of aliphatic side chains of protein, indicates that the stability of designed chimera proteins has not changed. The negative value of GRAVY (Grand Average of Hydropathy) implies that proteins are hydrophilic and have a better reaction with water.
The tertiary structure of arazyme, TGFaL3, and fusion proteins were modeled by I-TASSER and Robetta web servers. Then, all predicted models were evaluated by ERRAT, PROCHECK Ramachandran, MolProbity, and QMEAN value. The quality assessment analysis by all web servers showed that predicted models by Robetta obtained a better score than the I-TASSER server. In addition, the folding of the Robetta predicted models showed higher similarity to the arazyme territory structure due to their lower RMSD calculated by IPBA. Molecular docking of two fusion proteins to calculate binding energy and strength affinity between ligand and receptor was conducted by blind, rigid, and focused methods. By considering ß-sheets position, the binding affinity of the different sequence lengths of TGFa was evaluated. Interestingly, loop three of TGFa, which includes half of one large beta-sheet, has an energy that is more negative and therefore was selected to make fusion protein. Then the affinity of the two constructs was compared by HADDOCK 2.4, and Cluspro 2.0 and the results showed that the construct, Arazyme-linker-TGFaL3 was the best because of more negative scores, although the entire sequence of TGFa was the best. Therefore, Arazyme-linker-TGFaL3 was subjected to MD simulations and epitope investigations. The radius of gyration and RMSD of the optimized structure of Arazymelinker-TGFaL3 revealed the relative conformational equilibrium of fusion protein, although RMSF analysis showed that both Cand N-terminal are the most flexible region. Meanwhile, the secondary structure of fusion proteins remains relatively stable after simulation. Afterward, the interactions of the average structure of Arazyme-linker-TGFaL3 with EGFR were evaluated by 100 ns of MD simulation. The RMSD values indicated the overall stability of the complex. Also, the changes in the compactness of the complex were not significant during simulation time. The comparison of RMSF values between bound and unbound ligand indicated that the region comprising linker and TGFaL3 has the lower flexibility in the ligand-receptor complex which may be due to interaction with the receptor. MM/PBSA and MM/GBSA are the most popular methods to estimate the free binding energy of ligand to receptor although the physicochemical properties of molecules such as entropy and atomic charges as well as simulation time and sampling protocols can affect performance, accuracy, and differences in the calculation of binding energy (Genheden & Ryde, 2015). In the current study, the negative values of MM/PBSA indicated the contribution of nonbonded interactions in the binding affinity of Arazyme-linker-TGFaL3 to EGFR. In addition, the highest favorable contribution was related to van der Waals forces. Also, non-polar solvation energy and electrostatic interactions are favorable for ligandreceptor binding. However, the polar contributions were unfavorable for binding to EGFR.
Based on the crystal structure analysis of the complex of EGFR external domain and intact TGFa, LEU 566, HID 563, CYS 561, and VAL 551 from TGF-a, and PHE 357, SER 356 and GLN 16 from EGFR were displayed as active residues with non-bonded contacts with the receptor (Garrett et al., 2002). These residues possessed the highest energy contribution according to MM/GBSA free energy decomposition. Among different intermolecular interactions, hydrogen bonds play a key role in ligand-receptor binding (Patil et al., 2010). The analysis of residues involved in hydrogen bonds with EGFR revealed five residues, including LEU 566, HID 563, CYS 561, VAL 551, and ASP 565 are at the TGFaL3 region.
Different approaches including BepiPred-2.0, Kolaskar, and Tongaonkar, ElliPro, and DiscoTope were adopted to identify linear and conformational B-cell epitopes. BepiPred-2.0 is the server, which uses a random forest algorithm to derive epitope data from crystal structures and obtain higher quality results (Jespersen et al., 2017). By utilizing the physicochemical properties of the known epitopes, the Kolaskar, and Tongaonkar method showed 75% accuracy in epitope prediction (Kolaskar & Tongaonkar, 1990). Two algorithms for continuous epitope prediction by ElliPro are the approximation of protein surface as an ellipsoid and the protrusion index (PI) which is the percentage of outside residues in an ellipsoid (Ponomarenko et al., 2008). Two common predicted B cell epitopes, namely EALLSY and VSDLALNI were positioned near C-terminal peptidase of arazyme.
If a residue appears to be greater than two nm of the water-accessible surface, it is considered a surface residue. By considering this scale, the fractional surface probability of amino acids in a hexapeptide sequence was defined by Emini server formulae (Emini et al., 1985). The epitope, ANITFTEVGAGQKANITFGNYSQDRPGHYDYDTQ which is near the N-terminal of the Zinc-dependent metalloprotease sequence of arazyme obtains the highest Hydrophilicity, flexibility, and accessibility scores.
The DiscoTope combines log-odds ratio score and surface measurement calculated from half-sphere exposure to Table 5. The physicochemical properties and Vaxijen score of the best high-ranked linear epitope predicted by Ellipro server as well as Kolaskar, and Tongaonkar's method.

Web servers
The sequence of predicted epitopes  improve the performance of epitope prediction (Kringelum et al., 2012). Based on the DiscoTope score, the residues with a score of higher than 1.9 have 95% Specificity. In this manner, all predicted epitope residues of 154-165 (SQDRPGHYDYDTQAYAFLPNTI) show the highest specificity which was included in the discontinuous epitopes obtaining the highest score from ElliPro.
Our results indicate that the fusion protein, Arazymelinker-TGFaL3 shows affinity to the EGFR receptor which is expressed at a high level in the tumor cell. Subsequently, blocking of EGFR and elimination of proliferative signals are expected. Also, because of numerous predicted B cell and T cell epitopes, the immune responses to cancer cells can be elevated.