Molecular cloning, expression, mRNA secondary structure and immunological characterization of mussel foot proteins (Mfps) (Mollusca: Bivalvia)

Abstract The macroscale production of mussel foot proteins (Mfps) in the expression system has not succeeded to date. The principal reasons for this are low levels of expression and yield of Mfps, lack of post-translational modifications (PTMs), and immunological toxic effects on the host system. Identification of post-translational modification sites, suitable expression hosts, and immunological responses through an experimental approach is very costly and time-consuming. However, in the present study, in silico post-translation modification, antigenicity, allergenicity, and the immunological reaction of all available Mfps were characterized. Furthermore, all Mfps were codon optimized in three different expression systems to determine the best expression host. Finally, we performed the in-silico cloning of all codon-optimized Mfps in a suitable host (E. coli K12, pET28a(+) vector) and analyzed the secondary structure of mRNA and its structural stability. Among the 78 Mfps, six fps are considered potential allergenic proteins, six fps are considered non-allergenic proteins, and all other fps are probably allergenic. High antigenicity was observed in bacterial cells as compared to yeast and tumor cells. Nevertheless, the predicted expression of Mfps in a bacterial host is higher than in other expression hosts. Important to note that all Mfps showed significant immunological activity in the human system, and we concluded that these antigenic, allergenic, and immunological properties are directly correlated with their amino acid composition. The study’s major goal is to provide a comprehensive understanding of Mfps and aid in the future genetic engineering and expression of Mfps and its diverse applications in different fields. Communicated by Ramaswamy H. Sarma


Introduction
Mussel leads a sessile mode of life in an environment where waves, rocks, and predators pose continuous threats, mussels rely on a secreted adhesive apparatus called the byssus for anchorage.The byssus thread primarily comprises six distinct types of mussel foot proteins (Mfps) known as Mfp1, Mfp2, Mfp3, Mfp4, Mfp5, and Mfp6, also known as major Mfps and other Mfps categorized as variant proteins (Forooshani & Lee, 2017;Anand & Shibu Vardhanan, 2021).Mfp3, 5, and 6 are mainly present in the plaque region of the byssus thread, while Mfp1 and 2 are present in the cuticular outer layer of the byssus thread and Mfp4 is present in the core region of the byssus thread.These proteins are characterized based on their isoelectric point (pI) due to their high cationic amino acid content (Anand & Shibu Vardhanan, 2020, 2021).The adaptive attachment of marine mussels to a wide variety of substrates has been extensively studied and is a key element for bioinspired wet adhesion research (DeMartini et al., 2017;Anand & Shibu Vardhanan, 2020).Studies on biologically inspired synthetic adhesives and those containing mussel foot proteins (Mfps) assure new research areas and promote the formulation of next-generation underwater adhesives (Wei et al., 2015).Thorough knowledge is required to understand the molecular interactions in wet adhesion mechanisms, many of which occur at length and time scales that are generally too small to be reliably described by experiments.However, more advanced studies that combine theoretical modeling with state-of-the-art experiments are required to advance the development of new submarine adhesives (Levine et al., 2016).
Exploring the functional properties of each Mfps in a wet environment that requires large quantities of proteins for multiple laboratory experiments.Biomimetic adhesive molecules have been produced with recombinant technology and expressed in suitable expression systems; this prevents extracting adhesive proteins directly from animals.This technique permitted the investigation of adhesive proteins' essential features, including adsorption and adhesive properties of Mfps, at a microscopic and macroscopic level (Hwang et al., 2005;Cha et al., 2009;Hennebert et al., 2015).Approximately ten thousand mussels are required for one mg of protein isolation through the conventional extraction method (Morgan, 1990).The large-scale production of Mfps needs various prokaryotic, eukaryotic, and cell or tissue culture-based expression systems.Commonly used hosts for expression studies are bacteria (Escherichia coli), yeast (Saccharomyces cerevisiae), plants (tobacco), and mammals (goat, rabbit, and mouse) (Silverman & Roberto, 2007;Hennebert et al., 2015;Wang & Scheibel, 2018).There are benefits and drawbacks for all expression hosts, and they need thorough review before expensive wet lab trials.The critical features for successful Mfps expression systems include the presence of promoter, transcriptional and translational regulators, codon compatibility of the host, ideal for processing and protein purification methods, the ability to perform PTMs, the ability to provide high yields of the Mfps, and a simple scale-up process (Silverman & Roberto, 2007;Hennebert et al., 2015;Wang & Scheibel, 2018).For the microscale adhesion tests, recombinant Mfps were successfully produced and showed good performance, but macroscale testing and large-scale applications were still not feasible.Because Mfps have a toxic reaction to the host, low expression levels and solubility of purified Mfps are deficient (Cha et al., 2008;Stewart, 2011).
However, to enable profitable production of Mfps, we need to understand the structural and functional properties of Mfps.Our previous paper discussed the physicochemical, structural, and functional properties, ion-ligand binding ability, amino-acid compositions, signal-peptide regions, structural stability, and evolutionary divergence of Mfps (see Anand & Shibu Vardhanan, 2020, for review).The present study discusses the post-translation modifications, antigenicity, allergenicity, immunogenicity, codon optimization for cloning, and expression (expression systems are bacteria, yeast and human cells -a comparative analysis) studies and mRNA secondary structure analysis of all available Mfps using in silico tools (Figure 1).This study will provide a comprehensive understanding of Mfps and facilitate the future genetic engineering and expression of Mfps and their diverse applications in various fields.

Datasets
Bivalve Mfps (mussel foot proteins) sequences in FASTA format were retrieved from the NCBI protein database (August 2019) (www.ncbi.mlm.gov/protein).Selection criteria are mainly based on Mfps producing bivalves in which the complete sequence of at least one adhesive foot protein (fp) is identified (see Anand & Shibu Vardhanan, 2020)

Post-translational modification of Mfps
The post-translational modifications (PTMs) of Mfps were predicted using the ModPred server (Pejaver et al., 2014).It consists of 34 ensembles of logistic regression models trained separately on a combined collection of 126,036 non-redundant experimentally verified sites of 23 different modifications collected from the public databases and an ad hoc literature search (Zhang et al., 2020).The Fasta sequence of Mfps was used as input to predict various PTM sites such as Acetylation, Farnesylation, N-linked glycosylation, PUPylation, ADP-ribosylation, Geranylgeranylation, N-terminal acetylation, Pyrrolidone Carboxylic Acid, Amidation, GPI-anchor amidation, O-linked glycosylation, Sulfation, C-linked glycosylation, Hydroxylation, Palmitoylation, SUMOylation, Carboxylation,  Methylation, Phosphorylation, Ubiquitination, Disulfide linkage, Myristoylation, and Proteolytic cleavage.The server's output showed three different types of PTMs scores, i.e. low confidence, medium confidence, and high confidence.We only recorded the high confidence PTMs score for this study, as it gives a high probability score similarity to the wet lab study, and the cutoff score is 0.85.

Allergenicity prediction of Mfps
Three different types of algorithms were used to predict the allergenicity of Mfps, such as AlgPred, AllerTOP v.2.0, and AllergenFP v.1.0.(1) AlgPred (https://webs.iiitd.edu.in/raghava/algpred/index.html)allows the prediction of allergens based on the similarity of known epitope with any region of the protein (Saha & Raghava, 2006).We used a different combination of algorithm (i.e.SVM, IgEepitope, ARPs BLAST, MAST) to predict the allergenicity with high accuracy.
(2) The method of allergenicity prediction by using AllerTOP V.2.0 (https://www.ddg-pharmfac.net/AllerTOP/) is based on auto cross-covariance (ACC) transformation of protein sequences into uniform equal-length vectors (Wold et al., 1993) and applied to quantitative structure-activity relationships (QSAR) studies of peptides with different lengths.The proteins are classified by the k-nearest neighbor algorithm (kNN, k ¼ 1) based on a training set containing 2427 known allergens from different species and 2427 non-allergen (Venkatarajan & Braun, 2001;Dimitrov et al., 2013).AllerTOP is the first alignment-free server for in silico prediction of allergens based on proteins' physicochemical properties.In addition to allergenicity, AllerTOP can predict the route of allergen exposure: food, inhalant, or toxin (Dimitrov et al., 2013).(3) In AllergenFP v.1.0(https://ddg-pharmfac.net/AllergenFP/): the amino acid in the protein sequence in data sets was described by five E-descriptors (Venkatarajan & Braun, 2001), and the strings were transformed into uniform vectors by auto-cross covariance (ACC) transformation.The E descriptors for the 20 naturally occurring amino acids, defined by Venkatarajan and Braun (2001), were derived by principal component analysis of data matrix consisting of 237 physicochemical properties.The first principal component (E1) reflects the hydrophobicity of amino acids; the second (E2) -their size; the third (E3) -their helix-forming propensity; the fourth (E4) correlates with the relative abundance of amino acids; the b-strand forming tendency propensity dominates the fifth (E5) and used ACC transformation to make the length of the protein uniform.The subsets of antigens and non-antigens were transformed into matrices with 25 � 15 variables each.The derived matrix consisted of 4854 rows (2427 allergens and 2427 nonallergens) and 25 � 15 columns.Each column was divided into 11 intervals, and generated a 25 � 15 x 11-digit binary fingerprint for each protein.A digit in the fingerprint equals one, if the ACC value falls into the corresponding interval; otherwise, it takes 0. Thus, each protein has a unique binary fingerprint consisted of 25 � 15 units and (25 � 15 x 11 -25 � 15) nulls.Tanimoto coefficients were calculated for all protein pairs in the set.According to the protein from the pair, a protein was classified as allergen or non-allergen with the highest Tanimoto coefficient (Tanimoto, 1958;Dimitrov et al., 2014).
Furthermore, the allergenicity of the Mfps was confirmed by another server named AllerCatPro (https://allercatpro.bii.astar.edu.sg/).AllerCatPro predicts the potentially allergenic proteins based on the similarity of their 3 D protein structure as well as their amino acid sequence compared with a dataset of known protein allergens comprising of 4180 unique allergenic protein sequences derived from the union of the foremost Food Allergy Research and Resource program, comprehensive protein Allergen resource, WHO/international union of immunological societies, UniProtKB and Allergome (Maurer-Stroh et al., 2019).

Antigenicity prediction of Mfps
The antigenicity of the Mfps were evaluated by ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) and VaxiJen v.2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html).Antigenicity prediction of ANTIGENpro server was based on protein antigenicity microarray data.It is an alignment-free and pathogen-independent server; it estimated the accuracy of the server using the combined datasets to be 76% based on validation experiments (Magnan et al., 2010).VaxiJen v.2.0 server is an alignment-free approach for antigen prediction based on auto cross-covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties.We used bacterial, viral, and tumor protein datasets were used to derive models for predicting the whole antigenicity.Every set consisted of 100 known antigens and 100 non-antigens.The derived models were tested by internal leave-one-out-validation and external validation using test sets.The models performed well in both validations showing prediction accuracy from 70% to 89%.It allows antigen classification solely based on proteins' physico-chemical properties without recourse to sequence alignment (Doytchinova & Flower, 2007a, 2007b, 2008).A threshold of 0.4 was used to differentiate between antigenic and non-antigenic proteins.

Immune simulation
In silico immune simulations were conducted in the C-ImmSim server (Rapin et al., 2010) to characterize the immunogenicity and immune responses of each Mfps.C-ImmSim is an agent-based model that uses a positionspecific scoring matrix (PSSM) for immune epitope prediction and machine learning techniques to predict immune interactions.It simultaneously simulates three compartments that represent three separate anatomical regions found in mammals: (1) the bone marrow, where hematopoietic stem cells are simulated and produce new lymphoid and myeloid cells, (2) the thymus, where native T cells are selected to avoid auto immunity; and (3) a tertiary lymphatic organ, such as a lymph node.We also measured the Simpson Index (D) for understanding the emergence of different epitope-specific dominant clones of T-cells.The smaller the D value, the lower the diversity (Rapin et al., 2010;Shey et al., 2019;Kathwate, 2020).All simulation parameters were set at default in condition.

In silico cloning and validation
Before in silico cloning and expression of the Mfps in a bacterium (E. coli K12 strain), yeast (Saccharomyces cerevisiae), and mammalian system, the cDNA of all Mfps were retrieved from European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/home).Then, the codon optimization studies were carried out by the Java Codon adaptation tool (http:// www.jcat.de/)(Grote et al., 2005).Parameters such as codon adaptation index (CAI) and GC content were also analyzed.CAI tells about codon usage biases; the ideal CAI score should be 1.0, but more than 0.8 can be considered a good score (Sharp & Li, 1987;Morla et al., 2016).GC content of a sequence should be range between 30-70%; GC content values that do not reside in this range show unfavorable effects on transcription and translational efficiencies (Ali et al., 2017).Then, the optimized codon sequence was subjected to NEBcutter to add restriction sites to the N-terminal region and the C-terminal region of the cDNA sequence (Kalita et al., 2020).Finally, the SnapGene tool was used to ligate the optimized DNA sequence (with restriction enzyme) in a suitable plasmid vector for the expression of the Mfps.The inserted fragments' nucleotide sequences were verified by digesting the cloned vector with a selected restriction enzyme and separated in 1% agarose gel (performed in SnapGene).

mRNA secondary structure and stability prediction
The secondary structure of Mfps mRNA was predicted by using the Mfold web server (http://unafold.rna.albany.edu/?q=mfold) (Zuker, 2003), both before and after codon optimization.This server algorithm was used to predict mRNA's stability and evaluate the translation efficacy of Mfps mRNA in an expression system.

Molecular dynamics (MD) simulations
There are no crystal structures available for the Mfps, so their 3 D structures required for the further simulation-based analysis.Homology modelling of all known Mfps and their structural validations were described in our previous paper (see Anand & Shibu Vardhanan, 2020, for review).MD simulations were performed for the post-translationally modified Mfps.Incorporation of PTMs were done in Vienna-PTM V 2.0 server (http://vienna-ptm.univie.ac.at/).The algorithm supports a total of 260 different enzymatic and non-enzymatic modifications.The server performs geometrically realistic introduction of modifications at sites of interests, as well as subsequent energy minimization and forcefield parameter (GROMOS 54A8, 54A7, 45A3 force fields) optimization (Margreitter et al., 2013;Petrov et al., 2013;Margreitter et al., 2017).Posttranslationally modified, energy minimized and GROMOS 54A8 forcefield optimized PDB structure of Mfps used for MD simulations.MD simulations of each Mfps were carried out in GROMACS software.Each Mfps subjected to the following protocol: solvent -water (SPC model), octahedron box type, salt type -Na þ and Cl -(0.15 M for net charge neutralization), equilibrium type -NVT (constant number of particles, volume and temperature) and NPT (constant number of particles, pressure, and temperature), temperature À 300 K, pressure À 1 bar, MD integrator -Leap-frog, simulation time À 10 ns and approximate number of frame per simulation À 1000.Finally, RMSD, RMSF, Rg, SASA, and HBONDS of each Mfps were calculated.

Results and discussion
Mussel foot proteins (Mfps) are considered a promising biomaterial in medical, environmental, and industrial applications due to their exceptional adhesive properties, including flexible and robust adhesion, toughness, biodegradability, and biocompatibility (Hammer & Tirrell, 1996;Silverman & Roberto, 2007).Several previous studies are confirmed the

Post-translational modification of Mfps
The physico-chemical properties of Mfps are directly correlated to their PTMs (Sagert et al., 2006).So, understanding the PTMs of each Mfps is very crucial in designing the chimeric-Mfps and their expression.A total of 14 different types of PTMs were observed in Mfps, such as sulfation, amidation, acetylation, methylation, ubiquitination, phosphorylation, hydroxylation, SUMOylation, pyrrolidone carboxylic acid, O-linked glycosylation, N-linked glycosylation, palmitoylation, carboxylation, and ADP-ribosylation.We know that disulfide linkage and proteolytic cleavage is not fit the conventional definition of PTMs (Pejaver et al., 2014).But in this paper, we also listed the proteolytic cleavage and disulfide linkage of Mfps (Figure 2 and Supplementary file S1).New insights into the understanding of protein PTMs are provided by developing advanced sensitive techniques and profound analysis of proteomic data.With the advancement in this field, the number of new PTMs sites identifications has increased exponentially (Farley & Link, 2009;Chuh & Pratt, 2015).In Mfps, three different types of PTMs are commonly studied, i.e. glycosylation, hydroxylation, and phosphorylation (Hennebert et al., 2015).In our study (Figure 3), we identified that two different types of glycosylation existed in Mfps, such as O-linked glycosylation (serine & threonine) and N-linked glycosylation (asparagine).O-linked glycosylation is more prominently found in fp1 of M. californianus and M. edulis, and N-linked glycosylation is found in M. yessoensis fp1.On this basis, we concluded that glycosylation was more prominently expressed in fp1 compared to other fps.These findings are corroborated by the previous PTMs analysis of Dpfp1 (Rzepecki & Waite, 1993) and Pvfp1 (Ohkawa et al., 2004;Zhao et al., 2009).The fundamental role of glycosylation in Mfps is not properly assessed but is currently believed to increase the structural conformational stability and enhance protein binding capability (Roth et al., 2012).
Hydroxylation is relatively uncommon PTMs compared to other PTMs (i.e.glycosylation or phosphorylation) (Chopra & Ananthanarayanan, 1982;Kaelin, 2005).However, protein hydroxylation is a critical process in Mfps; it makes the protein more competitive with water by creating hydrogen bonds with surfaces (Papov et al., 1995;Burzio et al., 1997;Lee et al., 2011).In our study, three amino acids were found to be hydroxylated (tyrosine, lysine, and proline), among these amino acids, with the hydroxylation of tyrosine residues by polyphenol oxidase leading to the formation of DOPA (3,4-dihydroxyphenylalanine), which act as a key constituent in the Mfps (Waite 1991;Lee et al., 2011;Wilker, 2011;Maier et al., 2015).Proline hydroxylation was more evidently observed in Mefp and Pvfp, and these results confirm the previous findings of Waite (1983), Taylor & Waite (1994), and Zhao et al. (2009).These authors reported that mainly two different hydroxylated derivatives were observed in Mfps, such as 4-Hydroxyproline and 3,4-dihydroxyproline, which were detected in the decapeptide region of Mefp1.Compared to previous results, our findings suggest that only proline and lysine hydroxylation showed a high confidence score in Pvfp1 hydroxylation compared to tryptophan hydroxylation (Zhao et al. (2009).Except for both variants of Pvfp1, all other PTMs identified in our work are novel predictions (not listed, see supplementary file S1 for a detailed report).According to Hennebert et al. (2015;a review article), only trace levels of DOPA were found in Pvfp1 relative to Mefp1, suggesting that DOPA may functionally be replaced by hydroxytryptophan (Zhao et al., 2009).Among the four different types of phosphorylation (O-phosphorylation, N-phosphorylation, S-phosphorylation, and Acylphosphorylation) (Reinders & Sickmann, 2005;Jia et al., 2012), O-phosphorylation is commonly observed in Mfps (Sagert et al., 2006;Flammang et al., 2009).In our analysis, only lysine phosphorylation showed a high confidence PTMs score compared to serine (Sagert et al., 2006;Flammang et al., 2009).The actual role of phosphorylation in Mfps is not yet analyzed; it is suspected that phosphorylation improves the metal ligand-binding ability of Mfps and helps them bind to calcareous materials (Zhao & Waite, 2006).
There are only a few studies available related to the PTMs of Mfps, most of which concentrate on tyrosine modifications (Lim et al., 2011;Choi et al., 2012).We can assuredly assume that each PTMs plays a specific role in Mfps wet adhesion; the proper understanding of PTMs in functional aspects will help design Mfps in genetic engineering and their expression and their diverse applications in different fields; the basic functional properties of Mfps have already been discussed by Anand and Shibu Vardhanan (2020, see for review).The important thing is that the reported PTMs are only predictions (for the high accuracy prediction, we only used the high score PTMs sites of Mfps).The functional significance of Mfps PTMs sites may also be related to the expression system, especially if the expression system is distantly related to organisms (Chen et al., 2016).This work is the first attempt to evaluate Mfps PTMs; more in-vitro and in-vivo studies are required to evaluate Mfps PTMs and their functional importance during the wet adhesion process.

Allergenicity of Mfps
To predict the allergenicity of Mfps, we used three different types of in silico tools; if three tools showed the convergent allergenicity of Mfps, then Mfps were considered to be a potentially allergenic protein.Among the 78 Mfps, six fps ( Mcfp2,  Mcfp10, Mefp2, Mufp2, Mufp6 v2, and Mufp6 v7 (Saha & Raghava, 2006).All Mfps are non-allergenic while using IgE, MEME/MAST motif, and an ARP-based approach.Therefore, in this study, we mainly focused on the SVM -amino acid composition-based approach for determining the allergenicity of Mfps, and their threshold is 0.4 (Table 2).
For cross-validation, we used the AllerCatPro server for identifying the allergenic Mfps from public databases (i.e.Food Allergy Research and Resource program, complete protein Allergen resource, WHO/international union of immunological societies, UniProtKB and Allergome) (Maurer-Stroh et al., 2019).Currently, no previous records available related to the allergenicity of Mfps from the public database.This could be the key reason why no significant allergenic hit (E value threshold 0.001) was found on the AllerCatPro server (Supplementary file S2).

Antigenicity of Mfps
Different types of prokaryotic, eukaryotic and cell or tissue culture-based expression systems were widely used for mass production of Mfps (Hennebert et al., 2015).Before being expression in a host, we need to understand the antigenic properties of the protein in the expression system.In this study, we used the three different host (such as, bacteria, virus and tumour cells) for understanding the antigenic properties of Mfps.Prior to expression in a host, the advantage and disadvantage of the host should be considered.Most of Mfps are non-antigenic in tumour system and then followed by bacteria and virus (Figure 4 Mcfp6 v1, Mcfp6 v2, Mcfp6 v3, Mcfp7 v1,  Mcfp11, Mcfp12, Mcfp14, Mcfp15, Mcfp17, Apfp1, Mufp2,  Mufp6 v1, Mufp6 v3, Mufp6 v4, Mufp6 v5, Mufp6 v6, Mufp6  v7, Mufp6 v8, Mufp6 v9, Pvfp5 and Pvfp6 (Table 3).

Immune simulation
The wet adhesive property of Mfps is highly promising and offers many applications like tissue engineering, surgery, bone regeneration, dental surgery, and so on (Grande & Pitman, 1988;Hong et al., 2012;Mehdizadeh et al., 2012;Song et al., 2018).The toxicity and immunogenicity of Mfps have to be analyze before any form of biomedical application.Preliminary immunogenicity tests have shown that Mfps are non-antigenic and have neutral immunological properties (Waite, 1987;Saez et al., 1991).Mfps isolated from M. edulis exhibit high bonding levels with various substrates such as stainless steel, pig duodenal mucosa, porcine small intestine submucosa, and porcine skin (Hansen et al., 1994;Schnurrer & Lehr, 1996;Ninan et al., 2003Ninan et al., , 2007)).However, previous research has not adequately addressed the immunological responses of Mfps in in-vivo conditions, and most of the studies concentrated on in vitro analysis, i.e. cell line-based analysis such as cell adhesion, cytotoxicity, cell proliferation, and so on (Bhagat & Becker, 2017).
Immunogenicity assessment of Mfps is an integral part of the development of therapeutical Mfps.The C-ImmSim server predicts the actual immune response of Mfps in the mammalian system.Our results proved that except for Myfp1 v1 (the antigenicity of Myfp1 v1 is 0.135 in the ANTIGENpro server), all other Mfps exhibit immunological reactions (Figure 5).This immune simulation analysis is mainly based on the amino acid composition, the identified epitope of each Mfps, and the various immune responses (immune simulations of 78 Mfps from nine species are provided in the supplementary file S4).A high level of IgM characterized the primary immune response.Increasing the B-cell count, high levels of IgM, IgG1 þ IgG2, and IgM þ IgG with decreased antigen concentration are indicated as the secondary and tertiary immune responses (Shey et al., 2019;Kathwate, 2020).Different B-cell isotype formation (isotype switching) and B memory cell formation indicate immune memory development.In addition to the development of memory cells, the T H and T c populations have also increased.IFN-g, TGF-b, IL-10, and IL-12 were increased along with decreases in IL-4, IL-6, IL-18, IL-23, TNF-a, and IFN-b.High-level expression of dendritic cells (cell-mediated immunity) and IFN-g and IL2 (humoral immunity) with a low Simpson index (D) indicates sufficient immunoglobulin production.The smaller D value indicates the lower diversity of T cells clones (Figure 5).Table 3. Antigenicity of Mfps.In ANTIGENpro -the potentially antigenic Mfps probability score is >0.80.In VaxiJen v.2.0 -Mfps are antigenic in all the threeexpression systems is highlighted in as red colour and non-antigenic Mfps are highlighted as blue colour (threshold level of antigenicity was set to be 0.4 The immune simulation study was conducted to understand the antigenicity of Mfps and also to reveal the generation of adaptive immunity and immune interactions.In the immune simulation analysis (Figure 5), we proved that Mfps act as a prominent antigen because they have the ability to induce the production or generation of antibodies.It is important to note that certain Mfps,such as Mcfp3 v3,Mcfp3 v4,Mcfp3 v5,Mcfp3 v7,Mcfp3 v8,Mcfp3 v9,Mcfp3 v10,Mcfp3 v11,Mcfp8,Mcfp16,Mufp3,Mufp3 v1,Mufp3 v2,Mufp3 v4,Mufp3 v5,Mufp3 v6,Mufp3 v7,Mufp3 v7,Mufp3 v8,Mufp3 v9,Mufp3 v10,Mufp3 v11,and Mufp3 v12, have only induced antigen production and do not lead to the production of immunoglobulins (such as IgM, IgM þ IgG, IgG1 þ IgG2, IgG1, and IgG2).However, Mgfp3 v1, Mgfp3 v2, Mufp3 v3, Mufp6 v3, Mufp6 v6, and Mufp6 v7 only induced the production of IgM with a decreasing concentration of antigen (Supplementary file S4) and didn't lead to the production of cytokines.From the overall analysis, i.e. based on the immunological responses of 78 Mfps from nine species; we can confirm that the human immune system recognizes the Mfps from the first day onwards, and antibody production begins from the day �4 th onwards (Figure 5(a)).The analysis of cytokine and interleukin production is crucial for studying both innate and adaptive immunity; in comparison to other cytokines, IFN-g production is high from the 2 nd day onwards, and it plays an important role in macrophage activation, as well as stimulating natural killer cells and neutrophils (Figure 5(b)).as seen in the B-cell population (cells per mm 3 ) analysis (Figure 5(c)), Mfps do not activate B isotypes IgG1 and IgG2, but they do activate B memory (y2) cells and B isotype IgM.In the Ig population analysis, IgM þ IgG immunocomplex formation (� 16 cells mm 3 ) is very high compared to IgM (� 11 cells mm 3 ), IgG1 (� 4 cells mm 3 ), and IgG2 (� 2 cells mm 3 ).Interestingly, IgM þ IgG complex formation was observed from the 2 nd days onwards, whereas other Ig formations were only observed from the 4 th day onwards (Figure 5(d)).T c cell activity is very low, there is no T c memory (y2) cell formation, or their activity is in a resting condition (Figure 5(e,f)).During the formation of Ig, the formation of T H cells is also very important; from the 2 nd day onwards, the T H memory (y2) cell population increased until it reached �350-350 memory cells per mm 3 , which lasted for about 30 days.However, the activation of T H cells was only observed from the day 5 th onwards (Figure 5(h)).And the NK cell population is highly fluctuating in nature; high NK cell populations were seen between the 9 th and 11 th days (Figure 5(i)), while the MHC-2 cell population was only seen up to the 7 th day (Fig. 5(j)) (immune simulation of 78 Mfps from nine species was provided in the Supplementary file S4).The purpose of this immunoinformatic analysis is to determine the immunogenicity of Mfps; previous research has suggested that Mfps are immunoneutral neutral in nature (Rathi et al., 2018;Anand & Shibu Vardhanan, 2020;Pandey et al., 2020).
According to Choi et al. (2014), Mfps purity is essential while in vivo applications, any residual impurities present in Mfps (presence of LPS-lipopolysaccharide; due to improper purification of expressed Mfps) can cause innate immune response activation, i.e. activation of pro-inflammatory cytokines such as TNFa and IL6.Supporting our findings, Saez et al. (1991) studied the immunological reactivity of Mfps from three different species of Chilean mussels (Choromytilus chorus, Mytilus chilensis, and Aulacomya ater).C. chorus, and M. chilensis exhibit a high level of immunological reactions compared to A. ater.This study concluded that the different degrees of reactivity are directly correlated with the different amino acid compositions of Mfps.Saez et al. (1991) also suggested that proline-rich Mfps showed a higher level of immunological reactions than glycine-rich Mfps.Our observation agrees with this finding; most of the selected Mfps are proline-rich compared to glycine and other amino acids (Anand & Shibu Vardhanan, 2020).Most of the studies suggest that Mfps have exciting applications in many fields, particularly biomedical research.However, it is interesting to note that, currently, no Mfps or mussel-inspired bioadhesive has been approved by the FDA or has been reported in clinical trials (Rathi et al., 2018;Pandey et al., 2020 for review).4).For cloning purposes, we selected E. coli (strain K12) based on the CAI and GC content analysis.Even though they show antigenicity, the expression of the Mfps is high in E. coli compared to yeast (Saccharomyces cerevisiae) and the human cell system.The pET series of expression plasmids are widely used for recombinant protein production in E. coli (Shilling et al., 2020).pET28a is the most popular expression plasmid, and it contains the T7 promoter and an adjacent lac operator sequence that is included to suppress uninduced expression (Dubendorf & Studier, 1991;Shilling et al., 2020).Translation initiation is mediated by a Shine-Dalgarno (SD) sequence originating from the major capsid protein of T7 (gene 10 protein).In a typical experiment, because of the presence of a poly-histidine tag (His 6 ) and a thrombin protease recognition site (TPS), we can purify the expressed protein by using the standard purification method (Shilling et al., 2020).The pET28a(þ) plasmid vector system was selected for the in silico cloning analysis based on these features.Various restriction sites were present in the optimized cDNA of Mfps (optimized in E. coli) (78 Mfps restriction sites and cloning details were included in the supplementary file S5).One of the essential features in the selection of pET28a, the presence of LacI promoter, in the presence of isopropyl-b-D-thiogalactopyranoside (IPTG) that allows downstream transcription of the gene (i.e.Mfps) (Hwang et al., 2005;Gim et al., 2008;Jiang et al., 2012).After the initial screening of the restriction sites in all Mfps cDNA, for uniformity, XhoI and BamHI restriction sites were added at the N-terminal and Cterminal regions of optimized cDNA Mfps, respectively.The constructed chimeric Mfps was then cloned into the pET28a(þ) vector using the SnapGene tool for effective expression in the E. coli K12 strain (Figure 6).The constructed chimeric Mfps clone's size and position were confirmed by double digestion, using XhoI and BamHI, and run through in 1% agarose gel (simulated in SnapGene) (Figure 7).This in silico cloning and electrophoresis confirmed that we can use pET28(a)þ cloning vector as universal plasmid for Mfps expression and that we can use XhoI and BamHI restriction enzymes in Mfps for cloning purposes, as these restriction enzymes sites were not found in the Mfps.
According to Hwang et al. (2004Hwang et al. ( , 2005)), expression of Mgfp3-A and Mgfp-5 in E. coli inhibits cell growth and shows a toxic impact on E. coli cells.And also, the expression and yield of this protein are low.The authors also report that after several hours of IPTG induction, E. coli stopped the expression of recombinant Mfps and started a slight proteolytic degradation of Mfps.Our findings supported these results, i.e. the fps showed antigenicity and allergenicity impacts on E. coli.The antigenicity properties of Mfps may lead to the activation of E. coli cellular defense mechanisms, and these defense mechanisms might be the reason for the proteolytic degradation of expressed recombinant Mfps.But in Pvfp1 and Pvfp5b, they did not show antigenicity and allergenicity and exhibited a high-level expression probability in E. coli.Jiang et al. (2012) proved that Pvfp1 repeating and non-repeating regions are non-toxic to mouse osteoblast MC3T3-E1 cells, and their expression level is high in bacteria.Santonocito et al. (2019) proved that the expression of Pvfp5b in E. coli did not create any toxic impact on the expression system, and had no cytotoxic impact on NIH-3T3 and HeLa cell lines.This Mfps (Pvfp5b) has a high degree of cell adhesive properties on both glass and plastic plates.This finding is contradictory to our results because Pvfp-5 exhibits potential antigenic properties (Table 3).In our study, the antigenicity, allergenicity, and immunogenicity predictions were based on the amino acid composition of Mfps.The amino composition of Pvfp5 and Pvfp5b is not identical; the difference in the amino acid composition may be a key explanation for the non-toxicity of Pvfp5b.Some of the previous reports (Hwang et al., 2007;Gim et al., 2008;Choi et al., 2012;Jiang et al., 2012;Santonocito et al., 2019) explain that the expression of single Mfps is toxic to the expression system (i.e.E. coli), but the combination of a few Mfps genes (hybrid Mfps) or the expression of a partial region of Mfps does not create any toxic impact on E. coli.According to Gim et al. (2008), expression of Mgfp-353 in E. coli showed a high yield and was non-toxic to the expression system.In concordance with these findings, Hwang et al. ( 2007  proved that expression of Mgfp-151 showed a high production yield, improved solubility, exhibited a strong potential as a practical bioadhesive, and didn't show any toxic impact on the E. coli expression system.Thus, our findings concluded that even though Mfps exhibits a high degree of antigenic and allergenic properties in the E. coli system, E. coli is the best expression system compared to yeast and human cells.Based on this, we suggested that, either by using a single Mfps in the expression system, a combination of Mfps (hybrid Mfps) could be used in an expression system that would produce good results and would not produce any toxic impacts on the host system and exhibit high solubility and yield (based on previous findings of Hwang et al., 2007, Gim et al., 2008;Cha et al., 2009;Choi et al., 2012;Santonocito et al., 2019).

Secondary structure of Mfps mRNA and their stability
The secondary configuration of mRNA is an essential factor in protein expression.The mRNA secondary structure and free energy of each Mfps before and after codon optimization (optimized in E. coli) were measured using the Mfold server.Generally, the structure's lower thermodynamic   energy is suggested as the most stable and well-structured mRNA and is highly efficient for translation.Subsequently, higher stability would result in higher expression rates.The structure of mRNA was adjusted based on the low DG and start codon energy.This character will support the binding of ribosome units and translation initiation (Dana et al., 2020).The data showed that the mRNA was stable enough for efficient translation in the new host (i.e.E. coli strain K12  5).mRNA structural analysis of all Mfps clearly demonstrated that marginally significant structural variations were observed both before and after optimization (Figure 8) (78 Mfps mRNA structure and their stability, before and after optimization were included in Supplementary files S6 and S7).We also tested the amino acid composition before and after optimizing mRNA; there was no substantial difference between them.We can infer that the optimization process does not alter Mfps amino acids' composition, so the expressed Mfps may mimic the normal bioadhesive structural organization and functional features of Mfps (see Anand & Shibu Vardhanan, 2020).

Molecular dynamic simulation
Currently few studies are available related to MD simulation of fps.However, all the studies were focused on the partial purified Mfps rather than full-length Mfps (Petrone et al. 2015;Levine et al., 2016;Zhao et al., 2020;Shahryarimorad et al., 2022.During the MD simulation, we did not find any significant changes in Mfps.Overall changes in each Mfps stability were investigated by RMSD calculation.Pvfp5 (0.47) had the lowest RMSD value, and Mcfp1 v1 (0.78) had the highest.The stable conformation of Mfps was validated by the RMSD value, which means that it deviates from the original shape less frequently.The flexibility of different segments of the protein was revealed by the RMSF for each residue.Interestingly, we observed that some fluctuations were observed in the signal peptide regions of the Mfps (first 10-20 amino acid residue), while other residues were stable, showing less fluctuation.From the overall analysis, we confirmed that some level of fluctuations were observed in the loop and sheet regions of the Mfps.The compactness of the protein structures was measured using the Rg, which maintained a steady value in all Mfps (high Rg value observed in Mcfp1 v1 and low in Pvfp5).In the hydrophobic core region of Mfps calculated by change in solvent-accessible surface area (SASA).From the analysis, we confirmed that fp1 of all Mfps showing highest SASA and followed by plaque region fps.Hydrogen bond (HB) analysis is essential to understand the stability and flexibility of protein.Same as in SASA, highest flexibility and stability was observed in thread region protein and followed by plaque region fps.
From the overall MD analysis, we inferred that cuticular and thread core region protein (i.e.fp1, fp2, fp4) showing more flexible nature than plaque region proteins (i.e.fp3, fp5, fp6) (Table 6).Nonetheless, high level of compactness was observed in plaque region fps.

Limitations of the study
Mussel foot proteins (Mfps) have broad biotechnological applications.However, macroscale production of Mfps is very challenging.A variety of in silico technologies were used to comprehend the physicochemical characteristics of Mfps.In the study, we concentrated on the immunological and expression efficacy of all known Mfps.For the commercial manufacture and formulation of Mfps for diverse industrial and biomedical applications, in-depth understanding is important.In the case of MD simulation analysis, we need to study the effect of different temperature, pH, solvent on Mfps structure and function.The Mfps interact with one another and produce a hierarchically organized byssus thread.Mfps binding socket and interaction network analysis is essential for understanding the hierarchical organization of Mfps in byssus thread.Binding free energy calculation (MMPBSA and MMGBSA) aid to understand the interaction between Mfps and ligands.Our future research will concentrate on the impact of pH and temperature on the structure and functionality of Mfps, as well as the protein-protein network interactions and the free binding energy of Mfps.

Conclusion
This is the first study report using in silico methods to evaluate the allergenicity, antigenicity, immunogenicity, codon optimization, cloning, and mRNA structural analysis of all known Mfps.The in-silico approaches could substantially reduce time, cost, and cumbersome experimental trails.From this study, we concluded that the majority of the Mfps   showed antigenic and allergenic properties.They exhibit high antigenicity in bacteria compared to yeast and tumor cells.Nevertheless, the expression of Mfps in the bacterial host is high compared to other expression hosts.It is important to note that all Mfps showed significant immunological activity in the human system, and we concluded that these antigenic, allergenic, and immunological properties are directly correlated with the amino acid composition of Mfps.We believe that the current study findings will be useful in the cloning and expression of Mfps, structural studies, immunological analysis, drug-protein and protein-protein interaction research.

Figure 1 .
Figure 1.Flowchart of step-wise methodology followed in the present study.

Figure 2 .
Figure 2. Diversity and complexity of post-translational modifications in Mfps.The amino acid involved in each PTMs is shown in a small coloured circle -in single letter code.

Figure 6 .
Figure 6.In silico cloning analysis.(a) cloning of Apfp1 procedure; (b) Apfp1 cloned in pET28(a)þ.For detailed data of all Mfps cloning and restriction sites of each Mfps (all Mfps restriction sites and cloning details were provided in Supplementary file S4); performed in SnapGene (https://www.snapgene.com/).

Figure 8 .
Figure 8. mRNA stability and structural characterization before and after optimization of Apfp1 (a) energy dot plot of the original Apfp1 mRNA; (b) energy dot plot of the optimized Apfp1 mRNA; (c) mRNA structure of original Apfp1; (d) mRNA structure of optimized mRNA.For detailed data of all Mfps stability and mRNA structure data (mRNA secondary structure and their stability of all Mfps were included in Supplementary file S5, and S6); generated in Mfold web server (http://unafold.rna.albany.edu/?q=mfold).
. As a result, a total of 78 Mfps are available in the NCBI protein bank.Among these, 34 Mfps in Mytilus californianus

Table 6 .
The calculated parameters for all the systems obtained after 10-ns MD simulations.Root mean square deviation (RMSD) computes the average distance between the backbone atoms of starting structure (reference structure) with simulated structures (frame by frame) when superimposed; Root mean square fluctuation (RMSF) computes fluctuations (standard deviation) of atomic positions of each amino acids (residues) in the trajectory; Radius of gyration (Rg) computes the radius of gyration (structural compactness) of a molecule and the radii of gyration about the x, y, and z-axes, as a function of time; Solvent-accessible surface area (SASA) is an approximate surface area of a biomolecule that is accessible to a solvent with respect to simulate time; HBONDS -hydrogen bonds.Mc -Mytilus californianus, Ap -Atrina pectinata, Dp -Dreissena polymorpha, Me -Mytilus edulis, Mg -Mytilus galloprovincialis, Mu -Mytilus unguiculatus, My -Mizuhopecten yessoensis, Pc -Perna canaliculus, Pv -Perna viridis; Fp -foot protein, v -variant.