Immunoinformatics analysis to design novel epitope based vaccine candidate targeting the glycoprotein and nucleoprotein of Lassa mammarenavirus (LASMV) using strains from Nigeria

Abstract Lassa mammarenavirus (LASMV) is responsible for a specific type of acute viral hemorrhagic fever known as Lassa fever. Lack of effective treatments and counter-measures against the virus has resulted in a high mortality rate in its endemic regions. Therefore, in this study, a novel epitope-based vaccine has been designed using the methods of immunoinformatics targeting the glycoprotein and nucleoprotein of the virus. After numerous robust analyses, two CTL epitopes, eight HTL epitopes and seven B-cell epitopes were finally selected for constructing the vaccine. All these most promising epitopes were found to be antigenic, non-allergenic, nontoxic and non-human homolog, which made them suitable for designing the subunit vaccine. Furthermore, the selected T-cell epitopes which were found to be fully conserved across different isolates of the virus, were also considered for final vaccine construction. After that, numerous validation experiments, i.e. molecular docking, molecular dynamics simulation and immune simulation were conducted, which predicted that our designed vaccine should be stable within the biological environment and effective in combating the LASMV infection. In the end, codon adaptation and in silico cloning studies were performed to design a recombinant plasmid for producing the vaccine industrially. However, further in vitro and in vivo assessments should be done on the constructed vaccine to finally confirm its safety and efficacy. Communicated by Ramaswamy H. Sarma

In its most severe form, however, Lassa fever results in dysfunction of multiple organ systems and coagulopathy leading to hemorrhage, resulting in high fatality rates (Andersen et al., 2015). LASMV infection can be acquired directly by exposure to infectious body fluids or indirectly through tainted foods and surfaces from rodent or human origins (Ogbu et al., 2007). While nosocomial infection was the earliest recorded LASMV outbreaks, direct evidence of substantial human-to-human transmission has since been unusual and most cases are thought to result from exposure to infected rodents (McCormick & Fisher-Hoch, 2002). In view of the risks of increased human-to-human transmission, continuous surveillance of LASMV transmission patterns is crucial for informing the control and prevention of outbreaks, particularly when peaks of Lassa fever activity are detected (Kafetzopoulou et al., 2019;Siddle et al., 2018). Molecular evidence suggests that LASMV began about 1000 years ago (Heinrich, 2020). However, the first case in Nigeria was isolated from a nurse working in a missionary hospital in a small town called 'Lassa' of Northeastern Nigeria in 1969 (Lukashevich et al., 2019). This country is one of the major hotspots of LASMV infection in West Africa (Ibukun, 2020;Olayemi & Fichet-Calvet, 2020). In Nigeria, Lassa fever is endemic and the number of yearly human infection cases peaks during the dry season (December-April) following the Mastromy rat reproductive cycle in the wet season (May-June). A molecular diagnostic laboratory for Lassa fever has been in operation at the Irrua Specialist Teaching Hospital (ISTH), Irrua, in Edo State, Nigeria, since 2008 (Asogun et al., 2012). It has become a reference center for Lassa fever in the country as it is one of a few centers in West Africa providing this service. With an average confirmation rate of about 10%, about 1000 to 4000 suspected cases are tested annually. While the enormous majority of confirmed Lassa fever cases originate from Edo State, significant number of specimens also originate from patients attending hospitals in other parts of the country . There are 472 laboratory confirmed cases, including 70 deaths, from 1 January to 9 February 2020, having case fatality ratio (CFR) of 14.8% (World Health Organization (WHO), 2020). Last dated report filed by the Nigeria Centre for Disease Control (NCDC) on November 18, 2020, showed that there were 98 suspected cases on that particular date alone and one death was recorded from two LGAs in two States (Nigeria Centre for Disease Control (NCDC), 2020). Apart from Nigeria, cases of Lassa fever have also been reported from Burkina Faso, Ghana, Senegal, Gambia, Congo, etc. (Gonzalez et al., 2007;Safronetz et al., 2010).
LASMV is a single-stranded arenavirus with a small RNA segment of 3.4 kb (S-RNA) and a large RNA segment of 7.2 kb (L-RNA) (Lenz et al., 2001). The S-RNA encodes the nucleoprotein (NP, 64 kDa) and the envelope glycoprotein (GP) complex, which is cleaved into two viral glycoproteins GP1 (42 kDa) and GP2 (38 kDa) by the cellular SKI-1/S1P subtilase (Lenz et al., 2001;Xing et al., 2015). The NP encoded by the S-RNA is the major structural protein and is comprised of nucleoplasid proteins (Zinzula & Tramontano, 2013). The main function of these proteins is to interact with the viral RNA forming a ribonucleoprotein (RNP) complex, which is needed for the viral RNA transcription and replication (Mahanty et al., 2003). The NP is useful in the classification of different strains of LASMV because it is the most appropriately sequenced part of the LASMV genome. Hastie et al. (2011) revealed that the NP contains 3 0 -5 0 exoribonuclease function that helps in degrading potential pathogen-associated molecular pattern (PAMP) RNAs which leads to the suppression of the host's innate immunity. The GP also plays significant roles in mediating cellular functions (Watanabe et al., 2018). The GP produces the envelope protein that mediates the attachment of the virus and its entry into the host cell (Maniatis et al., 1982). The large segment (L-RNA) encodes the RNA polymerase (L, 200 kDa) and a Zinc-binding protein (11 kDa) (Xing et al., 2015).
The clinical disease begins as a flu-like illness characterized by fever, general weakness, and malaise, which can be accompanied by cough, sore throat, and severe headache. Gastrointestinal manifestations such as nausea, vomiting and diarrhea are also common. In severe cases, the condition of the patients deteriorates with severe pulmonary edema, acute respiratory distress and clinical signs of encephalopathy, sometimes with coma and seizures, and terminal shock (Ogbu et al., 2007). Bleeding from mucosal surfaces is often observed, however; it is usually not of a magnitude to produce shock by itself. Sensorineural deafness is commonly observed in patients in the late stages of the disease or in early convalescence in survivors (Yun & Walker, 2012).
The major impediment to the development of Lassa fever vaccine is the uncertainty of various lineages to induce immune response against one another (Heinrich, 2020), this might be attributed to the differences in B-and T-cell glycoprotein epitopes (Ibukun, 2020). Studying the structure of the LASMV will significantly help in the development of prophylactic strategies that can target the mechanisms used by this deadly virus in gaining entry into the host's cell. This study is therefore aimed at using Bioinformatics and Immunoinformatics approaches to design novel epitopebased vaccine candidate targeting the two mentioned proteins i.e. GP and NP of LASMV from Nigerian strain.

Protein retrieval and antigenicity prediction
In this study, we targeted two structural proteins i.e. NP and GP of LASMV, each of a particular strain from Nigeria, using the Virus Pathogen Database and Analysis Resource (ViPR) (https://www.viprbrc.org/), an integrated, powerful resource for several virus families and their respective species (Pickett et al., 2012). We have also targeted the same proteins from strains found Sierra Leone, Liberia and Guinea for conservancy analysis. The proteins sequences were extracted from National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/), an important resource for bioinformatics tools and services, providing access to biomedical and genomic information (Benson et al., 2013), using their respective GenBank accession ID and stored in FASTA format for further analysis. As a measure of an immune response, the structural proteins were then applied for antigenicity prediction using the Vaxijen v2.0 server (https:// www.ddg-pharmfac.net/vaxijen/) with a threshold value of 0.4 since the pathogen is a virus (Doytchinova & Flower, 2007). This server uses sequence output and auto crosscovariance (ACC) transformation method to maintain 70-87% prediction accuracy. Auto cross-covariance is a statistical process that provides the prediction of covariance of one process with the other process at a certain time point (Dong et al., 2009). This method is widely used by statisticians to do different types of analyses.

Linear B-lymphocyte epitopes prediction
The linear B-lymphocyte (LBL) epitopes were determined using the BepiPred Linear Epitope Prediction 2.0 method from IEDB server, available at http://tools.iedb.org/bcell/ (Vita et al., 2015). BepiPred 2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures (Peters et al., 2005). Therefore, we selected peptides with lengths ranging between 10 and 40 amino acid long for final vaccine construction. The predicted LBL epitopes were then further assessed to check their antigenicity, allergenicity and toxicity with VaxiJen v2.0, AllerTop v2.0 and ToxinPred server, respectively.

Transmembrane topology determination and human homology prediction
The transmembrane topology of the selected epitopes was determined using the transmembrane topology of protein helices determinant, TMHMM v2.0 server (http://www.cbs. dtu.dk/services/TMHMM/). The server predicts whether the epitope would be a transmembrane peptide or it would remain inside or outside of the membrane (M€ oller et al., 2001). The human homology of the epitopes was predicted using BLASTp server at https://blast.ncbi.nlm.nih.gov/ (Altschul et al., 2005). The main goal of vaccines is to provide life-long immunity to the diseases. And if the epitopes are found to be homolog to the human proteome, then this incident might lead to autoimmune diseases in the body which is why in this study the homology of the epitopes was determined. During the prediction in BLASTp server, all the parameters were kept at their default values and Homo sapiens [taxid: 9606] was used as the comparing organism. An Evalue cutoff of 0.05 was set in the experiment and the epitopes that had no hits below the e-value inclusion threshold were selected as non-homologous peptides (Mehla & Ramana, 2016).

Conservancy analysis and multiple sequence alignment
To determine the conservancy of the selected T-cell epitopes, the multiple sequence alignment (MSA) of the epitopes with the proteins from different isolates, was carried out by the ClustalX 2.1 tool (Larkin et al., 2007). The tool compiles different statistical methods within a single module which can be used to generate the MSA profile of target peptides or proteins. In the conservancy analysis, four GP sequences with GenBank IDs: AMZ00373.1, AZI95971.1, AZI96406.1, AVO03655.1 and four NP sequences with GenBank IDs: AMZ00372.1, AZI95972.1, AZI96407.1, ADU56611.1 were used. The T-cell epitopes that were found to be fully conserved among the sequences of the selected isolates, were considered for vaccine construction (along with some other mentioned criteria) because this will ensure the broad-spectrum efficacy of the designed vaccine.

Population coverage analysis
The distribution of specific HLA alleles among different ethnicities and population is an essential prerequisite for designing an epitope-based vaccine. The expression of different HLA alleles may vary from one population to another. The population coverage analysis was carried out taking into account different regions of Africa where Lassa virus infection is prevalent. The IEDB population coverage tool (http://tools.iedb.org/population/) was used to determine the population coverage of the best-selected epitopes across multiple HLA alleles in different regions of the world where the virus has high infection rate (Adhikari & Rahman, 2017;Bui et al., 2007).

Construction of the multi-epitope vaccine
From the previous steps, the epitopes that were found to be highly antigenic, non-allergenic, nontoxic, and non-homolog, were considered as the most promising epitopes and finally selected for constructing the vaccine. And also, the T-cell epitopes which were found to be fully conserved among the selected sequences from different isolates were also considered for vaccine construction along with the previously mentioned criteria. The cytokine production ability of the HTL epitopes was also considered when selecting the most promising HTL epitopes. During the construction of the multi-epitope vaccine, the EAAAK linker was added at the both ends of the vaccine protein, as it is needed for the effective functioning of epitopes (Nezafat et al., 2014). Peptides used for the vaccine construction are generally poorly immunogenic when used alone, therefore, requires adjuvants to boost up the immune response (Li et al., 2014). The human b-defensin-3 (UniProt accession number: Q5U7J2) was chosen as an adjuvant in our study, because it is widely expressed in the oral cavity and exerts strong antibacterial and immunomodulatory activities (Brancatisano et al., 2011). This adjuvant is also one of the widely accepted adjuvant for vaccine formulations. The final CTL, HTL, and LBL epitopes were selected and linked together with the help of Ala-Ala-Tyr (AAY), Gly-Pro-Gly-Pro-Gly (GPGPG), Lys-Lys (KK) linkers, respectively (Dorosti et al., 2019;Gu et al., 2017).

Analysis of the physicochemical properties, antigenicity and allergenicity
To indicate the basic properties of a protein, there is need for the prediction of physicochemical properties. In this study the ProtParam web-server (http://web.expasy.org/protparam/) was used for this prediction using our primary protein sequence (Gasteiger et al., 2005). Also, Antigenicity (VaxiJen v2.0), Allergenicity (AllerTop v2.0) and solubility (SOLpro, available at http://scratch.proteomics.ics.uci.edu/) of the vaccine protein upon overexpression in E. coli (Magnan et al., 2009) were also predicted 2.9. Secondary structure prediction of the designed vaccine The two-dimensional (2D) structural features such as a-helix, b-turn and random coils of the construct were determined by the online server, PSIPRED v4.0 (http://bioinf.cs.ucl.ac.uk/ psipred/). This server is one of the most widely used servers to predict the secondary structures of proteins. This server utilizes two feed-forward neural networks to perform an analysis on the output obtained from PSI-BLAST. The PSI-BLAST output is generated to predict the secondary structure of the query protein with high accuracy (Buchan & Jones, 2019). While analyzing the secondary structure of the designed vaccine, all the parameters were kept at their default values.
2.10. Tertiary structure: prediction, refinement and validation of the designed vaccine

Prediction
The tertiary structure of the final subunit vaccine was predicted using the RaptorX server at http://raptorx.uchicago.edu/, which operates based on three steps i.e. single-template threading, multiple-template threading, and alignment quality prediction. The server predicts 3D protein model and provides some confidence scores to evaluate the quality of predicted models.
These confidence scores that provide a clear indication to the relatively good model (K€ allberg et al., 2017).

Refinement
To further improve vaccine tertiary structure, the generated PDB file from the RaptorX server was then submitted to the Galaxy refine server (http://galaxy.seoklab.org/cgi-bin/submit. cgi?type=REFINE) for refinement (Heo et al., 2013). This server initially rebuilds sidechains, then sidechain repacking and consequently uses the molecular dynamics (MD) simulation to implement repeated cycles of structural perturbation with subsequent relaxation (Lee et al., 2016). The GalaxyRefine website provides the RMSD, Ramachandran favored score, and other scores. The refined structure was downloaded and the selected structure was identified by analyzing the energy score of the highest RMSD and Ramachandran favored value. The refined and identified structure was visualized using the PyMOL v2.3.4 software (DeLano, 2002).

Validation
Structural validation is a process to identify potential errors in the predicted tertiary structure (Khatoon et al., 2017). The refined vaccine protein was validated using four different validating tools i.e. Ramachandran plot, ERRAT and VERIFY 3D available at https://saves.mbi.ucla.edu/, and QMEAN analyzer (https://swissmodel.expasy.org/qmean/). At first, the refined structure was submitted as PDB file to PROCHECK server for Ramachandran plot analysis. The Ramachandran plot is an easy way to visualize energetically allowed and disallowed dihedral angles psi (w) and phi (/), i.e. torsional angles of amino acid and is calculated based on van der Waal radius of the side chain. The result from Ramachandran plot includes the percentage and number of residues in most favored, additional allowed, generously allowed, and disallowed region, which defines the quality of modeled structure (Laskowski et al., 1993). Thereafter, the distribution pattern of the different atoms in the peptide vaccine was evaluated by using ERRAT. This web-based tool can statistically analyze the non-bonded atomatom interactions and provides overall quality factor for a query protein (Colovos & Yeates, 1993). After that, VERIFY3D server, available at the same web module, was used to determine the compatibility of the 3D structure of the protein with its own amino acid sequence (1D) by analyzing its features such as loop structure, beta-sheet, coil structure, polar, non-polar regions etc. If 80% of the amino acids of a query protein has the average 3D-1D score > ¼ 0.2, then this value is considered to be a desired value and represents compatibility of the 3D structure with its 1D structure (Bowie et al., 1991). Finally, QMEAN tool was utilized to further evaluate the quality of the vaccine protein. QMEAN determines the quality of a protein by analyzing its different geometrical aspects (Benkert et al., 2011;Muhammad et al., 2020). All these tools were used to analyze the protein quality of the designed vaccine two times, once after the protein refinement and secondly, after simulating only the vaccine protein. The validation was done two times to find out and compare the improvement in the protein quality.

Conformational B-cell epitope prediction of the constructed vaccine
The conformational (discontinuous) B-cell was predicted using ElliPro tool of IEDB server at http://tools.immuneepitope.org/toolsElliPro/. The server uses three diverse algorithms i.e. residues protrusion index (PI), adjoining clustering residues liable upon PI, and approximation of protein shape to predict the possible regions of the protein where the conformational B-cell epitopes can be found (Ponomarenko et al., 2008).

Multi-epitope vaccine protein disulfide engineering
In protein disulfide engineering, disulfide bonds were generated with disulfide by design (v2.13) web tool (http://cptweb. cpt.wayne.edu/DbD2), using the refined 3D structures of the constructed vaccine to identify the residue pairs capable of undergoing disulfide bond formation. When engineering the disulfide bonds, the intra-chain, inter-chain and C b for glycine residue, were selected and the v 3 and C a -C b -S c angle were kept at À87 or þ97 ± 30 and 114.6 ± 10, respectively, while potential residue pairs were selected for mutation at the benchmark of <2.20 kcal/mol (Craig & Dombkowski, 2013).

Molecular docking between the vaccine and tolllike receptor (TLR)
We docked our refined vaccine (as ligand) with toll-like receptors (as receptor). Our selected TLR is TLR-4 (PDB ID: 4G8A) as it recognizes viral single-stranded RNAs. Finally, the binding affinity between the multi-epitope vaccine and TLR-4 receptor was calculated through the use of ClusPro v2.0 (https://cluspro.bu.edu/login.php). This server completed the task in three consecutive steps i.e. rigid body docking, clustering of lowest energy structure, and structural refinement by energy minimization. The best-docked complex was selected based on the lowest energy scoring and docking efficiency (Kozakov et al., 2013Vajda et al., 2017). The ClusPro server calculates the energy score based on the following equation: Thereafter, the docking was again performed by PatchDock (https://bioinfo3d.cs.tau.ac.il/PatchDock/) and later refined and re-scored by FireDock server (http://bioinfo3d.cs. tau.ac.il/FireDock/). The FireDock server ranks the docked complexes based on their global energies and the lower score represents the better result (Atapour et al., 2019;Duhovny et al., 2002;Schneidman-Duhovny et al., 2005). Finally, the docking was performed using the HawkDock server (http://cadd.zju.edu.cn/hawkdock) and the MM-GBSA analysis was conducted. The docked vaccine-TLR-4 was visualized by the Discovery Studio Visualizer (BIOVIA, 2015).

Molecular dynamics simulation
Molecular Dynamics (MD) simulation was carried out for the vaccine-TLR-4 docked complex using the GROMACS version 2020.3 dynamics simulation package (Abraham et al., 2015). AMBER99SB-ILDN force field (Lindorff- Larsen et al., 2010) was used with the TIP3P water model and 0.15 M KCl was added to the system. Some additional ions were also added to the system to neutralize the total system charge. To relax the structure and avoid steric clashes in further simulations, potential energy minimization was performed with the step size 1 fs to the maximum force 1000.0 kJ/mol/nm. Thereafter, the pressure and the temperature of the system were equilibrated to 1 atm and 310 K by running NVT and NPT simulations (100-ps each), respectively. Furthermore, the pressure and the temperature of the system were controlled with modified Berendsen thermostat (Berendsen et al., 1984) and Parinello-Rahman barostat (Parrinello & Rahman, 1982) with time constant tau_t ¼ 0.1 ps and tau_p ¼ 2 ps, respectively. Productive 200 ns MD simulation was carried out in the isothermal-isobaric ensemble with 2 fs time-step. LINCS (B. Hess et al., 1997) algorithm was used to constrain the bonds involving hydrogen atoms. And finally, the long-range electrostatic interactions were calculated using Particle-Mesh Ewald summation scheme (Darden et al., 1993). After the simulation of the vaccine-TLR-4 complex, the 200 ns simulation was further conducted (using the same principle) on the vaccine protein alone to compare the stability between the single vaccine structure and the docked complex. Thereafter, the tools from 2.10 subsection were again used to validate the simulated vaccine protein to find out any noticeable improvement in the protein quality.

In silico immune simulation of the designed vaccine
Profiling the immune response of the multi-epitope vaccine designed, in silico tool, C-ImmSim (http://150.146.2.1/C-IMMSIM/index.php?page=1) was employed (Rapin et al., 2011). This immune simulator uses a machine-learning technique and position specific scoring matrix (PSSM) for the prediction of epitope prediction and immune interactions, respectively (Rapin et al., 2010). The default constraints for simulation were employed during conducting the prediction except for one parameter, the simulation step which was set to 1050. One step of the simulation is equivalent to eight hours (8 h) of realtime, allowing immune response modeling for about 350 days (i.e. (1050 Â 8 h)/(24 h)). The minimum suggested interval between dose 1 and dose 2 is 4 weeks, according to most vaccines in current use (Castiglione et al., 2012). Our prediction is based on three injections which might be given four weeks apart to generate efficient and long-lasting immune response.

Codon adaptation and in silico cloning
Codon optimization which is essential as un-adapted codon may lead to the minor expression rate in the host. Codon optimization was conducted for the maximum expression of the vaccine in the host using the Java Codon Adaptation Tool (JCat), available at http://www.jcat.de/, with the aim of boosting the vaccine translational rate in E. coli K12 strain and the whole operation is carried out by avoiding the following three criteria: (1) rho-independent termination of transcription, (2) binding sites of the prokaryotic ribosome and (3) restriction enzymes cleavage sites (Grote et al., 2005). Codon adaptation index (CAI) value and Guanine-Cytosine (GC) content of the adapted sequence was obtained and compared with the ideal range (Sharp & Li, 1987). Finally, the obtained refined nucleotide sequence was used for in silico cloning into the pET28a (þ) expression vector. The whole in silico cloning operation was executed in SnapGene v5.2 software, available at https:// www.snapgene.com/try-snapgene/ (Goldberg et al., 2018).

Analysis of the vaccine mRNA
Prediction of the secondary structure of mRNA was made using RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebS uite/RNAfold.cgi) webserver. This tool predicts thermodynamically generated minimal free energies of the query mRNA structures (Lorenz et al., 2011). To analyze the mRNA folding and the secondary structure of the vaccine, the improved DNA sequence from the JCat server was first converted into a potential DNA sequence by DNA<->RNA->Protein at http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm. The RNA sequence was then pasted in the RNAfold servers for a lower minimum free energy calculation using the default parameters.

Protein retrieval and antigenic prediction
The retrieved sequences of GP and NP are 490 and 569 amino acids in length, respectively. Also, Vaxijen v2.0 server revealed that the structural proteins are highly antigenic with values of 0.4933 and 0.4610, respectively which are above the threshold set at 0.4 (Table 1).

Potential CTL epitopes
A total of 20 unique CTL epitopes, each with a length of 9 amino acids, were predicted using the proteins, exploiting the NetCTL v1.2 server. Assessment revealed that only one CTL epitope each from both structural proteins followed the selection criteria of being antigenic, non-allergenic, nontoxic, non-human homolog (e-value ! 0.05) and 100% conserved (Supplementary Table S1) and therefore, these two epitopes were selected for the final vaccine construction based on the antigenicity score (Table 2).

Potential HTL epitopes
We selected a total of 20 HTL epitopes from both proteins, with a length of 15 amino acids (Supplementary Table S2). Among them, only 4 HTL epitopes (glycoprotein) and 4 HTL (nucleoprotein) were able to induce the evaluated three types of cytokines, such as IFNc, IL4 and IL10 and also met the selection criteria of antigenicity, non-allergenicity, nontoxicity, non-human homology and 100% conservancy which are incorporated into the final vaccine construction (Table 3).

Linear B-lymphocyte epitopes prediction
Preliminary analysis revealed a total of 19 LBL epitopes and 7 of those epitopes (Supplementary Table S3) met the criteria such as antigenic, non-allergenic, nontoxic and nonhuman homology. Therefore, the final vaccine was constructed using only these seven (7) B-cell epitopes (Table 4). Supplementary Figure S1 shows the visual representation of the 100% conservancy of the finally selected epitopes (most promising epitopes) among the protein sequences of the selected isolates.

Prediction of transmembrane topology and human homology
The topology prediction of the selected epitopes showed that more of the epitopes might reside in the interior of the cell membrane and the human homology shows that the selected epitopes are non-homolog (Supplementary Table S4).

Population coverage analysis
The population coverage analysis has revealed that a very good portion of the population of the LASMV prevalent African regions possesses the HLA alleles, so the epitopes should be effective in controlling the infection of the virus. Nigeria had the highest MHC-I coverage (88.73%), whereas Senegal had the highest MHC-II coverage of 85.23%. Nigeria was also found to have the highest coverage of MHC-I and MHC-II combined coverage of 91.23% (Supplementary Figure S2).

Vaccine construction
The vaccine construct was formulated using the previously selected 17 epitopes belonging to three different classes (2CTL, 8HTL and 7LBL). The epitopes were merged by AAY, GPGPG and KK linkers, respectively (Figure 1). The human beta defensin-3 adjuvant was added ahead of the construct to improve the immunogenicity, linked by EAAAK linker. The final vaccine constructed was found to be 403 amino acids long.

Physicochemical properties and antigenicity, allergenicity, solubility assessment
The analysis showed that the molecular weight of the vaccine protein was 44,155.09 Da (44 KDa), with a predicted isoelectric point (pI) of 10.48. The total negative amino acid residues were of 29 (Asp þ Glu) and the total positive amino acid residues were 80 (Arg þ Lys). At the same time, other properties such as aliphatic index was 77.47 and grand average of hydropathicity (GRAVY) was À0.671. In addition, physicochemical features and the immunological potency of the construct were evaluated. Also, the antigenicity of the construct was found to be 0.6423 (antigenic) with no allergenicity and a solubility score of 0.984499 (Table 5).

Secondary structure prediction
The secondary structural features i.e. a-helix, b-strand and random coil of the final vaccine construct were analyzed by PRISPRED server. According to the server, the vaccine had 28.28% of the amino acids in the a-helix conformation and 17.86% and 53.84% of the amino acids in b-strand and coil structure conformations, respectively. The percentages of amino acids in a-helix, b-strand and random coils are listed in Supplementary Table S5 and illustrated in Supplementary Figure S3.

Tertiary (3D) structure assessment
From RaptorX server, 100% amino acid residues were modeled as five domains. The protein structure was used as the best template for modeling, as model 3 with a high estimated RMSD of 16.385 Å was selected. In refinement, there was an increase in the number of residues in the favored region, generating five refined models. The best refined model (1) showed the GDT-HA score of 0.8722, RMSD value of 0.610, MolProbity score of 3.049, and rotamers score 2.1, which indicated that the quality of the refined model was high compared to the raw structure ( Figure 2). In validation, the Ramachandran plot analysis by PROCHECK server revealed that 89.7%, 9.7%, 0.0% and 0.6% of the structure was located in the most favored region, additional allowed regions, generously allowed regions and disallowed regions, respectively (Supplementary Figure S4), confirming that the predicted model is of good quality. The protein generated an overall quality factor of 82.4607 by the ERRAT tool and according to VERIFY3D, 64.36% of the amino acids were found to have the averaged 3D-1D score > ¼ 0.2. Finally, QMEAN4 score, generated by QMEAN server was predicted to be À1.38 of the refined vaccine. The QMEAN tool also showed that the quality of the refined vaccine might be within the range of non-redundant set of PDB structures which was desired for a protein with acceptable quality.

Multi-epitope vaccine conformational B-cell epitope prediction
The conformational B-cell epitopes as predicted with ElliPro on the refined tertiary structure shows four potential regions of the vaccine with a total of 210 residues with scores varying from 0.612 to 0.808 were predicted (Supplementary Table S6 and Supplementary Figure S5).

Protein disulphide bridging for vaccine stability
The DbD2 server identifies the pairs that have the capability to form disulfide bonds based on the given selection criteria. However, in this experiment, total of 41 pairs of residues were generated and we selected only those pairs whose bond energy value were less than 2.2 kcal/mol. Therefore, only one pair of amino acids were found to generate a disulfide bond within the mentioned criteria: 191 ILE-206 SER (1.86 Kcal/mol) (Supplementary Figure S6).

Molecular docking between immune receptor and vaccine
Docking is the in-silico process of simulating the binding of the vaccine with TLRs to predict how the vaccine would bind to TLRs via in vivo process. The designed vaccine generated À1035.6 Kcal/mol with TLR-4 when docked using the ClusPro v2.0 server which indicates a very good binding affinity. On the other hand, according to the PatchDock and FireDock servers, the model having the global energy score of À40.57 Kcal/mol. And the MMGBSA study by the HawkDock server generated a good MM-GBSA score of À88.37 Kcal/mol. The 3D and 2D of the docked protein-protein interaction is illustrated in Figure 3.

Molecular dynamics simulation for TLRsvaccine complex
The MD simulation was conducted on the designed vaccine and TLR-4 docked complex to analyze its stability within a biological environment. In the simulation, the trajectory simulation was performed for 200 ns. Since the periodic boundary conditions were met during the simulation, the recentering of protein molecules in the complex and their return to the simulation cell was carried out. After that, an initial analysis of the trajectories was performed, including the calculation of RMSD, radius of gyration and RMSF. The corresponding graphs are shown in Figure 4. The RMSD changes are depicted in Figure 4(a), where it can be found that the RMSD stabilizes around 0.8 to 0.9 nm. Considering the size of the simulated system, this indicates the very high stability of the docked complex. Figure 4(b) depicts the radius of gyration tends of the docked complex which is found to be stable between 4.5 and 4.6 nm. This phenomenon indicates that the compactness of the system does not change significantly during simulation. The RMS fluctuations for the C-alpha atoms of the complexes are shown in Figure  4(c). The graph is divided into two parts. The blue line shows the RMSF for the receptor atoms, the red line for the vaccine atoms. The graphs show that the mobility of the receptor atoms is lower than the mobility of the vaccine atoms. Although the vaccine molecules showed less stability than the receptor atoms, however, both of these structures might be sufficiently stable to generate substantially feasible response within the biological environment. Thereafter, when the simulation was carried out only on the vaccine protein for 200 ns, it was predicted that the vaccine might be less stable than the docked complex. The RMSD of the single vaccine protein stabilized around 1.4 to 1.6 nm and the radius of gyration was predicted to be stable around 2.50 nm. And the vaccine protein showed greater fluctuation in the RMSF analysis than the docked protein complex which was also reported in the graph of RMSF simulation of the docked vaccine-TLR-4 complex (Figure 4(c)). So, from the simulation study, it can be concluded that the vaccine-TLR-4 docked complex structure might be more stable within the biological environment than the original vaccine structure itself. After the simulation, the quality of the simulated vaccine protein was again determined and compared with the refined vaccine protein. From Supplementary Table S7, it is clear that the structure had improved significantly as predicted by all the tools. According to the Ramachandran plot analysis, the percentage of amino acids in the most favored region was found to increase from 89.7% to 93.3%. Again, the overall quality factor of the protein was also predicted to reach 83.5777 from 82.4607, as determined by the ERRAT tool. Furthermore, the percentage of the amino acids having averaged 3D-1D score > ¼ 0.2 also increased up to 85.11%. Finally, the QMEAN server also pointed to the fact that the simulated protein had better protein quality than the refined vaccine protein, with relatively lower negative QMEAN4 score of À1.46. So, it can be concluded that according to all the quality checking tools, the simulated vaccine protein had better quality than the refined vaccine protein.

In silico immune response simulation
The simulated immune response was compatible with actual immune responses ( Figure 5). According to Figure 5(a), high immunoglobulins titer was generated after the first and second vaccine injections. The rise in the concentration of IgM was characterized at the primary response. The titer of various immunoglobulin activities (i.e. IgG1 þ IgG2, IgM and IgG þ IgM antibodies) was kept within a significant level during simulation period even after when the vaccine level declined. Following vaccine administration, there was a sharp increase in B-cells population including memory B-cells, suggesting the potential for isotype switching and memory formation ( Figure 5(b)). Based on Figure 5(c), the B-cells were active during simulation period with an increase in cells duplication and antigen presentation following vaccine injections. The level of TH (helper) and TC (cytotoxic) cell populations with the respective memory development, was also found to raise significantly as presented in Figure 5(d-f). As seen in Figure 5(g), there was an increase in macrophages activity and antigen presentation at vaccine injection timesteps. Lastly, there was a remarkable rise in the level of interferon gamma (INF-c), and moderate increase in interleukin-2 (IL-2) as observed in Figure 5(h). The moderate increase in IL-2 level was observed after the third vaccine injection. Besides, a lower Simpson index (D) was also found which indicates the greater diversity of the designed vaccine. Figure 3. The molecular docking of the designed vaccine with TLR-4 with their interacting amino acids. Here, the golden colored ribbon structure represents the designed vaccine and the variable colored ribbon structure represents the TLR-4 receptor structure.

Vaccine codon optimization and in silico cloning
In this step, the adapted nucleotide sequence of the vaccine construct was found to have the codon adaptation index (CAI) value of 0.966, falling within the range of 0.8-1.0, indicating a high level of expression in the K12 strain of E. coli. Meanwhile the Guanine-Cytosine (GC) content was recorded as 51.48%, which lies within the acceptable range of 30-70%, for effective translational efficiency. The predicted DNA sequences of the vaccine were cloned into pET28a (þ) cloning vector plasmids using the SnapGene software, using the EcoRI and BamHI restriction sites, as the start and end cut points, respectively. The newly constructed cloned plasmid was found to be 6585 base pairs long, including the constructed DNA sequence of the vaccines. The multi-epitope vaccine sequence is represented in red with the restriction sites. The plasmid contains 6x-histidine residue and thrombin sequences, which will aid in the final purification and downstream processing of the designed vaccine ( Figure 6).

MRNA prediction of the designed vaccine
Finally, after transformation of the optimized DNA sequence from the JCat server to the mRNA sequence, the secondary structure of the mRNA sequence of the vaccine was predicted by the RNAfold server, which generated a minimum free energy score of À393.20 kcal/mol. The interactive drawing of the minimum free energy structure is presented in Supplementary Figure S7.

Discussion
Countries mostly in West Africa like Nigeria, have always recorded increased cases of LASMV, as observed during a season known as 'Harmattan' in Nigeria. The development of a potential vaccine is needed to curb the spread of this virus. Vaccines are one of the most important pharmaceutical products, but their development and production processes are costly and sometimes it takes many years to achieve a proper vaccine candidate against a particular pathogen (Sarkar et al., 2020b). Again, sometimes it takes a lot of time to develop an efficient attenuated vaccine (Li et al., 2014). For this reason, scientists are now working on subunit vaccines with the help of in silico approaches. Immunoinformatics has been shown to be useful in predicting antigenic peptide B-cell and T-cell epitopes for peptide vaccine development (Jabbar et al., 2018). There are numerous limitations in the context of appropriate candidate antigens, their immunodominant epitopes and experimental methods, which include the development of an effective delivery system (Channappanavar & Perlman, 2020;Yin et al., 2016). Investigation of the whole spectrum of probable antigens is achievable through immunoinformatics and with the aid of molecular modeling to analyze the potential binding with host proteins (de Oliveira et al., 2020;Jabbar et al., 2018;Mirza et al., 2016;Shahid et al., 2020).
The structural proteins help viruses to invade and assemble viral particles in the host (Vogel et al., 2019). It has been shown that the combination of a long list of antigens can increase the effectiveness of vaccines (Rossi et al., 2004). Malfertheiner et al. (2008) showed that if a vaccine is highly antigenic, then it can induce long-lasting humoral and cellular responses to the target antigens. Our selected structural proteins i.e. GP and NP were found to be antigenic, since their antigenicity values were above the threshold of 0.4, making them capable to induce strong immune response within our body. Thereafter, the potential T-cells (CTL/HTL) and linear B-cell epitopes were identified, as they are required in the construction of multi-epitope vaccine, which can induce both humoral and cellular immune responses, simultaneously (Chauhan et al., 2019). The HTL (also known as CD4 þ , since it found on its surface) activates the B-cells, causing the production of memory B-cell and antibody producing plasma B-cell. The HTL epitopes activate macrophage and CD8 þ CTL which destroy the target antigen or the target infectious pathogen (Goerdt & Orfanos, 1999;Pavli et al., 1993;Tanchot & Rocha, 2003). CTLs represent one of several types of cells of the immune system that have the capacity to kill other infectious cells directly (Al-Shura, 2020). They go right away inside the virus and play an important role in the host defense mechanism. Several authors reported that in order to produce a desired immune response, the epitopes must be accessible to both MHC I and MHC II molecules along with the B-cell (Patra et al., 2019). B-cell epitopes are essential to induce humoral or antibody mediated immunity. B-cells consist of groups of amino acids that interact with the secreted antibodies and activate the immune system to destroy the pathogens (Sarkar et al., 2020a). With the help of immunoinformatics approach, we were able to identify total 2 CTL, 8 HTL and 7 LBL epitopes from both structural proteins. These epitopes were the most promising epitopes and they were selected for the final vaccine construction. Also, all of the HTL epitopes were found to be strong cytokine inducers. The non-human homology of the epitopes was determined to find out the non-homolog epitopes, as the homolog epitopes might not be able to induce potent immune response. The homolog epitopes might also induce autoimmune diseases (Sarkar et al., 2020b). The population coverage analysis showed that a good portion of the African population possessed the selected MHC alleles and therefore, the population coverage analysis of the MHC alleles was also quite satisfactory among the target population.
Our multi-epitope vaccine was designed by connecting all the selected CTL, HTL and LBL epitopes in a successive manner with the help of suitable linkers (AAY, GPGPG and KK linkers). To enhance the expression, folding and stabilization, linkers were implemented as an indispensable element in the development of vaccine protein (Shamriz et al., 2016). The AAY linker was used in conjugating the CTL epitopes to construct the vaccine because this linker influences protein stability, reduce immunogenicity and enhance epitope presentation (Abdellrazeq et al., 2020;Borthwick et al., 2020). On the other hand, the GPGPG linker was used to conjugate HTL epitopes since it prevents the formation of 'junctional epitopes' and facilitates the immune processing, where the bilysine KK linker helps to preserve their independent immunogenic activities of the vaccine construct (Khan et al., 2019). When epitopes of T-and B-cells are used alone, multiepitope-based vaccines are poorly immunogenic and require coupling to adjuvants (Meza et al., 2017). Adjuvants are ingredients added to vaccine formulations that enhance immune response to antigens and affect their development, stability, and longevity (Lee & Nguyen, 2015). Therefore, our chosen adjuvant, human b-defensins-3, was added at N-terminus of the vaccine using EAAAK linker, as this particular linker has a rigid nature and can connect the adjuvant to the peptide vaccine with adequate separation and minimal intervention (Chen et al., 2013). It has been established that human b-defensins have an important role in presenting the microbial peptides to antigen presenting cells and the inflammatory response, thus enhancing the immunogenicity of the bound antigen; therefore, b-defensins can be used as adjuvants (Kohlgraf et al., 2010;Park et al., 2018;Weinberg et al., 2012). Finally, the designed vaccine sequence was found to be 403 amino acids long, which is appropriate for such a vaccine. Although, several studies had been done where the vaccine length is even longer but this size of our designed vaccine can be easily synthesized and purified during mass production (Chatterjee et al., 2018;Kalita et al., 2019;Rahmani et al., 2019).
The evaluation of various physicochemical properties is essential for determining the safety and efficacy of the candidate vaccine (Dey et al., 2014). Our designed vaccine has high molecular weight, as some substances like viruses and certain synthetic polymers should have very high molecular weights. In fact, the molecular weight of protein plays a vital role in the prediction of protein structure (Reji, 2013). The theoretical pI exhibits its alkaline nature (pI > 7) due to the presence of basic amino acids i.e. lysine and arginine. These basic amino acids are involved in membrane protein activity and recognition of membrane voltages (Li et al., 2013). An isoelectric point value greater than 7 indicates that the overall charge of the peptide is positive. The isoelectric point is defined as the pH value of a solution at which the protein possesses zero net charge (Bunkute et al., 2015). The aliphatic index which is the relative volume occupied by aliphatic side chains such as valine, isoleucine, alanine and leucine (Chukwudozie et al., 2020) is high (>70), suggesting that the vaccine is thermostable in nature, as higher the aliphatic index of a protein, greater is its thermostability (Gasteiger et al., 2005). The Grand Average hydropathy (GRAVY) of the structural proteins are calculated as the sum of hydropathy values of all amino acids, divided by the number of residues in the sequence. The GRAVY value of the designed vaccine protein was predicted to be negative, which pointed to the fact that the vaccine is hydrophilic in nature (Gasteiger et al., 2005). Solubility is counted as a vital characteristic of any recombinant vaccine (Khatoon et al., 2017). Hence, the solubility of our vaccine construct as predicted was 0.984499 (<1.0) showing a very good solvability upon over-expression in E. coli by the SolPro server. In the antigenicity and allergenicity analyses, the construct vaccine was found to be antigenic as well as non-allergenic. Hence, the vaccine might generate robust immune response as well as cause no allergic reaction within the body.
In the 3D structure, for prediction, the model with higher RMSD was selected indicating that the vaccine has better quality 3D structure from RaptorX server. Structural prediction expresses the structure of vaccine candidate, but the system requires refinement and validation to predict a structure which closely resembles the native structure of the vaccine protein (Pandey et al., 2018). The refinement of 3D models of proteins has emerged as the last milestone of the structure prediction journey to reach parity with experimental accuracy (Heo et al., 2013;Khoury et al., 2017). For validation, about 90% of the amino acid residues were in the favorable region of the Ramachandran plot analysis by the PROCHECK server. The ERRAT tool predicted the structure to be of good quality with the overall quality score of more than 80 as the generally accepted range of the overall quality score generated by ERRAT is >50 for a high quality model (Messaoudi et al., 2013). With the negative value, QMEAN4 also generated fairly acceptable result. However, according to VERIFY3D, the quality of the protein structure was not so good. After the simulation, the quality of vaccine protein improved quite significantly as determined by all the quality checking tools.
After validating the 3D structure of the vaccine, the conformational B-cell epitopes of the vaccine were determined. From our prediction, we discovered four protruding regions in antigen surfaces of the designed vaccine, with good scores ranging from 0.612 to 0.808 (Ponomarenko et al., 2008). Thereafter, the probable disulfide bonds of the designed vaccine were searched using the refined 3D vaccine structure. The vaccine gave a prediction of disulfide bond with only one possible mutant. Disulfide bonds are covalent interactions that stabilizes molecular interactions and provide considerable stability by confirming precise geometric conformations (Craig & Dombkowski, 2013).
Molecular docking is a computational method which involves the interaction between a ligand molecule and the receptor molecule to provide a stable adduct (Lengauer & Rarey, 1996). Toll-Like Receptors (TLRs) aids in antiviral responses by triggering the production of antiviral cytokines such as type-I interferons (Boehme & Compton, 2004). Our multi-epitope vaccine was docked with TLR-4 which is one of the commonly found TLRs on the surface of the cells. Several online servers, ClusPro 2.0, PatchDock & FireDock and HawkDock servers, were used for the docking analysis to boost the precision of our forecast. All the servers pointed toward a strong interaction between TLR-4 and the designed vaccine. The MD simulation was conducted for vaccine-TLR-4 complex to simulate a biological environment for the complex and analyze the physical orientations of the atom for a fixed length of time to analyze the view of dynamic evolution of the system. The simulation was run for 200 ns and it generated satisfactory results. In the RMSF study, the complex structure was found to be stable around 0.8 to 0.9 nm which confirms fair stability of the complex. This result is in agreement with the radius of gyration study, where the structure was predicted to remain stable between 4.5 and 4.6 nm. And in the RMSF study, although the vaccine structure showed more fluctuations than the receptor, these fluctuations don't significantly contribute to the instability of the docked complex within the biological environment. The locations of most of the hinges in the structure were not so significant and therefore, it might be stable with lower degree of deformation at each individual amino acid. And the 200 ns simulation study of the vaccine protein alone showed that it might be less stable with greater fluctuation within the natural, biological environment than the vaccine-TLR-4 docked complex. So, it can be deducted that the designed vaccine might work effectively when interacting with the immune molecules.
An immune simulation was performed to observe the optimal behavior and cell density parameters for successful target clearance and find the best immunological response against the pathogen. The augmentation of memory B-cells and T-cells were visible. Also, after repeated antigen exposure, helper T-cell concentrations were found to elevate and therefore, this phenomenon causes effective Ig production that support a humoral response. The activity of macrophages was also satisfactory, as there was an increase in the macrophage concentration, indicating a very good antigen presentation (Sarkar et al., 2020b). Sustained generation of INF-c and IL-2 are seen after immunization because of expanded aide T-cell initiation. In this manner, the vaccine effectively simulated a humoral immunological response to increasing immunoglobulin creation (Almofti et al., 2018;Carvalho et al., 2002). Moreover, the augmentation in the cytokine profile was also reported which represents the possibility of the vaccine to produce the broad-spectrum immunity against the viral invasions (Hoque et al., 2019;Kambayashi & Laufer, 2014;Shey et al., 2019). The Simpson index represents the diversity of something, taking into account its total amount or number and also its abundance. A lower Simpson index describes greater diversity and thus higher amount and abundance of a query entity or material. In our experiment, a negligible Simpson index (D) was achieved which pointed to the fact that the vaccine would be able to activate and produce a wide number of immune molecules such as IL-6, IL-10, TGF-alpha, IL-23, etc. with high abundance (Figure 5(h)) (Rapin et al., 2010). In summary, the best booster doses effect was seen after the third dose.
The immunoreactivity testing through serological assessment is one of the first steps in validating a candidate vaccine (Gori et al., 2013). The expression of the recombinant protein in a suitable host is required, E. coli expression systems are determined for recombinant protein manufacturing (Chen, 2012;Rosano & Ceccarelli, 2014). In the codon adaptation, the obtained results were found to be satisfactory with the CAI value (0.966) and GC content (51.48%) meeting the criteria such as determining CAI over 0.80 and GC content to be within 30-70%, therefore; these scores were considered to be good scores in this study (Khatoon et al., 2017;Morla et al., 2016;Sarkar et al., 2020a;Shey et al., 2019). Furthermore, the optimized vaccine DNA sequence from JCAT was cloned into E. coli pET28a (þ) vector EcoRI and BamHI restriction sites with a total length of the clone was 6.5Kbp. The newly constructed recombinant plasmid contained the sequences for 6x His tag, so the vaccine protein is expected to be expressed in fusion with this tag, which might facilitate its purification purpose in the later stages. In addition, during the stability prediction of the mRNA secondary structure of the vaccine, the RNAfold server produced a negative and lower minimum free energy, so it can be claimed that the anticipated vaccine could be very stable after transcription in vivo (Hamasaki-Katagiri et al., 2017).
Since the immunoinformatics approach was utilized in this study, therefore, the basic steps of this novel method were exploited to determine the vaccine construct. There are also other studies, where these steps have been used. Tosta et al. designed an epitope-based vaccine against the Yellow fever virus where they have used this in silico method to design their vaccine construct. However, in our study, we have applied some more parameters to generate more robust predictions. For example, Tosta et al. only predicted the IFN-c producing ability of their HTL epitopes, whereas we have predicted IL-4 and IL-10 inducing capability along with the IFN-c producing ability. Again, our vaccine construct was simulated for 200 ns, whereas Tosta et al. used a 10 ns simulation to determine the stability of their designed vaccine. In this case, our stability validation could be declared as more reliable than the previously mentioned study (Tosta et al., 2021). Although there are some differences, however, the fundamentals of immunoinformatics have been maintained properly in both studies. Overall, all the experiments employed in this study pointed toward the fact that our designed vaccine should be an effective option to combat the LASMV infection.

Conclusion
Lassa virus, 'a silent but potent killer' has caused series of death in Africa and other parts of the world. The development of vaccine to tackle this has been of concern over the years. In this research, our potential vaccine was designed via in silico methods with bioinformatics and immunoinformatics approaches on two structural proteins coding for multiple T-cell (HTL and CTL) and B-cell epitopes and also mediate entering of the virus into the host cell. The designed vaccine seems to be efficient, stable and safe as indicated by docking analysis, simulation analyses, immune response, codon optimization and in silico cloning. With the promising computational results obtained, notwithstanding, experimental validation of the designed multi-epitope vaccine is required to further verify the effectiveness of this potential vaccine candidate which can be achieved via wet lab analysis (in vitro and in vitro design). The application of computational tools in vaccine designing can significantly enhance the process of vaccine discovery and accomplish the goal of research and development in less time and with much less costs.