Designing of cytotoxic T lymphocyte-based multi-epitope vaccine against SARS-CoV2: a reverse vaccinology approach

Abstract SARS-CoV2 is a single-stranded RNA virus, gaining much attention after it out broke in China in December 2019. The virus rapidly spread to several countries around the world and caused severe respiratory illness to humans. Since the outbreak, researchers around the world have devoted maximum resources and effort to develop a potent vaccine that would offer protection to uninfected individuals against SARS-CoV2. Reverse vaccinology is a relatively new approach that thrives faster in vaccine research. In this study, we constructed Cytotoxic T Lymphocytes (CTL)-based multi-epitope vaccine using hybrid epitope prediction methods. A total of 121 immunogenic CTL epitopes were screened by various sequence-based prediction methods and docked with their respective HLA alleles using the AutoDock Vina v1.1.2. In all, 17 epitopes were selected based on their binding affinity, followed by the construction of multi-epitope vaccine by placing the appropriate linkers between the epitopes and tuberculosis heparin-binding hemagglutinin (HBHA) adjuvant. The final vaccine construct was modeled by the I-TASSER server and the best model was further validated by ERRAT, ProSA, and PROCHECK servers. Furthermore, the molecular interaction of the constructed vaccine with TLR4 was assessed by ClusPro 2.0 and PROtein binDIng enerGY prediction (PRODIGY) server. The immune simulation analysis confirms that the constructed vaccine was capable of inducing long-lasting memory T helper (Th) and CTL responses. Finally, the nucleotide sequence was codon-optimized by the JCAT tool and cloned into the pET21a (+) vector. The current results reveal that the candidate vaccine is capable of provoking robust CTL response against the SARS-CoV2. Communicated by Ramaswamy H. Sarma


Introduction
SARS-CoV2 or 2019-nCoV, officially known as coronavirus disease of 2019  was first observed in animals and individuals from the Wuhan seafood market in China has spread globally affecting millions of people Wang et al., 2020). Coronaviruses (CoVs) belong to the Coronoviridae family and were initially reported in the early 1930s. SARS-CoV and SARS-CoV2 belong to the beta coronavirus class and primarily cause respiratory infections in humans and various animals (Cheng et al., 2007). Initially, there are four major SARS-CoV2 strains under circulation -HKU1, NL63, OC43, and E229 (Roussel et al., 2020) The origins of SARS-CoV2 is unclear at the moment. However, it shares a high sequence similarity with the Bat-CoV-RaTG13 strain that infects bats in Southeast provinces of China Zhou et al., 2020;Zhu et al., 2020). Moreover, rats, pangolins, and raccoons are suspected intermediate hosts for human transmission (Wong et al., 2020;Zhang et al., 2020). SARS-Cov2 infected individuals exhibit symptoms like common cold, flu, headache, respiratory ailments, while pneumonia and ultimately fatal infections have been reported in severe cases . SARS-Cov2 has been found to rapidly spread among infants, children, pregnant women, elderly, immunesuppressed, and immune-compromised patients with ease, implying high rate of infections (Shereen et al., 2020). Majority of infected individuals show little or no symptoms, with characteristic manifestations of the disease appearing after progression to a severe state. As a result, asymptomatic spreads play a major role in the COVID-19 pandemic (Dai et al., 2020).
SARS-CoV2 is a single-stranded, positive-sense RNA virus that encodes seven accessory proteins (ORF1ab, ORF3a, ORF6, ORF7a, ORF7b, ORF8 and ORF10) ( Islam et al., 2020;Yang et al., 2020 ) and four structural proteins, namely Envelope (E), Membrane (M), Spike (S), and Nucleocapsid protein (Nguyen et al., 2021). The spike protein plays a central role in the viral cell entry and replication. Both humoral and cellular immune responses towards the viral spike protein play a major role in controlling SARS-CoV2 infection. Specific antibodies (IgG and IgM) are evident approximately two weeks post-infection (Long et al., 2020;Wu et al., 2020). Meanwhile, the increased CTL responses are also monitored during the first week of infection (Thevarajan et al., 2020). A key focus for majority of vaccines lies in inducing the humoral immune responses towards the viral glycoproteins. On the other hand, CTL immunity towards the SARS-CoV2 also plays an essential role viral clearance (Peng et al., 2020). However, the CTL mediated immunity is activated only after binding of short (9mer to 11mer) peptides to class I MHC and its subsequent interaction with T cell receptor (TCR) (Trolle et al., 2016). Hence, it is critical to reliably identify class I MHC binders that form the basis for CTL epitope vaccine design.
In the recent times, owing to the progress in computational power and techniques, vaccine design concepts can rapidly be translated into practical examples (He & Zhu, 2015). In contrast, traditional vaccine development is a time consuming and labor-intensive process. Modern immunoinformatics methods can be used to design the vaccine effectively and in a cost-efficient manner (Patronov & Doytchinova, 2013). The past three decades have witnessed the development of a wide range of algorithms for T cell epitope prediction. Early algorithms were primarily sequence-based, while modern ones powered by matrix-driven methods, motifs binding, QSAR approach, molecular docking, homology modeling, and machine-learning algorithms have a muchimproved predictive power (Tomar & De, 2010). Modern immunoinformatics provides varied resources that aid in the design of potent peptide-based vaccine (Dhanda et al., 2017). In this present study, we utilize the sequence-and structure-based approach to predict potent CTL epitopes. Moreover, these epitopes were used to construct a multi-epitope vaccine targeting SARS-CoV2.  (HLA-B Ã 58:01), and B62 (HLA-B Ã 15:01) were selected to predict all possible 8-mer, 9-mer, 10-mer, and 11-mer length epitopes using Immune Epitope Database (IEDB), respectively (Kim et al., 2012). The class I MHC binders were selected based on the percentile rank 1 with the default prediction methods.

Class I MHC peptide processing prediction
The CTL peptides undergo multiple processing steps prior to TCR presentation. The peptides cleaved by the proteasomes need to bypass multiple cytosolic proteases prior to their transport into the endoplasmic reticulum through TAP (Peters et al., 2003). Moreover, identification and mapping of a CTL epitope is critical step prior to its incorporation in a vaccine (Mohan et al., 2018). In this study, all class I MHC binders were first identified and further processed using the MHC-I processing prediction tool from IEDB to predict proteasomal processing peptides (http://tools.iedb.org/processing/) (Peters et al., 2003;Tenzer et al., 2005). The default prediction method was used for the processing, with positively predicted peptides being considered as plausible potent epitopes.

Class I MHC peptide immunogenicity prediction
The prediction of immunogenic epitopes is an essential step for CTL based epitope vaccine development. Hence, the class I Immunogenicity prediction tool provided by IEDB (http:// tools.iedb.org/immunogenicity/) was used to differentiate in between immunogenic and non-immunogenic peptides that were assigned positive values by the MHC-I processing prediction tool (Calis et al., 2013). Likewise, the prior step, default settings were utilized, while the peptides with positive predictions being assumed as potent epitopes.
2.1.5. Antigenicity, toxicity, non-allergic, non-homologous and conservancy analysis It is critical to assess the antigenicity of a predicted immunogenic epitope prior to its in vitro/in vivo assessment. Here, the plausible antigenic nature of the positive peptides were assessed using the VaxiJen v2.0 tool (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) (Doytchinova & Flower, 2007). This step was followed by a toxicity analysis using the ToxinPred tool (http://crdd.osdd.net/raghava/toxinpred/). The epitopes deemed non-toxic by ToxinPred were further assessed for their allergenicity by the AllerTop v.2.0 tool (Dimitrov et al., 2014) (https://www.ddg-pharmfac.net/ AllerTOP/). The default prediction value was set by all three prediction tools to characterize the immunogenic epitopes. In order to avoid any autoimmune responses, the epitopes were subjected to a protein BLAST analysis with the human proteomes that yielded non-homologous peptides (e value < 0.05) (Mehla & Ramana, 2016). Finally, the epitope conservancy among other SARS-CoV2 variants was studied using the IEDB conservancy analysis tool (Bui et al., 2007) (http:// tools.iedb.org/conservancy/). A sequence identity cutoff of 100% was used in the identification of highly conserved epitopes.  (Laskowski et al., 1993) were used to validate the quality of the modeled HLA-A Ã 26:01 structure.

Molecular docking of MHC peptide
The Chimera tool (Pettersen et al., 2004) was utilized to build the peptide 3 D structure and used as a ligand; whereas, the AutoDock Vina tool was used to carry out a peptide-MHC docking study (Trott & Olson, 2010). Prior to docking, the native peptide, b 2 microglobulin, co-crystallized solvents and heteroatoms were removed from all of the class HLA crystal structures using the Discovery studio 4.1 (Jain & Baranwal, 2019). The HLA allele was further processed by adding hydrogen atoms and Kollman partial charges using AutoDock tool Meanwhile, the all-rotatable bonds except the backbone amide bonds were set to be mobile for the peptides. Finally, the HLA alleles and peptides ligands were converted into PDBQT format prior to docking. The docking grid was set to an appropriate size that accommodated the entire receptor, while default settings were used for the docking (Mohammed et al., 2018). The default output (containing 10 models) depicting various interactions between the HLA allele and their respective peptides, were sorted based on their binding affinity. The output models were retrieved in the PDBQT file format and visualization in Discovery Studio viewer (https://discover.3ds.com/discovery-studio-visualizer-download).

Theoretical validation of T cell epitopes
The IEDB database provides the option to analyze experimentally validated epitopes from various organisms (Mohan et al., 2020). In this instance, putative MHC binders were searched in the IEDB database to identify the experimentally proven T cell epitopes. The query epitopes were subjected to a BLAST analysis with the exact match option to predict the experimentally validated SARS specific epitopes.

Construction of multi-epitope CTL vaccine
The CTL epitopes restricted by various HLA supertypes were selected for multi-epitope CTL vaccine construct. The potent immunogenic epitope was selected on the basis of higher binding affinity and higher number of hydrogen bonds involved in the interactions between the peptide and HLA allele which describes more stable epitopes (Chen et al., 2016). The N terminal region of the multi-epitope CTL vaccine construct was embedded by Heparin Binding Hemagglutinin Adjuvant (HBHA) and linked with CTL epitopes using EAAAK linker. Furthermore, AAY and GPGPG linkers were placed between the CTL epitopes and PADRE peptide (AKFVAAWTLKAAA) respectively, for better CTL activation.

Antigenicity, toxicity, allergenicity, and physiochemical properties assessment
The antigenicity, toxicity, and allergenicity of the vaccine construct were evaluated by VaxiJen v2.0, ToxinPred, and AllerTop v.2.0 tools, respectively. The ExPASy ProtParam tool (https://web.expasy.org/protparam/) was used to determine the physicochemical property of vaccine construct such as molecular weight, half-life, instability index, theoretical isoelectric point, charge and hydropathicity (Gasteiger et al., 2005). Finally, the SOLpro server (Magnan et al., 2009) was used to study the soluble nature of the final vaccine construct over the expression in E.coli (http://scratch.proteomics. ics.uci.edu/).

Secondary and tertiary structure prediction
SOPMA server was used to investigate the secondary structure of the vaccine construct (https://npsa-prabi.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_sopma.html). Based on the amino acid sequence, the improved SOPMA server jointly predicts the secondary structure with neural network method and provides 82.2% accuracy (Geourjon & Deleage, 1995). The I-TASSER tool (Roy et al., 2010) was used to build the tertiary structure of vaccine construct. The final structure was selected based on the C score (range between À5 to 2), with a higher positive score implying a higher reliability of the vaccine construct's structure (https://zhanglab.ccmb.med. umich.edu/I-TASSER/).

Tertiary structure refinement and validation
The tertiary structure refinement and validation of the vaccine construct was performed in an identical fashion to HLA-A Ã 26:01 model and using the same tools (GalaxyRefine, ProSA, ERRAT and PROCHECK).

Molecular docking and simulation of vaccine construct with TLR4
It is crucial to assess the molecular interactions between vaccine construct and toll-like receptor (TLR) for determining the potent immune response. Briefly, TLR4 (PDB: 3FXI) crystallographic structure was retrieved from the PDB database and used as a receptor for docking studies. The final receptor was prepared by removing the other ligand and chemical molecules from the TLR4 crystal structure, while 3 D vaccine model used as a ligand. The complete vaccine construct and the TLR immune receptor interactions were investigated with the help of ClusPro 2.0 server (https://cluspro.bu.edu/queue. php) (Desta et al., 2020). The best model was selected based on the cluster size, the PIPER energy of the cluster center and the lowest energy structure in the cluster. Moreover, the binding affinity (in kcal/mol) of the vaccine to the TLR4 receptor was identified using the PRODIGY server (https:// wenmr.science.uu.nl/prodigy/) (Vangone et al., 2019). The interactions between the TLR and vaccine construct were analyzed using PDBsum server (Laskowski et al., 2018) (http://www. ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html). As a final step, the dynamic interactions in between TLR4 and the vaccine construct were studied with the aid of the iMODS (http://imods.chaconlab.org/) server (L opez- Blanco et al., 2014).
The dynamic parameters such as deformability values, Eigen values, B-factor, variance, and co-variance map were investigated for the candidate vaccine.

Codon optimization and in silico cloning of vaccine construct
The amino acid sequence of the vaccine construct was backtranslated using EMBOSS Backtranseq (https://www.ebi.ac.uk/ Tools/st/emboss_backtranseq/) (Rice et al., 2000). To improve the translation efficiency, the Java Codon Adaptation Tool (JCAT) (Grote et al., 2005) was utilized to optimize the codon usage to the K12 Escherichia coli strain. Additionally, the NdeI and NheI restriction sites were added to the 5 0 and 3 0 end of the optimized sequence respectively. The SnapGene tool was utilized to clone the optimized sequence into the pET28a (þ) 6x-His-TEV vector.

Antigenicity, toxicity, non-allergic, non-homologous and conservancy analysis
The immunogenic epitopes are further analyzed in VaxiJen v2.0, ToxinPred, and AllerTOP tools to predict the antigenicity, toxicity, and allergic nature of the epitopes, respectively. In total, 222 epitopes were predicted as antigenic from amongst 463 immunogenic epitopes. Any epitope with a threshold value of ! 0.4 is considered to be antigenic. Interestingly, 20/222 epitopes show > 1 as the probability of being antigenic. The ORF 8 epitope IQYIDIGNY restricted by B62 supertype exhibits the highest score (Score: 2.10). The antigenic epitopes were subjected to a toxicity analysis in ToxinPred. Three toxic epitopes (A24: TYKPNTWCI, B58: NTWCIRCLW, and B58: KPNTWCIRCLW) from the ORF1ab protein were excluded from the dataset, with the rest 219 being subjected to allergenicity analysis in AllerTOP server. In all, 121/219 epitopes were deemed non-allergenic, while simultaneously being predicted as potent immunogens and lacking any sequence homology to human genome (Supplementary Table 4 The immunogenic epitopes predicted in sequence-based methods were further used to identify the interactions with their respective HLA alleles. The HLA-A Ã 26:01 allele structure was not found to be available in the PDB that necessitated its sequence retrieval from the NCBI database and subsequent homology modeling using the SWISS-MODEL server The modeled structures were then refined giving a final output of 5 models. The top-ranking model from amongst the 5 GDT-HA: 0.99; RMSD: 0.33; MolProbity: 1.58; Class score: 11.70; Poor rotamers: 0.4; Rama favored: 98.50) was selected for further validations. The overall quality of the model 1 was found to be 96.17 while the Z-score according to ProSA was À8.92. Finally, the PROCHECK analyses incorporating the Ramachandran plot revealed that 93% residues were situated in the most favored region; whereas, 6.60% and 0.40% residues were found in additionally allowed and generously allowed regions (Figure 3).

Theoretical validation of T cell epitopes
A total of 28 from 81 epitopes have been proven to be immunogenic from experiments have shown in Supplementary Table 7. From theses, 18 epitopes are potent MHC binders as confirmed from MHC ligand elution assays. Interestingly, the HLA-A Ã 02:01 restricted epitope VMVELVAEL

Antigenicity, toxicity, allergenicity, and physiochemical properties assessment
The vaccine construct was found to be immunogenic when subjected to immunogenicity analysis in the VaxiJen v2.0 server (Probability score: 0.86). Moreover, toxicity and allergenicity analyses using ToxinPred (Score: 0.50) and AllerTOP v.2.0 server deemed the construct non-toxic and non-allergic. The physicochemical properties of the vaccine construct like molecular weight, theoretical pI, Grand average of hydropathicity (GRAVY), instability index, and aliphatic index were found to be 47920.

Secondary and tertiary structure prediction
Secondary structure analyses of the vaccine construct ( Figure  5) using the SOPMA server predicted 72.44% for alpha helix, Overall quality factor was determined by ERRAT server. X-axis and Y-axis denotes the residues and error value, respectively. Two lines in the Y-axis represent the confidence and region above that error value considered to be lower quality.
15.20% for random coil, 8.20% for extended strand, and 4.10% for beta-turn. The tertiary structure validation using the I-TASSER server, revealed the model 3 (C score À1.48) to be the best and was used for further validation.

Tertiary structure refinement and validation
The selected 3 D structure of the vaccine construct was further refined using Galaxy Refine tools as mentioned earlier. The mild perturbation model 1 was selected based on the scores generated by I-TASSER before and after refinement. Finally, the refined vaccine construct was analyzed by the ERRAT server and the overall quality factor score is observed as 95.94, while the ProSA Z-score was À4.89. Apart from this, the Ramachandran plot is generated by PROCHECK server and 87.40% residues are in the most favored region; whereas, 9%, 1.30%, and 2.30% residues are found in additionally allowed, generously allowed, and disallowed regions, respectively ( Figure 6).

Molecular docking and simulation of vaccine construct with TLR4
The interaction between refined vaccine construct with TLR4 was carried out with the aid of ClusPro 2.0 server. Based on the largest cluster size and the PIPER energy, cluster 0 (44 members) was selected out of 30 clusters ranked by the server. The centroid and lowest energy representative structures were found to have weighted scores of À1181 and À1289.1, respectively. Additionally, PRODIGY analysis revealed the binding affinity (DG) of the docked complex to be À14.10 kcal/mol. The intramolecular contacts were analyzed with the aid of PDBSum server. The vaccine construct formed 7 and 5 hydrogen bonds with B and D subunits of TLR4, Figure 4. Schematic diagram of multi-epitope vaccine designed together with HBHA adjuvant (Red). The N terminal HBHA adjutant linked with EAAK linker followed by structural proteins CTL epitopes (yellow), PADRE epitope (Sky blue) and non-structural proteins CTL epitopes (Green). All CTL epitopes linked with AAY linker and PADRE epitope linked with GPGPG linker. respectively ( Figure 7). Finally, interactions of the vaccine construct and TLRs are analyzed in PDBsum server. The molecular simulation study revealed that the Eigen values of the vaccine construct-TLRV4 complex were measured to be 1.82 Â 10 À6 (Figure 8(d)). However, the Eigen value of the protein is directly proportional to the deformability viz. The lower Eigen values are a representation of easy deformability or distortion of the candidate vaccine which ultimately depends on the distortion of each residue in the protein.
The peaks corresponding to the regions of the proteins with increased deformability have been illustrated in Figure 8(d).
The temperature factor or B-factor of the complex was displayed by their respective PDB field (Figure 8(c)). The normal mode analysis (NMA) and B-factors suggested that the vaccine construct is thermostable in nature (Figure 8(a, c)). The deformability value of approximately 1 was seen around the 1600 th residue for the vaccine TLRV4 protein complex which is computed based on the data shown in 8 b. Individual variance is indicated by red bars, while cumulative variance is indicated by green bars in the variance graph as shown in 8e. The coupling of docked complex was analyzed by covariance matrix analysis represents uncorrelated, correlated and anti-correlated motions have presented in order of white, red and blue colors respectively have shown in Figure 8(f). The stiffness and cumulative and individual variances of the complex has determined that have shown in Figure 8(g).

Codon optimization and in silico cloning of vaccine construct
The primary protein sequence was back-translated using the EMBOSS Backtranseq server and optimized by the JCAT server. Initially, the codon usage of the vaccine construct was optimized based on the E. coli strain K12. The CAI value subsequently improved from 0.55 to 0.98; whereas, the GC content of the improved sequence was found to be 53.61. After optimization, the NdeI and NheI restriction sites were added to the 5 0 and 3 0 end of the optimized sequence, respectively and cloned into the pET28a (þ) 6x-His-TEV vector using the SnapGene tool (Figure 9).

Immune simulation efficiency of vaccine constructs
Activation of viral-specific CTL response is an important event that would aids in clearing intracellular infections (Woolard & Kumaraguru, 2010). Hence, the CTL effect of the multi-epitope vaccine was assessed by C-ImmSim 10.1 server. Different combinations of HLA alleles (Test set 1-Test set 4) were entered in the C-ImmSim server and details of MHC Figure 6. The tertiary structure of multi-epitope vaccine designed by I-TASSAR server and validated in different bioinformatics tools. A) The 3 D structure of final multi-epitope vaccine. B) Best refined model analyzed by ProSA server. The plot represents the Z-score of all experimentally validated similar size proteins structure from PDB database (dark blue dots: NMR spectroscopy and light blue dots: X-ray crystallography). The black spot in the plot indicates the validated model with Zscore of -4.89. C) Ramachandran plot generated by PROCHECK server and results showing the amino acid distribution in different regions. D) Overall quality factor was determined by ERRAT server. X-axis and Y-axis denotes the residues and error value, respectively. Two lines in the Y-axis represent the confidence and region above that error value considered to be lower quality.
test sets have been mentioned in the methodology section. The immune simulation revealed long-lasting period of active CTL and TH populations for all four test sets. Additionally, increased level of Dendritic cell (DC), Natural killer cells (NK), and Macrophage (MA) presentation was observed after exposure of vaccine Supplementary File 8. Importantly, the robust level of IFN-g and IL-2 elevation appears during secondary and tertiary vaccination periods (Figure 10), which confirms that the constructed multi-epitope vaccine is capable of eliciting a robust CTL response.

Discussion
Humoral and cell-mediated immune responses are considered as two arms of the immune system. Obviously, antibody response is mainly involved in neutralizing the viral infections. Several SARS-Cov2 vaccines are available in the market under emergency use authorizations and other vaccines are under the various stage of clinical trials. Most of the current vaccines were solely focused on the pre-fusion form of the SARS-CoV-2 Spike protein and their neutralizing antibody response (Kyriakidis et al., 2021). However, long-lived antibody response was observed in patients who have recovered from SARS-CoV2 infection. But, the emerging data suggest that the antibody response may be compromised due to severe viral infection. On the other hand, the Covid patients are displaying the robust CTL and Th response to both structural and non-structural viral proteins even after several weeks of disease onset (Jarjour et al., 2021). Hence, T cell response plays a vital role in viral clearances during the infection period (Karlsson et al., 2020). The Cytotoxic T Lymphocytes (CTL) recognizes the Major Histocompatibility Complex (MHC) bounded peptides otherwise called epitopes and provokes the effector functions against the viral infected cells. Identification of such potent T cell epitopes through the traditional methods is time consuming and expensive process (Parvizpour et al., 2020). To overcome this problem, in silico prediction methods are widely employed to analyze the potent T cell epitopes. In this study, we first carried out different immunoinformatics analyses to predict the immunogenic CTL epitopes using a sequence-based method. Besides, class I MHC binders were predicted using the MHC-I binding predictions tool from Immune Epitope Database (IEDB). Based on the previous literature, 12 high prevalent HLA super-type alleles were selected for analyzing class I MHC binders. The selected HLA super-type almost covers 99% world population. A total of 2240 class I MHC binders were predicted that are restricted by high-frequency HLA supertypes. From amongst these, 80.31% (Total: 1799) of epitopes were predicted in non-structural proteins of SARS-CoV2 virus; whereas, 19.69% (Total: 441) of epitopes were found in structural proteins. Interestingly, 245 epitopes (10.98%) are located in the SGP protein. Hence, the SGP-based epitopes predicted in this study could be the potent immunogenic target for vaccine development. Intracellular proteasomal cleavage is an important event for CTL activation (D€ onnes & Kohlbacher, 2005). However, the predicted MHC binders should undergo processing by the proteasomes and transported through TAP for the formation of a peptide-MHC complex. Identification of proteasomal processing is thus a critical step prior to epitope incorporation in vaccine. The class I MHC binders using the MHC-I processing prediction tool from IEDB 876/2240 (39.11%) epitopes received positive processing scores. From these 876 epitopes, 697 and 179 positive processing peptides are found in the non-structural and structural proteins of the SARS-CoV2 virus, respectively. However, the predicted MHC binders should be processed Moreover, association of the peptide-MHC complex with the T Cell Receptor (TCR) is a another key step in activation of effector CTL functions (Calis et al., 2013). Predicting the association capabilities of immunogenic epitopes would be very useful for multi-epitope vaccine development. A total of 462/ 876 epitopes with positive processing were found to have favorable association propensities. From a structural perspective, 22.51% (104 out of 462) epitopes were predicted as immunogenic in structural proteins; whereas, 77.49% (358 out of 462) epitopes were predicted in non-structural proteins. A potent CTL epitope must exhibit antigenicity and the ability to trigger the cell-mediated immunity. The antigenicity prediction using the VaxiJen server revealed 222/462 positive immunogenic epitopes exhibiting optimal antigenicity, with the degree of antigenicity ranging from 0.40 to 2.10. Moreover, the desired vaccine candidate must be non-toxic and non-allergenic (Mohan et al., 2020). The toxicity and allergenicity analyses using ToxinPred followed by AllerTop predicted 219/222 epitopes to be nontoxic, while 121 epitopes were predicted to be non-allergic. A BLASTp query against human proteomes revealed all epitopes to have e values below 0.05, implying their inability to provoke an   autoimmune response. Moreover, a conservancy analyses showed that 67 CTL cell epitopes with 100% conservancy with other SARS-CoV2 variants. However, the conserved epitopes predicted in this present study could provoke the CTL immune response against multiple SARS-CoV2 variants. Interestingly, 8 epitopes (YIWLGFIAGL, KLNDLCFTNV, PYRVVVLSF, RFPNITNLCPF, FPNITNLCPF, YQPYRVVVLSF, YQPYRVVVL, AEIRASANL) predicted in the SGP protein exhibited 100% conservancy with other SARS-CoV2 variants. Among this, KLNDLCFTNV epitopes was experimentally confirmed to bind with HLA-A Ã 02:01 (Poran et al., 2020). whereas, HLA-A Ã 24:02 allele restricted epitopes RFPNITNLCPF provoke specific CD8 T cell activity (Tarke et al., 2021). Moreover, experimental evidence showed that PYRVVVLSF epitope binds with HLA-A Ã 01:01, HLA-A Ã 23:01, HLA-A Ã 24:02 and HLA-A Ã 26:01 alleles and FPNITNLCPF epitope binds with HLA-B Ã 07:02, HLA-B Ã 35:01, HLA-B Ã 51:01, HLA-B Ã 53:01 and HLA-B Ã 54:01alleles (IEDB submission). Generally, four methods -sequence, structural, hybrid, and consensus approaches have been extensively used to predict the immunogenic epitopes. Comparatively, the combination of sequence-and structure-based (hybrid) methods were found to exhibit higher predictive accuracy than other methods (X. Yang & Yu, 2009). Hence, the epitopes were predicted by sequencebased methods and their interaction with either experimentally validated MHC or high-quality MHC homology models were extensively analyzed using molecular docking. From amongst 81 epitopes binding to their respective alleles, 13 epitopes (ORF1ab: ILHCANFNV (A02); ORF8: YIDIGNYTV (A02); MGP: SYFIASFRL (A24); EP: QWNLVIGFLF (A24); MGP: LLWPVTLACF (B62); ORF1ab: FMGRIRSVY (B62); ORF1ab: KQLIKVTLVF (B62); ORF1ab: LAYILFTRF (B62); ORF1ab: GQSTQLGIEF (B62); ORF1ab: LQAVGACVL (B62); ORF8: FLGIITTVAAF (B62); SGP: YQPYRVVVLSF (B62); and NP: MEVTPSGTWL (B40) exhibiting the highest binding affinity (!-11 kal/mol). A majority of these were restricted with the B62 supertype. In this study, we considered the binding affinity and hydrogen bonding while constructing a multi-CTL epitope vaccine. A high number of hydrogen bond interaction between the peptide and MHC were indicators of stable and high affinity binding (Hossain et al., 2021). Thus, these metrics were used to evaluate the docked peptides. In comparison to previous studies using limited HLA alleles for epitope prediction, we have for the first time utilized all 12 high prevalent HLA supertype to not only predict potent CTL epitopes but also their interaction with all high prevalence alleles. To the best of our knowledge, this is the first study to utilize the structure-based epitopes prediction using all 12 high prevalent HLA supertypes.
A comparison of the predicted epitopes against the experimentally proven ones in the IEDB database revealed a 34.57% (Total: 28/81) match. Moreover, the HLA-A Ã 02:01 restricted ORF1ab epitope VMVELVAEL has been shown to elicit potent T cell response to HLA-A Ã 0201 Tg mice (Takagi & Matsui, 2020). Intriguingly, 4/28 epitopes (MGP Protein-FVLAAVYRI, ORF1ab Protein-KLNVGDYFV, ORF1ab Protein-ILHCANFNV, and EP Protein-LLFLAFVVFL) have multi-allele binding potential (Ishizuka et al., 2009). A total of five epitopes (ORF1ab Protein-HLYLQYIRK, EP Protein-LWPVTLACF, EP Protein-QWNLVIGFLF, SGP Protein-FPNITNLCPF, and MGP Protein-SELVIGAVI) exhibited high binding affinity to various class I MHC molecules (IEDB submission). Moreover, 18/81 epitopes matched those peptides which have been experimentally proven to bind MHC1 molecules as deduced from ligand elution assays (Supplementary  Table 7). These results strongly suggest that hybrid-based epitope prediction methods provide higher prediction efficiency over sequence-based prediction methods. The aforementioned top-ranking epitopes (N ¼ 17) were selected for the vaccine construct restricted by 11 HLA supertypes. The IEDB conservancy analyses revealed that 11 epitopes of vaccine construct have 100% conservancy, 3 epitopes have 90% and 3 epitopes have 80% conservancy with other SARS-CoV2 variants. Importantly, in the SGP protein, epitopes RFPNITNLCPF and YQPYRVVVLSF were identified with 100% conservancy. Similar values were observed for FLYIIKLIFL and LWPVTLACF (MGP), NSVLLFLAF (EP), MEVTPSGTWL (NP), WLLLTILTSL, AIVVTCLAY, TRFFYVLGL, DVSFLAHIQW (ORF 1AB), and APFLYLYAL (ORF 3 A) epitopes. Lower conservancy values of 90% and 80% were observed in the case of VYSVIYLYLTF, DVVAIDYKHY, SEVGPEHSL (ORF 3 A) and WTAGAAAYY (SGP), and HLYLQYIRK, LQAVGACVL (ORF 1AB) peptides. In addition, incorporation of the universal PADRE epitope in the vaccine ensured activation of the CD4 T helper cells during the immune response, while ensuring robust binding to 16 high-frequency HLA-DR alleles (Alexander et al., 2000). The addition of PADRE epitope and the HBHA adjuvant at the N terminal of the vaccine construct might lead to a robust immunogenic response. The antigenicity, toxicity and allergenicity analyses of the vaccine construct showed potent antigenic properties whilst exhibiting low toxicity and hypo-allergenicity. It is known that the molecular weight of the vaccine is inversely related to the lymph node exposure. Moreover, proteins with molecular weight > 16 kDa play a major role in lymph node absorption (Majee et al., 2020;Wu et al., 2020). Thus, the vaccine construct with a weight of 47.92 kDa is expected to be absorbed in the lymph nodes and exhibit good antigenicity. Based on the instability index score of 39.04 and an aliphatic index of 98.79, the vaccine construct was found to be thermostable (Instability index < 40 ¼ stable protein) (Jaydari et al., 2020). The vaccine was found to linger around for extended periods in mammalian cells (30 h), yeast (> 20 h), and E. coli cells (> 20 h). Moreover, the predicted half-life of our vaccine construct was also found to be higher (Akhand et al., 2020;Dong et al., 2020;Rahman et al., 2020) and equal in comparison to the previously reported in silico designed vaccines candidate (Abraham Peele et al., 2021;Ahmad et al., 2020;Ismail et al., 2020;Kar et al., 2020;Ojha et al., 2020;Saha & Prasad, 2020;Tahir Ul Qamar et al., 2020). The secondary and tertiary structure analyses of the vaccine construct revealed its high quality. The modeling of the interactions between TLR4 and the vaccine construct using the ClusPro server and the iMODS server, revealed that the construct bound with a high affinity (-14.1 kcal/mol) and established stable hydrogen bonds with TLR4. Moreover, the MD simulation of the TLR4-vaccine complex was found to have low deformity ( Figure  8(a)) and Eigen values (Figure 8(a)). The TLR and vaccine complex was also stable as evident from the B-factor analysis (Figure 8(b)). All these evidences suggest that the designed vaccine is very stables in vivo condition. Our vaccine construction procedure also considered the CAI value (codon optimization) in order to improve its expression in numerous expression systems. A higher CAI value is associated with a corresponding shift in the gene expression in the host system (Tosta et al., 2021). Hence, the vaccine construct exhibited the ability to be expressed in E. coli cells (CAI: 0.98). Moreover, cloning of the construct was performed in pET28a (þ) 6x-His-TEV vector (Figure 9), that contains 6X His tag sequence and a TEV protease cleavage site conferring ease of recombinant vaccine product purification in the downstream process.
In contrast to other studies focusing on the spike glycoprotein, we utilized the entire SARS-CoV2 proteome to designed the CTL based vaccine using robust epitope prediction methods. The immune system simulation profile from C-ImmSim online server reveals that the constructed vaccine is capable of stimulating many immune cells especially CD8þ and CD4þ T cells population after vaccination. Also, increased DC and macrophage levels are observed at primary and booster dose of vaccine administration. Increased IFN-g and IL-2 elevation is observed after immunization. To the best of our knowledge, this is the first study to determine the immune simulation profile using all 12 MHC supertypes. The present in silico results suggested that the vaccine construct is capable of provoking a potent CTL response that would help to combat SARS-CoV2 infection.

Conclusion
SARS-CoV2 is highly contagious pathogen that has resulted in a global pandemic unseen since the influenza pandemic of 1918. In this present study, potent immunogenic epitopes are identified in entire proteome of SARS-CoV2 using combination of sequence-and structure-based methods, wherein many peptides have been already experimentally validated as potent CTL epitopes. A broad gamut of prediction methods has revealed the predicted epitopes to be safe, effective and hypoallergic. With several potent CTL epitopes being used to construct the vaccine, a broad immune response is expected. This was seen in an in silico immune simulation study that revealed that our designed vaccine could provoke adequate level of CTL immunity to different sets of allele population covering wide swathes of global population. Further, in vitro and in vivo studies are needed to determine whether the designed vaccine could offer the protective T cell immunity to SARS-CoV2.

Acknowledgements
Authors thanks the researchers worldwide for sharing the sequential information of SARS-CoV2. This helps us for effective construction of potent vaccine candidate against the virus using various bioinformatics tools.

Disclosure statement
Authors declare no conflict of interest.

Funding
The author(s) reported there is no funding associated with the work featured in this article.