Deciphering the immunogenic T-cell epitopes from spike protein of SARS-CoV-2 concerning the diverse population of India

Abstract Scientists are rigorously looking for an efficient vaccine against the current pandemic due to the SARS-CoV-2 virus. The reverse vaccinology approach may provide us with significant therapeutic leads in this direction and further determination of T-cell/B-cell response to antigen. In the present study, we conducted a population coverage analysis referring to the diverse Indian population. From the Immune epitope database (IEDB), HLA- distribution analysis was performed to find the most promiscuous T-cell epitope out of In silico determined epitope of Spike protein from SARS-CoV-2. Epitopes were selected based on their binding affinity with the maximum number of HLA alleles belonging to the highest population coverage rate values for the chosen geographical area in India. 404 cleavage sites within the 1288 amino acids sequence of spike glycoprotein were determined by NetChop proteasomal cleavage prediction suggesting the presence of adequate sites in the protein sequence for cleaving into appropriate epitopes. For population coverage analysis, 179 selected epitopes present the projected population coverage up to 97.45% with 56.16 average hit and 15.07 pc90. 54 epitopes are found with the highest coverage among the Indian population and highly conserved within the given spike RBD domain sequence. Among all the predicted epitopes, 9-mer TRFASVYAW and RFDNPVLPF along with 12-mer LLAGTITSGWTF and VSQPFLMDLEGK epitopes are observed as the best due to their decent docking score and best binding affinity to corresponding HLA alleles during MD simulations. Outcomes from this study could be critical to design a vaccine against SARS-CoV-2 for a different set of populations within the country. Communicated by Ramaswamy H. Sarma


Introduction
The 2019 novel coronavirus (2019-nCoV) infection was emerged in Wuhan, China, in December 2019 and has rapidly multiplied throughout China and several other countries (Zhao et al., 2020). The virus has a positive-sense singlestranded RNA as its genetic constituent. It belongs to the family Coronaviridae of order Nidovirales and is a b-coronavirus (Mousavizadeh & Ghasemi, 2021). The binding of the virion to receptors on target cells in the human respiratory system is the first stage in the viral replication cycle after entry into a suitable host (Millet & Whittaker, 2015). The COVID-19 will bind to the ACE-2 receptor, which is found in the lungs, heart, kidney, small intestine, and other tissues (McMillan & Uhal, 2020). The viral genome RNA is released in the cytoplasm after the entrance and translated into the polyproteins pp1a and pp1ab (Janeway et al., 2005). Non-structural proteins are synthesised during translation and form the replication-transcription complex (RTC) in double-membrane vesicles (Brant et al., 2021). The RTC is a continuous replicating cell that produces a nested series of sub-genomic RNAs that encode accessory and structural proteins (V'kovski et al., 2021). Viral particle buds are generated when newly formed genomic RNA, nucleocapsid proteins, and envelope glycol-proteins combine (Luteijn et al., 2020). Finally, the virus is released when the virion-containing vesicles merge with the plasma membrane (Caobi et al., 2020). The epidemiology of coronavirus demonstrates that human-to-human transmission of the virus happens through various routes like sneezes, coughs, and respiratory droplets (Li et al., 2020). After days of illness, the spread of SARS-CoV occurred widely which is associated with modest viral loads in the respiratory tract during the initial stage of the illness, in later stages, viral loads increase approximately 10 days after the onset of symptoms (Karimzadeh et al., 2021). As the outbreak progressed, a large number of confirmed cases were reported all over the world. As a result, the disease was declared pandemic and given the name COVID19 (Khan et al., 2020). The pathogenesis of this novel virus is very high where Toll-like receptors might have major implications (Choudhury & Mukherjee, 2020;Choudhury et al., 2021).
Vaccination is an extremely successful approach to disease control in human and veterinary health care (Vrba et al., 2020). The major advantage of immunoinformatic is that it can reduce the time and cost essential for laboratory analysis of pathogenic gene products (Oli et al., 2020). So, this information enables an immunologist to explore the potential binding sites, which, in turn, guides the development of novel vaccines. This methodology is labelled as reverse vaccinology (Madlala et al., 2021). Previously, immunoinformatic studies have been applied to predict the novel multi-epitope vaccine construct against typhoidal Salmonella serovars which can simulate both T-cell and B-cell immune responses and could be proposed for therapeutic applications . Moreover, in another study, various robust In silico approaches were employed to find out the efficient vaccine candidate and their successful expression as a highly antigenic molecule . Additionally, these immunoinformatic approaches are also employed to design a multi-peptide vaccine against K. pneumonia (Dey et al., 2022). As the immune system is well-thought-out as a system of thousands of molecules, which directs to several intertwined responses, it is structurally and functionally distinct and this diversity differs both between individuals and temporally within individuals as a result it can create huge amounts of data (Rudolph et al., 2006). T cell epitopes are short linear peptides that are either cleaved from antigenic proteins or generated by protein splicing (Ma et al., 2020). T cell epitopes are presented in the surface of Major Histocompatibility Complex (MHC) proteins and in the case of humans on class I or class II Human Leukocyte Antigen (HLA) molecules (Hammer et al., 2020). Epitope presentation depends on both MHC-peptide binding and T cell receptor (TCR) interactions (Springer et al., 2021). MHC proteins are highly polymorphic and bind to a restricted set of peptides (Dendrou et al., 2018). Consequently, a specific combination of MHC alleles present in a host restricts the choice of potential epitopes identified throughout an infection (Juanes-Velasco et al., 2021;Hwang et al., 2021).
The COVID-19 began in India from Kerala (a state of India) by three students who had returned from Wuhan, China. As per the data of 25 th of October 2021, the total number of active coronavirus cases reported in India is 34,200,957 with total deaths of 455,093 (https://www.worldometers.info/coronavirus/country/india/) ( Figure 1). As the disease was spreading all over India uncontrollably, the necessity for developing an effective peptide vaccine component against the SARS-CoV-2 was rising (Kaur & Gupta, 2020).
By using immunoinformatic techniques, we could identify and characterize potential T-cell epitopes for the development of the epitope vaccine against SARS-CoV-2 (Sohail et al., 2021). The spike glycoprotein of SARS-CoV-2 is chosen as the target as it produces a distinctive crown around the virus that projects from the viral envelope that helps in viral attachment with the host receptor (Santopolo et al., 2021). Majorly IEDB (immune-epitope database) server was used to explore spike glycoprotein to recognize many epitopes for an effective vaccine (Ghorbani et al., 2020). To identify the T cell epitopes that are effective to produce immune response among the Indian population, we carried out a population coverage analysis within the Indian Population from the given MHC alleles and obtained a set of epitopes that are estimated to provide broad coverage within the population. The goal of this study is to find probable T-cell epitopes with the maximum coverage in the Indian population that might be utilised to build an in-silico peptide vaccine based on the SARS-CoV-2 virus's genomic data, as India is the third most covid impacted country due to its large population. Immunoinformatic methods are used to create immunodominant T-cell epitopes that can elicit specific immune responses. The goal of this study was to use computational and immunoinformatic tools to create a peptide-based vaccine against the SARS-CoV-2 virus (Lim et al., 2021;Crooke et al., 2020). The flowchart of methods applied for this study is illustrated in Figure 2.

Sequence retrieval
The protein sequence of prefusion SARS-CoV-2 spike glycoprotein with an RBD (NCBI ID-6VSB_A) in FASTA format is retrieved from the NCBI database (https://www.ncbi.nlm.nih. gov/protein/6vsb_A) (National Center for Biotechnology Information \(NCBI\) [Internet]. Bethesda \(MD\): National Library of Medicine \ (US, 1988). NCBI's Protein resources comprise protein sequences and structures, as well as databases and tools to predict and analyse functional domains. Among the different spike glycoprotein sequences available in the database, chain A of spike glycoprotein with RBD domain is considered as it plays a huge role in interacting with the human ACE2 receptor (Zhang et al., 2021). The FASTA format of the spike glycoprotein sequence is employed for the analysis.

Antigenicity prediction
Vaxijen v2.0 server (Flower et al., 2010) (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html) is used to predict antigenicity as it is the initial server for alignment-independent prediction of protective antigens This is alignment independent predictor based on auto-cross covariance (ACC) transformation epitopes sequences into uniform vectors of principal amino acid properties. The accuracy of this server varies between 70 to 89% depending on the targeted organism (Dorosti et al., 2019). This software requires FASTA submitted amino acid sequences and was established to certificate antigen classification uniquely based on the physicochemical properties of proteins without recourse to sequence alignment. The default parameters (threshold ¼ 0.4, ACC output) were used contrary to viral species to predict the antigenicity of full-length protein sequence (Flower et al., 2017).

Super secondary protein prediction
Super secondary protein prediction is used to classify the total number of domains present in the protein sequence as a greater number of domains is truly decent for an improved antigen and to design an epitope-based vaccine. NCBI Conserved Domain Database (CDD) (Lu et al., 2020) (https:// www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) is used to predict the domain present in the protein sequence as it is a database of well-annotated multiple sequence alignment models and derived database search models. Functionally important sites were defined as those residues making interaction with a ligand or a macromolecule. CDD alignments signify alignments of conserved core structures formed by presumably homologous sites, and positions exterior to the conserved cores are removed from the alignment. Fasta format of the protein sequence is specified as input and searching is done against CDD v3.18 which is a superset including NCBI-curated domains and data imported from Pfam (El-Gebali et al., 2019), SMART (Letunic & Bork, 2018), COG (Tatusov et al., 2001), PRK, and TIGRFAM (Haft et al., 2013). All other parameters are set into default for calculation.

Proteasomal cleavage prediction
The generation of cytotoxic T lymphocyte (CTL) epitopes from an antigenic sequence comprises several intracellular processes, including production of peptide fragments by the proteasome (proteasomal cleavage) and transport of peptides to endoplasmic reticulum through transporter associated with antigen processing (TAP). Efficient prediction of the proteasome cleavage site-specificity is one of the major components in the design of therapeutics based on CTL responses. The NetChop 3.1 server (http://www.cbs.dtu.dk/services/ NetChop/) (Nielsen et al., 2005) is used to predict the proteasomal cleavage of peptide sequence in the host which produces neural network predictions for cleavage sites of the human proteasome. Protein sequence in FASTA format is given as input by keeping the threshold to 0.5 and method as C term 3.0.

Identification of HLA alleles
The Allele Frequency Net Database (AFNDB) (http://www.allelefrequencies.net/hla6006a.asp) (Gonzalez-Galarza et al., 2020) is used to identify the HLA allele frequency of the population. This database grants the scientific community a freely accessible repository for the storage of frequency data (alleles, genes, haplotypes and genotypes) related to human leukocyte antigens (HLA), killer-cell immunoglobulin-like receptors (KIR), major histocompatibility complex Class I/II chain-related genes (MIC) and a numerous cytokine gene polymorphism in worldwide populations. At present, AFND contains >1600 populations from >10 million healthy individuals, making AFND an appreciated source for the analysis of some of the most polymorphic regions in the human genome. Identification of HLA classical allele frequency of all loci from all geographical regions was carried out by selecting the country as India and results are shown by sorting allele frequency from highest to lowest. Alleles are obtained from all sources covering literature, Proceedings of IHWs and Unpublished.

MHC-T-cell binder prediction
The primary step on applying bioinformatics to epitopebased vaccine development comprises predicting potential CTL and HTL epitopes from the target protein. We used NetMHCpan 4.1 (Jurtz et al., 2017) to predict the 12-mer and 9-mer CTL epitopes within the surface glycoprotein. Using artificial neural networks, the NetMHCpan-4.1 server predicts peptide binding to every known MHC molecule (ANNs). Over 850,000 quantifiable Binding Affinity (BA) and Mass-Spectrometry Eluted Ligands (EL) peptides were used to train the approach. We predicted class I 12-mer epitopes for the following MHC class I supertypes-A1, A2, A3, A11, A32, A68, B18, B35, B40, B44, B49, B50, B51, B52, B55, B57, B5, C1, C3, C4, C6, C7 and 9-mer epitopes for A1, A3, B7 and B58 supertypes. For the class, II T cell epitope prediction IEDB Analysis Resource (Vita et al., 2019) is used. The prediction method chosen was NetMHCIIpan 4.0 for the prediction of epitopes for the MHC class II supertype DRB1. FASTA format and all other parameters are kept default for the prediction of MHC epitopes.

Epitope conservancy and immunogenicity prediction
In an epitope-based vaccine design, the use of conserved epitopes would likely provide wider protection across multiple strains, or even species, as compared to epitopes obtained from extremely variable genome regions. The conservancy tool was designed to analyse the variability or conservation of epitopes. Epitope Conservancy Analysis of IEDB Analysis Resource (http://tools.iedb.org/conservancy/) (Bui et al., 2007) computes the degree of the conservancy of an epitope within a given protein sequence set at a given identity level. 179 selected 12-mer class I epitopes, 54 9-mer epitopes and 144 class II epitopes with decent peptide scores are given as plain text as input.  (Calis et al., 2013) is employed. This tool utilizes amino acid properties as well as their position within the peptide to predict the immunogenicity of a peptide MHC (pMHC) complex. The selected class I and class II epitopes are given as input and other parameters are given as default.

Allergy and antigenicity prediction
AlgPred2.0 (https://webs.iiitd.edu.in/raghava/algpred2/) , a Bioinformatics tool for allergenicity prediction is used to anticipate whether the determined epitopes from viral protein sequence would produce an allergic response on the human body. AllerTOP is the primary alignment-free server for in silico prediction of allergens built on the key physicochemical properties of proteins. Epitopes in plain text are given as input to compute the allergenicity of the sequence. In contrast to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity. Vaxijen v2.0 server (Flower et al., 2017) (http://www.ddg-pharmfac. net/vaxijen/VaxiJen/VaxiJen.html) is used to predict antigenicity of all the selected epitopes.

Toxicity extrapolation and analysis of IFN -c inducing epitopes
The toxicity of selected epitopes is predicted using ToxinPred (https://webs.iiitd.edu.in/raghava/toxinpred/multi_ submit.php) (Gupta et al., 2013) web server by keeping all other parameters as default. This tool permits the users to identify extremely toxic or non-toxic peptides from the large number of peptides offered by a user. It predicts their toxicity along with all the vital physio-chemical properties like hydrophobicity, charge pI etc. of peptides offered by users. This method was developed based on the machine learning technique and quantitative matrix using different properties of peptides. The ability of IFN-production in the selected epitopes was determined using the IFN epitope server (http://crdd.osdd.net/raghava/ifnepitope) (Dhanda et al., 2013). Several methods, including motifs-based search, machine learning technology, and a hybrid approach, are used to classify MHC binder epitopes into IFN-inducing (positive numbers) and non-inducing IFN-inducing (negative numbers). In this software, the accuracy of the best forecast based on hybrid technique is 82.10 per cent.

Population coverage
Population Coverage tool of IEDB (http://tools.iedb.org/population/) (Bui et al., 2006) is used as it calculates the fraction of individuals predicted to respond to a given set of epitopes with known MHC restrictions. This calculation is made based on HLA genotypic frequencies assuming non-linkage disequilibrium between HLA loci. The most prominent computationally validated epitopes predicted from several epitope prediction tools were taken into consideration to predict the response of individuals from the Indian population towards the predicted epitopes. 12-mer and 9-mer epitopes of Class I along with class II T-cell epitopes identified from epitope prediction tools along with its corresponding class I and class II HLA alleles are submitted to the tool to identify the coverage of epitopes within the Indian population by keeping other default parameters on.

Tap binding, interleukin-6 inducing epitopes predictions and cross-reactivity analysis
TAP refers to the transport of peptides to the endoplasmic reticulum through a transporter associated with antigen processing. TAP prediction is essential to identify whether the predicted epitopes will be successfully transported to the endoplasmic reticulum as it is significant for the epitope to produce a response in the human body. All those epitopes with vital IC50 values can be chosen for epitope-based vaccine design. The understanding of selectivity and specificity of TAP may contribute significantly to the prediction of the MHC class-I and class-II restricted T-cell epitopes. For predicting binding affinities of peptides to TAP, an online tool TAPREG (http://imed.med.ucm.es/Tools/tapreg/) (Diez-Rivero et al., 2010) from Immunomedicine Group is used. It predicts the binding affinity by using a Support Vector Machine model. Selected epitopes in FASTA format are given as input by keeping all other parameters as default. To analyse whether the predicted epitopes are interleukin-6 inducing peptides, IL-6Pred is used (https://webs.iiitd.edu.in/raghava/ il6pred/disp3.php?ran=92007) . Both class I and class II epitopes in FASTA format are given as input by keeping default e-value, 10e-3 for prediction. To eliminate cross-reactivity, the predicted epitopes were used to search against the UniProt database of human proteomes using the Protein Information Resource's Multiple Peptide Match tool (https://research.bioinformatics.udel.edu/peptidematch/batchpeptidematch.jsp).

Epitope 3-D structure prediction and structure refinement
For modelling the 3 D structure of predicted epitopes, the I-TASSER server (Yang et al., 2015) (https://zhanglab.ccmb. med.umich.edu/I-TASSER/) was employed. This server provides the most accurate protein structure and function predictions possible utilising cutting-edge algorithms. The models with the highest confidence score (C-score) were chosen for refinement analysis. The server was used to predict the 3 D structures of three 12-mer class I epitopes LLAGTITSGWTF, VSQPFLMDLEGK, and SETKCTLKSFTV, three 9mer class I epitopes ASFSTFKCY, RFDNPVLPF and TRFASVYAW and one class II epitope SNVTWFHAIHVS. Predicted tertiary structures were refined using UCSF Chimera (Pettersen et al., 2004). UCSF Chimera is a tool that allows interaction with molecular structures and data, such as density maps, trajectories, and sequence alignments, to visualise and analyse them in real-time. ProSA (Wiederstein & Sippl, 2007) and ERRAT (Colovos & Yeates, 1993), PROVE (Pontius et al., 1996) from SAVES v6.0 (https://servicesn.mbi. ucla.edu/SAVES/) server was used to validate tertiary structures in the final refined models which have also been used in previous studies .

Molecular dynamic simulation studies on the interaction between MHC allele and corresponding predicted epitopes
Molecular dynamics modelling is one of the utmost straightforward approaches that theoretically analyse the behaviour of molecular complex systems by utilizing empirical force fields (Raut et al., 2005). Molecular dynamics simulation runs were established using Amber 20 package and runs were conducted for more than 100 ns. This analysis may provide us with the stability and dynamics of interactions with the time, crucial for drug discovery and vaccine development.
System preparation and setup. Complexes 1 and 2 were composed of 283 amino acids (275-283 consisted of peptide chains). Similarly, complex 3 and 4 were composed of 284 amino acids (276-284 consisted of peptide chain) whereas complex 5 consisted of a total of 379 residues (peptide chain: 368 to 379 residues), complex 6 consisted of 391 residues (peptide chain: 380-391) and complex 7 contained 287 residues (peptide chain: 276-387). Missing hydrogen atoms were added using the leap module of Amber 20 package (Wang et al., 2019). Few Na þ /Cl À1 were added to the protein surface to neutralize the total charge of the system. The resulting system was then solvated in a truncated octahedron box using TIP3P water model extending up to a minimum cutoff of 10 Å from the protein boundary (Jorgensen et al., 1983). The Amber FF19SB force field was employed for all complexes containing standard protein molecules. The peptide chains were docked inside the active site region of the protein system using Autodock Vina. All the three complexes were then subjected to MD simulation studies followed by MMGBSA calculations.
MD Simulation. After proper parameterizations and setup, the geometries of resulting systems were minimized (5000 steps for steepest conjugate and 10000 steps for conjugate gradient) to remove the poor contacts and relax the system. The systems were then gently annealed from 10 to 300 K under the NVT ensemble for 50 ps with a weak restraint of 5 kcal/mol/Å 2 . Subsequently, the systems were maintained for 1 ns of density equilibration in the NPT ensemble at a temperature of 300 K and the target pressure of 1.0 atm. using Langevin thermostat (Izaguirre et al., 2001) and Barendsen barostat (Berendsen et al., 1984) with a collision frequency of 2 ps and pressure relaxation time of 1 ps, with a weak restraint of 1 kcal/mol/Å 2 . This 1 ns of density equilibration is not identical with conformational equilibration, but rather a weakly restrained MD in which we slowly relax the system to achieve a uniform density after heating dynamics under periodic boundary conditions. Thereafter, we removed all restraints applied during heating and density dynamics and further equilibrated the systems for $3 ns to get wellsettled pressure and temperature for conformational and chemical analyses. This was followed by a productive MD run, for all four systems for 100 ns. During all MD simulations, the covalent bonds containing hydrogen were constrained using SHAKE (Ryckaert et al., 1977) and particle mesh Ewald (PME) was used to treat long-range electrostatic interactions (Darden et al., 1993). All MD simulations were performed with the GPU version of Amber 20 package. All analysis of trajectories was done with the Cpptraj module of Amber 20.
The hydrogen bonds and their occupancies were calculated by VMD (Humphrey et al., 1996) for production trajectories, where we kept the donor-acceptor distance as 3.0 Å and the angle cut-off as 20.
MM-PB/SA calculations. For the binding free energy calculations, we used the standard MM-PB/SA (Gohlke & Case, 2004;Fogolari et al., 2003) method. Before the MM-PB/SA analysis, all water molecules and the sodium ions were excluded from the trajectory. During the analysis of the MM-PB/SA trajectory, snapshots were gathered at the interval of 50ps.

Antigenicity prediction
Vaxijen v2.0 analysis of prefusion spike glycoprotein with RBD in 0.4 thresholds (70% accuracy at threshold 0.4 for viral antigens is suggested) (Ahmad et al., 2021) exhibited an antigenicity of 0.4512 which indicates that the selected sequence is a probable antigen. So, it confirms that the protein sequence can be well considered for epitope prediction.

Physio-chemical parameter prediction of sequence
Physio-chemical parameter prediction of full-length protein sequence using the Protparam online tool helped to identify characteristics important for epitope prediction from a protein sequence. The molecular weight is computed as 142274.61 with theoretical pI 5 6.14. The estimated half-life is 30 hours for mammalian reticulocytes, (in vitro). The instability index (II) is computed to be 31.58 which classifies the protein as stable. The Aliphatic index is computed to be 81.58 with a Grand average of hydropathicity (GRAVY) ¼ À0.163 ( Figure S1).

Super secondary protein prediction
Super secondary protein prediction with NCBI Conserved Domain Database (CDD) specifies that there are three domains present in the protein sequence. One amongst them is Corona_S2 superfamily (Coronavirus S2 glycoprotein) of length 601 seen between sequence interval 662-1208 with Bit score 794.41 and E-value 0e þ 00. The next is SARS-CoV-2_Spike_S1_RBD (receptor-binding domain of the S1 subunit of severe acute respiratory syndrome coronavirus 2 Spike (S) protein) presents in the sequence interval 319-541 of length: 223, Bit Score: 492.30 and E-value 9.12e-166. The third domain to be identified is Spike-COV-like_S1_NTD (N-terminal domain of the S1 subunit of the spike (S) protein from severe acute respiratory syndrome coronavirus and related beta coronaviruses in the B lineage) of length 280 present in the sequence interval 13-304 with bit score 458.72 and Evalue 4.53e-152. Among the three domains recognized, SARS-CoV-2_Spike_S1_RBD is significant as it plays a key role in the binding of SARS-CoV-2 with the ACE2 receptor of the human host. Entire sequences involved in the respective domain were also identified as it aids to distinguish whether the predicted epitopes exist in the major domains ( Figure S2).

Proteasomal cleavage prediction
MHC class I binding predictions are very accurate for most of the identified MHC alleles. However, these estimations could be additionally enhanced by integrating proteasome cleavage. NetChop proteasomal cleavage prediction prophesied that the spike glycoprotein sequence has 404 cleavage sites within 1288 amino acids. It specifies that there are adequate sites in the protein sequence for cleaving into appropriate epitopes.

Identification of HLA alleles
Identification of HLA alleles with Allele Frequency Net Database (AFND) helped to recognize the class I alleles commonly seen in the Indian population based on their Allele Frequency and Percentage of Individuals that have the allele. Alleles are identified from all the regions of the Indian population. The results consist of Allele, Population from which alleles are identified along with the % of individuals that have the allele, Sample size and Location sorted based on their Allele frequency. The results are briefed in Table S1.

T-Cell epitope prediction
Class I epitope prediction using NetMHCpan 4.1 tool aligns predicted results based on peptide score and provides information on the start and end of 12-mer and 9-mer epitopes.

Epitope conservancy and immunogenicity prediction
Epitope conservancy analysis of both classes of predicted epitopes against 10 distinct SARS-CoV-2 spike proteins showed that within class I epitopes 143 of the 179 epitopes had 100% conservation with the provided spike protein sequences. 18 had a conservancy of 90%, whereas 4 had an 80% conservancy. 13 epitopes have a conservancy of less than 80%. For 9-mer class I epitopes, 32 within 54 epitopes showed 100% conservancy. Whereas for the class II epitopes within the 144 epitopes, 107 were having 100% conservancy. 21 and 8 epitopes showed 90% and 80% conservancy respectively. 8 epitopes were having a conservancy of less than 80%. The immunogenicity scores of both classes of epitopes were predicted using the class I and class II immunogenicity prediction in the IEDB analysis resource, and the output was sorted in decreasing order of immunogenicity score.

Allergy and antigenicity predictions of epitopes
The allergenicity and antigenicity of the predicted epitopes were done using AlgPred 2.0 and Vaxijen 2.0 respectively. Both the analysis was performed for all the class I and class II epitopes. Within the given 179 class I 12-mer epitopes, 32 and 104 epitopes were observed as non-allergens and antigens respectively whereas for 9-mer epitopes 32 were having antigenic properties within given 54 epitopes and 12 were non-allergen. For the class II epitopes, 42 and 66 among the 144 epitopes were predicted as non-allergen and antigen respectively.

Toxicity extrapolation and analysis of IFN -c inducing epitopes
Toxicity prediction is also important to identify whether the predicted epitopes are toxic to the human body or not. All the predicted class I 12-mer and class II epitopes were observed as non-toxic. For class-I 9-mer epitopes 53 among 54 epitopes were non-toxic. Among the given 179 class I 12mer epitopes, 15 were IFN -c inducing epitopes whereas for 9-mer epitopes it was 13. Within 144 class II epitopes, 11 epitopes were IFN-c inducing epitopes. All the IFN -c inducing epitopes were having with positive scores.

Population coverage
Population coverage analysis for class I and class II epitopes within the Indian population predicted using IEDB Population Coverage Analysis Tool exposed that for the 179 selected class I epitopes, the projected population coverage is 94.7% with 41.03 average hit (average number of epitope hits/HLA combinations recognized by the population) and 3.59 pc90 (minimum number of epitope hits/HLA combinations recognized by 90% of the population). For the class II epitopes, a coverage of 51.77% with an average hit of 15.13 and pc90 4.15 was predicted. The tool also predicted the overall coverage of both class I and class II epitopes as 97.45% with 56.16 average hit and 15.07 pc90. A graphical representation of the percent of individuals with the number of epitope hits/HLA combinations recognized with cumulative percent of population coverage is obtained as shown in Figure 3 and Table 1. Data of Figure 3 is also presented as a tabulated form in supplementary Table S3. Tabulated information on coverage of individual epitopes in the Indian population is provided in Table S4 which shows the genotype frequency of various alleles concerning predicted epitopes.

Tap binding predictions and crossreactivity analysis
We have attempted the study of TAP prediction using a dataset comprising of 3 class I and 1 class II 12-mer peptides (DS613) that have shown the highest population coverage along with high epitope conservancy and immunogenicity using TAPREG. The results are sorted based on increasing TAP affinity IC50(nm) values where one with the lowest IC50 values are considered to have the highest affinity to TAP proteins. All the selected epitopes were having decent IC50 values which show their appropriate affinity towards TAP. The results of selected 6 class I and 1 class II epitope having high epitope conservancy, immunogenicity score along with decent TAP affinity are summarized in Table 2. Cross-reaction analysis against the human proteome revealed that none of the six RBD epitopes tested has a human counterpart, implying that no cross-reactivity should occur in normal human cells.

Epitope 3-D structure prediction and refinement
After evaluating all of the epitope analysis results, three class I epitopes and one class II epitope which are non-toxic, nonallergenic, antigenic, interferon-gamma inducing with high immunogenicity, 100% epitope conservation within 10 different SARS-CoV-2 spike glycoprotein, and low IC50 tap affinity were chosen to predict their 3 D structure using ITASSER for further interaction analysis. These epitopes also have a high percentage of coverage among the Indian population. Three class I 9-mer epitopes were ASFSTFKCY, RFDNPVLPF, TRFASVYAW. The three-class I 12-mer epitopes selected were LLAGTITSGWTF (between 37 to 48 amino acid residues), VSQPFLMDLEGK (within 31 to 42 residues), SETKCTLKSFTV (within the 31 to 42 residues) whereas one class II epitope was SNVTWFHAIHVS. Epitope structural analysis was done using ERRAT for class I 12-mer epitopes showed that the epitope LLAGTITSGWTF showed a quality factor of 94.382, VSQPFLMDLEGK showed 80% and SETKCTLKSFTV showed 100%. For 9-mer epitopes, ASFSTFKCY showed 74%, RFDNPVLPF with 82% and TRFASVYAW with 80%. Whereas ProSA analysis showed that the epitope LLAGTITSGWTF has a Z-score of À6.79, VSQPFLMDLEGK has À0.64 and SETKCTLKSFTV has À0.41 (Figure 4). PROVE used for epitope analysis showed that the class I 12 mer epitope LLAGTITSGWTF showed Z-score RMS of 1.901 whereas VSQPFLMDLEGK showed 2.309 and SETKCTLKSFTV showed 2.097. For class I 9-mer epitopes, ASFSTFKCY showed Z score RMS of 1.051 whereas RFDNPVLPF showed 2.007 and TRFASVYAW showed 1.198.

Molecular docking analysis
Molecular docking analysis is a very prominent method to predict the non-covalent interactions between various types of biological molecules. According to molecular docking using AutoDock Vina (Trott & Olson, 2010), the epitope LLAGTITSGWTF with HLA-B Ã 57:01 (complex 1) has the lowest energy score of À8.4 kcal/mol, followed by epitopes    VSQPFLMDLEGK with HLA-A Ã 03:01 (complex 2) and, SETKCTLKSFTV with HLA-B Ã 49:01 (complex 3) which have energy scores of À7.6 and À7.4, respectively. The complex with the lowest energy score has the highest binding affinity between the epitopes and their HLA alleles. The docked complex was visualized using Pymol and the receptor-ligand interaction was analysed using Discovery Studio. The docking results and surface structure of 7 epitope-MHC complexes with their respective alleles are illustrated in Table 3 and Figure 5 respectively.

Molecular dynamic simulation studies on the interaction between MHC allele and corresponding predicted epitopes
The receptor Major Histocompatibility Complex Class proteins including HLA-C Ã 07:01, HLA-C Ã 04:01, HLA-A Ã 01:01 and HLA-A Ã 11:01 is docked with their corresponding predicted epitopes which are peptides.

RMSD analysis for the complexes
The root-mean-square deviation (RMSD) throughout the simulation can be used as a measure of the conformational stability of a structure or model during the simulation. The RMS deviations of all residues are illustrated in Figures 6(A) and 7(A) for 9-mer and 12-mer epitopes interactions respectively whereas the RMSD plot of only backbone atoms over the course of MD simulations is shown in Figures 6(B) and 7(B) for 9-mer and 12-mer epitopes respectively. A jump in the RMSD is observed within the first nanosecond in all cases, which is due to the relaxation of the starting model. Further from the RMSD plots for 9-mer ( Figure 6), it can be seen that RMS deviation of the complex-1; ASFSTFKCY-HLA-A Ã 11:01 and complex-2; ASFSTFKCY-HLA-A Ã 01:01 starts converging after around 70 ns. Before 70 ns, RMSD deviations are quite larger in complex-2 as compared to that of complex-1 which has RMSD of around 3.0 Angstrom. That of Complex RFDNPVLPF-HLA-C Ã 04:01 (complex-3) remains almost constant after 20 ns which shows that the simulations are quite converged in this case. That of Complex TRFASVYAW-HLA-C Ã 07:01 starts converging after around 80 ns. The value of RMSD for RFDNPVLPF-HLA-C Ã 04:01 (complex-3) is around 3.5 Angstrom which is the least among all which indicates that this complex RFDNPVLPF-HLA-C Ã 04:01 (complex-3) is most stable among all complexes.
In the case of 12-mer epitopes in Figure 7, among three complexes, complex 5 has the least deviations from the starting model whereas complex 7 has deviated to a large extent which describes the instability of the third complex. RMSD of backbone atoms is almost similar to RMSD of all atoms in all the complexes.

RMSF analysis for the complexes
The root-mean-square fluctuation (RMSF) measures the average deviation of a protein residue over time from a reference position. The RMSF (root-mean-square fluctuations) of all the complexes has shown in Figure 8A and B representing the 9mer and 12-mer epitope complex respectively. For 9-mer epitopes, the analysis of RMSF is depicted that all four complexes represent a similar type of atomic fluctuations and their analysis of four complexes shows that compared to other complexes ASFSTFKCY-HLA-A Ã 11:01 (complex-1) is  showing less fluctuations in their RMSF values. From the graph, it is clear that the RMSF value is found to be stable for RFDNPVLPF-HLA-C Ã 04:01 (complex-3) after around 25 residues whereas for other residues it keeps on fluctuating for every residue. The 12-mer epitope interaction studies are shown in Figure 8B as complex-5,6 and 7. Due to the instability of complex 7 (shown by RMSD in Figure 7), its residues are fluctuating largely as compared to complexes 1 and 2. Also, only the residues SER276 and GLU277 of complex 7 have the largest fluctuation of 11 and 10.6 Å respectively.

Hydrogen bond analysis of complexes
The hydrogen bond analysis of four complexes is done by plotting a graph between hydrogen bonds v/s no. of frames illustrated in Figure 9. From the graph, it is clear that both complex 1 and complex 2 are showing approximately 10 hydrogen bonds. But it also has been observed that these complexes are not stable with their number of hydrogen bonds where the 10 hydrogen bonds are shown only at one peak and is highly fluctuating throughout the frames. But in case of complex 3, shows a constant number of approximately 7 hydrogen bonds throughout the frames. Similar kinds of fluctuations are also observed for complex 4 as in complex 1 and 2. We have further calculated the hydrogen bonds between the residues of the peptide chain and protein system in 12mer epitopes ( Figure 9E-G). In complex 5 and 7, 4 to 5 number of hydrogen bonds are present throughout the simulation whereas, in complex 6, the number of hydrogen bonds is comparatively less.

MM-PB/GB-SA calculation
Molecular dynamic simulations of each of the four proteins complexed with their respective peptide chains were employed separately for the study of their stability and binding affinity. MM-PB/GB-SA (Molecular Mechanics Poisson Boltzmann or Generalized Born Surface Area) calculations are used to calculate the free energy change between two states (typically bound and free state of a receptor and ligand). This method used MD simulations of free ligand, free protein and their complex as a basis for calculating their free energies. The total binding free energy is represented in terms of gas-phase contribution, solvation energy and entropic contributions. The total energy of the solute E GAS includes the electrostatic energy, van der Waals energy derived from a Lennard-Jones potential, and internal energy. For MMPBSA calculations, we select those ranges of the frame which are deviating least. Here, we calculated the binding free energies of all seven complexes (Table 4). In each complex, there is a small peptide chain of nine and twelve amino acids which differ from complex to complex. The total free energy of binding for all the four complexes are given in Table 4.
From the above binding free energies values, it can be seen that complex-1 has the least favourable while complex-4 has the most favourable value of free energies of binding. The binding energy calculation for all four complexes suggested a favourable contribution of electrostatic energy E EEL . The evidence of the electrostatic interactions can be evaluated from the residues which are interacting majorly with the protein molecule in complex-1, the major contributions towards total free energy of binding is due to the polar residues ARG6, ASP29, ASP30, ASP102, THR233, ARG234 and a non-polar residue LEU179. It can be inferred that these are the key residues of the protein. In complex-2, the major contributing residues in the binding free energies are PHE8, ASP29, ASP30, ARG35, THR233, ARG234 and GLN96. In the case of complex-3, TYR27, ARG35, GLU212 and GLU232 are key residues interacting with the peptide chain complexed inside the protein. While in the complex-4, polar residues of the protein are majorly interacting with amino acids of the peptide chain. These are ASP30, GLU32, ARG35, ARG48 and GLU212 (Figure 10).   From the observed binding free energies values of 12-mer epitope complexes, we can propose that complex-5 has the most favourable while complex-7 has the least favourable value of free energies of binding. The binding energy calculation for all three complexes suggested a favourable contribution of electrostatic energy E EEL . The evidence of the electrostatic interactions can be evaluated from the residues which are interacting majorly with the protein molecule.
In complex-5, the major contributions towards total free energy of binding are due to the polar residues GLU32, ASP30, ARG47, GLU52, ARG178, SER321 and non-polar residues PRO46, LEU179, GLY234, LEU334. It can be inferred that these are the key residues of the protein. In complex-6, the major contributing polar residues in the binding free energies are HID189, THR191, GLU230, ARG235, GLN243, LYS286, GLN288 and the non-polar residues include TRP205, VAL232, VAL289. In the case of complex-7, THR69, GLN72, THR73, GLU76, GLU152, GLN155 and LEU156 are key residues interacting with the peptide chain complexed inside the protein.

Discussion
In the previous few years, many emergent pathogenic diseases have been identified in which most of them have involved zoonotic or species-jumping infectious agents . Among the various emergent viral infections, the novel coronavirus (nCOV or SARS-CoV-2) is considered as the third CoV outbreak among humans (Rehman et al., 2020). The COVID-19, which emerged in Wuhan, China, at the end of 2019 has been recognized to cause respiratory, digestive, and systematic manifestations that harmfully disturb human health (Briguglio et al., 2020). This class of virus which affects type 3 pneumocytes and ciliated bronchial epithelial cells using ACE2 receptors are RNA virus that can be transmitted through airborne particles and drops from person to person (Contini et al., 2020). As the intensity of these diseases is increasing, vaccine development within a short period is very critical to protect people from the expanding viral attacks (Torreele & Amon, 2021). Vaccination is the administration of agent-specific, yet harmless, antigenic particles. Vaccinated individuals can induce defensive immunity against the respective infectious agent when administered. But it would take many years for the progression and production of an effective vaccine and they can be costly too (Attia et al., 2021). Therefore, designing strategies to minimize the cost and time for the development of vaccines become important (Ghaebi et al., 2020). In that scenario, various Bioinformatics approaches can be very beneficial to design new-generation safe vaccine within a short period (Bahrami et al., 2019;Jose et al., 2022). The emergence of technologies like next-generation sequencing, progressive genomics and proteomics, have brought a great transformation in computational immunology (Gauthier et al., 2019). However, with the advancement of the new field in Bioinformatics known as Immunoinformatics which aims in developing the vaccine or vaccine candidates through understanding the immune response of the human body against an organism within a short time . Immunoinformatics is a branch whose core objective is to translate extensive immunological data using computational and mathematical methods, to organize these data to acquire immunologically meaningful elucidations (Ahammad & Lira, 2020). This field uses statistical and machine learning system based tools and can be used for studying and modelling molecular interactions In the progress of CoV vaccines, numerous approaches are implemented and most of these approaches target the surface protein named spike (S) glycoprotein or S protein as it is the chief inducer of counteracting antibodies (Inchingolo et al., 2021). Spike protein-based approaches of CoV vaccine development either make use of full-length spike protein or S1 receptor-binding domain (RBD) (Ita, 2021). The spike protein molecule comprises two subunits namely S1 and S2. The RBD domain present in the S1 subunit interacts with its host cell receptor which is the angiotensin-converting enzyme 2 (ACE2) (Dehury et al., 2021). Hence, spike protein-based vaccines are considered as vital as they induce antibodies that block not only viral-receptor interaction but also virus genome uncoating . In this proposed immunoinformatic work, we have attempted to recognize class I epitopes from SARS-CoV-2 spike protein with RBD (receptor binding domain). After retrieval of the sequence from NCBI with accession id 6VSB_A, which is the chain A of spike RBD protein, it was exposed to many Insilco approaches to make sure that the epitopes are predicted with higher accuracy. It is vital to identify whether the designated protein sequence is an antigen or not as the antigenic or foreign substance can only induce an immune response in the host body. The given protein sequence was predicted to be a probable antigen with a score of 0.4152. The prediction of physio-chemical parameters of the primary protein sequence help to identify the stability of proteins thereby pointing to the stability of epitopes. It also ensures its thermostability, hydrophilicity and theoretical PI. The molecular weight of the protein sequence was computed to be 142274.61, which indicate it to be a good immunogenic sequence with an instability index of 31.58 which is appropriate for a stable protein. Three functional domains namely Corona_S2 superfamily (Coronavirus S2 glycoprotein) of length 601, SARS-CoV-2_Spike_S1_RBD (receptor-binding domain of the S1 subunit of severe acute respiratory syndrome coronavirus 2 Spike (S) protein) of length: 223, Spike-COV-like_S1_NTD (N-terminal domain of the S1 subunit of the spike (S) protein of length 280 were identified from super secondary prediction. Identification of domains is significant to analyse the functionality as well as to recognize whether the studied protein sequence will be an effective antigen for predicting epitopes. SARS-CoV-2_Spike_S1_RBD is vital as it interacts with the ACE2 receptor in the host. The loop (amino acids 424-494 of the RBD), that comes in whole interaction with the receptor ACE2 is named as a receptorbinding motif (RBM) (Trott & Olson, 2010). The role of spike protein especially RBD in receptor binding and membrane fusion show that vaccines targeting spike protein could induce antibodies and T-cell responses to prevent virus binding and fusion or neutralize virus infection. Epitopes are primarily made by a multi-subunit protease called proteasome which carries out the majority of intracellular protein degradation. The most accurate C-terminal of CTL epitopes and the N-terminal with a probable extension can be generated by the proteosome. If the epitopes are destroyed by the proteasome's, CTL responses may diminish (Kasahara & Flajnik, 2019). Therefore, to identify possible immunogenic regions in the proteomes of pathogenic microorganisms, the estimation of the proteasome cleavage sites is significant. The spike protein sequence with RBD was predicted to contain 404 cleavage sites within the 1288 amino acid length. So, this specifies that the protein sequence has enough cleavage sites which indicates the presence of decent immunogenic regions that are desirable to generate the generous number of epitopes. In some other infectious diseases like filariasis in human, human Toll-like receptors (TLRs) has been proposed as a good therapeutic target where the extracellular domain of TLR4 interacts with cystatin . Further chimeric vaccine prediction and development against SARS-CoV-2 can also be done using immunoinformatic tools as utilized by  in case of Staphylococcus aureus .
Identification of HLA alleles prominent in the Indian population is important to predict the class I T-cell epitopes. Human leukocyte antigen (HLA) loci seemed to be a principal genetic candidate for infectious disease vulnerability (Debnath et al., 2020). HLAs are categorized as the major histocompatibility complexes (MHCs) due to their significant role in permitting the immune system to distinguish between self and non-self-antigens and are of two types namely class I and class II (Al Naqbi et al., 2021). The three chief class I major histocompatibility complex (MHC) genes are HLA-A, HLA-B, and HLA-C genes. HLA alleles were selected from AFND (Allele Frequency Net Database) based on their allele frequency and the percentage of individuals that have the allele. MHC class I supertypes A1, A2, A3, A11, A32, A68, B18, B35, B40, B44, B49, B50, B51, B52, B55, B57, B5, C1, C3, C4, C6, C7 were used to predict class I epitopes, while MHC class II super type DRB1 was used to predict class II T cell epitopes. Within the Indian population, all of these MHC supertypes had high expression. The server predicts HLA alleles by covering all regions of India and it also mentions the name of the population in which the alleles are present in a higher percentage. Epitopes or antigenic determinants are defined as short amino acid sequences of a protein that can induce a more direct and potent immune response, than the response induced by the whole cognate protein (Berti & Adamo, 2018). Class I and Class II epitopes predicted against chosen MHC supertypes were sorted based on their peptide scores and the best ones were subjected to population coverage analysis.
Population coverage calculates the fraction of individuals who respond to a given set of epitopes within the given MHC restrictions. For a valuable epitope-based vaccine prediction, population coverage analysis is very significant. It illustrates the coverage of all as well as individual epitopes among the population (Supplementary Table S3). As we are interested in the Indian population, 179 12-mer and 54 9mer epitopes were given as input within class I MHC restrictions along with 144 12-mer epitopes within MHC II restrictions. The combined results show a total coverage of 97.45% with 56.16 average hits and 15.07 pc90. So, it indicates that among the Indian population, individually class I epitopes have population coverage 94.7% with 41.03 average and 3.59 pc90 and class II have 51.77% with an average hit of 15.13 and pc90 4.15. The use of conserved epitopes would be projected to offer wider protection across multiple strains, or even species in an epitope-based vaccine prediction, than epitopes derived from highly variable genome regions (Yarmarkovich et al., 2020). The epitope conservancy analysis of the given peptide sequences showed that all the selected epitopes are 100% conserved within the protein sequence. Similarly, immunogenicity prediction of the epitopes is also vital due to the ability of the epitopes to induce an immune response in the host. The immune response to the foreign protein leads to the production of neutralizing antibodies. This immune response is facilitated by T cells and happens as a rapid reaction after its first encounter with the antigen (Sant et al., 2018). The prediction of TAP binding affinity of peptides can contribute to subunit vaccine development. The transporter associated with antigen processing (TAP) functions as a transporter of the proteolyzed antigenic or self-altered protein's peptide fragments to the endoplasmic reticulum where the binding of these peptides with the major histocompatibility complex (MHC) occurs (Geng et al., 2018). TAP affinity is measured in terms of IC50 score where the lower the IC50 score more will be the binding affinity of epitopes with the TAP proteins. Allergy prediction of the epitopes is vital to detect whether the epitopes can trigger IgE antibodies in the human host (Ehlers et al., 2019). From allergy prediction, it was found that all the 3 selected epitopes are non-allergen and appropriate for developing T-cell based epitope vaccines. Similarly, toxicity prediction also showed that all the given epitopes are non-toxic to the host. So, by comparing the results of all the analyses, the best 6 epitopes from class I and 1 epitope from class II were selected.
The 3 selected class I 12-mer epitopes LLAGTITSGWTF, VSQPFLMDLEGK, SETKCTLKSFTV and 9-mer epitopes ASFSTFKCY, RFDNPVLPF and TRFASVYAW were given for their 3 D structure prediction. 3-D structure prediction of epitopes as it is necessary to perform molecular docking and other structural analyses. Molecular docking analysis of the epitopes with their respective HLA alleles shows that all the epitopes are interacting with their allele with a decent binding affinity value. The highest binding score among 12-mer class I epitopes was shown by the epitope LLAGTITSGWTF (-8.4) followed by VSQPFLMDLEGK (-7.6) and SETKCTLKSFTV (-7.4).
For class I 9-mer epitopes, the highest binding energy was À7.5 shown by RFDNPVLPF followed by TRFASVYAW (-7.3) and ASFSTFKCY (-7.0 with both of its HLA alleles). The docked complex was visualized and the interactions were analysed to observe the amino acids interacting with the epitope. Within the 6 identified class I epitopes, one 9-mer epitope ASFSTFKCY (within 372-380) and one 12-mer epitope LLAGTITSGWTF (within 877-888) are seen within the cleavage sites for proteasomal cleavage. Two class I epitopes ASFSTFKCY and TRFASVYAW lie with the Receptor binding domain of spike protein whereas three epitopes RFDNPVLPF, VSQPFLMDLEGK and SETKCTLKSFTV lie within the N terminal domain of the S1 subunit of spike protein.
The molecular simulation analysis also showed that all the given three complexes are interacting with the protein exhibiting decent binding energies and hydrogen bonds. However, by combining all the aspects of molecular dynamics simulations like RMSD, RMSF, hydrogen bonds and MM/ PBSA calculations it is clear that interaction of epitope LLAGTITSGWTF with its corresponding HLA allele is the best which is followed by VSQPFLMDLEGK for 12-mer class I epitopes. In the case of 9-mer class I epitopes, the best one observed was TRFASVYAW followed by RFDNPVLPF. But it also has been observed that some of these epitopes are having average tap affinity and immunogenic scores as compared to others. So, further in-vitro studies are needed to showcase the full efficiency of epitopes. But from the in-silico studies, it can be interpreted that the epitopes predicted from spike protein of SARS-CoV-2 concerning Indian populations are utmost correct. But to generate a multi epitopebased vaccine all the epitopes can be used. Recently, such concepts of multi-epitope vaccine candidate prediction including wet lab experimentation have been done by  concerning pneumonia infection .

Conclusion
The SARS-CoV-2 virus has become a key health concern in many countries including India and has led to the death of many people worldwide. Although the development of the vaccine is in progress and some of them are in application but in this difficult situation, treatment only depend on antiretroviral therapy (Busquet et al., 2020). Therefore, this study might provide an add-on to the path of multi-epitope-based vaccine development against SARS-CoV-2. Among different proteins of SARS-CoV-2, we have chosen the RBD domain of spike glycoprotein as it is crucial for viral attachment with host receptor in FASTA format. The conserved sequence of spike glycoprotein with RBD domain of the infectious SARS-CoV-2 was evaluated using the advanced immunoinformatic methods.
Immunoinformatics can effectively make use of computational techniques to bring out an effective and useful advantage in the exploration of novel vaccines. It is believed to contribute to vaccine design as computational chemistry contributes to drug design. Immunoinformatic-based vaccine design can attain worthy and cost-efficient advancement in vaccines or vaccine components design. It has been reported that the CTL epitope-based vaccine, which has an elevated realism in vaccine designing can complement the convalescent plasma therapy and can produce multiple serotype-specific immune responses. The present study concludes that the high potential epitope vaccine construct has the potential to obtain sturdy immune responses. The designed epitopes were assessed over several immunological parameters. The predicted epitopes exhibited high antigenicity, immunogenicity, C-terminal proteasomal affinity and strong TAP affinity. They were also identified as non-allergen, highly antigenic along stable physio-chemical characterization. At last, molecular docking and dynamics study was performed to check the binding affinity of predicted epitopes with their respective HLA alleles. Here we have considered the Indian population which very vastly diverged and constitutes people with several ethnicities, cultures, traditions, lifestyles and belongs to different geographical locations. Due to this, the probability of the vaccine effectiveness getting compromised at certain conditions. Current immunoinformatic analysis and MD simulation approaches have pointed out the best 6 MHC epitopes within the spike glycoprotein RBD of SARS-CoV-2 that can be used for designing a multi-epitope vaccine for the Indian population. The population coverage analysis of given epitopes within the Indian population has helped to identify the best 5 epitopes that respond more with the population and might provide insight for the multi-epitope-based vaccine as the disease has caused a major impact on the country. Designing a vaccine specifically for a population is helpful as the different populations may have a different response to the epitopes and as a result, a general vaccine may not give the expected results. Yet, these immunoinformatic analyses need several in vitro and in vivo studies before formulating the vaccine to resist COVID-19. In-silico studies of the predicted epitopes portrayed decent elicitation of the anti-corona immune response. This study might provide a way to a potential CTL based vaccine construct against SARS-CoV-2 through detailed theoretical analysis.