Computational designing of a novel subunit vaccine for human cytomegalovirus by employing the immunoinformatics framework

Abstract Human cytomegalovirus (HCMV) is a widespread virus that can cause serious and irreversible neurological damage in newborns and even death in children who do not have the access to much-needed medications. While some vaccines and drugs are found to be effective against HCMV, their extended use has given rise to dose-limiting toxicities and the development of drug-resistant mutants among patients. Despite half a century’s worth of research, the lack of a licensed HCMV vaccine heightens the need to develop newer antiviral therapies and vaccine candidates with improved effectiveness and reduced side effects. In this study, the immunoinformatics approach was utilized to design a potential polyvalent epitope-based vaccine effective against the four virulent strains of HCMV. The vaccine was constructed using seven CD8+ cytotoxic T lymphocytes epitopes, nine CD4+ helper T lymphocyte epitopes, and twelve linear B-cell lymphocyte epitopes that were predicted to be antigenic, non-allergenic, non-toxic, fully conserved, and non-human homologous. Subsequently, molecular docking study, protein-protein interaction analysis, molecular dynamics simulation (including the root mean square fluctuation (RMSF) and root mean square deviation (RMSD)), and immune simulation study rendered promising results assuring the vaccine to be stable, safe, and effective. Finally, in silico cloning was conducted to develop an efficient mass production strategy of the vaccine. However, further in vitro and in vivo research studies on the proposed vaccine are required to confirm its safety and efficacy. Communicated by Ramaswamy H. Sarma


Introduction
Human cytomegalovirus (HCMV) is a life-threatening, opportunistic human pathogen belonging to the family, Herpesviridae. Studies have found that it has already infected more than 50% of the human population, causing severe and permanent neurological injury in newborns. Although HCMV infection is usually asymptomatic in most individuals, it is associated with significant morbidity (Nogalski et al., 2014). HCMV can also cause disease in solid organ transplant and hematopoietic stem cell transplant recipients (Anderholm et al., 2016). Infection of HCMV can have more serious symptoms affecting the eyes, lungs, liver, esophagus, stomach, and intestines in the case of people with weakened immune system or patients with acquired immunodeficiency syndrome. Infants born with HCMV infection can have brain, liver, spleen, lung, and growth complications. The most common long-term health problem in infants born with congenital HCMV infection is hearing loss. HCMV has the largest genome of any known human virus, with doublestranded DNA of 236 kbp in size (Dolan et al., 2004). The genome is linear, enclosed by an icosahedral protein capsid. A lipid bilayer envelope consisting of two glycoprotein complexes encompasses the capsid of the virus (Vadlapudi et al., 2012). In the United States alone, nearly one in three children are already infected with HCMV by the age of five. Over half of adults have been found to be infected with HCMV by the age of 40. Once HCMV infects a person's body, it stays there for life and has the capability to be reactivated over time (https://www.cdc.gov/ cmv/overview.html). Moreover, sensorineural hearing loss (SNHL) is found in 35% of children with HCMV infection, cognitive deficits have been found in 66% of children with HCMV infection, and death occurs in 4% of the infected children (Dobbie, 2017). It can be inferred that HCMV is one of the most frequently encountered viruses which is the most common cause of congenital viral infection, brain damage, hearing loss in children, blindness, and responsible for a wide range of neurodevelopmental disabilities (Schleiss, 2010;Vadlapudi et al., 2012).
Developing a vaccine to prevent HCMV infection or its diseases has been regarded as one of the highest priorities. Many researchers and institutes have been working on developing vaccines effective against HCMV. Works on such vaccines began in the 1970s, soon after the toll of the virus on infants in utero and transplant recipients became apparent. However, even after 50 years of intensive study, no licensed HCMV vaccine is currently accessible in the global market (Schleiss, 2010;Zhong & Khanna, 2007;Plotkin & Boppana, 2019). AD169 was the first HCMV vaccine studied in humans (Arvin et al., 2007). While a 96% conversion rate was achieved in the 10,000-test group, only half of the volunteers had detectable HCMV antibody responses in the post-vaccination evaluation after eight years (Elek & Stern, 1974). GCV was the first antiviral drug approved to treat HCMV infections and another intravenous formulation, Cytovene-IVV R , was approved for HCMV retinitis in 1989. But those drugs caused hematologic abnormalities in patients (neutropenia, anemia, and thrombocytopenia) and long-term reproductive toxicity (Vadlapudi et al., 2012). Some recent drugs such as ganciclovir, valganciclovir, cidofovir, and foscarnet have shown to be effective against HCMV, although prolonged therapy with these approved drugs has been found to be associated with dose-limiting toxicities. Furthermore, drug-resistant mutants have also been observed, particularly in patients with acquired immunodeficiency syndrome (Vadlapudi et al., 2012). As a result, research is underway to develop newer vaccine candidates with improved antiviral effectiveness and reduced side effects; still, no effective HCMV vaccine seems to be on the verge of approval (Schleiss, 2008).
In this study, an effective, suitable polyvalent vaccine against the four different virulent strains (i.e. AD169, Merlin, Toledo, and Towne) of HCMV was designed exploiting the methods of immunoinformatics. Immunoinformatics identifies the novel antigens of a pathogen or virus by analyzing and examining its genome and facilitates the vaccine designing process with the aid of various tools from in silico biology and bioinformatics fields (Chong & Khan, 2019;Rappuoli, 2000). Previously, polyvalent vaccines have been designed insilico against SARS-CoV-2, Type-1 and Type-2 herpes simplex virus, human papillomavirus, and dengue Anderholm et al., 2016;Hasan et al., 2020;Krishnan G et al., 2021;Sarkar et al., 2020;Yazdani et al., 2020). The designed HCMV vaccine of this study was assumed to produce a functional immune response against the glycoprotein B (gB) and H (gH) of all the four selected strains of HCMV. To construct the vaccine, the epitopes with high antigenicity, non-allergenicity, non-toxicity, and non-homology to the human proteome were selected for vaccine construction. The selected epitopes were tested for their high conservancy as well, which ensured the activity of the designed vaccine against all the selected viral strains. Glycoprotein gB plays a role in host cell entry, cell-to-cell virus transmission, and fusion of infected cells. It is also involved in the initial attachment of the virus via binding to heparan sulfate together with the gM/gN complex (Halary et al., 2002). Glycoprotein gB is being utilized in recombinant vaccines and DNA vaccines for HCMV infections (Anderholm et al., 2016). Glycoprotein gH is required for the fusion of viral and plasma membranes leading to virus entry into the host cell. Therefore, this study targets these two proteins to design the vaccine that are supposed to block the viral entry and thus hamper the viral life cycle. A flowchart of the methods, and a list of the methods used in this vaccine design analysis, are shown in Figure 1.

Identification of the strains and retrieval of the protein sequences
The four strains of HCMV (i.e. AD169, Merlin, Toledo, and Towne) were selected for this experiment from the National Center for Biotechnology Information or NCBI (https://www. ncbi.nlm.nih.gov/) database. Then the target proteins (gB and gH) were retrieved from the UniProt (https://www.uniprot.org/) database. UniProt is a publicly available database that organizes biological knowledge of proteins from literature and provides comprehensive, high-quality resource of protein sequences, and functional information. Users can identify the assembly of a species and the provenance of sequences owing to the presence of proteome ('UniProt: A Hub for Protein Information.,' 2015). The inclusion of annotation scores in UniProt helps users to perform comparative protein sequence analysis.

Analyzing the physiochemical properties of the proteins and antigenicity prediction
An online antigenicity predictionserver, VaxiJen v2.0 (http:// www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html), was used to determine the antigenicity of the protein sequences. This server predicts whether peptides are antigenic or not by analyzing solely the physiochemical characteristics of the proteins without requiring their sequence alignments (Doytchinova & Flower, 2007). The 0.4 prediction accuracy threshold was chosen for this approach as it has been shown to improve prediction accuracy (Doytchinova & Flower, 2007). This server's algorithm, which is based on the auto cross covariance (ACC) transformation mechanism, determines the antigenicity of query peptides or proteins. After that, the online tool from ExPASy, ProtParam (https://web. expasy.org/protparam/) was used to analyze the physicochemical properties of the retrieved proteins, keeping all the parameters default (Gasteiger et al., 2005). The physicochemical properties of any protein sequence entered as a Swiss-Prot/TrEMBL accession number or ID, or as a user-entered raw protein sequence can be computed by ProtParam (Gasteiger et al., 2005).

T-cell and B-cell epitope prediction
For determining the epitopes of the two selected antigenic proteins i.e. gB and gH of the strain AD169 (considered as the model strain), the Immune Epitope Database or IEDB (https://www.iedb.org/) server was used, with all parameters kept at their default values (Vita et al., 2019).
The selection of the MHC class-I restricted CD8þ cytotoxic T-lymphocyte (CTL) epitopes was made with the NetMHCpan EL 4.0 prediction method (http://www.cbs.dtu.dk/services/ NetMHCpan/). NetMHCpan-4.0 provides improved predictive performance for the identification of naturally processed ligands, cancer neoantigens, and T-cell epitopes by integrating information from two data sources: mass spectrometry ligand data and peptide binding affinity data (Jurtz et al., 2017). Along with this, the IEDB recommended 2.22 prediction method (http://tools.iedb.org/mhcii/) was used to determine the CD4þ helper T lymphocyte (HTL) epitopes (15-mer epitopes) that were MHC class-II restricted (Vita et al., 2019;Wang et al., 2010). The IEDB server is a publicly accessible database that allows easy identification of experimental data on antibody and T-cell epitopes experimented in humans, non-human primates, and other animal species in terms of infectious disease, allergy, auto-immunity and transplantation (Vita et al., 2019;Wang et al., 2010). Through the BepiPred linear epitope prediction method (http://tools.iedb.org/bcell/ ), the linear B-cell lymphocyte epitopes (LBL) were predicted keeping all the parameters default . The BepiPred method employs a combination technique that combines the predictions of a hidden Markov model with a propensity scale, allowing it to outperform all other methods evaluated on the validation data set and to provide the greatest prediction accuracy (Larsen et al., 2006). To design the vaccine, only LBLs were used; however, the conformational B-cell epitopes were later predicted from the tertiary structure of the designed vaccine. The presence of the effective conformational B-cell epitopes ensures stronger and long-lasting immunity.

Antigenicity, allergenicity, toxicity, and transmembrane topology prediction
Physiochemical properties of the selected epitopes include the antigenicity, allergenicity, toxicity, and transmembrane topology of the epitopes. Only highly antigenic, non-allergenic, non-toxic, completely conserved, and non-homologous to the host proteome epitopes were considered to be used in the vaccine construction method. The epitopes were run through various physicochemical property prediction tests, including firstly the antigenicity prediction test using VaxiJen v2.0 (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.htm) server, keeping the prediction accuracy threshold at 0.4 (Doytchinova & Flower, 2007). In the next step, the allergenicity test was conducted using AllerTOP v2.0 (https://www.ddg-pharmfac.net/AllerTOP/) as well as AllergenFP v1.0 (http://ddg-pharmfac.net/AllergenFP/), of which the generated results by AllerTOP v2.0 were given priority by virtue of the server having about 0.8% more accuracy than AllergenFP (88.7% vs. 87.9%) (Dimitrov et al., 2013, Dimitrov et al., 2014. AllerTop v.2 uses k-nearest neighbor algorithm (kNN) with 85.3% accuracy at 5-fold, allowing it to provide a realistic, robust, and highly complementary method for allergy prediction (Dimitrov et al., 2014). While both the servers are based on a training set of 2427 known allergens and 2427 known non-allergens from various species and use ACC transformation method to predict, the AllerTop 2.0 server uses kNN, k ¼ 1 to classify the query proteins and the AllergenFP server uses the Tanimoto coefficient to decide if the query proteins are allergenic or non-allergenic. Afterward, the toxicity of the selected epitopes was predicted using the support vector machine (SVM) method through the ToxinPred server (http://crdd.osdd.net/raghava/toxinpred/) whose parameters were maintained at the default value. This server is built based on a positively trained data-set of 1805 sequences and two negatively trained data sets: one with 3593 negative SwissProt sequences and the other with 12,541 negative TrEMBL sequences. Reliable differentiation of the toxic and non-toxic epitopes measured by the Matthew's correlation coefficient (MCC) is expected as the SVM method is quite an acclaimed machine learning technique for toxicity prediction . When the performance of ToxinPred was assessed on separate datasets, the server was found to have an accuracy of about 90% . Finally, the transmembrane topology of all the epitopes were predicted using the TMHMM v2.0 server (https://services.healthtech.dtu.dk/service.php?TMHMM-2.0) to determine whether the epitopes are exposed outside or inside. Based on the Hidden Markov Model (HMM), the TMHMM v2.0 server predicts the transmembrane topology profile of the protein helices with acceptable accuracy (Krogh et al., 2001).

Determination of the cytokine inducing ability of the epitopes
The cytokine-inducing ability of the HTL epitopes is required to be tested as cytokines have a role in inducing cytotoxic T-cells, B-cells, and macrophages, among other immune cells (Luckheeram et al., 2012). The helper T-cells produce various cytokines, such as IFN-gamma, IL-4, and IL-10, and the ability of the selected HTL epitopes to induce these cytokines are determined by a bunch of highly accurate and recommended servers, i.e. IFNepitope (http://crdd.osdd.net/raghava/ifnepitope/) for IFNgamma, IL4Pred (http://crdd.osdd.net/raghava/il4pred/) for IL-4, and IL10Pred (http://crdd.osdd.net/raghava/IL-10pred/) for IL-10 (Dhanda et al., 2013a;Dhanda et al., 2013b;Nagpal et al., 2017). Two different methods were used in these predictions: the Hybrid (Motif þ SVM) prediction for the INF-gamma inducing capability prediction and the SVM prediction method in the case of the IL4pred and IL10pred predictions. All predictions were conducted with threshold parameters set to default values.

Conservancy and human homology determination of the epitopes
Selecting the epitopes that show high conservancy was one of the objectives of this study since completely conserved epitopes consolidate the wide-ranged activity of any polyvalent vaccine. To perform the conservancy analysis of the epitopes, the selected protein sequences of gB and gH were run through multiple sequence alignment in the CLC Drug Discovery Workbench software version 3.0 (https://digitalinsights.qiagen.com/). Furthermore, the comprehensive effectiveness of the vaccine constructed with conserved epitopes depends on the non-native nature of the vaccine to the cell. Therefore, the epitopes must be non-homologous to the human proteome and the homology test of the epitopes was performed by the protein BLAST (blastP) module of the BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) online tool. BLAST scans the target genome or proteome to find any similarity between the query sequence and target sequence. Usually, homology to human proteome has the potential to reduce the effectiveness of the epitopes to induce immune responses. Furthermore, homologous vaccine might induce inflammation and autoimmune disorders . For running the homology test with the online tool, all parameters were set to default values, and for comparison of homology, Homo sapiens (taxid:9606) was used. A cut-off value of 0.05 was set to screen peptides having no hits below the e-value threshold as non-homologous peptides (Mehla & Ramana, 2016).

Cluster analysis of the MHC alleles
To acknowledge the alleles of the MHC class-I and class-II molecules with indistinguishable specificities that exhibit the correlation of the clusters of the alleles phylogenetically, cluster analysis of the analyzed MHC alleles from the IEDB server was done by utilizing MHCcluster 2.0 (http://www.cbs.dtu.dk/services/MHCcluster/) (Thomsen et al., 2013). During the experiment, 50,000 peptides and 100 bootstrap calculations were maintained, and all the HLA super-type representatives (MHC class-I) and HLA-DR representatives (MHC class-II) were chosen. Outcomes generated by the server are represented as MHC specificity tree and MHC specificity heat-map.

Vaccine construction
Based on the previously mentioned selection criteria such as antigenicity, non-allergenicity, non-toxicity, and so on, the most promising epitopes were selected to construct a polyvalent vaccine against HCMV, which mainly included CTL, HTL, and LBL epitopes. The epitopes were conjugated using an adjuvant, PADRE sequence, and different linkers for constructing the vaccine. Adjuvant, a major component for vaccine construction, is responsible for intensifying antigenicity, immunogenicity, stability, and longevity of the vaccine (Lee & Nguyen, 2015;Meza et al., 2017). The selected adjuvant sequence, human-beta-defensin-3, prompts the stimulation of the toll-like receptors (TLRs): 1, 2, and 4 after its administration inside the human body (Solanki & Tiwari, 2018). After the adjuvants are joined to the epitopes by the EAAAK linkers, the PADRE sequence was also associated with the adjuvant and the epitopes. By boosting the capacity of the CTL epitopes, the PADRE sequence improves the immunogenicity of the vaccine (Wu et al., 2010;Pandey et al., 2016). Subsequently, the AAY, GPGPG, and KK linkers were utilized for vaccine construction, where the epitopes CTL, HTL, and LBL were linked in a continuous sequence (Solanki & Tiwari, 2018;Ullah et al., 2020). For a bifunctional fusion protein, the EAAAK linkers add a successful separation of the protein domains (Arai et al., 2001). Since the AAY linker conjugates the epitopes with high specificity, it is often used in in silico vaccine design studies. The GPGPG linkers has the ability to block junctional epitope generation promote immune processing and presentation (Saadi et al., 2017). Additionally, the independent immunological activities of the epitopes of a vaccine are maintained by the KK linkers, also known as the bi-lysin linkers (Gu et al., 2017). Hence, for the vaccine construction, these linkers were favored to conjugate the epitopes. The schematic representation of the designed vaccine is shown in Figure 2 as a diagram.

Antigenicity, allergenicity, and physicochemical property analyses
It is an important factor for the vaccine to be highly antigenic and at the same time to be safe and effective. The online tool VaxiJen v2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.htm) was used again to determine the antigenicity of the constructed vaccine (Doytchinova & Flower, 2007). Simultaneously, the AlgPred (http://crdd.osdd. net/raghava/algpred/) and AllerTop v2.0 (https://www.ddgpharmfac.net/AllerTOP/) servers were employed to run the allergenicity of the vaccine construct. AlgPred (http://crdd. osdd.net/raghava/algpred/) predicts the allergenicity of a protein based on the similarity of the known epitope of any of the known regions of a protein (Saha & Raghava, 2006;Dimitrov et al., 2013Dimitrov et al., , 2014. In the AlgPred server, the Multiple Expression motifs for Motif Elicitation (MEME)/Motif Alignment & Search Tool (MAST) prediction approach (MEME/MAST) was used to determine the vaccine's allergenicity. Afterward, another online tool, ProtParam (https://web. expasy.org/protparam/), regulated multiple other physicochemical properties of the vaccines, such as pI, half-life, GRAVY value, etc. (Gasteiger et al., 2005). The Protein-Sol server (https://protein-sol.manchester.ac.uk/) was used to predict the solubility of the vaccine construct, in addition to physicochemical property analysis. The server can accurately assess the solubility of a query protein and determines the results using a fast sequence-based method (Hebditch et al., 2017). All of the servers' parameters were left at their default values during the solubility analysis. 2.10. Secondary structure prediction of the vaccine construct Following the vaccine's physicochemical property analysis, it was subjected to secondary structure estimation using the online server, PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) (PSIPRED 4.0 prediction method). The PSIPRED tool can generate the secondary structures of the proteins as well as make a prediction of the transmembrane topology, fold and domain recognition, and transmembrane helix (Jones, 1999;Buchan & Jones, 2019). PSIPRED 4.0 employs a deep neural network architecture to carry out predictions and has an 84.2% Q3 secondary structure prediction accuracy (Buchan & Jones, 2019). Keeping all values of this server default, the secondary structure of the vaccine construct was determined. After that, several other online resources, such as GOR IV (https://npsa-prabi.ibcp.fr/NPSA/npsa_gor4.html), SOPMA (https://npsa-prabi.ibcp.fr/NPSA/npsa_sopma.html), and SIMPA96 (https://npsa-prabi.ibcp.fr/NPSA/npsa_simpa96.html), were used to run the prediction with all parameters set to default (Jones, 1999;Buchan & Jones, 2019). Furthermore, to determine the solvent accessibility (ACC) and disorder areas, the RaptorX Property web server (http://raptorx.uchicago.edu/ StructurePropertyPred/predict/) (DISO) was used. RaptorX predicts secondary structure (SS), solvent accessibility (ACC), and disorder regions (DISO) concurrently using an emerging machine learning algorithm known as DeepCNF (Deep Convolutional Neural Fields) (Wang et al., 2016). RaptorX is comparatively superior to other servers and can obtain up to 66% Q3 accuracy for 3-state solvent accessibility and 0.89 area under the receiver operating characteristic (ROC) the curve for disorder region prediction (Wang et al., 2016).

Protein 3D structure prediction, refinement, and validation
We generated the tertiary or 3D structure of the constructed vaccine with RaptorX (http://raptorx.uchicago.edu/) server. The server predicts the tertiary structure of the proteins using the widely used template-based method Wang et al., 2016). Following the prediction of the 3D structure, the refinement process is normally conducted for converting the structures with low resolution to native, high resolution containing structure. The GalaxyRefine module of the GalaxyWEB server (http://galaxy.seoklab.org/) was used for the protein structure refinement process (Ko et al., 2012;Nugent et al., 2014). GalaxyWEB is a web server for protein structure prediction and refinement that offers precise refining by specifying the portions of the protein that need to be refined (Ko et al., 2012). The vaccine construct, thereafter, was validated using the PROCHECK (https://servicesn.mbi. ucla.edu/PROCHECK/) server to generate the Ramachandran plot for analysis (Morris et al., 1992;Laskowski et al., 2006). PROCHECK generates stereochemical parameter values for a given protein model and compares them to 'ideal' values taken from high-resolution protein structures in the Protein Data Bank (PDB). It can assess the quality of any protein model whether produced by crystallography or NMR or constructed by homology modelling (Laskowski et al., 2006).
Thereafter, the z-score, which designates the quality of the query protein structure, was generated using the ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php) server. A query protein with a z-score that falls within the range of the z-scores of all the experimentally defined protein chains in the PDB database, is considered to have a higher quality (Wiederstein & Sippl, 2007).

Vaccine protein disulfide engineering
There are a few sites within a protein structure that are more prone to undergo a disulfide bond formation. For predicting these sites and design the disulfide bonds within the selected vaccine proteins, the Disulfide by Design 2 v12.2 (http://cptweb.cpt.wayne.edu/DbD2/) online tool was used. Disulfide by Design 2.0 is an independent online tool that outperforms the original program, Disulfide by Design in terms of functionality, visualization, and analysis capabilities. The software can assess the B-factor of regions involved in disulfide bonds as well as can identify possible disulfides that are unlikely to form and identify potential disulfide bonds that improve the protein's thermal stability (Craig & Dombkowski, 2013). The formation of proper disulfide bonds within the vaccine construct is crucial not only for proper protein folding but also for the molecule's stability. Disulfide bonds causes the conformational entropy to fall and free energy of the denatured state to rise, thus boosting its stability (Craig & Dombkowski, 2013). In some cases, disulfide bonds allows vaccine candidates to assume their native conformation that is essential for driving the proper antibody response (Lee et al., 2018). It is reported that almost 90% of native disulfide bonds have an energy value less than 2.2 kcal/mol (Craig & Dombkowski, 2013). Therefore, the selected amino acids for the test had bond energy of less than 2.2 kcal/mol (threshold bond energy for disulfide bonds) while the v3 angle and Ca-Cb-Sc angle were both kept at 10 (Peterson et al., 1999;Craig & Dombkowski, 2013).

Protein-protein docking study
Molecular docking studies of the vaccine construct were performed to analyze the effectiveness of the designed vaccine construct. The vaccine proteins were docked against multiple Toll-like Receptors (TLRs) in the first step as one of the major roles of TLRs is to mediate the immune responses of B-cells, T-cells, macrophages, dendritic cells, etc. The TLRs used to dock against the proteins are namely: TLR1 (PDB ID: 6NIH), TLR2 (PDB ID: 3A7C), TLR3 (PDB ID: 2A0Z), TLR4 (PDB ID: 4G8A), TLR8 (PDB ID: 3W3M), and TLR9 (PDB ID: 4QDH). The vaccine proteins underwent experimentation through four highly recommended online servers and the lowest energy value was taken as the criteria of a better vaccine construct.
In the first stage, the molecular docking was conducted with ClusPro 2.0 (https://cluspro.bu.edu/login.php) server, keeping all default values fixed. This widely used protein-protein docking tool has been the best automated server in the Critical Assessment of Predicted Interactions (CAPRI) challenge and its performance has been improved over time . This server provides a rank-wise arrangement of the clusters of docked complexes based on their lowest energy scores (Kozakov et al., 2013Vajda et al., 2017). For calculating the energy score, the server uses the following equation: Kozakov et al., 2013).
Here, E rep and E attr designate the repulsions and attraction energies due to van der Waals interactions while E elec denotes the electrostatic energy term. E DARS represents the pairwise structure-based potential constructed by the Decoys as the Reference State (DARS) approach.
The docking was then conducted again using the PatchDock server (https://bioinfo3d.cs.tau.ac.il/PatchDock/php.php) and the findings were optimized using the FireDock server (http://bioin-fo3d.cs.tau.ac.il/FireDock/php.php) (Andrusier et al., 2007). Using complex algorithm, the PatchDock server analyzes the Connolly dot surface representations and root mean square deviation (RMSD) clustering scores of candidate compounds using complex algorithms. This server also arranges the docked complex clusters in a rank-wise manner according to their global energy scores and allows users to perform rigid docking that generates the near-native result (Schneidman-Duhovny et al., 2005). The FireDock web server allows scoring of protein-protein docking solutions with flexible and high-throughput refinement and has proven to be successful in refining and selecting near-native solution from thousands of candidates (Mashiach et al., 2008). Finally, the Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) study was conducted with the HawkDock server (http://cadd.zju.edu.cn/hawkdock/) while all the parameters were at default values (Hou et al., 2011;Feng et al., 2017;Weng et al., 2019). The HawkDock server employs the ATTRACT docking algorithm, the HawkRank scoring function and the MM/GBSA free energy decomposition analysis and can predict key residues in protein-protein interactions in the top 10 residues for $81.4% predicted models and $95.4% crystal structures (Weng et al., 2019) (Biovia et al., 2000).

Protein-protein interaction analysis
To predict the binding affinity of the protein-protein complexes formed between the designed vaccine molecule and the TLRs, the complexes were uploaded on PROtein binDIng enerGY prediction (PRODIGY) (Vangone & Bonvin, 2015;Xue et al., 2016) webserver that utilizes the information derived from the molecular contacts and non-interface surface to calculate the binding affinity (delG) and the dissociation constant (kd) between the protein-protein complexes. The physiological reference temperature of 37 C was used for the calculation. The prediction made by the server are shown to be accurate with the value of Pearson's correlation coefficient between the predicted and measured binding affinity on the benchmark (P-value <0.0001) to be 0.73 with the Root Mean Square Error (RMSE) of 1.89 kcal/mol.
The detailed analysis of the type and nature of interface residues that are interacting between different epitopes of the vaccine with their corresponding TLRs were determined by using PDBSum web server (Laskowski et al., 2018). The interaction plot is generated between the two interacting protein molecules using the HBPLUS program (McDonald & Thornton, 1994). The server predicts different types of interactions including H-bonds, disulphide bonds, salt bridges as well as non-bonded interactions.
In order to assess for the contribution of each inter-surface residue between the vaccine molecule and the TLRs, the complexes were subjected to computational alanine scanning analysis by uploading them on DrugScorePPI webserver (Kr€ uger & Gohlke, 2010). The chains for all the interacting partners in the vaccine-TLR complexes were mentioned and ran separately to get the hotspot residues of both the interacting proteins as this server calculates the hotspot residues for only one interacting chain/partner at a time. The server provides the result in the form of DDG, which is essentially the difference in the binding free energy between the wildtype and the alanine mutant protein calculated based on the knowledge-based weighted distance-dependent pair potentials generated for scoring protein-protein interactions. Along with the DDG, the server provides the degree of buriedness for each interface residues by mapping the surrounding of these residue sidechains considering the number of atoms within 4 Å to other nearby residues as well as the salt-bridges between Asp or Glu within 4 Å distance to Arg or Lys.

Conformational B-cell epitope prediction
B-cell epitopes play important roles as the modulators of humoral immunity by producing antibodies against antigens. The presence of effective conformational B-cell epitopes ensures stronger and long-lasting immunity to the human body. Therefore, the selected epitopes were investigated of these conformational B-cell epitopes with the IEDB ElliPro tool (http://tools.iedb.org/ellipro/) while keeping default parameters (Ponomarenko et al., 2008).

Molecular dynamics simulation
The molecular dynamics (MD) simulation study was carried out using the vaccine-TLR docked complexes. MD simulations were performed with the GROMACS 2020.3 version simulation package (Abraham et al., 2015) AMBER99SB-ILDN force field (Lindorff- Larsen et al., 2010) was used with the TIP3P water model for each system. 0.15 M KCl was added to each system. To relax the structure and avoid steric clashes in further simulations, potential energy minimization was performed with the step size 1 fs to the maximum force of 1000.0 kJ/mol/nm. Pressure and temperature of the system were equilibrated to 1 atm and 310 K by running NVT and NPT simulations (100-ps each), respectively. The system's pressure and temperature were controlled with a modified Berendsen thermostat (Berendsen et al., 1984) and Parinello-Rahman barostat (Parrinello & Rahman, 1982) with time constant tau_t ¼ 0.1 ps and tau_p ¼ 2 ps, respectively. Productive 200 ns MD simulations were carried out in the isothermal-isobaric ensemble with 2 fs time-step for all six systems and the Root-mean-square deviation of atomic positions (RMSD), RMS fluctuations (RMSF) and Radius of Gyration (RG) calculations were performed. LINCS (Hess et al., 1997) algorithm was used to constrain the bonds involving hydrogen atoms. Long-range electrostatic interactions were calculated using the Particle-Mesh Ewald summation scheme (Darden et al., 1993).
The essential dynamics analyses of the trajectories were done using geo_measures (v.0.9) PyMol plugin (Kagami et al., 2020). The Free Energy Landscape (FEL), Principal Component Analysis (PCA), Probability Density Function (PDF) and mode vector (porcupine plots) analysis were done over the 500 equally spaced frames extracted from 200ns trajectories to assess the different conformations and motions mapped by the complexes.

Immune simulation
The immune simulation of the vaccine was conducted with C-IMMSIMM server (https://kraken.iac.rm.cnr.it/C-IMMSIM/ index.php) server. The server is one of the most accurate servers for immune simulation study and the prediction of adaptive immunity follows machine learning methods, similar to predicting epitopes-specific targets immune interactions (Luckheeram et al., 2012). Along with the machine learning methods, the position-specific scoring matrix (PSSM) technique is also applied (Rapin et al., 2010). The number of simulation steps of the C-IMMSIMM server (https://kraken.iac. rm.cnr.it/C-IMMSIM/index.php) was set at 1050 and the time steps were changed to 1, 84, and 170. In our study, the interval between vaccine administration was set to four weeks as the ideal interval between dosages for commercial vaccines was found to be four weeks (Castiglione et al., 2012).

Codon adaptation, in silico cloning, and analysis of the vaccine mRNA structure
An organism's cellular structure can vary drastically from that of another, and therefore, an amino acid can be encoded by several codons in various species (codon bias). As a result, codon adaptation is used to determine the best codon for efficiently encoding a single amino acid in a given organism. The vaccine protein was reverse translated to a potential DNA sequence that would encode the amino acids of the engineered vaccine protein in the codon adaptation analysis (Castiglione et al., 2012;Khatoon et al., 2017). The constructed vaccine's codon adaptation analysis was conducted using the JCat server (http:// www.jcat.de/) (Grote et al., 2005). The target organism which will be used to mass-produce the vaccine was chosen to be the prokaryotic E. coli strain K12 in the server, and rho-independent transcription terminators, prokaryotic ribosome binding sites, and restriction enzyme cleavage sites EaeI and StyI were all avoided during codon adaptation (Solanki & Tiwari, 2018). Lastly, the newly adapted DNA sequence was incorporated between the pETite vector's Eael and StyI restriction sites using the SnapGene restriction cloning software (https://www.snapgene.com/) (Solanki & Tiwari, 2018). The presence of sequences for small ubiquitin-like modifiers or the SUMO tag and 6X-His tags on the pETite vector plasmid means that the expressed vaccine protein will possess these tags, which assure increased solubilization and the affinity purification of the recombinant protein (Biswal et al., 2015). Furthermore, Mfold (http://unafold.rna.albany.edu/?q=mfold) and RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/ RNAfold.cgi) servers were used to predict the mRNA secondary structure of the designed vaccine with their minimum free energy (DG Kcal/mol). The lower the minimum free energy is, the more stable is the mRNA structure (Mathews et al., 1999(Mathews et al., , 2007Zuker, 2003;Gruber et al., 2008). From the JCat server, the mRNA secondary structure of the vaccine was predicted by first converting the optimized DNA sequence to RNA sequence. This was done by the DNA<->RNA->Protein tool (http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm). The RNA sequence was then collected from the tool and used in the Mfold and RNAfold servers for prediction while keeping all the parameters default.

Identification and retrieval of the target proteins
The UniProt accession numbers of the two chosen protein sequences (i.e. gB and gH) that were collected from the AD169, Merlin, Toledo, and Towne strains of HCMV from the NCBI database, are addressed in Table 1.

Antigenicity and determination of physicochemical properties of the target proteins
Among the query proteins, all eight proteins were found to be antigenic and afterward, were run through various physiochemical testing. All proteins showed a pI value below 7 except forthe glycoprotein gB of the Toledo strain (pI 7.25), indicating that most proteins reach the isoelectric point in an acidic environment. All the proteins possessed an estimated half-life (in mammalian reticulocytes) of 30 hours, which is quite a satisfactory duration. All the proteins were predicted to be stable as proved by their instability indexes (II) below 40 and the most stable protein was the glycoprotein gB of the strain Towne (II: 35.45). All proteins had a very high aliphatic index (AI) of >80, signifying a more thermostable state. The AI of a protein is a measurement of the relative number of amino acids in the protein's side chain as occupied by aliphatic amino acids such as leucine, proline, and valine (Ikai, 1980). The GRAVY value is used to decide whether a compound is hydrophilic or hydrophobic. The negative GRAVY value denotes a hydrophilic property, implying that compounds with a negative GRAVY value    are water-soluble, and a positive GRAVY value represents hydrophobicity or the hydrophobic nature of a compound. (Chang & Yang, 2013;Kyte & Doolittle, 1982;Panda & Chandra, 2012). In our analysis, six of the eight proteins were found to be hydrophilic. The Supplementary Table S1 tabulates quite exquisite results for further inquiry about the proteins.

T-cell and B-cell epitope prediction
The efficacy of the selected antigenic proteins and vaccines depends on how well the immune system recognizes them and generates the required and sufficient response. Epitopes are the part of an antigen molecule with which an antibody can attach itself. For the proper effectiveness of the vaccine, CTL, HTL, and LBL epitopes must be present within the vaccine construct (Zhang, 2018). The T-cell and B-cell epitopes were determined for constructing the subunit vaccine. The cell-mediated immune response provides robust immunity against different types of pathogens, which lasts for a lifetime (due to pathogen-specific memory). The cell-mediated immunity mediates its effects by the secretion of antiviral cytokines, especially through identification, followed by the destruction of the infected cells (Cano & Lopera, 2013;Garcia et al., 1999). The antigens are recognized by the cytotoxic T-cells, while helper Tcells aid in activation of the cytotoxic T-cells, B-cells, as well as macrophages (Chaudhri et al., 2009;Zhu & Paul, 2008). B-cells additionally moderate the less significant and less vigorous humoral immune response (compared to cell mediated immune response) through the production of antibodies (Bacchetta et al., 2005;Cooper & Nemerow, 1984). A total of 62 CTL, HTL, and LBL epitopes were initially selected for further analyses according to their percentile rank (lower the rank higher the significance) and corresponding scores (Elamin Elhasan et al., 2021). Supplementary Table S2 lists the potential T-cell and B-cell epitopes of gB protein from all the selected strains and Supplementary Table S3 lists the potential T-cell and B-cell epitopes of gH protein from the selected strains, with their respective antigenicity, allergenicity, toxicity, cytokine inducing ability (for HTL epitopes) topology, and homology to the human proteome.

Antigenicity, allergenicity, toxicity, conservancy, and human homology analyses
In the antigenicity, allergenicity, and toxicity analyses, screening of the highly antigenic, non-allergenic, and non-toxic epitopes was performed. Such selection was in resonance with high antigenic response simulation within the body, preventing allergic reactions and maintaining cellular safety from toxic elements. The HTL epitopes were mostly antigenic as well as non-allergenic. The epitopes enlisted within the table are the most promising epitopes conforming to all three criteria of this test. Following that, in the conservancy and human homology test the selected epitopes emerged as highly conserved among the selected strains (Supplementary Figure S1, S2) and non-homologous to the human proteome . Therefore, these epitopes should provide immunity to all the target viral strains and the non-homology of the epitopes ensured that these epitopes would not be able to cause any autoimmune diseases within the body.

Cluster analysis of the MHC alleles
The online tool MHCcluster 2.0 was used to perform a cluster analysis of the possible MHC class-I and MHC class-II alleles that may interfere with the predicted epitopes of the selected proteins. Phylogenetically, the method shows the relationship between the groups of alleles. The results of the experiment are illustrated in Supplementary Figure S3, with the red zone suggesting a strong interaction and the yellow zone indicating a weaker interaction. All the analyzed MHC alleles showed satisfactory relationship among themselves.

Vaccine construction
The vaccine was designed with the most promising epitopes that could be used to provide immunity to the selected viral strains successfully with a few selection criteria, i.e. high antigenicity, non-toxicity, non-allergenicity, high conservancy across the selected strains, and non- homology to the human proteome. 28 of the 62 epitopes were finally considered for the vaccine construction according to the mentioned criteria (Supplementary Table  S4). The human-beta-defensin-3, an adjuvant, and the PADRE sequence were conjugated along with the chosen epitopes for the vaccine construction. The linkers, i.e. EAAAK, AAY, GPGPG, and KK were utilized at particular sites, which are proved to be beneficial for the vaccine construction and its increased productivity. Finally, the constructed vaccine was designated as the 'HCMV' vaccine (Supplementary Table S5).

Antigenicity, allergenicity, and physicochemical property analyses
The results of the antigenicity and allergenicity test of the HCMV vaccine were exceptionally satisfactory. Due to being highly antigenic and non-allergenic, the vaccine was predicted to generate the desired immune response in human body, without provoking any allergenic reactions. A further prediction of the additional physicochemical properties of the HCMV vaccine confirmed its capacities. The vaccine demonstrated an isoelectric point (pI) value of 9.40, suggesting the vaccine to be basic. Moreover, the stability of the vaccine was ensured by an II value of 37.58, which is less than 40, proving its stability (Guruprasad et al., 1990). Again, the high AI value (79.98) of the vaccine states that it should remain stable within the body temperature (Chaudhri et al., 2009). The negative GRAVY value (-0.336) indicated that the vaccine might be be hydrophilic. The vaccine also demonstrated, in mammalian reticulocytes, that it has a half-life of 1 h and >10 h in E. coli cell culture system, which means that mass production and purification of the vaccines should run efficiently in the E. coli cell culture system. Furthermore, since the higher solubility of a protein indicates improved purification during downstream processing, solubility is a major factor in post-production vaccine studies. The HCMV vaccine showed satisfactory results to be soluble upon over-expression in E. coli cell culture system. This is suggestive of an easy and efficient post-production downstream processing (Magnan et al., 2009). All the results found in (Table 2) point towards the suitability of the predicted vaccine as an effective preventive measure against HCMV.

Secondary structure prediction of the vaccine construct
From the secondary structure prediction of the vaccine constructs, it was found that the protein mainly consists of coil structure (39.97% as observed in PSIPRED, 41.60% as observed in GOR IV, 31.16% as observed in SOPMA, and 48.86% as observed in SIMPA97). After the coil structure, the alpha structure contained the second majority of amino acids (PSIPRED ¼ 35.39%, GOR IV ¼ 36.54%, SOPMA ¼ 41.44%, SIMPA96 ¼ 33.88%), and the b-strand contained the least amino acid percentage (PSIPRED ¼ 24.63%, GOR IV ¼ 21.86%, SOPMA ¼ 18.11%, SIMPA96 ¼ 17.10%) (Supplementary Figure S4 and Supplementary Table S6). RaptorX Property server predicted solvent accessibility (ACC) and disorder regions (DISO) (Supplementary Figure S5) (Wang et al., 2016;Young et al., 2011). We predicted 40% of the 613 amino acid residues in our final vaccine to be properly exposed, 27% to be moderately exposed, and 32% to be buried. In Supplementary Figure S5, the peptides labeled with black boxes are B-cell epitopes with strong surface accessibility. In disordered regions, a total of 24 residues (3%) were found to reside.
3.9. Protein 3D structure prediction, refinement, and validation The 3D structure of the HCMV vaccine was generated by the RaptorX server and then refined in the Galaxy server. The refined tertiary structure of the vaccine is illustrated in Supplementary Figure S6. Following that, validation process of the refined structure was carried out in the PROCHECK as well as the ProSA-web servers. The percentage of amino acid residues in the most favored regions of the HCMV vaccine protein was found to be 86.4%, as shown from Ramachandran plots. Additionally, allowed regions contained about 11.2% amino acids, while generously allowed regions and disallowed regions contained 1.1% and 1.3%, respectively. Moreover, the z-score of the vaccine protein structure was À3.47, which, as it is known, is a pretty good score (low scores denote better quality of the query protein). Therefore, the Ramachandran plots and z-score analysis, after refinement of the vaccine protein structure , presented satisfactory results regarding the HCMV vaccine, including a high percentage of amino acids in the most favored region, along with a low z-score of À3.47 (Supplementary Figure S7).

Vaccine protein disulfide engineering
Only amino acid pairs with a bond energy value of less than 2.2 kcal/mol were chosen for protein disulfide engineering.  Figure S8). The vaccine protein might be stable enough to generate a good immune response with these disulfide bond forming six amino acid pairs.

Protein-protein docking study
The vaccine was subjected to a protein-protein docking study to evaluate its capability to interact with TLRs efficiently during an immune response. When docked by ClusPro 2.0, the HCMV vaccine revealed a high binding affinity. It was also docked with two other servers for improved prediction, namely PatchDock and HawkDock. In the ClusterPro analysis, the vaccine protein-TLR1 docked complex generated the lowest score (À1201.8 kcal/mol) among the other complexes. The PatchDock server also represented the high binding affinity of vaccine protein, with TLR8 having a score of À19.22 kcal/mol. The MM-GBSA study by the HawkDock server predicted the vaccine protein's higher binding affinity with TLR4 (-91.25 kcal/mol) (Table 3). Finally, the interaction of the vaccine protein-TLRs docked complexes were subjected to visualization (Figure 3).

Protein vaccine-TLRs interaction analysis and hotspot identification
The vaccine protein-TLRs complexes were stabilized by different kind of interactions between them including H-bonds, disulphide bonds, salt bridges and hydrophobic/non-bonded contacts. As evident from the figures (Supplementary Figure  S9-S14), the major type of interactions between vaccine protein and TLRs were non-bonded contacts in all the complexes followed by multiple hydrogen bonds and few salt bridges as in the case of TLR1, 2, 8, and 9. The binding affinity of the vaccine complex with the TLRs ranged from À6.3 to À12.5 kcal/ mol with the highest affinity of the vaccine for TLR1, TLR8, TLR2, and TLR9 followed by TLR4 and TLR3 (Supplementary  Table S7 and Table 3). The PRODIGY server predicted the binding affinity (in Kd (M) at 37.0 C) of the vaccine complex with the TLRs ranging between 8.4 E-09 and 3.8 E-05. The hot-spot residues critical for the interaction between the vaccine protein and TLRs were identified by DrugScorePPI webserver. The server identified the hot-spot residues by performing the alanine scanning of the interface residues, thereby providing the DDG energy values between the wild-type to mutant residue in protein and it also provides the degree of buriedness of the residue chains at the interface of interacting partners. Higher the value of DDG and degree of buriedness more is the probability of finding the hotspot residue. The identified hotspot residues in all the complexes (Supplementary Figure S15-S20) and their corresponding residue interaction data (Supplementary Figure S21-S26) are well in agreement i.e. the vaccine residues identified as hot-spot residues by DrugScorePPI server are the residues which are essentially interacting with the TLRs and mutating them to alanine gave substantial difference in DDG up to 1.76 kcal/mol.

Conformational B-cell epitope prediction
The CMV vaccine predicted six potential regions of conformational B-cell epitopes, with scores ranging from 0.582 to 0.785. These scores can be considered as satisfactory scores as higher scores of regions represent better probability of acting as conformational B-cell epitopes. These six regions covered a total of 339 amino acids, as listed in  Table S8 . Therefore, these six regions of the vaccine might act as B-cell epitopes and thus would be effective in generating potential antibodies (Figure 4).

Molecular dynamics simulation
The MD simulation was carried out for 200 ns simulations to find out the probable stability of our designed vaccine structure. During the MD simulation, the vaccine-TLR1, vaccine-TLR2, vaccine-TLR3, vaccine-TLR4, vaccine-TLR8, and vaccine-TLR9 complexes were designated as CMv1, CMv2, CMv3, CMv4, CMv5, and CMv6, respectively. Since the periodic boundary conditions were met during the simulation, the re-centering of protein molecules in the complexes and their return to the simulation cell was performed. Thereafter, an initial analysis of the trajectories, including the calculation of RMSD (Figure 5a), RMSF (Figure 5b), and RG (Figure 5c) was conducted. The RMSD changes are depicted in Figure  5a, from which it can be seen the RMSD stabilizes around 0.8 nm for CMv1 and CMv2, around 0.7 nm for CMv3, and between 0.5 and 0.6 nm for CMv4, CMv5, and CMv6. When the size of the simulated systems is considered, these calculations represent the stable vaccine-TLR docked complexes. The RMSF values for the C-alpha atoms of the complexes are shown in Figure 5b. The graph shows that the mobility of the TLR atoms in every case is significantly lower than the mobility of the vaccine atoms. And Figure 5c shows that the RG values of the studied complexes tend to remain relatively stable for each complex, which indicates that the compactness of complexes does not change significantly during simulation. All these satisfactory findings suggested that our designed HCMV vaccine construct might remain quite stable within the biological environment and thus would be able to generate a substantially effective immune response.

Essential dynamics analysis of the vaccine protein-TLR complexes
The molecular dynamic simulations provide the atomistic details of movement of atoms with respect to the time scale of the simulation. The complex macromolecules with different domains or interacting partners are dynamic in nature and adapts to different conformational changes in them to acquire the energetically most favorable state. Such kind of motions and energy state are studied by the essential dynamics which extract the principal components and the dimensional subspace spanned by the macromolecule to acquire the stable state. FEL represents the energy minima which contains the most native like structural states at their corresponding energy level while the PDF denotes the probability of finding that native state in the conformational subspace spanned by the molecule. The mode vectors denote the direction of principal motions compassed by the molecule over the simulation time. The essential dynamics analysis of the vaccine protein-TLRs complexes were done to look for the different conformational changes adapted by the interacting proteins after their complex formation over the simulation period. As evident from the figures ( Figure  6-11), the vaccine-TLR4 showed a single energy minima which contains an ensemble of the most stable states while vaccine-TLR9 showed 2 energy minimas, vaccine-TLR1 and TLR3 showed three energy minimas. However, in the case of vaccine-TLR2 and TLR8, multiple energy minimas were observed indicative of the different conformational movements acquired by the loop and linker regions of the vaccine protein in complex with the TLRs (Figures 7, 10). Moreover all the systems were converged to the stable states as evident from the FEL plots with the lowest energy basins. All the complexes spanned diverse conformational subspaces as evident from the PCA analysis and the movements of the TLRs and vaccine molecules were towards each other to form a tighter complex as observed in the mode vector/porcupine plots. In conclusion, all the vaccine protein-TLRs complexes acquired the stable conformations over the simulation time scale.

Immune simulation
Robust primary immune response was found to be generated along with secondary immune response by the HCMV Figure 11. The metadynamic analysis of TLR9-vaccine complex trajectory. (a) Gibbs free energy plot for TLR9-vaccine trajectory using Rg by RMSD. The blue region denotes the energetically stable conformations while the red regions denote the energetically less stable conformations. (b) Probability density function for TLR9vaccine trajectory using Rg by RMSD. The dark red area corresponds to the most populated populations while the blue region represents the least populated. (c) Projection of the TLR9-vaccine complex trajectory using the first and second principal components (PC1 and PC2). (d)TLR9-vaccine complex transition movement porcupine plot. The length of the arrows denotes the extent and direction of the movement in the protein structure. vaccine. The three dosages showed a natural response against a typical vaccine. To further confirm the potential of the vaccine, the immune simulation results predicted increased levels of active B cells (Figure 12b, c), plasma B-cell (Figure 12d), helper T-cell (Figure 12e, f), and cytotoxic T-cell (Figure 12g-h) along with gradually increased levels of different immunoglobulins (Figure 12a). Concurrently, the vaccine stimulates helper T-cells pointing towards better adaptive immunity (Almofti et al., 2018;Carvalho et al., 2002). Along with that, the results suggest better immune memory generation, higher antigen clearance rate, increased dendritic cells and macrophages that point towards an outstanding antigen presentation by the APCs (Antigen Presenting Cells) ( Figure  12i, j). IFN-gamma, IL-23, IL-10, and IFNbeta, a range of diverse cytokines that generate an immune response and defending the body against viruses, were predicted to be produced by the vaccine (Hoque et al., 2019;Kambayashi & Laufer, 2014;Shey et al., 2019) (Figure 12k). An insignificant amount of Simpson's Index (D) indicating a more diversified vaccine effect was presented in the experimentations (Kaminski et al., 2001). Such promising attributes, including the elevation of immunoglobulin, cytokines, APCs, active Bcells, and T-cells indicate highly promising and revolutionary immunogenic responses generated by the HCMV vaccine.
3.17. Codon adaptation, in silico cloning, and analysis of the vaccine mRNA structure The vaccine proteins are estimated to be effectively produced by the E. coli strain K12 with the presence of the most likely codons (Mathews et al., 1999;Zuker, 2003). The CAI value of the HCMV was found 0.94, indicating the presence of the most suitable codons for use in cellular machinery of the E. coli strain K12 within the DNA sequences, as a value above 0.80 is considered good potential (Supplementary Figure S27). While the GC content between 30% À 70% is regarded as a good score, the codon adaptation test for HCMV presented a GC content of 51.22%, indicating the improved sequence of the vaccine protein. After the codon adaptation, the predicted DNA sequence was incorporated into the pETite vector plasmid between the restriction sites EaeI and StyI. During the downstream process, it was expected that the vaccine protein would be smoothly purified due to the presence of SUMO and 6X His tag sequences in the plasmid. Such cloned vector plasmid was designated afterward as 'CMV' (Figure 13). As predicted respectively by the RNAfold and Mfold servers, the maximum and minimal free energies, are À547.97 kcal/mol and À545.00 kcal/mol, respectively. The HCMV vaccine can be established stable in terms of in vivo expression due to the lower maximum and minimal free energies (Hamasaki-Katagiri et al., 2017). The mRNA secondary structure of the vaccine predicted by the RNA fold server is illustrated in Supplementary Figure S28.
In a nutshell, the vaccine designed in this study might be an effective option to combat HCMV infections worldwide. However, further experiments using various protein expression systems such as human embryonic kidney and Chinese hamster ovary cell lines should be carried out to validate the computationally generated results of this research.

Conclusion
HCMV is one of the most transmitted viruses and it is the leading cause of congenital viral infection, brain damage, and accountable for several neurodevelopmental abnormalities. Researchers have been attempting to develop the best vaccine for half a century and while some have shown promising outcomes, none have yet received approval. This study utilizes immunoinformatic applications and in silico biology Figure 13. The constructed recombinant pETite plasmid designed for mass production of the HCMV vaccine. Here, the HCMV vaccine insert has been marked in red color.
to construct a suitable and effective polyvalent vaccine against the four different strains (i.e. AD169, Merlin, Toledo, and Towne) of HCMV. After a thorough and rigorous analysis, the most promising epitopes for final vaccine construction were chosen based on the selection criteria of high antigenicity, non-allergenicity, non-toxicity, conservancy, cytokine inducing capacity (for HTL epitopes), and non-homology (to the human proteome). The high conservancy of T-cell and Bcell epitopes studied suggests that the vaccine would be effective against all the selected strains of the virus. Moreover, the allergenicity and non-toxicity studies imply that the vaccine would be able to elicit an excellent immunogenic response without causing toxicity. Several in silico validations performed in the study also indicated that the vaccine would be safe, effective, and responsive to the human body. Due to the high costs and limitations of developing a live, attenuated, or inactivated vaccine for such a highly infectious agent, peptide-based vaccine candidates, such as the one developed in this study, is a more costeffective and efficient way of designing potential blueprints of vaccines. If wet laboratory-based studies produce satisfactory findings, then this epitope-based vaccine candidate could give a relatively inexpensive and effective approach for combating HCMV infections worldwide.