In silico analysis of glycosylation pattern in 5th-6th repeat sequence of reelin glycoprotein

Abstract Reelin is an extracellular matrix glycoprotein that plays a key role in cortical development, maturation, synaptic plasticity, and memory formation in the adult mammalian brain. Glycosylation is a significant post- and co-translational modification of proteins. Although glycosylation contributes to the characteristic of proteins from their production to molecular interactions, the knowledge about the glycosylation pattern of reelin is very limited. In this study, we aimed to predict the potential glycosylation pattern of the 5th-6th repeat of central reelin fragment that responsible for their signaling, by using in silico methods. We found that the predicted glycosylation pattern of the 5th-6th repeat of human reelin was highly conserved between vertebrate species. However, this conservation was not observed in analyzed invertebrates. For the first time, we described the sites of glycosylation at a three-dimensional protein structure in human reelin. Because the sites were very closed to EGF-like repeats and receptor binding sites, they could contribute the interaction with a partner of reelin in addition to the effect of thermostability to protein. Many of the residues related glycosylation were also conserved in analyzed species. These findings may guide biochemical, genetic, and glycobiology base on further experiments about reelin glycosylation. The understanding of reelin glycosylation might change the point of view of treatment for many pathological conditions in neurodegenerative diseases such as Alzheimer’s disease. Communicated by Ramaswamy H. Sarma


Introduction
Glycosylation is one of the most widespread co-and posttranslational modifications of protein (Schauer, 2004), and it impacts the function of proteins throughout folding, localization and trafficking, protein solubility, antigenicity, biological activity and, half-life as well as molecular interactions (Hounsell et al., 1996;Varki, 1993). The most common mechanisms by which glycans are linked to proteins are N-linked and O-linked glycosylation. The first one, called N-glycosylation, is identified by attachment of precursor oligosaccharide chain, GlcNAc 2 -Man 9 -Glc 3 , from the dolichol-linked pyrophosphate donor to the amino side chain of the asparagine (Asn) within a consensus sequence of Asn-Xaa-Ser/Thr (Xaa is any amino acid except Pro). The second one, called O-glycosylation, is identified by attachment of monosaccharide/glycan units to the hydroxyl side chain of serine (Ser) or threonine (Thr) amino acids (Taylor & Drickamer, 2011).
Variety types of O-glycosylation forms have been documented, such as O-GalNAc, b-linked O-GlcNAc, and a-linked O-GlcNAc types (Blom et al., 2004;Steen et al., 1998). Mucin type form (O-GalNAc) is the most known form of O-glycosylation in mammals (Verma & Davidson, 1994). In contrast to Nglycosylation, there is not defined amino acid motif for the transfer of O-glycans to the protein backbone in O-glycosylation. Following the transfer of glycan to the polypeptide chain, N-and O-linked glycans are processed to construct the structure of mature glycan by glycosidase and glycosyltransferase enzymes (Lehle & Bause, 1984). All of these complex glycosylation processes influence biochemical, physicochemical, and structural characteristics of glycoproteins (Sinclair & Elliott, 2005).
Reelin (400 kDa), an extracellular glycoprotein, is secreted from the Cajal-Retzius cells in the marginal zone of the developing cortex and a subpopulation of GABAergic interneuron in the mature brain (Rice & Curran, 2001). It acts as a key regulator of several biological processes in laminar formation including neuronal migration, cell aggregation, and dendrite formation (Sekine et al., 2014). From the N-to the C-termini, reelin is composed of a signal peptide, F-Spondin-like domain, unique region called 'H' subdivided into three subdomains termed X, Y, Z, and eight tandem reelin repeats followed by basic C-terminal domain respectively ( Figure 1A). Each of these repeats includes epidermal growth factor (EGF) like motif at its center and links up two sub-repeats (D'Arcangelo et al., 1995;Ichihara et al., 2001). In the reelin signaling cascade, a central fragment of reelin (RR3-6) induces the layer formation in cortical slice culture. However, Nand C-terminal fragments appear to be inactive (Jossin et al., 2004(Jossin et al., , 2007. Physiologically, 5 th -6 th repeats of reelin within the central fragment are responsible for the activation of tyrosine phosphorylation-dependent signal transduction (Yasui et al., 2007(Yasui et al., , 2010. In the signal pathway, reelin binds to its receptors, very-low-density lipoprotein receptor (VLDLR) and apolipoprotein E receptor 2 (ApoER2) (D'Arcangelo et al., 1999;Hiesberger et al., 1999;Trommsdorff et al., 1999). Then reelin phosphorylates the adapter protein disabled-1 (Dab1) through the Src family tyrosine kinases (SFKs) Fyn and Src (Howell et al., 1997;Kuo et al., 2005). Phosphorylated Dab1, in turn, recruits several signaling molecules to include the Crk family adaptor proteins, Crk and CrkL (Park & Curran, 2008).
Reelin contains a mass of N-linked and a little O-linked glycan (D'Arcangelo et al., 1997). Reelin glycosylation is also known to be altered in cerebrospinal fluid (CSF) and plasma of Alzheimer's disease (Botella-Lopez et al., 2006). Glycosylation of reelin is changed with b-amyloid 1-42 (Ab42) induction (Botella-L opez et al., 2010), and this changing of the glycosylation effects reelin signaling via its receptor interaction and increases Tau phosphorylation (Cuchillo-Ib añez et al., 2013). Besides, recent data has shown that reelin glycosylation could be important in nonneuronal diseases and some types of cancer (Khialeeva & Carpenter, 2017).
Although reelin is known as a glycoprotein, the glycosylation pattern, conserved sequences of the active sites, and bioinformatics comparison of reelin between species are poorly understood and determined. Because bioinformatics is a practical way to predict the glycosylation pattern (Gupta & Brunak, 2001), in this study, we aimed to show that the potential glycosylation sites, conserved sequences of the active site, and comparison of protein sequences in 23 species, include vertebrate and invertebrate species, by using in silico methods, which are commonly used and have high output accuracy, in 5 th -6 th repeats of reelin glycoprotein.

Selection of sequences
To predict potential glycosylation sites of reelin glycoprotein, the amino acid sequence of Homo sapiens' reelin was retrieved from the NCBI protein database, and a homology search was made by the protein blasts (blastp) in NCBI nonredundant protein database (Pruitt et al., 2007). Reelin protein sequences were picked up from the database according to high Bit value and zero E-value scores ( Table 1). As a protein sequence alignment tool, CLUSTALW was used to perform multiple alignments (Larkin et al., 2007) for selected FASTA sequences of reelin glycoprotein. According to multiple alignments, the conserved 5 th -6 th repeats of reelin were determined between these species.

Construction of phylogenetic tree with multiple sequence alignments
The data of phylogenetic tree were obtained from CLUSTALW, according to the multiple sequence alignments, and that were visualized with phylogenetic tree (Newick) viewer of the environment for tree exploration (ETE) toolkit, which is a computational framework that simplifies the reconstruction, analysis, and visualization of a phylogenetic tree with multiple sequence alignment (Huerta-Cepas et al., 2016). In addition, multiple sequence alignments of reelin glycoproteins were combined with the phylogenetic tree (Thompson et al., 1994). The super-matrix-based construction mode of ETE permits to build and concatenate multiple sequence alignments with a phylogenetic tree of species.

Prediction of glycosylation sites
The prediction of N-glycosylation sites was analyzed by NetNGlyc 1.0 server, which was based on artificial neural networks (Gupta & Brunak, 2001). The protein sequences of reelin in FASTA format were uploaded to the server. After the prediction had been calculated, a threshold value above 0.5 was considered the possible glycosylation site. In the same way, the prediction of O-glycosylation sites was analyzed by three servers included NetOGlyc 4.0, DictyOGlyc, and YinOYang servers for GalNAc, a-GlcNAc, b-GlcNAc linked glycosylation respectively Gupta & Brunak, 2001;Steentoft et al., 2013). The threshold value was 0.5 for the NetNOGlyc 4.0 server. However, DictyOGlyc and YinOYang did not use a constant threshold value. In addition, isoform-specific O-glycosylation prediction (ISOGlyP) server analyzed the prediction of N-acetylgalactosaminyltransferase activities, responsible for the transfer of GalNAc to serine/threonine amino acids in mucin-type glycosylation (Gerken et al., 2011).

Protein modelling and visualization
The three-dimensional (3 D) structure of 5 th -6 th repeats of human reelin were constructed by the iterative threading assembly refinement (I-TASSER) server. The server was going to predict the protein structure by using the protein amino acid sequence in many steps that included threading, structural assembly, model selection and refinement, structure-based function annotation (Roy et al., 2010;Yang et al., 2015;Zhang, 2008). The amino acid sequence of human reelin (NCBI code: AAC51105.1) was used for the construction of the model. The constructed 3 D structure also was visualized, and the vacuum electrostatic was calculated by PyMOL, which is an open-source molecular visualization software (Schrodinger, 2015). The topological structure and surface accessibility of amino acids of the 3 D model was analyzed via the Espript3 server (Robert & Gouet, 2014). The Ramachandran plot server was used to make a visual representation of the main chain conformational tendencies of amino acids in the structure of reelin (Anderson et al., 2005). The Chiron server was used to resolve steric clashes that arise due to the unnatural overlap of any two nonbonding atoms in the structure of the reelin (Ramachandran et al., 2011).

Results
The multiple sequence alignment of 5 th -6 th reelin repeats has shown that the fragment of human reelin glycoprotein was highly conserved among selected 18 species of vertebrates ( Figure 1, Supplement 1). According to the phylogenetic tree, the difference of sequences was less than 12% between these vertebrate species (Except, Danio rerio) ( Figure 1B). The lengths of these repeat sequences also were to range from 703 to 797 amino acids. In addition, since we want to elucidate the effects of glycosylation sites to reelin, we determined the conserved amino acids (Asn, Ser, Thr) related to glycosylation with their surface accessibility rates ( Figure 1C, Supplement 1). However, we could not find any conserved amino acids related to glycosylation between vertebrates and invertebrates species. The potential N-glycosylation sites of the 5 th -6 th reelin repeat of human were obtained from the NetNGlyc 1.0 server ( Table 2). The repeat of Homo sapiens included 39 asparagine (Asn) residues. The four of them were found in the Asn-Xaa-Ser/Thr (Xaa is any amino acid except Pro) sequence. These two glycosylation sites (Asn 2268 , Asn 2568 ) were conserved in analyzed vertebrates. The N-glycosylation sites of vertebrates are also more similar to each other (Figure 2A, Supplement 2).
The potential mucin-type O-glycosylation (O-GalNAc) sites of 5 th -6 th reelin repeats of human were obtained from NetOGlyc 4.0 server ( Table 2). All of these species, which were analyzed, contained mucin-type O-glycosylation sites ( Figure 2B, Supplement 2). In addition, there were four possible mucin-type glycosylation sites in Homo sapiens, the two of them (Ser 2070 , Thr 2071 ) were found to be conserved in analyzed vertebrates. Since the mucin-type O-glycosylation sites were very close to each other (Ser 2070 , Thr 2071 ), they can be seen as overlapping sites ( Figure 2B). In addition, the potential band a-linked GlcNAc type's glycosylation sites of human reelin were obtained from YinOYang and DictyOGlyc 1.1 servers in respectively (Table 2). In spite of the b-linked glycosylation sites were detected in 5 th -6 th repeats of reelin, the a-linked sites did not be detected. There were fifteen possible b-linked glycosylation sites in reelin ( Figure 2C, Supplement 2). The nine of them were found to be conserved in analyzed vertebrates ( Table 2).
The N-acetylgalactosaminyltransferase activities (ppGalNAc Ts) were analyzed for the mucin-type glycosylation sites of 5 th -6 th reelin repeats via ISOGlyP server (Figure 3). According to enzyme activities, the T5 isoform was found to have the highest Enhancement Value Product (EVP) for Thr 2163 . In addition to T5 isoform, a variety of other isoforms such as T3, T11, and T12, were found to have a close EVP value to each other for Thr 2163 (Figure 3). However, two sites (Ser 2070 and Ser 2162 ) were found to have low EVP value for these enzymes.
The three-dimensional (3 D) structure of 5 th -6 th reelin repeats was constructed via I-TASSER server that resulted in accuracy of 2 in C-score, 0.99 ± 0.04 in Estimated TM-score, and 3.5 ± 2.4 Å in Estimated RMDS (Figure 4). The C-score is typically in the range of (Blom et al., 2004;Varki, 1993), in which a higher value of C-score signifies a model of reelin structure with a high confidence. The topological structure of the 3 D-model contained eight a-helices, fifty-five b-strands, 29 strict b-turns structures, and nine disulfide bonds (Supplement 1). Potential N-glycosylation sites were also located between two closed b-strands in the 3 D structure of reelin. In addition, all of the glycosylation sites (Nand O-linked) were closed to cysteine bridges (Supplement 1). Since we can understand the effect of the glycan to structure, the 3 D model of reelin with glycosylation sites was visualized via PyMOL software ( Figure 4A). By the way, according to the vacuum electrostatics of the model, reelin contained many regions as a neutral charge ( Figure 4B).
According to Ramachandran plot, we can categorize the conformational tendencies of the reelin structure into three categories composed of the percentage of the highly preferred observation (89.6%, 543 amino acids), the preferred observations (7.4%, 45 amino acids), and Questionable observations (3.0%, 18 amino acids) (Supplement 3). In terms of energy minimization of the reelin, the total number of contacts (11685), clashes (354), Van der Waals (VDW) repulsion energy (215.264 kcal/mol), and clash ratio (0.018) values were obtained from the Chiron server (Supplement 4).

Discussion
Most of the reelin sequences that identified belong to the phylum chordate, although few homologs could be identified in Mollusca and Arthropoda included Saccoglossus kowalevskii, Crassostrea gigas, and Aplysia californica (Manoharan et al., 2015). In addition to them, we compared the two invertebrate sequences belong to Strongylocentrotus purpuratus and Ciona intestinalis to vertebrates based on protein  sequences. The identified reelin sequences share common reelin repeats especially for 5 th -6 th reelin sequence ( Figure  1C). Interestingly, S. purpuratus is distinguished from other homologs of reelin because it is the more closed species to vertebrate based on sequence similarity ( Figure 1B). Because the 5 th -6 th reelin repeats are required for the reelin signaling (Yasui et al., 2010), this could be evolutionary processing to effects the signal activity of the reelin. On the other hand, Echinoderm, belong to S. purpuratus, is a very similar family compared to the other group of invertebrates to vertebrates and shares short identical domains of reelin with the species can be useful to understanding the reelin protein structure. In spite of the overall structure of reelin has not been reported, because of its unusual large molecular mass and  . Three-dimensional (3 D) structure of the 5th-6th repeat of reelin glycoprotein. A, constructed cartoon structure of 5th-6th repeat of human reelin showing the possible glycosylation sites (spheres) and receptor-interacting region (dashed spheres); B, the calculated vacuum electrostatic potential of the model (white, neutral; grey, charged region); C, the molecular face of 5th repeat with the status of Cys2101 and Asn2144 sites; D, the localization of Asn2268 with active Lys2360 and Lys2467 sites; E, the EGF-like repeat of 5th repeat, and it's status with Thr2163 and Asn2144; F, the localization of EGF-like repeat with Asn2316 and Asn2568 at 6th repeat of reelin. The spheres show the particular sites in the 3 D model. The color of hot-pink (mucin-type of O-glycosylation); blues (N-glycosylation); yellow (active site of reelin signaling); Red (Cys2101); Orange (EGF-like repeats); magenta (O-b-linked glycosylation).
proteolytic cleavage at multiple sites, the 5 th -6 th reelin repeats was solved by X-ray crystallography in mouse (Mus musculus) (Yasui et al., 2007(Yasui et al., , 2010. However, there was 5% differences between human and mouse species in reelin glycoproteins (Table 1). According to Ramachandran Plot and energy minimization of the structure, the orientation of the amino acids is highly preferred observation, and the clash score (0.018) is suitable for the human model ( Figure 4A). Because the acceptable clash-score is calculated as 0.02 kcal/ mol/contact from the distribution of clash scores of structures from the high-resolution dataset (Ramachandran et al., 2011). The crystalized structure of these repeats revealed that the 5 th -6 th reelin repeats of mouse carry four N-glycan chains (Yasui et al., 2007). In the present study, we determined the possible four N-glycosylation sites in human as well as the mouse (Table 2). It is also shown that N-glycosylation sites and motifs were conserved through vertebrate species, and some of the species may carry the species-specific glycosylation sites as well O-glycosylation ( Figure 2).
The functional form of reelin was found as a multimer protein structure through inter-chain disulfide bonds and non-covalent interactions (Yasui et al., 2011). According to that, the role of Cys 2101 in 5 th -6 th reelin repeats of mouse responsible for protein dimerization, and this residue was conserved across species from sea urchin to human. A mutation of this residue (C2101A) led to a great reduction in the signaling activity. However, the multimeric form alone was insufficient, and thereby special other structure maintained by covalent and non-covalent intermolecular interaction is required for reelin signaling (Yasui et al., 2011). Reelin glycosylation is changed with b-amyloid 1-42 (Ab42) induction, and abnormal dimerization of reelin is a case in the AD brain, which suggests an altered signaling in vivo (Cuchillo-Ib añez et al., 2013). As the Cys 2101 , Asn 2144 was located at an opposite molecular face from the receptor-binding site as well ( Figure 4C). Thereby, it may allow the formation of dimer or act as a stabilizer for intact interaction for reelin homodimers. In addition, the Asn 2568 was located at R6 repeat and have similar position with Asp 2144 . On the other hand, the Asn 2268 was located at opposite direction with receptor binding site of 5 th -6 th reelin repeats, and it could help to stabilize the active Lys 2467 and Lys 2360 residues that were sufficient for receptor interaction of reelin ( Figure 4D). Two sites (Asn 2316 and Asn 2568 ) were located at side of EGFlike domain in 6 th repeat, and they may contribute the possible molecular interaction of the EGF-like repeat with partners ( Figure 4F).
The GalNAc-type glycosylation directed by distinct polypeptide GalNAc-transferase isoenzymes (GalNAc-Ts) were emerging as an important regulator for protein function, and further insight into the O-glycoproteome (Katrine & Clausen, 2012). In this study, our predicted GalNAc-type glycosylation sites were located at the vicinity of the EGF-like repeats of reelin, and most of the sites were conserved through vertebrate species ( Figure 2B). Since the function of EGF-like repeats involve mediating protein-protein interaction and trafficking (Haltom & Jafar-Nejad, 2015), these conserved glycosylation sites (like Thr 2163 ) may be important factor for interaction partners of reelin ( Figure 4E). In addition, the GalNAc-Ts enzymes could be responsible for different type of cellular event, and may be uniquely sensitive to peptide sequence with overall charge of protein, the characterization of the possible enzymes is important to understand of glycosylation sites (Bennett et al., 2012). According to our EVP results, because Thr 2163 was high glycosylation potential via T5 and T11 of GalNAc-T isoform activity, this site could be a good candidate for O-GalNAc type (mucin type) glycosylation.
The O-linked GlcNAc glycosylation was exclusively analyzed on a large collection of cytosolic and nuclear proteins (Hart et al., 2011). However, recent studies have shown that the EGF domain-specific O-GlcNAc modification was also seen on an extracellular protein such as Dumpy protein (Sakaidani et al., 2011). Although the EGF domains were a well-know, EGF domain-containing proteins had not been understood well enough as well as glycosylation of these signaling molecules. We determined two critical sites for O-b linked GlcNAc glycosylation. One of them, Ser 2155 , was located at the 5 th reelin repeat, and the other, Thr 2512 , was located at 6 th reelin repeat. Because these sites were close vicinity of EGF-like repeats, they could regulate the molecular interaction of reelin with partners. According to our prediction results, reelin glycoprotein of Homo sapiens contained more sites that were candidate for O-b linked GlcNAc compared to other types of glycosylations ( Figure 2). It is also known that transcripts of both responsible enzymes, O-GlcNAc transferase and O-GlcNAcase, for O-linked GlcNAc glycosylation are very abundant in brain tissue (Wani et al., 2017). Because reelin is derived from neuronal origin (Botella-Lopez et al., 2006), it can contain high possible glycoylasiton sites for O-linked GlcNAc.
In summary, the present findings have shown that the glycosylation pattern of 5 th -6 th reelin repeats were very similar between vertebrate species, and many glycosylation sites present on these repeats were being high surface accessibility. Reelin contained possible N-and O-glycosylation sites that could be important in molecular interaction with its partner molecules, because of the its neutral vacuum electrostatics in many regions. Also, glycosylations are crucial for protein secretion, as they influence protein folding, provide ligands for lectin chaperones, contribute to quality control surveillance in the ER and mediate transit and selective protein targeting throughout the secretary pathway (Hoseki et al., 2010;Moremen et al., 2012). These modulations help to determine glycoprotein's overall energy (thermostability and protein stabilization), because of the complexity and diversity of glycans present on protein structure. The bioinformatics data may be helpful to understand uncovered points, and these results may guide biochemical, genetic and glycobiology based further experiments about reelin glycosylation.

Disclosure statement
No potential conflict of interest was reported by the authors.