Cell communication using intrinsically disordered proteins: what can syndecans say?

Because intrinsically disordered proteins are incapable of forming unique tertiary structures in isolation, their interaction with partner structures enables them to play important roles in many different biological functions. Therefore, such proteins are usually multifunctional, and their ability to perform their major function, as well as accessory functions, depends on the characteristics of a given interaction. The present paper demonstrates, using predictions from two programs, that the transmembrane proteoglycans syndecans are natively disordered because of their diverse functions and large number of interaction partners. Syndecans perform multiple functions during development, damage repair, tumor growth, angiogenesis, and neurogenesis. By mediating the binding of a large number of extracellular ligands to their receptors, these proteoglycans trigger a cascade of reactions that subsequently regulate various cell processes: cytoskeleton formation, proliferation, differentiation, adhesion, and migration. The occurrences of 20 amino acids in syndecans 1–4 from 25 animals were compared with those in 17 animal proteomes. Gly + Ala, Thr, Glu, and Pro were observed to predominate in the syndecans, contributing to the lack of an ordered structure. In contrast, there were many fewer amino acids in syndecans that promote an ordered structure, such as Cys, Trp, Asn, and His. In addition, a region rich in Asp has been identified between two heparan sulfate-binding sites in the ectodomains, and a region rich in Lys has been identified in the conserved C1 site of the cytoplasmic domain. These particular regions play an essential role in the various functions of syndecans due to their lack of structure.


Introduction
The traditional protein-structure paradigm describes that a protein folds into a single functional, three-dimensional structure, and that its amino acid sequence determines its unique native conformation and unique function (Anfinsen, 1973). The category of intrinsically disordered proteins (IDPs) is a notable exception to this because these proteins are unable to achieve a fixed, unique tertiary structure on their own, yet they can form such structures after binding to various partners. Therefore, the structures and functions of such proteins depend on the characteristics of the macromolecules to which they bind. Many IDPs that are known to be essential for diverse cellular functions have been discovered individually (Dyson & Wright, 2005;Tompa, 2012;. Interestingly, many genes in many different organisms encode IDPs. For example, a large fraction of RNA-binding proteins consist of domains that lack a three-dimensional structure, and they are predicted to have prion-like domains that play a crucial role in RNP assembly but can also cause pathological aggregation (Alberti, Halfmann, King, Kapila, & Lindquist, 2009;Gitler & Shorter, 2011;Halfmann et al., 2011). For certain proteins, the presence of natively unstructured regions is suggested to be of functional importance. Syndecans, which are transmembrane proteoglycans, can be natively disordered due to their diverse functions and large number of interaction partners.
Syndecans (meaning "to link" in Greek) were first discovered in 1989 by the group of M. Burnfield in California (Stanford University School of Medicine) during the study of the proteins that are associated with the cell surface and contain heparan sulfate and chondroitin sulfate chains (Saunders, Jalkanen, O'Farrell, & Bernfield, 1989).
It has been noted that syndecans are synthesized by most of the animal cells. They consist of a core protein with covalently linked glycosaminoglycan chains (Kokenyesi & Bernfield, 1994;Weinbaum, Tarbell, & Damiano, 2007). The synthesis of the polypeptide chain of the core protein begins on membrane-bound ribosomes and proceeds in the lumen of the endoplasmatic reticulum. Binding of the glycosaminoglycan chains proceeds in the Golgi apparatus, and then the proteoglycans are delivered by exocytosis to the cell surface.
Additionally, the polysaccharide chains in the Golgi apparatus are modified by epimerization and the addition of sulfate groups. There are four types of syndecans in vertebrates, all of which are composed of a core protein and covalently linked glycosaminoglycan chains (Alberts et al., 2008). Glycosaminoglycan chains may carry 50-200 negatively charged disaccharide units due to the attached sulfate groups. Therefore, they can hold a large number of positively charged molecules. Moreover, because of the accumulation of charge, glycosaminoglycan chains are repelled from each other and extend into space to increase the area of their interaction (Manon-Jensen, Itoh, & Couchman, 2010).
The sulfation of glycosaminoglycans is not homogeneous; there are sites with a small number of sulfate groups that are distributed among other sites that have a large content of sulfate groups. Frequently, slightly sulfated sites are concentrated closer to the core protein of a syndecan (Manon-Jensen et al., 2010). The core protein, in turn, contains three types of domains: extracellular, transmembrane, and cytoplasmic (see Figure 1).
The expression of genes that encodes syndecans depends on the type of tissues: the gene of syndecan-1 is expressed predominantly in epithelial tissues, the gene of syndecan-2 in mesenchymal tissues, the gene of syndecan-3 in nervous tissues, and the gene of syndecan-4 in almost in all tissues (Kim, Goldberger, Gallo, & Bernfield, 1994). As a rule, cells express limited types of syndecans, and the expression varies, depending on the developmental stage. The intensive synthesis of syndecan-2 in the mesenchymal tissues proceeds in the late stages of morphogenesis, whereas it is almost entirely unobserved in the early stages. In contrast, the synthesis of syndecan-1 occurs consistently in the early stages and is almost absent in the later stages of morphogenesis (David et al., 1993). The expression of the gene of syndecan-1 is increased in the later stages of differentiation of immune system cells both in plasmatic cells (Kim et al., 1994) and tissue macrophages (Wang, Haller, Wen, Wang, & Chaikof, 2008).
The various syndecans of vertebrates originated from only one type of syndecan of invertebrates and, after Notes: Syndecans 1 and 3 are longer than syndecans 2 and 4. The glycosaminoglycan chains are covalently linked to a core protein: the serine is linked to xylose, which is successively attached to two galactose residues and one glucuronic acid residue. Heparan sulfate consists of N-acetylglucosamine with glucuronic acid or N-acetylglucosamine with iduronic acid. The chondroitin sulfate consists of N-acetylgalactosamine with glucuronic acid. The cytoplasmic domain has two conserved С1 and С2 sites and one variable V site that is unique for each type of syndecan. ТМ (transmembrane domains) have GxxxG motifs that are conserved for each type of syndecan. For syndecan-1, it is GGLVG; for syndecan-2, it is GGVIG; for syndecan-3, it is GGVVG; and for syndecan-4, it is GGxVG. gene duplication, formed two groups of syndecans; one of them was an ancestor of syndecans 1 and 3, and the other was an ancestor of syndecans 2 and 4 (Chakravarti & Adams, 2006;Chen, Couchman, Smith, & Woods, 2002). This explains the high homology within each group. Syndecans 2 and 4 are shorter than the others and carry only heparin sulfate chains, whereas syndecans 1 and 3 are longer and carry both heparin sulfate and chondroitin sulfate chains (Figure 1).
It is known that many receptors and co-receptors, including syndecans, undergo dimerization (Heldin, 1995). Usually, this occurs in lipid rafts, the parts of a cell membrane rich in cholesterol and sphingolipids (Simons & Ikonen, 1997). Lipid rafts are crucial for receptor binding and signal transduction from the cell surface into the cell. Often, lipid rafts are floating on the cell surface and are stabilized by the cooperation of specific receptor proteins in the cell membrane. The transmembrane domain of all syndecans includes a conserved motif GxxxG that is necessary for the dimerization of syndecans and for retaining cholesterol in the membrane (Barrett et al., 2012). It is known that decreasing the cholesterol level in the lipid rafts leads to their degradation and to the damage of the signaling cascade (Li, Park, Ye, Kim, & Kim, 2006). It should be noted that syndecans vary in their tendency to form homodimers: syndecan-1 dimerizes only weakly, syndecans 3 and 4 dimerize strongly and syndecan-2 dimerizes very strongly (Dews & MacKenzie, 2007). Moreover, certain types of syndecans can form heterodimers and even heterotrimers. The exceptions are syndecans 1 and 4, which cannot form heterodimers with each other. Ligands that bind to syndecans are able to promote their dimerization and, thus, may drive few signaling cascades (Dews & MacKenzie, 2007).
In organisms, syndecans can be found in two forms: membrane-incorporated and "soluble" (Bernfield et al., 1999;Wang, Götte, Bernfield, & Reizes, 2005). The "soluble" form is the ectodomain that has been shed from the cell surface by a process governed by extracellular, zinc-dependent endopeptidases and metalloproteinases (MMPs) (Figure 2). This shedding is regulated by a large number of extracellular stimulating agents: chemokines, growth factors, trypsin, heparanase, virulent components of bacteria, insulin, and cellular stress. In general, the ectodomain shedding is multiplied in response to inflammation and other destructive processes that proceed in an organism (Li, Park, Wilson, & Parks, 2002;Manon-Jensen et al., 2010). The "soluble" syndecans 1 and 4 are concentrated in large amounts in the fluid that accumulates around wounds. The shedding of the extracellular domain is increased in response to the thrombin and epidermal growth factors that are especially active during wound healing (Subramanian, Fitzgerald, & Bernfield, 1997). Certain signaling transducers, such as protein kinase C and nuclear transcription factor NF-кB, also influence the shedding of ectodomains. In addition, the phosphorylation of tyrosines in the conserved sites of the cytoplasmic domain initiates the shedding of ectodomains (Manon-Jensen et al., 2010). Interestingly, tyrosine residues are absent in the conserved C1 and C2 sites in frog syndecan-4. At the present time, the detailed mechanism of the activation of ectodomain shedding by various stimulants remains obscure.
Syndecan-3 is very important for the normal functioning of the brain, appetite regulation, memory strengthening, and skeletal musculature development (Kaksonen et al., 2002;Karlsson-Lindahl et al., 2012;Pisconti, Cornelison, Olguin, Antwine, & Olwin, 2010). After knockout of the encoding gene of syndecan-3, a mouse phenotype became thin and inactive (Zheng et al., 2010). The understanding of the mechanism of regulation by "soluble" syndecan-3 that underlies the functioning of the hippocampus and the development of the brain cortex requires additional investigation.
In this paper, we demonstrate that the extracellular and cytoplasmic domains of syndecans are intrinsically disordered regions, and we present a sequence analysis of the syndecans from 25 animal organisms.

Search for disordered residues
Disordered residues were predicted using the IsUnstruct program, which is based on the Ising model (Lobanov & Galzitskaya, 2011). The parameters of the program were determined and optimized on the basis of protein structure statistics. Testing demonstrated that the program yields reliable predictions. The program was previously developed and is available at http://bioinfo.protres.ru/ IsUnstruct (Lobanov, Sokolovskiy, & Galzitskaya, 2013). It should be noted that the Ising model has been used successfully to describe the two states related to the helix-coil transition for homopolypeptide chains. The PONDR-FIT method was used to check the reliability of the predictions; a metaserver yields a consensus prediction for 10 programs (Xue, Dunbrack, Williams, Dunker, & Uversky, 2010).
3. Results and discussion 3.1. Syndecan structure and comparative characteristics of the amino acid sequences The diversity of the functions of syndecans is dependent on the syndecan structures. As mentioned above, there is a common (typical of all syndecans) pattern of attachment of glycosaminoglycans to a core protein. Both the heparan sulfate and chondroitin sulfate chains are linked covalently to serine residues. It is common to all syndecans that glycine links covalently next in the sequence to the serine residue. The binding is driven by a "special tetrasaccharide" that consists of xylose successively attached to two galactose residues and one glucuronic acid residue. This structural unit acts as an "original" primer that triggers the growth of the polysaccharide chain. Heparan sulfate is composed of two repeating disaccharide units N-acetylglucosamine-glucuronic acid and N-acetylglucosamine-iduronic acid, whereas the chondroitin sulfate consists of a repeating disaccharide  unit that contains N-acetylgalactosamine-glucuronic acid (see Figure 1).
Sequence analyses of all types of syndecans from 25 organisms have shown that there is a region rich in aspartic acid (D) that is located between two heparan sulfate-binding sites in the ectodomains and promotes the lack of structure (Table 1). In spite of a significant difference in the amino acid sequence of the ectodomains, these regions remain conserved and are typical of all types of syndecans. The absence of these regions in syndecan-3 of certain organisms, such as Pan troglodytes (common chimpanzee), Saimiri boliviensis (squirrel monkey), Ovis aries (sheep), Felis catus (domestic cat), and Mustela putorius furo (ferret), indicates alternative splicing (Table 1).
It has been proposed that the genes encoding natively disordered proteins undergo an alternative splicing more frequently than the genes encoding ordered proteins (Romero et al., 2006). According to the Consensus CDS protein set CCDS database (http://www.ncbi.nlm.nih. gov/CCDS/CcdsBrowse.cgi), the genes that encode syndecans have five exons: the first exon encodes a signal peptide; the second exon encodes the attachment sites for heparan sulfate; the third and fourth exons encode the sites of chondroitin sulfate binding to syndecans 1 and 3 only; and the fifth exon encodes transmembrane and cytoplasmic domains. The second and fourth exons are assumed to undergo alternative splicing. As a result, during mRNA "maturation," they may be spliced from pre-mRNA along with introns in various combinations and to create four forms of mRNA. Each type of syndecans may be found in four isoforms (Leonova & Galzitskaya, 2013b). However, there is no experimental confirmation of the existence of syndecan isoforms; in some sequences of syndecan-3, we have found the absence of a region referring to the second exon. Therefore, taking into account the fact that the second alternative exon encodes three sites for the attachment of heparan sulfate in syndecans 1, 2, and 4 and four sites in syndecan-3, it is possible to suggest the existence of isoforms of syndecans that lack heparan sulfate. In addition, the fourth alternative exon (in syndecans 1 and 3) contains a site for the attachment of chondroitin sulfate chains (Figure 3), and the deletion of these sites can lead to a lack of chondroitin sulfate chains in particular isoforms. The presence of such isoforms may seriously affect their functions and lead to various diseases, including atherosclerosis, hypertriglyceridemia, or, as in anorexia, reduction in the neuromediator secretion by nervous cells. Additional investigation is required to confirm the existence of isoforms of syndecans.
The cytoplasmic domain of syndecans has two conserved C1 and C2 sites that flank the variable region V. The conserved site C2 (EFYA) of the cytoplasmic domain binds to PDZ-binding proteins, such as synbindin, synectin, CASK/LIN-2, and syntenin, that play an important role in vesicular transportation, synaptic signaling, neuronal migration, and cancer metastasizing (Beauvais & Rapraeger, 2004). The conserved C1 site interacts with the actin-binding proteins ezrin, radixin, and moesin that regulate the organization of the actin cytoskeleton. It should be noted that the presence of Lys in region C1 can contribute to the lack of a three-dimensional structure. The exception is syndecan-4 in frogs (Table 2) because of the absence of lysine in region C1. The variable region V determines the specificity and uniqueness of each type of syndecan. The conserved sites C1 and C2 are typical not only of vertebrates but also of invertebrates (Brown, 2011). The transmembrane domain of all syndecans contains the conserved motif GxxxG ( Table 2) that is necessary for the dimerization of syndecans and for the retention of cholesterol in the membrane (Barrett et al., 2012).

Comparison of the amino acid compositions of the syndecans of various animals
The occurrences of 20 amino acid residues in 17 animal proteomes are summarized in Table 3 according to our recent estimates (Lobanov & Galzitskaya, 2012). The occurrences established for animal syndecan proteins are summarized in Table 3. In spite of the wide variation Cell communication observed for the ectodomain, certain regularities persisted, i.e. Cys residues were almost absent, whereas Pro and Thr were abundant (except for in syndecan-4).
Based on the occurrences of 20 amino acid residues in the syndecan proteins of various organisms, it can be observed that syndecan-1 contains higher levels of proline, glutamic acid, alanine, glycine, and threonine and lower levels of cysteine, phenylalanine, isoleucine, asparagine, tyrosine, arginine, and lysine than the average levels of these amino acids in the 17 proteomes. Syndecan-2 contains higher levels of alanine, threonine, serine, lysine, and asparaginic and glutamic acids, and lower levels of cysteine, asparagine, tryptophan, glutamine, arginine, and histidine. Syndecan-3 typically has higher levels of alanine, valine, threonine, proline, and glutamic acid and lower levels of cysteine, methionine, phenylalanine, isoleucine, tryptophan, tyrosine, glutamine, asparagine, histidine, and arginine. Syndecan-4 contains higher levels of isoleucine, leucine, valine, glycine, proline, and aspartic and glutamic acids, and lower levels of cysteine, tryptophan, threonine, asparagine, glutamine, histidine, and arginine. It should be noted that glycine, threonine, glutamine, glutamic acid, and proline are members of residues that facilitate the disordered arrangement of proteins (Campen et al., 2008;Dunker et al., 2001;Garbuzynskiy, Lobanov, & Galzitskaya, 2004;Lobanov, Garbuzynskiy & Galzitskaya, 2010;Romero et al., 2001;Williams et al., 2001). In addition, the ectodomains of certain animals include disordered motifs that we previously observed in the protein structure database (Chen et al., 2004;Lee et al., 2011), such as polyhistidine and polythreonine repeats in frog syndecan-1; EPTNSS in Gallus gallus syndecan-1; APEDPE in Bos taurus syndecan-1; PSPPP in syndecan-3 of certain animals, such as Callithrix jacchus, Pongo abelii, Saimiri boliviensis; and others. The motif SGDLDD is present in most animal syndecan-4 structures and a poly-D motif is present in frogs. Additionally, the poly-D motif was observed in all of the syndecan proteins from our database. This motif is located between two heparan sulfate-binding sites in ectodomains (Table 1).
For the reliable prediction of the location of disordered regions, it is advantageous to use several programs employing different principles (Ferron, Longhi, Canard, & Karlin, 2006). Here, we utilize the program IsUnstruct (Lobanov & Galzitskaya, 2011) based on the Ising model  and the metaserver PONDR-FIT (Xue et al., 2010), which combines the results of multiple disorder predictors for the analysis of the syndecan sequences. The profiles of the disordered regions in human syndecans are shown in Figure 3. The two prediction programs report very similar order/disorder trends over the entire sequence (see Figure 4 and the Supplementary material). The two programs indicate that the large variation for syndecan-2 is localized at positions 70-90, residues that are unfolded in the case of IsUnstruct and folded in the case of PONDR-FIT; and for syndecan-3, at positions 200-220, residues that are unfolded in the case of PONDR-FIT and folded in the case of IsUnstruct. In general, (see http://bioinfo.protres.ru/misc/syndecan_25. xlsx) the syndecan 1-4 proteins are characterized by their disordered structure, with the exception of the transmembrane domain and the signal peptide.
Because of their diverse functions and numerous interaction partners, syndecans can be expected to be natively disordered. Natively disordered proteins lack a unique tertiary structure in isolation and acquire it only when interacting with a partner (Dyson & Wright, 2005;Tompa, 2012;. The conformation of such a protein in a complex is determined to a certain extent by the interaction partner, rather than depending solely on the amino acid sequence, as is Table 3. Comparative occurrences (%) of 20 amino acid residues in 17 animal proteomes and in each type of syndecan.
In conclusion, syndecans as transmembrane proteoglycans are able to perform a wide range of functions that are determined by their structure, which is expected to be natively disordered. Natively disordered proteins lack a unique tertiary structure in isolation and acquire it only when interacting with a partner. The conformation of such a protein in a complex is determined to a certain extent by the interaction partner, rather than depending solely on the amino acid sequence, as is characteristic of structured proteins. It has been noted that the occurrences of 20 amino acids in syndecans 1-4 from 25 animals are different from those in 17 animal proteomes. The number of occurrences of Gly + Ala, Thr (except in syndecan-4), Glu, and Pro were higher in all syndecans. These amino acid residues are known to induce the lack of an ordered structure. In contrast, lower quantities were observed of amino acids that promote an ordered structure, such as Cys, Trp, Asn, and His (except in syndecan-1). In addition, all types of syndecans contain the poly-D motif in a region between two heparan sulfatebinding sites in the ectodomains and a region rich in Lys in the conserved C1 site of the cytoplasmic domain. The amino acid compositions of syndecans play an essential role in the formation of the disordered regions and govern their functions.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.10.1080/07391102.2014.926256.