Possible molecular mechanisms of species recognition by barnacle larvae inferred from multi-specific sequencing analysis of proteinaceous settlement-inducing pheromone

Gregarious settlement is essential for reproduction and survival of many barnacles. A glycoprotein, settlement-inducing protein complex (SIPC) has been recognized as a signal for settlement and it is expressed in both conspecific adults and larvae. Although the settlement-inducing activities of SIPC are species-specific, the molecular-based mechanism by which larvae distinguish conspecific SIPC from the SIPC of other species is still unknown. Here, the complete primary structure of the SIPC of Megabalanus coccopoma, as well as the partial structure of the SIPCs of Balanus improvisus, Megabalanus rosa, and Elminius modestus are reported. These SIPCs contain highly variable regions that possibly modulate the affinity for the receptor, resulting in the species specificity of SIPC. In addition, the distribution patterns of potential N-glycosylation sites were seen to be different among the various species. Differences in such post-translational modifications may contribute to the species specificity of SIPC.


Introduction
Some barnacle species settle on artificial objects such as ships' hulls, aquaculture net cages and pipes in coastal plants, resulting in huge economic losses and an increase in the emission of environmentally harmful gases (eg Callow and Callow 2011;Schultz et al. 2011). Many studies have investigated the development of environmentally friendly antifouling (AF) techniques (for reviews see Chambers et al. 2006;Mare´chal and Hellio 2009;Magin et al. 2010;Callow and Callow 2011;Scardino and De Nys 2011). However, many of these techniques can be problematic, particularly in terms of specificity for particular organisms, cost and long-term performance, so understanding the settlement mechanisms of fouling organisms is important for improving non-biocidal AF technologies (eg Aldred and Clare 2008;Callow and Callow 2011).
For many barnacle species, gregarious larval settlement is critical for successful reproduction because although the adults are sessile, they are frequently observed undergoing cross-fertilization. During the settlement stage, the barnacle cypris larva (cyprid) explores the substratum with highly specialized antennules, which carry chemosensory and/or mechanosensory setae, in order to select its lifetime habitat (Walker 1987;Bielecki et al. 2009). Recently, various researches have focused on the cyprid settlement process. For example, video observation has demonstrated the action of the setae during exploration of the substratum (Maruzzo et al. 2011), while another study revealed that cyprids preferentially selected surface textures that increased attachment strength (Aldred et al. 2010). In addition, transcriptomic analysis using 454 technologies (Chen et al. 2011;De Gregoris et al. 2011) and proteomic approaches (Thiyagarajan and Qian 2008) have identified the genes or proteins involved in larval settlement.
Both chemical and physical cues have long been known to be involved in the settlement of cyprids (Finelli and Wethey 1999), and evidence of conspecific chemical cues as settlement-inducing factors has accumulated . A proteinaceous settlement-inducing pheromone called the settlement-inducing protein complex (SIPC) was purified from Balanus amphitrite Darwin 1854 (Matsumura et al. 1998a), and the cDNA sequence was cloned and sequenced (Dreanno et al. 2006a). SIPC is a glycoprotein comprising 3 major subunits (76, 88, and 98 kDa) and all the subunits independently have settlement-inducing activity as high as intact SIPC (Matsumura et al. 1998a). SIPC is expressed in the cuticle of adults (Dreanno et al. 2006b), which supports the original hypothesis that the gregarious settlement signal is a cuticular protein (Crisp and Meadows 1962). Cyprids may sense SIPC in the epicuticular layer on the outer surface of the shell (Clare 2011).
Immunological studies have indicated that thoracic barnacles commonly have SIPC (Kato-Yoshinaga et al. 2000). However, the molecular weight of SIPC is slightly different among species, suggesting speciesspecific structures (Kato-Yoshinaga et al. 2000). Dreanno et al. (2007) demonstrated that the SIPC of B. amphitrite was more active in conspecific larvae than that of B. improvisus Darwin 1854, Megabalanus rosa (Pilsbry 1916), and Elminius modestus Darwin 1854; this finding was consistent with the function of SIPC.
Although several studies on SIPC have been published (Clare 2011), most have focused on B. amphitrite.
Although a great deal of research on barnacle larval settlement has been performed, a molecularbased mechanism by which cyprid larvae distinguish conspecific species from other species remains undetermined and a comparison of the multi-specific structures of SIPC is required. In this study, the fulllength primary structure of SIPC from M. coccopoma (Darwin 1854) and the partial structures of SIPC from B. improvisus, M. rosa, and E. modestus were determined. The molecular-based mechanism of barnacle gregarious settlement is discussed in the light of the results from sequence analysis.
Purification and amino acid sequencing of SIPC from M. coccopoma The soft bodies of M. coccopoma were crushed and homogenized in 61.5 volume of 50 mM Tris-HCl (pH 7.5), then filtered though gauze. The homogenates were centrifuged at 15,000 6 g for 30 min and filtered (Whatman No. 3 paper). Proteins were precipitated overnight with 70% ammonium sulfate, and the slurry was centrifuged at 15,000 6 g for 30 min. The pellet was suspended in 50 mM Tris-HCl (pH 7.5) and dialyzed against the same buffer. After centrifugation at 15,000 6 g for 60 min, the supernatant was filtered through a 0.2-mm membrane filter. The filtrate was applied to a Resource Q column (GE Healthcare, Piscataway, NJ, USA) equilibrated with 50 mM Tris-HCl (pH 7.5). The adsorbed proteins were eluted using a 0.1-M stepwise gradient of 0-0.5 M NaCl. The active fractions were determined using an Immuno-Tek ELISA Construction System (ZeptoMetrix, Buffalo, NY, USA) with anti-76 kDa SIPC of B. amphitrite antibodies (Matsumura et al. 1998c) and peroxidase-conjugated anti-rabbit IgG goat antibodies (A-0545; Sigma-Aldrich Corp., St Louis, USA) as primary and secondary antibodies, respectively. The active fraction was concentrated by ultrafiltration and dialyzed against 0.5 M NaCl and 50 mM Tris-HCl (pH 7.5). The concentrated active fraction was applied to a lentil lectin (LCA)-Sepharose column (Pharmacia Biotech, Uppsala, Sweden) equilibrated with 0.5 M NaCl and 25 mM Tris-HCl (pH 7.5). The column was washed using the same buffer, and the adsorbed proteins were eluted with 50 ml of 0.2 M methyl a-D-mannopyranoside (MMP) in the same buffer. The fractions eluted by 0.2 M MMP were pooled, concentrated by ultrafiltration and dialyzed against 50 mM Tris-HCl (pH 7.5). All procedures were performed at 48C. The band stained with Coomassie brilliant blue was excised and treated with 0.2 mg of Achromobacter protease I (provided by Dr Masaki, Ibaraki University (Masaki et al. 1981)) at 378C for 12 h in 0.1 M Tris-HCl (pH 9.0) containing 0.1% sodium dodecyl sulfate.
For sequencing, the peptides generated were extracted from the gel and separated on columns of DEAE-5PW (1 6 20 mm; Tosoh, Tokyo) and Inertsil ODS-3 (1 6 100 mm; GL Sciences Inc., Tokyo, Japan) connected in series with an Agilent 1100 Series liquid chromatography system (Agilent Technologies, Waldbronn, Germany). The peptides were eluted at a flow rate of 20 ml min 71 using a linear gradient of 0-60% solvent B, where solvents A and B were 0.09% (v/ v) aqueous trifluoroacetic acid and 0.075% (v/v) trifluoroacetic acid in 80% (v/v) acetonitrile, respectively. The selected peptides were subjected to Edman degradation by using a Procise cLC protein sequencing system (Applied Biosystems, Foster City, CA).

RNA extraction and first-strand cDNA synthesis
Total RNA was extracted using Isogen (Nippon Gene, Toyama, Japan), according to the manufacturer's protocol. Extracted RNA was treated with DNase (Invitrogen, Carlsbad, CA, USA). First-strand cDNA was synthesized using the SuperScript III First-Strand Synthesis System (Invitrogen).

Cloning and sequencing of SIPC cDNAs
A partial cDNA sequence of M. coccopoma SIPC was amplified using polymerase chain reaction (PCR) with degenerate primers designed according to the amino acid sequences listed in Table 1. The DNA band was extracted from the agarose gel using a QIAquick Gel Extraction Kit (Qiagen, Chatsworth, CA). This partial cDNA fragment was cloned in E. coli HST08 premium competent cells (Takara Bio, Shiga, Japan) and sequenced using an ABI3130 genetic analyzer (Applied Biosystems, Carlsbad, CA, USA). Then, 5 0 -and 3 0 -UTRs were determined using the SMARTer RACE method (Clonetec, Cambridge, UK). The full-length SIPC open reading frame (ORF) was amplified by endto-end PCR with primers located in the 5 0 -and 3 0 -UTRs and Pfu DNA polymerase (Promega, Fitchburg, WI, USA). The full-length SIPC ORF was cloned and sequenced using a ABI3130 genetic analyzer.
Approximately 2.7 kbp of partial SIPC cDNA sequences on the 3 0 side of B. improvisus, M. rosa, and E. modestus were amplified by PCR with primers designed on the basis of common amino acid sequences between B. amphitrite and M. coccopoma with Pfu DNA polymerase (Promega). Each DNA band was extracted from the agarose gel using a QIAquick Gel Extraction Kit. These partial cDNA fragments were cloned and sequenced using a ABI3130 genetic analyzer. The primer sequences for PCR are indicated in Table 2. The sequence data have been deposited in the DDBJ database (accession numbers: AB695617, AB695618, AB695619, and AB695620 for M. coccopoma, B. improvisus, M. rosa, and E. modestus, respectively).

Sequence analysis
A signal peptide search was performed using the SOSUIsignal program (http://bp.nuap.nagoya-u.ac.jp/ sosui/sosuisignal/sosuisignal_submit.html). N-glycosylation sites were predicted using the NetNGlyc 1.0 program (http://www.cbs.dtu.dk/services/NetNGlyc/). The amino acid sequences of SIPCs were aligned to those of a 2 -macroglobulins (A2M) using Clustal-W ( Thompson et al. 1994). A phylogenetic tree was constructed on the basis of the amino acid sequences using the neighbor-joining method and assuming gamma distances (a ¼ 1.1) with MEGA 5.05 (Tamura et al. 2011).

SIPC sequences
Three peptide sequences were determined from purified SIPC of M. coccopoma (Table 1), and they showed significant homology with those detected from SIPC of B. amphitrite. The full-length cDNA sequence of SIPC from M. coccopoma was 5249 bp, comprising a 5 0 -UTR of 34 bp, an ORF of 4605 bp, and a 3 0 -UTR of 610 bp. The full-length ORF encoded 1534 amino acid residues. All the peptide sequences determined by Edman degradation were present in this sequence (Table 1). A BLAST (National Library of Medicine, Bethesda, MD, USA) search indicated that the predicted amino acid sequence showed 63% homology with the SIPC of B. amphitrite. The sequence also showed a close relationship with the sequence of the thioester-containing protein (TEP) family, as did the SIPC of B. amphitrite, and showed the highest homology with the sequences of the A2Ms of a tick, Ixodes ricinus (30%). The signal peptide analysis indicated a 19-residue signal peptide at the N-terminal segment that did not show significant homology to that of SIPC from B. amphitrite. SIPC from M. coccopoma contained eight potential N-glycosylation sites (Asn-Xaa-Ser/Thr) (Figure 1).  Partial cDNA sequences of 2656, 2641, and 2658 bp were determined for SIPC from B. improvisus, M. rosa, and E. modestus, respectively. These sequences contained four potential N-glycosylation sites ( Figure  1). The predicted amino acid sequence of SIPC from B. improvisus showed 76% homology with SIPC of B. amphitrite and 30% homology with the A2M of a horseshoe crab, Limulus sp. The predicted amino acid sequence of SIPC from M. rosa showed 67% homology with SIPC from B. amphitrite and 33% homology with A2M of a tick Ornithodoros moubata. The predicted amino acid sequence of SIPC from E. modestus showed 70% homology with SIPC from B. amphitrite and 33% homology with A2M of an ant Acromyrmex echinatior.
Two highly variable regions were detected among the species ( Figure S1, Supplementary material).
[Supplementary material is available via a multimedia link on the online article webpage.] When the SIPC sequences were aligned with the A2M sequence of a horseshoe crab, Limulus sp., the variable regions were not detected in the A2M sequence ( Figure S1, Supplementary material).

Phylogenetic analysis
The SIPCs of the four species sequenced in this study and that of B. amphitrite formed their own clade with a high bootstrap value (Figure 2). SIPCs of the same genus showed monophyly, with high bootstrap values. The tree was constructed without variable regions because the A2M sequences did not contain any sequences that were homologous with the variable regions.

Discussion
The SIPC sequence for B. amphitrite showed *30% homology with those of A2M, complement factor, and the TEP family (Dreanno et al. 2006a). The amino acid sequences of the SIPCs determined in this study shared 63-76% homology with the SIPC sequence of B. amphitrite, while homology with the A2M sequence was 30-33%. Phylogenetic analysis indicated that the SIPC sequences analyzed in this study clustered with SIPC sequence for B. amphitrite with high bootstrap value (Figure 2). In addition, all the peptide sequences of the SIPC of M. coccopoma, determined by Edman degradation, occurred in the predicted amino acid sequence. These findings indicate that the sequences analyzed in this study are likely to be the SIPCs of each species. The SIPC of M. coccopoma has a 19-residue signal peptide in the N-terminal segment, as shown for the SIPC of B. amphitrite (Dreanno et al. 2006a). Similar to B. amphitrite, complete SIPC sequences of M. coccopoma contained the QTD motif, the FXVXXYVLPXFE region, and the STQDT region, but did not contain the GCGEQ region, which is functionally important in the TEP family (Dreanno et al. 2006a). Likewise, partial SIPC sequences of B. improvisus, M. rosa, and E. modestus contained the STQDT region, but not the GCGEQ region. These structural signatures, similar to those of SIPC from B. amphitrite, support the hypothesis that the SIPCs of the species analyzed in this study function as settlement-inducing pheromones. However, assays of SIPCs need to be performed to confirm this hypothesis.
The SIPC sequences had highly variable regions among the species (Figure 1). A2Ms did not contain any homologous sequences with the variable regions; therefore, the phylogenetic tree ( Figure 2) was constructed without these variable regions. The settlement-inducing activity of SIPC from E. modestus on cyprids of B. amphitrite is significantly weaker than the settlement-inducing activity of SIPCs of B. improvisus and M. rosa and conspecific SIPC (Dreanno et al. 2007). Dreanno et al (2007) stated that such a result was consistent with the suggestion that the activity of the barnacle settlement pheromones reflects their phylogenetic relationships (Knight-Jones 1955;Crisp 1990;Willis et al. 1990;Dineen and Hines 1994;Kato-Yoshinaga et al. 2000). The phylogenetic tree presented here indicates that SIPC from E. modestus is the most primitive (Figure 2), and the conserved regions of SIPC from B. amphitrite are more closely related to those of the SIPC from B. improvisus than those of either Megabalanus or E. modestus. However, the settlementinducing ability of SIPC from M. rosa on cyprids of B. amphitrite is as high as that of B. improvisus (Dreanno et al. 2007). This suggests that conserved regions alone do not express the species specificity of SIPC.
SIPC has been shown to consist of three major subunits of approximately 76, 88, and 98 kDa in B. amphitrite, B. eburneus and M. rosa, respectively ( Matsumura et al. 1998a;Kato-Yoshinaga et al. 2000). In the SIPC of B. amphitrite, each subunit has been shown to have settlement-inducing ability (Matsumura et al. 1998a). Dreanno et al. (2006a) indicated that the subunits are coded on a single cDNA sequence. The 76-kDa subunit is coded on the 3 0 part of the ORF, while the 88-and 98-kDa subunits are coded on the 5 0 part of the ORF (Dreanno et al. 2006a). The variable regions indicated in the current study are located in the central part of the ORF (Figure 1). Therefore, variable regions are possibly included in all subunits. Interestingly, the A2M of the primitive arthropod, Limulus sp., does not contain the variable regions (see Figure S1, Supplementary material) [Supplementary material is available via a multimedia link on the online article webpage] suggesting that these regions are acquired after duplication of the common ancestral molecule of SIPC and A2M. These variable regions may correlate with the species specificity of SIPC. The following hypotheses for species specificity of SIPC are proposed: (1) variable regions act as receptor-binding domains, or (2) conserved regions are commonly functional in the larval settlement-inducing activity among species. Variable regions are correlated with the affinity to the receptors of cyprid larvae, resulting in the species specificity of SIPC. Higher-order structural analyses of SIPC and identification of the receptors are required to test these hypotheses.
The lentil lectin-binding sugar chain has been indicated to be involved in the settlement-inducing ability in B. amphitrite (Matsumura et al. 1998b). The distribution patterns of the N-glycosylation sites are different among species (Figure 1). Such differences in the distribution patterns or moiety of sugar chains may be involved in species specificity. A recent study revealed that SIPC of B. amphitrite contains high levels of mannose glycans (Pagett et al. 2012). Advanced studies that focus on sugar moieties and their effect on settlement-inducing activity should be performed in the future.