Modelling and analysis of early aggregation events of BMHP1-derived self-assembling peptides

Despite the increasing use and development of peptide-based scaffolds in different fields including that of regenerative medicine, the understanding of the factors governing the self-assembly process and the relationship between sequence and properties have not yet been fully understood. BMHP1-derived self-assembling peptides (SAPs) have been developed and characterized showing that biotinylation at the N-terminal cap corresponds to better performing assembly and scaffold biomechanics. In this study, the effects of biotinylation on the self-assembly dynamics of seven BMHP1-derived SAPs have been investigated by molecular dynamics simulations. We confirmed that these SAPs self-assemble into β-structures and that proline acts as a β-breaker of the assembled aggregates. In biotinylated peptides, the formation of ordered β-structured aggregates is triggered by both the establishment of a dense and dynamic H-bonds network and the formation of a ‘hydrophobic wall’ available to interact with other peptides. Such conditions result from the peculiar chemical composition of the biotinyl-cap, given by the synergic cooperation of the uracil function of the ureido ring with the high hydrophobic portion consisting of the thiophenyl ring and valeryl chain. The inbuilt propensity of biotinylated peptides towards the formation of ordered small aggregates makes them ideal precursors of higher hierarchically organized self-assembled nanostructures as experimentally observed.


Introduction
In supramolecular chemistry, the self-assembly phenomenon (Halley & Winkler, 2008), refers to the spontaneous process, tending towards equilibrium and without external direction, through which a system composed by numerous molecular units leads to the formation of an ordered pattern (Lehn, 1995). The resulting pattern is encoded in the nature of the molecular units and involves weak and dynamic noncovalent interactions (Gazit, 2002;Gsponer & Vendruscolo, 2006;Senguen, Lee, Gu, Ryan, & Doran, 2011) such as hydrogen bonds, electrostatic, hydrophobic (Bowerman, Ryan, Nissan, & Nilsson, 2009;Hills & Brooks, 2007) and π-π interactions (Gazit, 2002;Hunter & Sanders, 1990;Waters, 2002). On the reversible nature of non-covalent interactions between components is based the inbuilt capability of selfassembled systems for error correction (self-healing), not available to covalently bonded structures.
The ability of many biomolecules (lipids, DNA, proteins and peptides) to self-assemble into biologically functional hierarchical structures inspired researchers to borrow this bottom-up approach by nature to design and build new materials with targeted functionality. In this perspective, and because of the easiness of synthesis combined with the broad spectrum of available functionality, self-assembling peptides (SAPs) are widely used as building blocks of nanostructured materials (Liu, Busuttil, Zhang, Yang, & Wang, 2011) in different fields such as drug delivery, electronics, optics and regenerative medicine (Matson & Stupp, 2012). The development of peptide-based scaffolds in regenerative medicine over the past decade is largely increased, this leading to an even more conscious need to gain knowledge and control on the assembly mechanism to obtain scaffolds ad hoc for the target cell and tissue. To mimic the extra cellular matrix (ECM), the scaffolds have to meet various requirements dictating their chemical composition, struc-tural morphology, mechanical properties, porosity and functionalization to support cell survival, proliferation, as well as cell differentiation (in case of transplanted stem cells) (Gelain, Horii, & Zhang, 2007;Matson, 2012).
Among the promising bio-functional motifs, PFSSTKT amino acid sequence, belonging to the family of the Bone Marrow Homing Peptide (BMHP1), was individuated by phage display technique applied to bone marrow and stem cells (Nowakowski, Dooner, Valinski, Mihaliak, & Quesenberry, 2004), resulting able to selectively home to bone marrow and, additionally, to bind to primitive hematopoietic progenitor cells. The addition of such functional sequence to RADA16 (Taraballi, Natalello, Campione, Villa, & Doglia, 2010;Gelain et al. 2011a), a bio-re-absorbable SAP widely used to support cell growth and tissue regeneration (Ellis-Behnke Liang, You, Tay, & Zhang, 2006;Gelain et al. 2007;Ye et al. 2008), showed to promote considerably nervous tissue regrowth in chronic SCI in rats (Gelain et al. 2011a). The BMHP1 linked to RADA16 via an oligo-glycine spacer can also stabilize the β-sheet structure and stimulate neural stem cells differentiation (Taraballi et al. 2010). To explore self-assembling and biomechanical properties only inherent to BMHP1 sequence, a whole family of BMHP1derived SAPs, N-terminally biotinylated peptides, has been developed and studied (Gelain et al. 2011b) highlighting hierarchical self-organization, a broad range of storage moduli, self-healing propensity and an intrinsic 'pro-differentiative effect' over human neural stem cells. These peptides exhibit the ability to form nanostructured scaffolds whose kinetics, morphologies and mechanical properties depend upon the amino acid composition and the presence of biotin at the N-terminal. Therefore, an efficiently planned scaffold design depends on a deep understanding of the relationship between sequence, structure and properties. We empirically demonstrated that the presence of a biotinyl N-terminus cap, with respect to its absence, provides BMHP1-derived SAPs, with superior self-assembling propensity and mechanic properties of the final scaffolds.
To gain deep insights about the contribution made by the biotinylation to the self-assembling ability of the our peptides (Gelain et al. 2011b), we used a molecular dynamics approach on a pool of seven BMHP1-derived peptides (see Table 1), already synthetized and characterized in our laboratories, selected as showing significant differences in their self-assembling behaviour depending on the presence or absence of the biotinyl N-terminal cap (Gelain et al. 2011b). Starting by simulations of the monomers, we selected representative conformations to build, for each sequence, four systems containing eight peptides. The eight peptides are embedded in explicit water cubic box whose dimensions fulfil the condition of 3% w/v for peptide concentration to reproduce conditions favourable to assembly, as suggested by experiments (details about the number of atoms forming each simulation are reported in table S1 of Supplementary material). Simulation analysis evidenced differences, correlated with the presence or absence of biotin, in the modalities of self-aggregation and propensity to form β-sheets and ordered structures. The peculiar chemical structure of biotin indeed conjugates the ability to form H-bonds, mainly by means of the ureido ring, and to establish hydrophobic contacts, involving the thiophenyl ring and the alkyl chain. We demonstrated that the cooperation of these features lead to the formation of a dense and dynamic H-bonds network and to the building of a 'hydrophobic wall', both synergically triggering a disorder-to-order transition. Within the used time scale simulations, the oligomers of biotinylated peptides quickly evolved towards structured parallel β-sheets against the globular aggregates formed by the unbiotinylated peptides, despite their occasionally higher propensity to form βstructures. As experimentally observed, this last point is in agreement with the role played by biotin in triggering a disorder-to-order transition in biotin-protein complexes (Bagautdinov et al. 2005;Chapman-Smith & Cronan, 1999;Freitag et al. 1997;Wilson et al. 1992;Wood et al. 2006) and supports the significantly higher capacity of biotinylated peptides in self-assembling into hierarchically ordered nanostructures. Length of the post-aggregation phase starting from the end from the pre-aggregation phase until the end of the simulation at 250 ns.

MD simulation setup
In Table 1, the sequences of modelled peptides and details of the performed simulations are listed. The monomers have the C-terminus amidated and the N-terminus biotinylated or acetylated. The lysine residues are in the protonated state. The extended conformations of monomers are built with PyMOL (http://www.pymol.org) by imposing an all-trans geometry on the backbone dihedrals. For each sequence, molecular dynamic simulations have been conducted on systems containing one or eight monomers (also simulations on systems containing two or four peptides have been executed, data not showed) using the version 4.5.3 of the GROMACS simulation package (Van der Spoel, Lindahl, Hess, Groenhof, & Mark, 2005) and the GROMOS 53a6 force field (Oostenbrink, Villa, Mark, & Van Gunsteren, 2004) implemented inside. The biotinyltermination structure has been obtained optimizing a N-terminal biotinylated and C-terminal amidated alanine at HF level with 6-31G ⁄ base functions with GAUSS-IAN quantum chemistry program (Frisch, Trucks, Schlegel, Scuseria, & Robb, 2004). Atom types of GROMOS 53a6 force field have been used also for biotin atoms and the bond between biotinyl-group and the aminoacidic sequence has been considered as an amide bond. All simulations have been performed in explicit SPC (Berendsen, Postma, van Gunsteren, & Hermans, 1989) water molecules and Na + and Cl À ions have been added in random positions to simply neutralize the systems or to reach the salt concentration of 0.015 M to stabilize the system. Before running the production simulations, all systems have been submitted to an equilibration procedure consisting of the following steps: steepest descendent minimization first in vacuum and then in water and ions, a brief simulation with 0.001 ps time step and position restrained peptides, a last brief simulation in the NVT ensemble to bring the system at 300 K. All unconstrained production simulations were carried out in the NPT ensemble keeping the temperature at 300 K, by the v-rescale temperature coupling (Bussi, Donadio, & Parrinello, 2007) (τ T = 0.1 ps), and the pressure at 1 atm, by the Berendsen pressure coupling (Berendsen, Postma, Vangunsteren, Dinola, & Haak, 1984) (τ p = 0.1 ps). An integration time step of 2 fs is used and snapshots of the trajectories are saved every 5 ps. The electrostatic interactions are modelled with the particle mesh Ewald (Darden, York, & Pedersen, 1993) theory with a cut-off of 0.9 nm, a Fourier spacing of 0.12 nm and a fourth polynomial order. A cut-off of 0.9 nm has been used also for the van der Waals interactions. The protein bond lengths and angles were constrained with the LINCS algorithm (Hess, Bekker, Berendsen, & Fraaije, 1997) while the water bond lengths were constrained with the SETTLE algorithm (Miyamoto & Kollman, 1992).

Systems preparation
All the monomers in extended conformation have been embedded in a periodic dodecahedral box of explicit water and submitted to three simulations of 40 ns/each with different initial velocity distributions. A distance of 10 Å has been left between the peptide and the box edges to preserve the minimum image convention. On the obtained conformational samplings, discarded the first three ns considered as a part of the system equilibration, a cluster analysis has been performed by means of the gromos algorithm of Daura (Daura et al. 1999). The centrotypes of the first clusters are used to prepare systems containing eight monomers: each system is composed by centrotypes picked up in relative amount needed to achieve 70-80% representation degree of the monomeric sampling. The initial configurations of the multipeptide systems were prepared by insertion of the selected centrotypes in random orientations and positions in cubic boxes filled by explicit water and so that atoms belonging to different peptides were at least 6 Å away from each other. Inspired by empiric results (Gelain et al. 2011b), the size of the water boxes including octamers simulated a 3% w/v peptide solution: a concentration demonstrated to favour the aggregation of all sequences and to allow the observation of the different aggregation behaviours. Periodic boundary conditions are used to minimize the edge effects. Simulations have been monitored to check that the minimum image convention was not violated.

Analysis
Dissecting the contacts involved in the formation and structural organization of aggregates by nature and amounts, together with secondary structures and order parameters, provides a rational picture of what happens during the oligomer formation. Analysis has been performed by GROMACS implemented tools if not stated otherwise. Graphic representations of peptides have been realized by the Visual Molecular Dynamics (VMD) program (Humphrey, Dalke, & Schulten, 1996).

Contacts and H-bonds
Aromatic contacts, intended as due to π-π interactions, are considered as existing when the centre of mass of aromatic side chains falls within a 4.5-7 Å distance (Burley & Petsko, 1985). Also contacts between biotincondensed rings and the side chains of aromatic residues, intended as due to π-σ interactions (Li, Gusarov, Evoy, & Kovalenko, 2009), are monitored in the same distance range. Hydrophobic contacts are considered as existing if the centres of mass of the hydrophobic residues fall within a distance less than 6.5 Å (Zhao, Liu, Liu, Lin, & Fang, 2009). Hydrophobic residues considered are tryptophan, glycine, proline, alanine and phenylalanine. Biotin has been considered in hydrophobic contacts as composed by two parts: the condensed ring and the alkyl chain. Both aromatic and hydrophobic interactions contribute to the formation of biotins/aromatic clusters. So, to monitor the formation of such clusters, we combined the distance criterion introduced for aromatic interaction with non-specific nature of hydrophobic interactions, thus neglecting the orientation condition typical of aromatic contacts. Hydrogen bonds are formed if the donor-acceptor distance falls within a 3.5 Å cut-off radius and the acceptor-donor-hydrogen angle falls within a 30°cut-off angle. For each single H-bond, the existence percentage (EP) has been calculated as the percentage of frames in which a given H-bond exists. The averaged existence percentage (AEP) of a group of H-bond is obtained by summing the EPs of the considered group and by dividing the number of the members of the group. The multiple atom-atom H-bond interactions contributing to the residue-residue H-interactions are considered as a group of which we calculated the average EP.

Secondary structure
Secondary structure has been monitored by means of the DSSP algorithm of Kabsch and Sander (1983), based on hydrogen bond patterns and on geometrical characteristics along the amino acid chain.

Aggregation
During the aggregation process of proteins in water a first order transition takes place, due to the demixing of the protein-water solution in a liquid-water-rich phase and a solid-like protein-rich phase (which can be amorphous or ordered). We followed this transition phase monitoring the participation of the peptides into the aggregation and the aggregates gyration radius to detect both the time needed to form the aggregates and their stability. Particularly, two or more monomers are considered aggregated when the distance between the centres of masses of two (or more) peptides lays within 10-11 Å, corresponding to the inter-sheet distance range in amyloid fibrils (Serpell, 2000).
Note that the distance used to judge the formation of aggregates (see Section 2.2) implies that the aggregation fraction of some oligomers can be greater than zero at the beginning of the simulation.

Nematic order parameters
During fibril formation, a second-order transition from an isotropic phase, in which particles are randomly oriented, to a nematic phase, in which orientation of particles is distributed along preferred directions, takes place. To measure and compare the structural order of aggregated systems, the nematic order parameter (P 2 ) has been calculated by the Wordom package (Seeber, Cecchini, Rao, Settanni, & Caflisch, 2007). The nematic order parameter is defined as follows: whered is the unit vector pointing in the preferred direction of alignment,ẑ is a suitably defined molecular vector, and N is the number of peptides.

Results and discussion
Seven BMHP1-derived peptides have been selected to investigate their dependence of their aggregation ability on their sequence and, particularly, on the presence of a biotinyl N-terminal cap. Sequences and nomenclature are reported in Table 1. Along the sequences (see Scheme 1) the N-terminal cap, the three-glycine spacer, the BMHP1-derived functional segment and the C-terminal cap are distinguishable (see Scheme 1). All the peptides, C-terminally amidated, differ among them both for mutations in the functional sequence and for the N-terminus cap that can be an acetyl-, an acetyltriptophyl-or a biotinyl-group. The experimentally observed scaffold formation, corresponding to the different self-assembling abilities of these peptides, ranges from absent (SAPs 2 and B26), to a medium (soft scaffold for SAPs 4 and 30) and then to an high efficiency (SAPs B3, B24 and 31) (Gelain et al. 2011b): this last is mainly associated with the presence of the biotinyl-N-terminal cap.
Scheme 1. Schematization of the studied SAPs sequences.

The octameric systems
As there are no NMR solution structures of the studied SAPs and taking into account that in the self-assembly, the conformational change of the monomer is a necessary step towards aggregation (Massi & Straub, 2001;Thirumalai, Klimov, & Dima, 2003), we prepared the octameric systems sampling the configurations generated by monomer simulations. It has been showed that the self-assembly easiness is directly linked to the population of aggregation-prone conformations and that such conformations have a consistent degree of overlap with the conformations of the monomer in the fibres (Nam, Kouza, Zung, & Li, 2010;Straub & Thirumalai, 2010). As observed by X-ray diffraction (XRD), circular dichroism (CD) and Fourier Transform Infrared Spectroscopy (FTIR) experiments (Gelain et al. 2011b), the high βsheet content of the fibres formed by BMHP1-derived SAPs is compatible with an essentially extended conformation of the monomer inside the fibres. On the other hand, multimeric systems simulations showed as it is not necessary to start from monomers extended conformations (Colombo, Daidone, Gazit, Amadei, & Di Nola, 2005;Zanuy, Ma, & Nussinov, 2003;, indeed, random coil conformations gave rise to spontaneous formation of aggregates with a more or less degree of order (Lu, Derreumaux, Guo, Mousseau, & Wei, 2009;Mu & Gao, 2009;Nguyen & Hall, 2004). Since the spontaneous formation and evolution of aggregates is what we want to observe, we prepared the octameric systems picking up the centrotypes obtained via a cluster analysis of the monomer conformational samplings (Dolker, Zachariae, & Grubmuller, 2010): each selected centrotype has a multiplicity proportional to its representation percentage in the monomer sampling. In each combination of centrotypes selection, the multiplicity is such as to reach the 70-80% representation of the monomer conformational sampling (see the 5th column of Table 1). The selected centrotypes structures are reported in Figure S1 of Supplementary material.

Aggregates formation
Critical nucleus size is the number of chains below of which there is a free energy barrier to associate into ordered oligomers: for short amyloid peptides 6-12 residues long critical nucleus size ranging from three to twelve peptides have been suggested (Auer, Dobson, & Vendruscolo, 2007;Ma & Nussinov, 2002;Nelson, Sawaya, Balbirnie, Madsen, & Riekel, 2005;Nguyen, Li, Stock, Straub, & Thirumalai, 2007;Song, Wei, Mousseau, & Derreumaux, 2008;Tsemekhman, Goldschmidt, Eisenberg, & Baker, 2007). Far from the attempt to make a detailed comparison of different methods available to identify the critical nucleus size (Nguyen Truong & Li, 2012), a system of eight monomers could be reasonably considered as representative to simulate early aggregation events as it is consistent with the critical nucleus size reported in literature for the aggregation of amyloid peptides. Therefore, to probe the aggregation propensity of the selected peptides, we generated for each different sequence, four simulations (250 ns long) of systems composed by eight molecules.
To follow the evolution of peptide aggregation, we consider that two peptides belongs to the same aggregate if the distance between their centres of mass does not exceed the maximum inter-sheet value of 11 Å, a critical distance of amyloid fibres (see Section 2.3.3). The analysis of the XRD pattern of the BMHP1-SAPs, indeed, revealed the presence of peaks compatible with the cross-β structure typically showed by amyloid fibrils.
In the XRD pattern of the cross-β structure, in which β-strands and β-sheets are, respectively, perpendicular and parallel to the fibril axis, there are typical reflections at ≈5 Å (inter-strand distance) at 8-11 Å (inter-sheet distances) (Serpell, 2000). The simulated systems, always neutralized, can be divided in two sets, each of which composed by two 250 ns long simulations for each sequence. The two simulations are performed on systems containing the same number of centrotypes structures but are characterized by different initial reciprocal positions and orientation of the peptides inside the box. The two sets differ in the modalities used to fulfil the neutralization condition: in the set (a) eight Cl À atoms have been added, while in the set (b) Na + and Cl À atoms, up to 0.015 M, have been added. The simple neutralization (set a) is usually used to simulate charged systems without overcharging the calculation of non-bonding interactions.
A 0.150 M salt concentration has been experimentally shown to trigger solid scaffold formation of SAPs previously dissolved in pure water (Gelain et al. 2011b). However, in case of BMHP1-SAPs, we empirically observed that dissolving peptides directly in solvents with such ionic force, discourages the formation of ordered fibres during self-assembling, while favours random aggregates formation. On the other hand, a salt concentration 0.015 M in experiments still supports the self-assembly of fibres but these are shorter and thicker respect to the case without salt addition (unpublished data). Being the formation of a fibre off target in this work, we considered NaCl 0.015 M as compatible with the small size of our systems: it could allow the stabilization of the expected small aggregates and highlight further differences in aggregation propensities among the different sequences. The fraction of peptides forming an n-mer (n = 1-8) has been calculated every 20 ps and the obtained trends were made more readable by cubic regression as showed in Figure 1. The system gyration radius, the root mean square deviation (RMSD) of the backbone and the time required to the formation of the greater aggregate along the simulation were used to split the simulations in pre-aggregation and post-aggregation phases (read columns 7, 8 and 10, 11 of Table 1).
Spontaneous oligomers formation starts with a rapid clustering in all simulations, going from monomers through common small order aggregation states toward the octamer. While multiple assembly pathways are suggested by the different trends of aggregation showed during the simulations, the oligomerization processes can be schematized in the following three phases: (1) very fast formation, in few ns, of small order aggregates (up Figure 1. Aggregation fraction of the eight fragments composing each system: 8-mers in red, 7-mers in green, 6-mers in blue, 5mers in brown, 4-mers in violet, 3-mers in orange, 2-mers in maroon and 1-mers in turquoise. Panel (a) refers to systems neutralized with Cl À ions. Panel (b) refers to systems containing NaCl 0.015 M. The two columns in each panel refer to the two simulations corresponding to two different starting structures of the octameric system. Oligomerization mechanism can be schematized through three phases: (1) very fast formation of small oligomers, (2) increase of oligomerization order, (3) structural reorganization of oligomers.A cubic regression has been used to obtain the aggregation fraction plots (see Section 3.2).
to 3-mers) certainly favoured by the high peptide concentration used; (2) increase in the oligomerization order up to 7/8-mers in a dynamic fashion characterized by less or more frequent association/dissociation events and a low degree of structural order; (3) change of the order content by internal structural reorganization inside the aggregates, essentially without needs of monomers detachments. The octamer formation, with different extent and stability, as final product is observed in all simulations. Duration and efficiency of all the three phases appear directly linked to the flexibility of the monomers as dictated by their inborn structural constrains. In the analysis of the degrees of involvement of the various n-mers during the aggregation (Figure 1), some apparently conflicting behaviours of the same peptide can be traced: indeed, some limitations associated with methodological choices may emerge. Firstly, the cut-off criteria used to recognize aggregates (see Section 2.3.3) does not allow to discern the complete loss of contacts (see Section 2.3.1) between two peptides from a system whose centres of masses are at distances greater that the cut-off. Secondly, different starting structures used for each dynamic may cause different approaching modalities and times of the peptides within the system, but still allowing the extrapolation of features typical of each sequence. An example of such limitations, mainly due to the aggregation cut-off criteria, is showed by the apparently remarkable different aggregation behaviour of B3 in neutralized systems, where, on the contrary a molecular clusterization of all the 8-mer system is still evident in the second dynamic too (unpublished data), but a lower degree of packing keep some monomers beyond the chosen cut-off distance. Some evidence of the fragmented self-assembly process for peptides containing proline rings (SAPs 2, 4, B3 and B26) emerges looking at the lines associated with 1-to 3-mers in Figure 1 revealing a certain resistance to aggregation in respect to the correspondent sequences without proline. From Figure 1, it is possible to distinguish SAPs with rapidly increasing contribution of 8-mers before of 50 ns (SAPs 2,B3, B24, B26 and B24, B26, 30, 31 in set (a) and (b), respectively) and SAPs showing a considerable contribution of smaller n-mers, within 50 ns (SAPs B3, 4, B26, 30, 31 and 2, 4 in set (a) and (b), respectively) or until the end of the simulation (SAPs B3, 4, B26, 30, 31 and 2, 4, 30 in set (a) and (b), respectively). In the presence of NaCl 0.015 M the systems aggregate quickly in the octameric form, the contribution of the other n-mers become less important and a more marked difference between unbiotinylated and biotinylated peptides emerges. Purely amino acid peptides show a lesser stability of the aggregated forms along the simulations: being the worst peptide 2 followed by peptides 4 and 30 and in lesser extent by 31. This peptide group enlarges to B3 and B26 in simply neutralized simulations in accordance with an expected discouraged aggregation due to the presence of proline.

Secondary structure of the aggregates
The secondary structure analysis shows that the peptide aggregation was accompanied by a spontaneous and gradual conversion from random coil to β-sheet (column a in Figure 2) structure.
Although β-sheet conversion appears as a general feature of the studied sequences, differences in the βsheet content were observed for the different peptides. The percentages of frames spent in β-sheet structure during the aggregation phase by the eight central residues (see Scheme 1) in all the sequences have been compared (see Figure 3).
In both simulation sets, in addition to the relatively low β-sheet content in correspondence of the proline residue, are distinguishable two groups: the first is positioned in the upper part of the graph and is composed by B24 and 31 peptides, the second positioned in the lower part of the graph is composed by B3, 4 and B26 peptides. The two remaining sequences, 2 and 30, are positioned in the central part of the graph in the simply neutralized systems, while in the NaCl 0.015 M share the upper part of the graph with B24 and 31 peptides. Both these pictures are in agreement with experimental observations (i.e. B24 and 31self-assemble while 4 and B26 do not in experiments performed by Gelain et al. 2011b): the not-assembling ability and the correspondingly low βsheet content of 4 and B26; the assembling propensity and the significant β-sheet content exhibited by B24 and 31, even if the discrepancy between them grows in higher salt concentration conditions in favour of the biotinylated sequence. These observations are supported by the aggregate formation analysis (see Section 3.2) where we highlighted the instability of aggregates of SAP 2 and B26 against the stability of aggregates of B24 and 31.
Comparing the two simulation sets, the enhancement of β-sheet content observed for peptide 2 in set (b) results particularly evident. Peptide 2 has the shortest sequence of the overall peptide pool and, presumably, can react more quickly to environmental changes if compared to the other SAPs. In this perspective, the used time scale could be sufficient for peptide 2 to respond to the increased salt concentration, so that the consequent reduction of intermolecular electrostatic repulsions can partially compensate the reorganization difficulties structurally inherent in its sequence. Despite this not-negligible increase of the β-sheet content, the peptide 2 did not show propensity to form high order and stable aggregates during the simulations (see Sections 3.2 and 3.4). This is in agreement with the inability to self-assemble and form fibrous structures at the nano and microscale as showed by the peptide 2 in rheological tests and in Atomic Force Microscopy (AFM) experiments (see Table 1 and Figure 2 of Gelain et al. 2011b).
Looking at the changes of β-sheet content with the time (see Figure 3) only the three peptides B24, 30 and 31 exhibit a rather regularly increasing trend until the end of the simulations: in these cases, the percentage of residues involved in the β-sheet structure reaches about the 40% for the peptide B24 and the 60% for the peptides 30 and 31. The trend of β-sheet percentage reaches about 40% also for the peptide 2 that however does not exhibit a regularly increasing trend. The value of the β-sheet percentage reached by peptides B3, B26 and 4 is generally below the 20% instead. Regarding the peptide B3, then, simulations seem in disagreement with experiments, where it is able to form highly β-structured and stable fibres. In fact in we previously demonstrated (Gelain et al. 2011b) that peptide B3 hierarchically self-assembles into twisted protofibrils and form a solid scaffold after PBS addition. Preliminary simulations of dimers, starting from two monomers spaced about 20 Å apart, showed that peptide B3 required more time to form the dimer respect the other sequences but, once formed, the dimer was a stable and in register β-sheet until the end of the simulation (data not showed). These findings suggest a lag phase towards the β-organization to overcome the structural constraints due to the presence of proline. Increasing the concentration in octameric systems, this lag phase could be too much long on the used time scale length and the system could be trapped in an energy hole, a common drawback for the used simulation protocol.

Order content of the aggregates
To gain a more complete view of the morphologies of the aggregates, and particularly to gain insights about the alignment order inside the aggregates, also the nematic order parameter (P 2 ) has been calculated for all the octameric systems. Values above 0.5 indicate the propensity of peptides to mutually align and then to ordered aggregation states. Notably, the highest degree of order is showed by the peptide B24 which exhibits a rather regularly increasing trend of P 2 up to a value of 0.8 in Figure 2. Percentage of residues assuming a β-sheet secondary structure (a column) and nematic order parameter P 2 (b column) during the simulations of B24 (row 1) and B30 (row 2) octamers. Each graph shows two simulations referring to two different starting structures of the octameric system. The mismatch between the β-sheet percentage amount and the orientational order degree suggests different aggregation mechanisms between biotinylated and unbiotinylated peptides. Percentage of residues assuming a βsheet secondary structure and nematic order parameter P 2 of all the simulated peptides are available in figures S2 and S3 of supplementary materials. the simulations with NaCl 0.015 M (see graph (1b) of Figure 2). P 2 values up to 0.7 are taken by B24 also in simply neutralized conditions. Peptides that in addition to B24 exhibit an increasing trend of P 2 and P 2 values above 0.5 are the 30 (see graph 2b in Figure 2) and 31. Also the peptides B3 and B26, which show a very low propensity to form β-sheet structures, assume P 2 values comparable with those assumed by peptides to which corresponds an higher β-sheet propensity such as 30 and 31. Thus, interestingly, the order degree and the β-sheet percentage seem not to run parallel in early aggregation events: the exhibition of a high content of β-sheet percentage does not necessarily correspond to an high order degree and vice versa. A similar behaviour has been observed by de Groot et al. (Matthes, Gapsys, Daebel, & de Groot, 2011) during the simulation of 10-mers of six residues peptides: despite the high amount of β-sheet structure a low averaged orientational order was observed. They attributed their findings to fluctuations due to conformational reorganizations of the aggregates and to the disorder degree caused by lateral stacking and twisting in the larges β-sheet assemblies.
The conformations corresponding to the highest P 2 value in the two simulations (NaCl0.015 M) of B24 octamer are showed in Figure 4.
In the conformation extracted from the first simulation peptides are roughly arranged in a unique sheet with a β-sheet core formed by four parallel and out-of-register peptides. Biotins and phenylalanines appear mainly grouped on one side of the sheet so forming a particularly hydrophobic face available to interact with other peptides or with another preformed sheet. The conformation extracted from the second simulation, instead, shows the peptides roughly arranged on two parallel β-sheets each involving four peptides in both parallel and antiparallel mutual orientations. Looking at the evolution of this simulation a four parallel out-of-register β-strands clearly evolves from the two β-sheets structure, the arrangement of remaining peptides is not well defined and still compatible with a two β-sheets structure but also with a precursor of a unique β-sheet.
The lower P 2 values combined with a high β-sheet content observed in octamers of 30 and 31 peptides corresponds instead to a more globular arrangement of aggregates in which, even if the β-strands interact and form β-sheets composed by two or three peptides, their spatial distribution, similar to a spherical symmetry, cannot give rise to alignments displaying a high order degree, as it is calculated in following the parameter P 2 definition (see Section 2.3.4). A substantial difference emerged by comparing the trend of secondary structures percentage and of nematic order parameter among biotinylated and unbiotinylated peptides: unbiotinylated peptides show a larger β-sheet propensity and give rise to globular aggregates characterized by P 2 values below 0.5, biotinylated peptides show a lower β-sheet propensity and self-assemble into more aligned aggregates characterized by P 2 values above 0.5.

Formation of H-bonds
To better understand the role played by biotin in the early events of self-assembly, we need to define the contacts framework acting during our simulations. Both hydrogen bonds and hydrophobic contacts have been monitored during the pre-aggregation and postaggregation phase, as defined in Table 1.
As already stated, hydrogen bonds formed by the ureido group of the biotin are fundamental to ensure the high-binding affinity of the biotin-streptavidin complex. In this perspective, we analysed and compared all hydrogen bonds formed from biotinyl-and acetyl-tryptophyl caps within the octameric systems (see Figure 5). The comparison has been made for the two couples of sequences, B3/4 and B24/30, formed by the same Figure 3. The β-sheet percentage per residue during the postaggregation phase in systems neutralized with NaCl (a) or NaCl 0.015 M (b). (colour legend: 2 as blue squares, B3 as orange diamonds, 4 as yellow upside down triangles, B24 as green triangles, B26 as red triangles pointing to the right, 30 as light blue triangles pointing to the left and 31 dark green horizontal hourglasses). The low β-sheet content in proximity of the Pro residues, as well as the localization of B3, 4 and B26 in the bottom of the graphs, show as the Pro affects the βsheet propensity of the sequences more than the presence of the biotinyl-cap. residues except for the presence of biotinyl-and acetyltryptophyl group as N-terminal cap. We found that the number of H-bonds formed by biotinyl-group exceeds the number of H-bonds formed by the acetyl-tryptophyl group of 31% for the first couple and 44% for the second couple during the post-aggregation phase (orange columns in Figure 5), a similar trend is observed during the pre-aggregation phase (transparent blue column in Figure 5).
The percentage of these H-bonds formed among biotinyl-groups and among acetyl-tryptophyl-groups are, respectively, of 30-35 and 25-27% during the post-aggregation phase, these percentages were 35-37 and 18% during the pre-aggregation phase. These values show that the biotinyl-groups are more involved than the acetyl-tryptophyl-groups into the H-bond network formation in the octamers. Moreover, the significant gap between the percentages of H-bonds formed by the two N-terminations among themselves during the pre-aggregation phase is greatly reduced during the post-aggregation phase, suggesting the importance of the H-bond network between biotinyl-groups already during the pre-aggregation phase.
Looking at Figure 5 further features of the influence of the biotinyl-group and acetyl-tryptophyl group on the aggregation can be highlighted. As expected, both Nterminations form a higher number of H-bonds with the hydrophilic residues that are, sorted by increasing number of H-bonds, Lys-Ser-Thr during the pre-aggregation phase, and Lys-Thr-Ser during the post-aggregation phase.
A relevant difference between H-bonds formed by biotinyl-group and those formed by acetyl-tryptophyl group concerns their EP. During the post-aggregation phase, the AEP of the H-bonds formed by the biotinylgroup is lower than that of H-bonds formed by the acetyl-tryptophyl group. This observation has to be  (Cross, Kuttel, Stone, & Gain, 2009) In the first simulation of B24, the 'hydrophobic wall' made of Btn and Phe residues is particularly visible in the conformation corresponding to the highest P 2 value. In the second simulation, in correspondence of the highest P 2 value, part of the Phe and Btn residues dispose respectively in the centre and on the edges of a reduced layer interposed between two partially structured β-sheets. Figure 5. Percentage of H-bonds formed between biotinyl (left column) or acetyl tryptophyl (right column) peptides and all other residue types present in each sequence. Percentages referred to pre-and post-aggregation phases are coloured in transparent blue and in orange respectively. Residues repeated within the same sequence are labelled by an asterisk and the relative percentage is obtained by dividing the total percentage associated with that residue type by its multiplicity in the sequence. When marked, multiplicity of glycines is three, while for the others is two. Percentage of H-bonds formed between biotinyl or acetyl tryptophyl peptides explicitly associated with all the other residues of each sequence are available in figure S4 of supplementary materials. considered together with the fact that a higher number of H-bonds is formed by the biotinyl-group with respect to acetyl-tryptophyl group, during both the pre-and post-aggregation phase. Moreover, during the post-aggregation phase, the H-bonds formed among the biotinylgroups exhibit an AEP around 0.5%, while the H-bonds formed by acetyl-tryptophyl group among themselves exhibit an AEP around 1%.
A closer look to AEP formed by the N-terminal groups with other residues during the post-aggregation phase highlights other interesting aspects. The residues forming H-bonds with biotinyl-groups with the highest AEP are Pro, where it is present, and Phe followed by Gly and Ser: without averaging the longest H-bond existences are instead Ser and Lys. The residues forming H-bonds with acetyl-tryptophyl group corresponding to the highest AEP are Phe and Ser together with Gly and Ala. Without averaging the longest H-bond EP are instead shown by Thr and Ser followed by Gly and Ala. These observations in addition to the larger number of H-bonds formed by Btn-with respect to Ac-Trp-suggest different typologies of H-bond network linked to presence or absence of Btn-cap: in the first case, the H-bond network is supported by an higher differentiated pool of H-bonds with a shorter average existence length respect to the second case. Moreover, the H-bond network formed by Btn-appears to be on one side more diffused and on the other side more dynamically responsive to conformational changes.

Hydrophobic contacts
The binding pockets of biotin complexes with streptavidin/avidin and with BirA show an extended participation of aromatic residues in the binding interaction. Inside the pockets, indeed, both the valeryl chain and the thiophenyl ring of biotin interact with a sort of 'hydrophobic wall' mainly consisting of the aromatic residues Trp and Phe. This inspired us to monitor the formation of hydrophobic/ aromatic clusters (see Figure 6), described as interactions (see Section 2.3.1) among Btn-rings and Phe side chains Figure 6. Percentage of phenylalanines, biotins or tryptophans organized in clusters: the occupation degree of the clusters is indicated by the colour legend. Clusters formed by Phe (a), Btn or Trp (b) and by Btn/Phe or by Trp/Phe (c) are reported for all the analysed peptides. In (d) is showed an enlargement of the clusters formed by Btn/Phe or by Trp/Phe in peptides differing only by the substitution of the Btn-with Ac-Trp at the N-termination, that is, the couples B3/4 and B24/30. The enlargement evidences the high cluster multiplicity reached in B24, corresponding to the formation of the 'hydrophobic wall' and given by the cooperative interactions between Btn and Phe.
in biotinylated peptides as well as due to purely aromatic interactions among Trp and Phe side chains during the aggregation phases. The Phe cluster multiplicity (panel (a) in Figure 6) reaches the widest spread for peptides B24 and 31(clusters up to 7 Phe) followed by the peptide 2 (cluster up to 6 Phe). The same check, done on clusters formed only by Btn rings or only by Trp rings (panel (b) in Figure 6), shows clusters up to 5-6 rings in both cases, revealing similar propensities in forming clusters of the two different types of rings.
Differences start to emerge when considering clusters of Btn/Phe rings and Trp/Phe rings (panel (c) in Figure 6). Peptide B24, among the biotinylated peptides, exhibits the lower content of Btn or Phe rings in isolated state (blue column in Figure 6) and the highest cluster multiplicity, climbing up to 16 residues in the same cluster. A similar tendency is showed by peptide 31 in respect to the others unbiotinylated peptides. The comparison between peptides B24 and 30 (panel (d) in Figure 6), two iso-sequences except for the N-terminal cap, is intriguing: Btn/Phe rings in B24 give rise more frequently (lower isolated states level) to more populated clusters respect to Trp/Phe rings for peptide 30. Also for the couple of peptides B3 and 4, the more populated clusters and the most recurring clusters are formed by the biotinylated one (B3). Figure 6 shows two different types of cluster formation mechanisms that populate the Btn-Phe and Trp-Phe clusters: the composition of Trp-Phe clusters is mainly additive, as their size is very close to the sum of the separated Trp and Phe clusters, while the size of Btn-Phe clusters is likely given by a cooperative effect as it is greater than the sum of the separated Btn and Phe clusters.
In the simulations of the peptide B24, this cooperative effect likely rely on the formation of the Btn-Phe 'hydrophobic wall' as clearly observable in Figure 4. This behaviour is in agreement with the presence of the uracil moiety of the biotin ring as suggested by literature about the denaturant properties of urea. Urea in fact is able to form large clusters interacting specifically and preferentially with hydrophobic/aromatic residues resulting in the formation of 'local urea clouds' near the first solvation shell of the these residues (Xia, Das, Shakhnovich, & Zhou, 2012). Such a propensity deviates the hydrophobic/aromatic collapse from the inner of the globular oligomers, as observed in the unbiotinylated peptides, to the water interface, as observed in the biotinylated peptides, with which uracil groups form H-bonds. The monitoring of the hydrophobic contacts during the simulation (see Figure 7) clearly shows as their contribution to the assembly process is greater in biotinylated peptides with respect the unbiotinylated. This is particularly evident for B24 with respect to peptide 30. Comparing the behaviour of B3 respect to the peptide 4 instead the difference is not so highly pronounced: the rigidity introduced by the proline residue reduce the 'benefits' given by the biotinyl-group.

Conclusions
Seven BMHP1-derived SAPs sequences were subjected to molecular dynamic simulations in explicit water and salt ions with the aim to identify the role played by the biotinyl-N-termination and other residues in the selfassembly dynamics. Our simulations confirmed the formation of β-sheets, even if out-of-register, as typical feature in the ordered state of the biotinylated peptide B24, according to experimental evidences. Indeed, the βsheet packing ability of peptide B24 has been confirmed by XRD, CD and FTIR experiments while AFM experiments showed its self-assembling into tabular long fibres (Gelain et al. 2011b). Furthermore, the presence of proline, both in biotinylated and unbiotinylated peptides, discourages the aggregation and the formation of β-sheets, confirming its known role as β-sheet breaker (Viet, Ngo, Lam, & Li, 2011;Wood, Wetzel, Martin, & Hurle, 1995). The analysis showed that a larger β-sheet propensity corresponds to the unbiotinylated sequences with respect to the biotinylated ones, but higher mutual orientational orders are reached by the these last peptides, as stated by the values of the nematic order parameter. This order corresponds to the formation of β-structures composed by parallel out-of-register β-strands. Going beyond the backbone H-bonds, the overall H-bond network appears more dense and dynamic in biotinylated peptides, already in the pre-aggregation phase. Likely the denser hydrogen bond network supported by the presence of the biotinyl cap, increases the degree of spatial confinement of the peptides enhancing the propulsive self-assembly effect of the already high peptide concentration. At the same time, the flexibility of such hydrogen bond network, balancing the 'sticky' effect associated with an excess of hydrophobic interactions that could lead to the formation of amorphous oligomers (as observable in the octamers formed by unbiotinylated peptides), favours a process of conformational reorganization toward highly ordered aggregates. The formation of a 'hydrophobic wall' among biotinyl-termination and phenylalanines is observed in correspondence of the 'high-order state'. These results picture the biotinylation effects in the selfassembly dynamics: the presence of the biotinyl-caps leads to the establishment of a dynamic H-bonds network, due to the uracil groups chemistry, and of a 'hydrophobic wall', due to the synergic effect of its hydrophobic (thiophenyl ring and alkyl chain) and hydrophilic (uracil ring) parts able to interact with other hydrophobic residues, mainly aromatic, at the interface with water. The 'hydrophobic wall' acts as a physical support on which β-structures can grow triggering, together with the spatial confinement due to the diffuse H-bonds network, the transition toward an ordered state. Our findings suggest that the ordered self-assembly of small aggregates is quickly activated by the biotinyl Nterminal cap: however, knowing that small ordered aggregates may be ideal precursors of the hierarchically ordered nano-structures seen with BMHP1-derived SAPs, investigations by means of coarse grained techniques are required to extend our observations to larger systems. Finally this work proposes a biotin-activated mechanism of ordered self-assembly that could be useful to obtain high performing nano-structured materials for applications in fields other than regenerative medicine.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.org/10.1080/07391102.2013.790848