Large-scale periodicity of nucleosome positioning signal in pericentric regions of chromosomes (Drosophila melanogaster)

Nucleosome positioning signal (NPS) in heterochromatin is not uniform. We suggest the analysis of its heterogeneity by correlation with periodic function (analog of Furrier analysis). It was established the periodical repetition of the nucleosome clusters of large size in pericentric regions in a discontinuous manner. In the 3L pericentric region, it was revealed the domination of 78–85 kbp wavelength in the correlation coefficient profile and also strong presentation of 50 kbp signal. In further to centromere position, the 69 kbp value strongly dominates as well as the 50 kbp value in the closest proximity. In addition to the long wavelength signals, there are plenty of short wavelengths signals especially in the closest vicinity to centromere. In some positions throughout pericentric region of 2L chromosome, there are two sizes of repeated intermingled correlation signals (50, and 75 kbp) with dominating value of 75 kbp in proximity and 50 kbp distantly to centromere, the situation for 2R is analogous. Some genes with long introns support these quantitative characteristics of NPSs and to some extent their dominating character in each region. The characteristic repeat periods for 3L pericentric region coincide with the distances between heterochromatin epigenetic mark clusters and their distribution throughout this region for fly embryos, larvae, and some cell lines.


Introduction
Earlier, some evidences were presented that pericentric heterochromatin is not uniform in some relations: the effect of variegation when gene is inserted in its regions (Schotta, Ebert, Dorn, & Reuter, 2003), influence on the expression level by environmental sequences or their features (Hower, Dimitri, Berloco, & Wakimoto, 1995). Coexistence of active genes (light, concertina, AGO3, and others) and silencing region (Sun, Cuaycong, & Elgin, 2001), plenty of simple and different classes repeats also evidence to the complexity of heterochromatin. Characteristic feature of pericentric chromatin is the specific nucleosome modifications H3K9me2 (Noma, Allis, & Grewal, 2001;Stewart, Li, & Wong, 2005;Verschure et al., 2005) and interactions of nucleosome assembly with heterochromatin proteins (HP1 and other variants) that recognize this epigenetic signature (Jacob & Khorasanizadeh, 2002;Nielsen et al., 2002). Heterogeneous profile of HP1 binding data within some regions (De Wit, Greil, & van Steensel, 2007), nucleosome modification data (dimethylation of lysine in ninth position of H3 histone) (Riddle et al., 2011) evidence about the heterochromatin discontinuity. Transition induction to silencing by small RNA cognate to retrotransposons and multi-protein complex (members of Argonaute family, HP1, histone H3 specific methyltransferases and others) is shown for heterochromatin regions (Fagegaltier et al., 2009;Pal-Bhadra et al., 2004). However, region length of silencing spread further from the initiation point is limited by 10 kbp at least for four chromosomes (Haynes, Gracheva, & Elgin, 2006). In pericentric heterochromatin, this limitation probably is diminished due to high density of cognate small RNA (Aravin et al., 2003;Reihart & Bratel, 2002). The intricacy persists of how dense heterochromatin causes silencing of transposable elements but permits expression of embedded genes. In heterochromatin, pericentric 10-nucleosomesize arrays were observed in micrococcus nuclease (MNase) digestion compared to 5-or 6-nucleosome size as the largest fragment observed for the genome as a whole (Sun et al., 2001). Knowledge of the precise nucleosome location and its analysis in this region is essential elucidation of the context for the various features of pericentric heterochromatin and its influence on chromosome architecture. We possess the opportunity to map the nucleosome occupancy signal along the DNA in large scale and analyze its features. We used previously published algorithm of the nucleosome positioning calculations on the bases of two hierarchy ordered trinucleotide matrixes with adapted usage of them to AT-rich or GC-rich fragments (Fedoseyeva & Alexandrov, 2007).

Materials and methods
Nucleotide sequences were used from GenBank sequences (Release5) for 2L chromosome beginning with 21,800 kbp up to the satellite repeats, that is 1200 kbp by length, as well as for 2R beginning from 1 up to 1200 kbp, for 3Lchromosome the beginning point is 22,730 kbp and up to the centromere, that is near 1800 kbp by length, for 3R from 1st nucleotide up to 600 kbp. Calculation of nucleosome positioning signal (NPS) in conventional units was performed according to the method (Fedoseyeva & Alexandrov, 2007) by using Turbo Pascal calculating program. Initial value is the maximum value of NPS represented approximately for each 10-11 bp and final value of NPS is the averaging of initial values in window length of 400 bp. Analysis was presented as correlation coefficient calculation between NPS and sinusoidal function in which wave lengths and phases were varying parameters (analog of Furrier coefficients). We chose sin function among the various periodical ones due to more distinctness of specters were achieved. Each point in the specter profile is the maximum value among all possible phases (phase increment equal π/6 radian, phase interval [0;2π]), so each specter profile is the dependence of correlation coefficient signal (CCS) upon wavelength of sin function which NPS is compared with. NPS/CCS is monitored in different windows with the lengths from 200 up to 600 kbp in the sliding manner, for special cases 1000 kbp. Window displacement pace of window was 50 kbp for 200, 300 kbp window length, 100 kbp for 400, 500, 600 kbp window lengths if not stated otherwise. Scale interval in all figures of CCS specter is equal to .3. This analysis program implemented in Excel was presented in Microsoft Office package.

Results and discussion
Discontinuous periodicity of signals in pericentric heterochromatin region In the Figure 1, the profiles are suggested for the coefficients of correlation signal (CCS) between NPS and sinusoidal function as the dependence of wavelength. The cluster size assessment of the nucleosome occupancy corresponds to the half of sinusoidal wavelength with maximal correlation value. The scan was suggested for different positions of sliding windows (200, 300, 400, and 500 kbp by length) (Figure 1(A) a-u, (B) a-l, (C) a-i, (D) a-h) bottom up with 50 or 100 kbp window displacements throughout the total nucleotide fragment (1200 kbp). Initial nucleotide corresponds to 21,800,001 bp of 2L (D m). There are some regions corresponding to long wavelengths with significant correlation coefficient values in pericentric heterochromatin area. In this heterochromatin zone, according to our observations, it is relevant to distinguish the closest proximity, proximal, intermediate, and distant regions concerning the centromere. The nearest to the centromere portion is characterized by 38 kbp (Figure 1(A), u, peak III), and by 60 kbp (Figure 1(A), t peak IV), proximal portion by 45-50 (peak II) and with dominating 80 kbp (peak I) peaks. For distant portion, the domination is revealed approximately for 50 kbp wavelength (Figure 1 (B), c, e) and there is also the fuzzy peak V (equal 60 kbp by wavelength approximately, Figure 1 Intermediate region has weakly expressed spectral characteristics. It should be noted that in dominating of 80 kbp signal, the significant role belongs to the CG40006 gene ($150 kbp Figure 1(E)) with short exons and long introns enriched of numerous sequence repeats. As is shown in the Figure 1(F), the comparison of the distant with proximal regions (each 600 kbp by length) reveals the distinct domination of 80 kbp in the proximity to the centromere. As is shown in Figure 1(D), there are some fuzzy peaks (VI peak and similar) in the long wavelengths' region corresponding to the distances between the higher NPS regions. The histogram of the average NPS level (Figure 1(G)) also demonstrates non-uniform profile, at least two regions of higher NPSs should be distinguished.
For 3L chromosome portion beginning from 22,730 kbp up to centromere (total 1800 kbp), it should be noticed the presence of euchromatin region besides heterochromatic one. The euchromatin features are well discernible (low density of different repeats, dense presence of short and middle length genes, short intergene spaces) in the first 200 kbp fragments as shown in Figure 3(A) (a-d), Figure 3(B) (a-b), Figure 4(A). There are multiple insignificant peaks in the specter region up to 32 kbp wavelengths, minimal signal if any in the range of larger wavelengths according to CCS profiles. This spectral interval boundary approximately coincides with epigenetic border, e.g. abrupt strong enrichment of H3K9me2 marks as well as repeat density increase which last in further movement to centromere (Riddle et al., 2011).
There are strong dominating signals corresponding to 78-85 kbp wavelengths (Figure 3, peak I) and significant CCSs corresponding to 50 kbp approximately (Figure 3, peak II) from 200 kbp up to 1200 kbp of the current scale in such order towards centromere (Figure 3(A) f-t and Figure 3(B) c-k, informative features in Figure 4(A-D)). The dominancy of the peak I compared to the others was justified by the averaging realized for spectral profiles from (f) to (t) passages (Figure 3(A), 300 kbp window length) throughout 1000 kbp fragment. This generalized profile of CCS is depicted in Figure 3(E). In Figure 3(F), CCS profile is depicted for 1000 kbp window length realized for the same boundary points. It should be concluded about the noticeable quality of NPSs phasing from the fact of peak I domination even for 1000 kbp window length. There is a distinct strong signal with 69 kbp wavelength (Figure 3(D), peak III in the 1050-1500 kbp range of current scale) in further movement towards the centromere, and also the distinct peak corresponding to 50 kbp in this wave range (Figure 3(A)-(C), peak IV, in the 1550-1750 kbp interval of current scale) is in closeness to centromere.
In the histogram (Figure 3(G)), the average values of NPS in conventional units are presented and calculated for each sliding window location in passage from (a) to (b1) (Figure 3(A)). It may be observed the crests and troughs that reflect non-uniformity of heterochromatin influenced by specific types of transposon elements as well as gene locations.
In Figure 4(A)-(F), structural features of total fragment are depicted in linear manner from 22,730 kbp up to 24,543 kbp of 3L arm. By simple viewing NPSs from the fragment beginning to the end, it is possible to verify the details of periodical properties detected above and comparative appraisal of nucleosome occurrence in genes are encountered. By viewing the scheme of genes, it should be noted that the transcription direction of almost all genes with long introns is directed to the centromere, CG40470 and jim genes are the exceptions with the opposite direction. This notion is essential as for fission yeast (Lengronne et al., 2004), where the transcriptional directions are tightly connected with site occupation of one of the cohesion proteins.
For the different repeats represented at various scheme levels, it should be noted the abundance and to some extent interchange of two main types: retrotransposon elements (Gypsy family framed by Long Terminal Repeats, Copia, Pao and others) and Line Class (Jockey, Penelope, CR1, R1 families) as well as the presence of DNA Class, short or long simple and AT-rich of low complexity repeats. Regions of high NPS value are in accordance with high-repeats density. However in detail, it is not possible to detect a strong correlation between each repeat type density and the level of NPSs with the exception of AT-rich short and long repeats that has a positive and negative impact on NPS level accordingly.
If to transfer this information especially on the geometrical proportions of chromosome architecture, then in the case of 3L, these dominating wavelengths (78-85 kbp) in distant heterochromatin area most probably correspond to circumference of cylinder as a chromosome model in simple case of one-peak local domination. Probably this representation concerns G2 and pro-metaphase stages in cell cycle. Half values of these wavelengths approximately correspond to the nucleosome occupancy cluster sizes and the dominating wavelengths by themselves correspond to the repetition period of clusters. In movement towards the centromere, the circumference length gradually diminishes from 78-85 kbp (Figure 3(B) j-k, peak I throughout 500 kbp fragment, we chose the peak of maximum wavelength in this range), further 69 kbp (Figure 3(B) m, Figure 3(D), peak III, throughout 300 kbp fragment) and at last to 50 kbp wavelength (Figure 3(B) o, Figure 3(C), peak IV, throughout 200 kbp fragment). This gradual diminishing of dominating wavelengths may be indicative of a tendency to change the chromosome cylinder circumference towards the centromere. In juxtaposed to centromere position there is CCS peak corresponding by wavelength to 8 kbp approximately, e.g. clusters of nucleosome occupancy of 4 kbp by size periodically occurring in the 8 kbp interval (for 1740-1775 kbp region in the current linear scale). It may be that these nucleosome occupancy repeats corresponding to short wavelength signals are connected with kinetochore functioning.

Influence of genes with long introns
In Figure 5(A), the CCS profiles are presented for separate genes with long introns located in different heterochromatin regions of 3L chromosome arm discussed above. They have strong influence definite in each location on the whole profile, especially due to their long introns full of various repeats. It is noticeable that upon the elimination of two genes AGO3 and CG70470 from 1000 kbp fragment (the central portion of pericentric heterochromatin) the peak I (82 kbp) loses to some extent the dominant property in specter profile as shown in Figure 5

Comparison with epigenetic marker and HP1 localization data
It is noticeable to compare the NPS profile of our data with H3K9me2 modifications' enrichment in constitutive heterochromatin and euchromatin-heterochromatin junction in high-resolution manner (Yasuhara & Wakimoto, 2008). Due to detailed comparison, we could notify the coincidence of the profiles of two types (NPS and H3K9me2 signature) to a significant extent for distinct genes such as CG40006, light, CG17018, Nipped-A and their environments as well as for euchromatin-heterochromatin junctions for 2L and 3L chromosomes. Another suggested opportunity is the comparison NPS data with detailed H3K9me2 mark profile of 3L pericentric region for different fly growth stages and cell lines (Riddle et al., 2011). Especially for embryos and larvae and partially for fly heads, these marks profiles are characterized by clusters of signal enrichment with inter-distances reminding the periodicity features in NPSs. For cell lines, the distinct cluster repetitions are in some cases less discernible. Profile of the repeats' density also displays some wavy character (Riddle et al., 2011) but without strong coherence with epigenetic marks or NPS/ CCS periodicity according to the data presented here.
As bipolar properties are ascribed now to the functioning of heterochromatic protein 1 (HP1) by its property to participate in silencing and simultaneously have expression effects on genes depending on their lengths (De Wit et al., 2007;Lundberg, Stenberg & Larsson, 2013), it was of interest to compare NPS and HP1 occurrence level. The high-resolution data of HP1 occupancy were presented only for restricted regions in 2L pericentromere, preferentially coincide with NPS data (lt, cta genes, 21,800-22,200 kbp portion of 2L, 0-1200 kbp of 2R arms).
The higher nucleosome occupancy may lead to realization of conditions or preliminaries to transfer to silencing status in support of the initiation stages of nucleosome modification connected with the complexes of small RNA and protein modifiers with following recruitment of HP1 or HP type proteins. One of the stages is the recruitment of cohesion protective proteins to HP1 in pericentric heterochromatin. This view relies on the data concerning the cohesion protective proteins recruitment by HP1 (Kiburz et al., 2005;Sokuno & Watanabe, 2009), when it is essential that NPS clusters and perhaps HP1 have to a some extent the preference to localize approximately at one half-pipe side (half-pipe model).
In summarizing these data it should be underlined that these specters correspond to the variegation character of heterochromatin in pericentromere area. These features correlate with the variability of small and long genes locations as well as repeats density. Constantly, reminiscent values in the range of 50-85 kbp wavelengths with some interruptions for each chromosome argue in favor to the inconstant diameter of chromosome. It is possible to interpret half-pipe preference in nucleosome position in the frame of cylinder model in connection with the functions of protective proteins those in their location, have to protect cohesion complex of sister chromatids from irrelevant destruction (Kiburz et al., 2005;Sokuno & Watanabe, 2009).
It should be emphasized that in comparison between 2nd and 3rd chromosomes that constitutive heterochromatin area is distributed for 2nd chromosome on the right and left arms in juxtaposition to centromere, however for the 3rd chromosome analogous area is located predominately on the left arm. The right arm area juxtaposed to the centromere is enriched of high-density short genes, short inter-gene distances and heterochromatin properties are presented in the area more distantly >700 kbp, where large inter-gene regions are presented. So that the constitutive heterochromatin properties are more concentrated on the left arm of 3rd chromosome when compared with 2nd chromosome where they are distributed between two arms.