Trinucleotide’s quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff’s second parity rule

For almost 50 years the conclusive explanation of Chargaff’s second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson–Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson–Crick base pairing generates CSPR. We demonstrate quadruplet’s symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These “landscapes” are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1–12, and X, Y the “landscapes” are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.


Introduction
Quantitative comparative genomic analysis revealed several universals of genome evolution that come in the form of distinct distributions of certain quantities or specific dependencies between them. But what is the nature of the genomic universals? Do they reflect fundamental "laws" of genome evolution or are they "just" pervasive statistical patterns that do not really help us understand biology? (Koonin, 2011). The idea of symmetry role in natural laws is present in science as a segment of general and not yet fully understood symbiosis of mathematics and natural laws (Gross, 1996;Wigner, 1969aWigner, , 1969b. Symmetry concept has been used to propose models for genomics, for example (Afreixo, Rodrigues, & Bastos, 2015;Afreixo et al., 2013;Bashford, Tsohantjis, & Jarvis, 1998;Glazebrook & Wallace, 2012;Innocentini, Forger, Ramos, & Hornos, 2010;Kong et al., 2009;Nikolajewa, Friedel, Beyer, & Wilhelm, 2006;Rosandić & Paar, 2014;Yamagishi & Herai, 2011).
Recently it was pointed out that the CSPR may be maintained in nature by alternating sequence segments with different signs of deviation from parity (Rapoport & Trifonov, 2013). The net result was that irrespective of presence or absence of the alternation pressure, the CSPR does hold for long enough sequences.
Alternatively, it was suggested that CSPR would probably exist from the very beginning of genome evolution and the information revealed from modern genome structures in terms of small oligonucleotide frequencies could be helpful for reconstruction of the primordial genome as well as for further understanding of the pattern of genome evolution. Assuming symmetries arising from primitive genomes could shed light on the origin of genomes, and even on the origin of life (Sobottka & Hart, 2011;Zhang & Huang, 2008, 2010Zhang et al., 2013). It was noted that the CSPR can reveal general speciesindependent properties, and have remarkable implications of some unknown mechanism that seems to be present (Albrecht-Buehler, 2007a, 2007bRapoport & Trifonov, 2013).
In previous investigations of frequency profiles, the trinucleotides were ordered alphabetically, and thus the four trinucleotides corresponding to each quadruplet, which have nearly the same combined double-strand frequency, were scattered and not easily recognizable.
Here we map concatenated nucleotides from a given genomic sequence into schematic presentation of weighted trinucleotides (embodying frequencies of appearance), organized into 20 trinucleotide's quadruplets. This is convenient for survey and display of frequency profiles and mirror symmetries. Also, the Pu-Py symmetry is clearly seen in quadruplet representation. These quadruplets embody information of nucleotide content in genomic sequence, but are disentangled from information about the way in which these nucleotides are scattered throughout the genomic sequence. This presentation is visualized in terms of 20 Q-diboxes which consist of two concatenated 2 × 2 Q-boxes (direct-reverse complement and complement-reverse), corresponding to double-stranded DNA.
We study possible causes of symmetries for trinucleotides within double-stranded DNA by introducing the concept of a fundamental law of DNA creation. This concept leads to a regularity of trinucleotide quadruplets between both strands as fundamental quadruplet structure of DNA and the associated interstrand diagonal crosswise direct-reverse mirror symmetry and intrastrand higher order mirror symmetry on ground of purinepyrimidine relationship. A consequence of quadruplet symmetries between both strands of DNA is the CSPR within the same strand. The novel concept of trinucleotide's quadruplet frequency profiles may provide a clue to a more profound understanding of genome evolution.
2. Mapping of genomic sequence into 20 double-strand trinucleotide's quadruplets 2.1. Start/stop codon like trinucleotide matrix and trinucleotide's quadruplets (equivalence class quadruplets) Here we investigate trinucleotide's quadruplets which consist of a trinucleotide, its reverse complement (RC), complement (C), and reverse (R), accompanied by frequencies of constituent trinucleotides. Every quadruplet represents an equivalence class (Supplementary material). A selection criterion for the choice of representative trinucleotides for 20 quadruplets is guided by start/ stop codon-like trinucleotides classification scheme (Table 1), based on . In this way, we obtained two quadruplet groups: 10 A+T rich and 10 C+G rich trinucleotide's quadruplets. The first (representative) trinucleotides in each quadruplet are referred to as direct trinucleotides and denoted D. It should be noted that each quadruplet is unique in the sense that any of its four trinucleotides can be used as representative.

Correlation between quadruplets from novel trinucleotide classification and Chargaff's second parity rule
In quadruplets starting with trinucleotides D = TAA, ATA, and AAA, the constituent trinucleotides D+RC(D) and C(D) + R(D) have the same nucleotide composition 3A+3T (Table 2). Thus each of these three quadruplets contains 6A+6T nucleotides. For each of the remaining seven A+T rich quadruplets, starting with ATG, TGA, TAG, AAC, AAG, ACA, and AGA, the constituent trinucleotides D+RC(D) and C(D)+R(D) have the same nucleotide composition 2A+2T+1C+1G. Thus, each of these seven quadruplets contains 4A+4T+2C+2G nucleotides.
Analogously, in the C+G rich group of trinucleotides each of the three quadruplets starting with GCC, CGC, and CCC contains 6C+6G nucleotides. Each of the seven remaining C+G rich quadruplets, starting with CGT, GTC, GCT, CCA, CCT, CAC, and CTC, contains 4C+4G+2A+2T nucleotides. The exact nucleotide frequency equalities A=T and C=G within each quadruplet is satisfied also in its D−RC and R-C segments.
times the quadruplet structure is multiplied, the intrinsic quotient of nucleotides remains the same according to the above scheme. This quotient remains the same even if doublets D−RC are reproduced with one frequency and C−R with another, which is the case as seen from the empirical CSPR: in all species with double-stranded DNA, both in prokaryotes and eukaryotes, the CSPR relations f(D) = f(RC(D)) and f(C(D)) = f(R(D)) (f(E) denote the frequency of some trinucleotide E) is approximately satisfied within the same strand (deviations at most .01-.02).
Reducing the size of DNA sequence considered, deviations from the CSPR become more and more pronounced because of disturbance of intrinsic trinucleotides's balance in 20 quadruplets.
If a sequence is too short, below about 100 kb, we will not have statistical relevance in number of mononucleotides or oligonucleotides, and this rule is gradually disappearing. This indicates that the reason why shorter sequences do not show CSPR pattern is possibly not a feature of DNA itself, but may be a drawback of statistics.

Interstrand direct-reverse crosswise diagonal symmetry of Q D-RC -box and Q C-R -box
We map a given genomic sequence into a symbolic presentation of quadruplets. As an example, let us consider a quadruplet generated by trinucleotide D = ATG. It is symbolically presented by quadruplet boxes Q D-RC -box (D) and Q C-R -box(D), forming a Q-dibox(D) as shown in Figure 1. The quadruplet box Q D-RC -box is a 2 × 2 matrix: the first row corresponds to trinucleotides D, RC (D) (ATG, CAT) from the top strand, and the second row to their complements C(D), R(D) (TAC, GTA) from the bottom strand (both rows read from left to right).
Analogously, in the Q C-R -box the first row contains C(D), R(D) (TAC, GTA). In the second row are D, RC (D) (ATG, CAT) (read from left to right, i.e., as in the first row). Concatenated Q D-RC -box and Q C-R -box form Q-dibox(D).
In fact, the proposed quadruplets allow observing the simultaneous influence of the first and second Chargaff's rule. A more clear explanation of Figure 1 can be seen from Q-dibox matrix presentation in Figure 2. Given a trinucleotide D, its corresponding Q-dibox is a matrix in Figure 2(A).
Due to the Watson-Crick base pairing there follows: each time we see a D in the top strand, we will see a C (D) in the bottom strand; each time we see a RC(D) in the top strand, we will see a R(D) in the bottom strand; each time we see a C(D) in the top strand, we will see a D in the bottom strand; and each time we see a R(D) in the top strand we will see a RC(D) in the bottom strand. Hence (using colors to indicate the frequencies which are equals), the Q-dibox has the frequency pattern shown in Figure 2(B).
In the next step, if the CSPR holds for this genome, in each strand the frequency of D will be almost the same as of RC(D), and the frequency of C(D) will be almost the same as of R(D). In this case (using the color to indicate that frequencies are almost equal), the matrix for Q-dibox becomes a two-frequency pattern as shown  in Figure 2(C). In both Q-boxes, the frequency of each of 16 symmetric trinucleotides is the same.
Our approach is alternative as follows: due to the crosswise diagonal direct-reverse mirror symmetry between both strands, frequency of D equals to the frequency of R(D) (both denoted f 1 ) in the left half of Q-dibox matrix and thus the frequency of D is almost the same as of RC(D). In this way, the CSPR would arise as a consequence of this symmetry. In analogy, frequencies in the right half of Q-dibox are mutually the same (denoted f 2 ). This presents an explanation of symmetries displaced schematically in Figure 1. Arguments and mechanisms that may lead to the two-frequency pattern 2C will be discussed in Section 4.
The above explanation works not only for trinucleotides, but also if we take D being any mono or oligonucleotide.
This new method could be useful to encode the information and to visualize the existence of symmetries in double-stranded DNA, where the first row of the quadruplet arises from the top strand, while its second row arises from the bottom strand.
Symbolic quadruplet pattern within Q D-RC -box has interstrand symmetry with respect to diagonal. Because of antiparallel structure of DNA, every trinucleotide in the first row (ATG) has its reverse (GTA) at diagonally symmetric position in the opposite row of Q-box with the same frequency, denoted f 1 as shown in Figure 1: ATG-GTA is a mirror symmetry construction. Their Watson-Crick complement pairs TAC-CAT have the same frequency f 1 and also the mirror symmetry (for the left to right reading). The Q C-R -box is also characterized by interstrand diagonal symmetry, but with another trinucleotide's frequency denoted f 2 .
In genomic sequence, the frequency f 1 of D and RC (D) generally differs from the frequency f 2 of C(D) and R(D) in the same strand when the nucleotide structure of D is nonsymmetrical (for example, like ATG). Then the Q D-RC -box and Q C-R -box have different frequencies, f 1 and f 2 , respectively. Only in cases when the nucleotide structure of D is symmetrical (like ATA, for example), the frequencies f 1 and f 2 are equal and then the same frequency appears for both Q D-RC -box and Q C-R -box.
Because of reverse symmetry between the first and second row within both Q-boxes, each entry of a trinucleotide into top strand, besides creating its Watson-Crick pair in the bottom strand, also leads to simultaneous entry of its reverse form (for the left to right reading) into bottom strand.
The interstrand reverse symmetry implies only reversed ordering of nucleotides within a trinucleotide, with no need for substitution replacing one nucleotide within a trinucleotide by another. In this sense, the interstrand direct-reverse symmetry between trinucleotides within a quadruplet is "privileged" over the intrastrand direct-reverse complement symmetry. Namely, in favor of priority of direct-reverse (D-R(D)) symmetry of quadruplet's trinucleotides between the two rows is the fact that, having the same frequency, they consist from the same purines and pyrimidines.
It should be noted that the Q-dibox symmetries can arise from Watson-Crick base pairing and CSPR acting together. In particular, the priority of interstrand directreverse symmetry is exactly the Watson-Crick base pairing, which implies that they have exactly the same frequency. We discuss and compare both possible approaches.
Analog situation appears for mononucleotides ( Figure 3). In the case of right to left reading in the bottom strand, the entry of one nucleotide in the top strand according to the interstrand crosswise diagonal direct-reverse mirror symmetry leads to entry of identical trinucleotide at some location within the bottom strand (and vice versa). Due to this principle during the evolution, DNA can be enlarged without violating its symmetry pattern.

Q-dibox and higher order mirror symmetry in the same strand
Within every Q-dibox, there is also higher order mirror symmetry for constituting Q D-RC -box and Q C-R -box in Figure 3. Illustration of insertion of mononucleotide A into symbolic frequency weighted Q-box. Mononucleotide A is inserted into the top strand and simultaneously at some displaced position into the bottom strand. Due to Watson-Crick pairing each of them will bind its complement T. Analogously enter C or G nucleotides. Thus a single nucleotide is not built in alone, but always as a member of quadruplet in both strands. The result of this process is the CSPR. This mononucleotide pattern of quadruplet symmetry leads to trinucleotide's quadruplet symmetry in Figure 1. the same strand, as shown in Figure 1. Diagonal crosswise direct-reverse mirror symmetry between both strands and higher order mirror symmetry in the same strand are characteristics of all quadruplets and their trinucleotides in each DNA.
Analyzing symmetries, we see that each Q-dibox consists of one quadruplet in each constituting Q-box. Also, two quadruplets are present in the top and bottom strand of Q-dibox. By additional combinations between both Q-boxes, we recognize totally six quadruplets. In the whole quadruplet construction of eight trinucleotides which encompass both strands, only two trinucleotides are different as in our example are direct ATG in the top strand and its complement TAC in the bottom strand. The remaining six quadruplets are repetitions of the same trinucleotides in opposite strand or their mirror forms in the same strand as shown in Figure 1.

Purine/pyrimidine relationshipessential symmetry in double-stranded DNA
Symmetries in double-stranded DNA are based on (A, T) and (C,G) Watson-Crick pairing between strands, i.e. of purine/pyrimidine type. Here, the symmetry between quadruplets from A+T rich or C+G rich group is also reflected in purine/pyrimidine balance: each quadruplet consists of six purines and six pyrimidines (Table 2). In this way, the purine/pyrimidine symmetry of quadruplets is characterizing the whole DNA. In our illustration of Q-dibox, ATG (D)-CAT (RC(D)), TAC (C (D))-GTA (R (D)), shown in Figure 1, the purine/pyrimidine ratio 6/6 appears also within each Q-box. Q-dibox of each quadruplet has purine/pyrimidine symmetry based on diagonal crosswise direct-reverse mirror symmetry within the same Q-box between both strands and by higher order mirror symmetry in the same strand between Q D-RC -box and Q C-R -box. Both symmetries appear simultaneously between the same purines or pyrimidines (A-A or G-G, etc.) This basically differs from Watson-Crick pairing, which is always of purine-pyrimidine type.
In order for A+T rich and C+G rich groups to be competitive in our new classification scheme , they are related on the basis of purine/pyrimidine symmetry as shown in Table 1.
Crosswise reverse diagonal symmetry within each Q-box creates symmetry between purines or pyrimidines in both strands and simultaneously leads to Chargaff's second parity rule (CSPR) in each strand.
In the ideal genetic code composed from leading and non-leading groups of trinucleotides (Rosandić & Paar, 2014), each consisting of 32 codons, the symmetry between groups is also based on strong purine/pyrimidine relation, what is not the case in standard genetic code. We confront both groups of codons by purinepurine or pyrimidine-pyrimidine coupling. In this way we obtain the structure of genetic code with strong purine-pyrimidine symmetries.
By purine-pyrimidine organization, we encompass all structures within double-stranded DNA showing that symmetries lie at the core of DNA creation.

Quadruplet histograms and 3D landscapes of different species
To illustrate the quadruplet symmetries of DNA in practice, we construct novel histograms with trinucleotide's quadruplet frequencies for A+T and C+G rich trinucleotides of different species. As an illustration, the frequency distribution is determined for three evolutionary distant genomic sequences: Escherichia coli (whole genome), Saccharomyces cerevisiae chromosome 4 (the largest chromosome in S. cerevisiae), and human chromosome 1 (Figure 4).
As a consequence of symmetry f(D, top strand) ↔ f (R(D), bottom strand), the combined frequency f 1 + f 2 is approximately equal for all four trinucleotides within each quadruplet, giving rise to the natural law of DNA creation (see also an enlarged segment in Figure 5). The natural symmetry law of DNA creation is discussed in Section 4.
Owing to interstrand crosswise diagonal directreverse mirror diagonal symmetry, the double-stranded DNA is more robust, with smaller fluctuations of trinucleotide's frequencies than for one strand. Lowering of frequency for a trinucleotide from a quadruplet in the top strand leads to increase of its frequency in the bottom strand, or vice versa; simultaneously, the combined frequency of trinucleotide is nearly the same. This balance is particularly pronounced in the case of human C+G rich quadruplets (Figure 4(F)). The A+T rich histograms for S. cerevisiae and for human genome are rather similar (Figure 4(C) and (E)), in spite of large evolutionary distance between organisms, which is in accordance with the concept of ab ovo Natural symmetry law of DNA creation.
These results are compared to frequencies for the corresponding random sequences (dashed line in Figure 4). As seen, the frequency histogram of prokaryote E. coli (Figure 4(A) and (B)), is closest to random sequence. In E. coli the ratio of A+T to C+G occupancies is about 50-50%, while eukaryotes have asymmetrical distribution of about 60-40%. For eukaryotes, the deviation with respect to frequencies characterizing random sequences is higher (Figure 4(C)-(F)).
Larger frequencies in the A+T rich histograms are associated with smaller frequencies in the C+G rich histograms and vice versa, in accordance with the concept of Natural symmetry law of DNA creation.
For symmetrical trinucleotides, the frequencies for direct and reverse trinucleotides (for example, direct  (Table 1). Vertical axis: trinucleotide's frequencies. Heavy shadowed segments: top strand frequencies (f 1 for D and RC(D), f 2 for C(D) and R(D)); light shadowed segments: bottom strand frequencies (f 2 for D and RC(D), f 1 for C(D) and R(D)). Combined frequencies: heavy plus light shadowed segments of columns. The combined frequency f 1 + f 2 is nearly equal for all four trinucleotides within each Q-dibox. Dashed horizontal line: frequency histograms for random sequences.
ATA and its reverse ATA) are the same and computation gives their sum f(ATA) + f(ATA). Therefore, the computed frequencies for symmetrical trinucleotides are divided by two to obtain frequencies of each trinucleotide in its quadruplet.
We also perform computations for a broader set of species along the evolutionary chain, from E. coli to Neanderthal and human genomes; their combined double-strand quadruplet frequencies are displayed using 3D-presentation ( Figure 6).
The combined double-strand quadruplet frequencies are nearly identical for trinucleotides within each quadruplet for a broad range of organisms. Differences between A+T rich quadruplets in mammals are below .5%, while in C+G rich quadruplets are slightly higher. In prokaryotes, these differences increase up to 2%. Approximately there is inverse proportionality between A+T rich and C+G rich quadruplets in all species: when the A+T rich increase then the C+G rich decrease, and vice versa, providing some internal balance between trinucleotides. This is consistent with the symmetry law of DNA creation.
We also compute quadruplet frequency distributions for each human chromosome (Figure 7). Although each human chromosome has its own characteristics, their frequency profiles of quadruplets are mutually nearly equal, with pronounced "valleys" and "ridges" in quadruplet's frequency 3D-landscapes. The largest differences to other species are located along the "ridge quadruplets" (for nonsymmetrical A+T rich quadruplets) TAA, TTA, ATT, AAT (about 2.5%) and for symmetrical ATA, TAT, TAT, ATA, and AAA, TTT, TTT, AAA quadruplets (about 1.5%).
In general, the smallest frequency below 1%, especially in human, is in the C+G rich "valley" for symmetrical quadruplet CGC-GCG-GCG-CGC. Somewhat more pronounced differences in C+G rich group are for E. coli and maize (Zea Mays). Relative frequencies of trinucleotides in all 20 quadruplets are almost identical in human and Neanderthal chromosome 1.   (Kelso, 2014;Pruefer et al., 2014) as the only extinct species included in this analysis. Combined two-strand frequencies within the same quadruplet are nearly the same. Differences between mammals are small. There are no differences between quadruplet frequencies for human and Neanderthal chromosome 1.
We find almost identical quadruplet frequency distribution of human chromosomes 1-12, which are of larger size and none of them telocentric. The remaining 12 chromosomes which are phenotypically more different (smaller and some of them telocentric) show slightly more differences in quadruplet frequency distributions (Figure 7). This also reveals the natural symmetry law of DNA creation.

The natural symmetry law of DNA creation
We argue that the creation of quadruplet structure of double-stranded DNA genomes in the evolutionary chain, from prokaryotes to eukaryotes, may be an interstrand phenomenon directed by natural symmetry law of DNA creation. In this model, each entrance of mono/ oligonucleotide in one strand is automatically accompanied by entrance of identical mono/oligonucleotide into another strand (reading in the 5′ → 3′ direction). Thus, the integrity of double-stranded DNA is preserved. In this way, double-stranded DNA might have been created in evolution ab ovo.
From the point of view of underlying processes, it is obvious that if the interstrand natural symmetry law holds, then the CSPR will hold, and vice versa, if the CSPR holds, then the interstrand symmetry law will hold. However, the first situation involves in the primary step that the types of added nucleotides are preserved in both strands, while in the second situation the change of type of nucleotides is involved. In this sense, the first situation might be of significance.
It should be stressed that the interstrand directreverse symmetry was already proposed previously as a result of stochastic process to simulate the creation of double-strand DNA (Sobottka & Hart, 2011). Among other assumptions, it was assumed that a nucleotide type has the same probability (50%) of being added into one strand or into the other strand. Such assumption means that for large DNA sequences, in accordance with probability theory, the same nucleotide type is added into each strand in very approximately similar numbers. Thus, it was proposed that this interstrand symmetry has a purely stochastic mechanism.
Several possible explanations for CSPR were presented (for example, Albrecht-Buehler, 2006, 2007bFickett et al., 1992;Forsdyke & Bell, 2004;Frenkel & Trifonov, 2012;Kong et al., 2009;Lobry & Lobry, 1999;Sueoka, 1995). An evolutionary mechanism leading to the CSPR was suggested recently (Frenkel & Trifonov, 2012) based on tandem repeat expansion as a major vehicle of genome evolution. Some explanations (Zhang & Huang, 2010) proposed CSPR as being the original trait and relic of the primordial genome, i.e., the most primitive nucleic acid genome for Earth's life would already possess the feature of CSPR (Zhang & Huang, 2008). One could hypothesize whether primordial genome characterized by such symmetry might have been related to a kind of natural symmetry law already present in the very process of DNA creation.
In the present framework of trinucleotide's quadruplets, the focus is on interstrand direct-reverse symmetry, without proposing a specific mechanism of this law. The previously proposed stochastic mechanisms can lead to CSPR as an asymptotic stationary equilibrium state (Sobottka & Hart, 2011), but they might be time consuming. The fact that among genomes of thousands of different species in a wide range of evolutionary scale Figure 7. Quadruplet frequency 3D-diagrams for human chromosomes. There is high degree of similarity between chromosomes, in particular for chromosomes 1-12 (blue scale) which have rather regular shapes (metacentric, submetacentric, and acrocentric). Smaller and more irregular (telocentric) chromosomes 13-22 mutually differ more. X and Y chromosomes, although phenotypically sizably differ, have mutually similar quadruplet frequencies.
(see also Figure 6) practically there is no exception to the CSPR, might be considered as a hint of achieving the CSPR pattern on a rather short time scale. This could point to a type of fundamental symmetry law leading to straight establishment of direct-reverse interstrand symmetry during DNA creation.
On the other hand, we note that similar line of thought is familiar in the frame of physical sciences, where the fundamental natural laws appear as realization of certain symmetry requirements, without explanation in terms of a specific mechanism (Wigner, 1969a(Wigner, , 1969bGross, 1996). A classic example is emergence of the fundamental law of conservation of energy as a consequence of time symmetry, as mathematically proved by Emmy Noether in 1915 (Byers, 1999;Gross, 1996).
Another possible origin of such symmetry forcing might lie in general principle of minimum potential energy (Dill, Phillips, & Rosen, 1997;Dinner, Šali, Smith, Dobson, & Karplus, 2000;Doye & Wales, 1996;Hart et al., 2012) applied to large DNA molecule as a holistic quantum-mechanical system, which is, however, far out of reach of computational capabilities.
In general, one might hypothesize whether some universals of genome evolution might qualify as some "laws of evolutionary genomics" in the sense "law" is understood in modern physics (Koonin, 2011). As a less ambitious statement in the absence of proposing specific mechanism underlying interstrand symmetry, strictly speaking one could at most argue that if there would exist a mechanism due to which mononucleotides or oligonucleotides are added into each strand with the same frequency, such mechanism could explain CSPR. In this way, it is possible to speculate about some unknown symmetry law of DNA creation. We also note that a previous idea that some features of DNA could be inherited from an ancient genome (Zhang & Huang, 2010) might be related to the present natural symmetry law of ab ovo DNA creation.
We propose that DNA growth might be viewed as being programmed from start by nonlocal natural symmetry law of DNA creation. In this approach, we map the whole DNA to a set of 20 frequency weighted symbolic quadruplets of trinucleotides, characterized by pronounced symmetries. It could be possible that then the CSPR emerges automatically within the same strand as energetically favored solution. Indication in favor of such interpretation might be already mentioned empirical finding that CSPR is present in double-stranded DNAs of thousands of organisms along the evolutionary chain, while it is absent in single-stranded DNAs.
The present approach may resemble the well established role of symmetries in some other fields of science. Here, we consider interplay of DNA language and symmetry forcingas a possible simple but magnificent aspect for the code of life.

Supplementary material
The supplementary material for this paper is available online at http://dx.doi.org/10.1080/07391102.2015.1080628