A comprehensive analysis of NAC gene family in Oryza sativa japonica: a structural and functional genomics approach

Abstract NAC gene family regulates diverse aspects of plant growth and developmental processes. The NAC DNA binding domains together with cis-acting elements play inter-related roles in regulating gene expression. In this study, an in silico approach for genome wide analysis of NAC gene in Oryza sativa japonica lead to an identification of 11 NAC genes, distributed over 12 chromosomes. A detailed analysis of phylogenetic relationship, motifs, gene structure, duplication patterns, positive-selection pressure and cis-elements of 11 OsNAC genes were performed. Three pairs of NAC genes with a high degree of homology in terminal nodes were observed and were inferred to be paralogous pairs. One conserved NAC domain was analyzed in all the NAC proteins. Only one gene was studied to be intronless and the majority had 2 introns. Segmental gene duplication pattern was predominant in 11 NAC genes. Ka/Ks ratio of 3 pairs of segmentally duplicated gene was substantially lower than 1, suggesting that the OsNAC sequences are under strong purifying selection pressure. NAC74 and NAC71 gene showed the maximum responsiveness for several factors. The paralogous genes, NAC2 and NAC67 were found to have maximum mya values, respectively. They showed maximum difference amongst themselves in all the categories of responsiveness. Responsiveness towards abscisic acid was observed to be absent in NAC67, but present in NAC2, while responsiveness to meristem inducibility was observed to remain absent in NAC2 but present in NAC67. These results would provide a platform for the future identification and analysis of NAC genes in Oryza sativa japonica. Communicated by Ramaswamy H. Sarma


Introduction
Rice (Oryza sativa) is the staple food crop for 2.5 billion people worldwide and have the second highest production after Zea mays (Fairhurst & Dobermann, 2002). Oryza sativa japonica is one of the major subspecies of rice other than Indica and javanica. Oryza sativa genome has a haploid chromosome number (n ¼ 12) of 370 Mb size that encodes 36,000 protein coding genes. In most eukaryotic genes, the expression is dependent on different transcription factors (TFs) which binds to the cis regulatory region of the genes upstream of the promoter and modulate transcription initiation by RNA polymerases. The specificity of these TFs is governed by the presence of their characteristic functional domains (Agarwal et al., 2011). The NAC genes are among the largest family of plant-specific transcription factors with more than 50 transcription factor groups in rice. (Kawahara et al., 2013). NAC gene family originated from three major transcription factors: ATAF1-2 (Arabidopsis thaliana activating factor), NAM (No apical meristem, Petunia) and CUC2 (cupshaped cotyledon), which share a common highly conserved N-terminal DNA binding domain with a more variable C-terminal domain (Aida et al., 1997;Hu et al. 2006;Olsen et al., 2005). The NAC transcription factors function by modulating its target genes via the 'CATGTG' motif (Nakashima et al., 2007) where it initiates the transcription in the promoter region of the target gene.
Alongside cis-acting elements, NAC transcription factors play a significant role in gene expression regulation through the modulation of the genes in nearly all areas of cellular activity. (Xiong et al., 2005). NAC gene family have shown to play a vital role to regulate diverse aspects of plant growth and developmental events like seed development, flower formation, embryo development, shoot apical meristem formation, hormone signaling, fiber development, leaf senescence and cell division (Duval et al., 2002;Greve et al., 2003;Kim et al., 2006;Ko et al., 2007;Liu et al., 2009;Souer et al., 1996;Sperotto et al., 2009). NAC transcription factors have been widely explored in the involvement of abiotic stress response and several studies have previously reported that the over expression of OsNAC5 (Jeong et al., 2013;Takasaki et al., 2010), OsNAC6 (Kikuchi et al., 2000;Nakashima et al., 2007), OsNAC9 (Redillas et al., 2012), OsNAC10 improves the crop's resistance towards drought and salinity (Jeong et al., 2010;Jiang et al., 2019). Additionally, higher expression of OsNAC2 demonstrated an increase in tiller angle of rice height (Chen et al., 2015), increased shoot branching (Mao et al., 2007), enhanced leaf senescence (Mao et al., 2017) and higher crop yield (Jiang et al., 2018).
The availability of several complete plant genomic sequences led to the identification of 151 NAC genes (Nuruzzaman et al., 2010), and of them, only a few have been characterized. Therefore, there is a need for further exploration and detailed comprehension of the specific biological function of various NAC transcription factors to understand how they play role in the regulation of various processes of plant physiology and abiotic stress response such as drought, salinity, cold and heat. Nuruzzamen et al. (2010) demonstrated an evolutionary relationship of NAC in rice (OsNAC) proteins based on their domain's structure (Nuruzzaman et al., 2010). They exhibited through phylogenetic analysis that this large family of transcription factors consists of groups that are closely related to each other (Nuruzzaman et al., 2010). They also performed a comparative analysis of rice NAC proteins with that of Arabidopsis and demonstrated the divergence between NACs of monocot and dicot plants (Nuruzzaman et al., 2010). Furthermore, they experimentally demonstated involvement of NAC genes in abiotic and biotic stresses (Nuruzzaman et al. 2013). Lin et al. (2007) also demonstrated that proteins with similar domains have a similar biological function. NAC48 were reported to target the genes involving in drought stress tolerance (Takasaki et al., 2010). It has also been documented to activate genes like LEA19 that are involved in membrane modification, nicotianamine biosynthesis, glutathione relocation, accumulation of phosphoadenosine phosphosulfate and glycosylation in roots (Lee et al., 2017). NAC2 transcription factor was reported to possess transactivation activity (Hu et al., 2008) and it plays a positive role during dehydration and salt stress by binding specifically to the 5 0 -CATGTG-3 0 motif found in promoters of stressresponsive genes (Hu et al., 2008). NAC22 transcription activator that binds sequence-specific DNA motifs has also been reported to involve in stress response and play a positive role in drought and salt stress tolerance through the modulation of abscisic acid-mediated signaling (Hong et al., 2016). The transcription factor, NAC10 was documented to be associated with male fertility and participates in anther development, but play no role in senescence (Distelfeld et al., 2012). NAC45 transcription activator is also involved in responses to drought stress and salinity stress by trans-activating the stress response genes LEA19 and PM19L (Zheng et al., 2009). Detailed investigation on the evolution of NAC genes, its structure, gene duplication and analysis of cis-regulatory elements is yet to be done and therefore serves as an interesting element of the current study.
The focus of this study is to perform a systematic genome-wide identification and analysis of NAC genes in Oryza sativa japonica, in terms of gene structure, exon-intron map, their phylogenetic relationship, position on the chromosome, conserved motif region, tissue specific expression, cis elements in the promoter, their duplication and evolutionary relationships. The detailed genome-wide identification and analysis of expression and phylogeny of NAC genes would provide a basis for future analysis of NAC genes and to understand their roles in plant development and during the stress response. Analysis of integrated synteny and phylogeny would pave the way to study the biological functions unknown genes using information from their better annotated orthologues and paralogues. These results would provide a platform for the future identification and analysis of NAC genes with their biological functions in other crops species.

Materials and methods
2.1. NAC gene family identification in rice (Oryza sativa subspecies japonica) The total set of genes in the NAC gene family of Oryza sativa japonica was identified from the NCBI database. A total of 110 genes were retrieved from NCBI. These genes were then validated by comparing with the UniProt database (The UniProt Consortium, 2019) to search for the annotated and reviewed gene. After the comparison eleven reviewed NAC genes (i.e. NAC48, NAC2, NAC71, NAC22, NAC10, NAC45, NAC77, NAC68, NAC76, NAC74 and NAC67) were identified. The sequences for genomic DNA and CDS of these genes were retrieved.

Analysis of gene position and structure prediction of NAC gene family in Oryza sativa japonica
The position of each NAC gene was shown in their respective chromosome and their chromosome maps were constructed by Chromosome Map Tools at Oryzabase server (http:// viewer.shigen.info/oryzavw/maptool/MapTool.do) (Kurata & Yamazaki, 2006) where locus ID of each gene extracted from the NCBI database was used as a query for the server. For the analysis of gene structure, Gene Structure Display Server (Hu et al., 2015) was used to construct and find out the exon, intron and UTR within the gene by using CDS and genomic sequence of respective NAC gene.

Physiochemical characterization and subcellular localization analysis of OsNAC gene family
Expasy's ProtParam server (Gasteiger et al., 2005) was used to characterize the physiochemical properties of NAC genes in terms of isoelectric point (pI), molecular weight, grand average hydropathy (GRAVY) values and other properties. Isoelectric point (pI) corresponds to a pH value at which a net protein charge is zero, while GRAVY indicates the extent of hydrophobicity and solubility of the protein. Subcellular localization of 11 NAC proteins were analyzed using PSORTb programme (Yu et al., 2010).

Identification of conserved domains and motif
Pfam database (Punta et al., 2012) was used to identify the NAC specific domain using protein sequence of respective 11 NAC genes as input. Pfam works based on the Hidden Markov Models (HMMs) to identify protein domains from a protein family database (Punta et al., 2012). The ExtractAlign utility of EMBOSS Explorer (http://www.bioinformatics.nl/ emboss-explorer/) (Itaya et al., 2013;Rice et al., 2000) was then used to extract the domain specific sequences using the results obtained from the Pfam. The domain sequences these of eleven NAC proteins were then subjected to multiple sequence alignment utilizing ClustalOmega (Thompson et al., 1994) and the conserved amino acid sequence of NAC genes were then analyzed. The motifs present in the 11 reviewed NAC genes were identified using the MEME (Multiple EM for Motif Elicitation) program (Bailey et al., 2006). To identify conserved motifs in these sequences, the selection of maximum number of motif was set to 10 with minimum and maximum width of amino acid was 6 and 55 respectively. The distribution of one single motif was considered as any number of repetitions while the other factors were of default selections. The conserved regions within NAC genes were executed through MEME (Bailey et al., 2006) and were screened through the Pfam database (Punta et al., 2012).

Cis-regulatory element analysis
PLANTCARE Program was utilized for analyzing plant cisregulatory DNA elements for 11 NAC genes. The sequences of NAC genes were used as input in PLANTCARE Program (Lescot et al., 2002).

Tissue specific expression of NAC gene in Oryza sativa japonica
To get the information on gene expression patterns of various tissues, RNA-seq (RNA sequencing) data was utilized. RNA-seq data is available from Rice Expression Database (RED). It comprises high-quality RNA-Seq data having essential criteria like sequencing reads should be ! 50 bp. Rice Expression Database (RED) (Xia et al., 2017) was utilized to understand the expression of 11 OsNAC genes in different tissues (aleurone, anther, callus, leaf, panicle, pistil, root, seed and shoot). For the different tissues, respective box plots and heat maps were simultaneously plotted using the loci ID and also using the 'number of fragments aligned per kilobases of the transcript per million mappable fragments' (FPKM) expression. The heat map was mean centered by genes using the scale À4 for minimum expression (blue shade) and þ4 (red color) for maximum expression.
2.7. Evolutionary analysis of NAC genes in Oryza sativa japonica 2.7.1. Phylogenetic analysis and paralogous identification of OsNAC gene MEGA10 (Kumar et al., 2018) was used to construct the phylogenetic tree and find out the paralogous genes among eleven NAC genes. Protein sequences of 11 NAC genes were used as input and the tree was constructed using Maximum Likelihood method. JTT substitution model were used, considering uniform substitution rates for each residues and the 1000 bootstrap replications were selected for final construction. The paralogous genes were respectively identified using the phylogenetic tree where the internal node of the tree indicates a gene duplication event and hence paralogous (Xiong, 2008). This result obtained from from phlogenyetic tree was further validated by performing pairwise alignment of each pair of proteins that constituted as internal node in the phylogenetic tree.

Classification of paralogous gene, calculation of Ka/ Ks and duplication time:
Duplication such as tandem or segmental may occur when two paralogous NAC genes are located on the same chromosome or separate chromosomes, respectively. The synonymous (Ks) and non-synonymous substitution rate (Ka) of each NAC paralogous gene and their ratio were calculated to evaluate the selection mode (Siltberg & Liberles, 2002). Ka/Ks calculation server from CBU -Computational Biology Unit was utilized for the purpose (Siltberg & Liberles, 2002). The Ks value calculated for each of the gene-pair was then used to calculate the approximate date of the duplication event 2000). The selection pressure on NAC duplicated genes were evaluated through Ka/Ks ratio and considered purifying, positive and neutral or rare selection when the value of Ka/Ks ratio< 1, > 1 or ¼ 1, respectively (Juretic et al., 2005;Siltberg & Liberles, 2002).

Positive selection analysis
The Selecton 2.2 (http://selection.tau.ac.al) was used to investigate the amino acid under selection pressure within the NAC proteins where CDS sequences of three paralogous NAC genes were given as an input file and one of the gene sequences was used as a reference on which the result was being displayed. Different parameters were of default selection only evolutionary model from the M8 model was changed to the MEC model Stern et al., 2007) in advanced option. Maximum-likelihood test through Bayesian inference method was used to measure the shifted omega ratio between codons within the aligned sequences (Yang, 1997). Various types of selections (positive, negative and neutral) were appeared in the selecton results and evaluated through different scales of color.

Identification and protein characterization of NAC gene family in Oryza sativa japonica
A total of 110 NAC genes from Oryza sativa japonica were extracted from the NCBI database (Suppl. Table S1). Out of which 11 NAC genes were selected from UniProt and NCBI.
The selection was made based on experimental validation (reviewed) at the gene level. In the course of an extensive search for NAC genes, it was found that the 11 genes were non-redundant. These protocols were followed to maintain the quality of the extracted data. After the identification of the reviewed genes, UniProt database was used to identify the protein encoded by a particular gene using the Gene ID extracted from the NCBI database. A total of 11 NAC proteins were identified namely, NAC48, NAC2, NAC71, NAC22, NAC10, NAC45, NAC77, NAC68, NAC76, NAC74 and NAC67. Characteristics of 11 NAC genes of Oryza sativa japonica retrieved from NCBI and UniProt databases are given in Table 1.
The protein sequence of NAC genes ranged from 276 (OsNAC76) to 489 (OsNAC74) amino acid residues with a molecular weight (MW) range of 30.86-54.69 kDa. The average length of all 11 NAC members was 345 amino acids with an average MW of 38.13 kDa. Moreover, 27.27% (3 genes out of 11) OsNAC genes shared comparatively high pI values (more than eight) whereas 18.18% (2 genes out of 11) OsNAC has lowest pI value (less than six). The predicted subcellular location indicates that 5 proteins mainly occupy the nucleus; another 5 occupy the cytoplasm while the remaining one occupies mainly mitochondrion (Table 1). NAC74 and NAC48 hold the lowest and highest pI values of 5.19 and 8.95, respectively.
It was found that 11 of the NAC proteins had a common family of no apical meristem (NAM) domain. It is an N-terminal module of $160 amino acids and is found in proteins of the NAC family of plant-specific transcriptional regulators (Aida et al., 1997). NAC proteins are involved in developmental processes, including the formation of the shoot apical meristem, floral organs and lateral shoots as well as in plant hormonal control and defense (Duval et al., 2002;Xie et al., 2000). The average length of all the domains had 129 amino acids (approximately). The detailed analysis for domain lengths has been tabulated in Table 1. A total of 10 conserved motifs were analyzed using the 11 protein sequences of NAC genes. The schematic distribution of these 10 motifs among NAC proteins are shown in Figure 1. The multilevel consensus sequence for the motifs is shown in Suppl. Table S2. The closely related NAC members as represented in the phylogenetic tree revealed common motif distribution, suggesting functional similarities among the NAC proteins. Some similar proteins consisted of common motif profiles (e.g. NAC045/77, NAC048/71 and NAC067/2) but differences are also observed in motif composition in other proteins. 2 NAC genes i.e. NAC45 and NAC7 had all the nine motifs leaving the tenth motif, while NAC22, NAC10, NAC76 lacked motif 7 that is common to all the 11 NAC genes with motif 1, 2, 3 were common to all the genes. It was also observed that the motif distribution of the paralogous gene pairs as identified in Figure 7 had maximum  similarity. The motif distribution also highlights that the NAC genes in Oryza sativa japonica are supposed to be conserved during the course of evolution.

Gene expression analysis of 11 NAC genes from Oryza sativa japonica
The Rice Expression Database (RED) revealed FPKM (fragments per kilobase per million) expression of 11 OsNAC genes and the log2 transformed heat-map of FPKM values and loci ID were presented Figure 2. It was observed that LOC_Os06g33940 and LOC_Os07g37920 had minimum expression in almost all the tissues, while LOC_Os01g66120 showed the maximum expression level for the tissues, especially for the leaf. LOC_Os01g66120 and LOC_Os11g08210 being paralogous (have been studied in the next subsection) to each other showed the maximum expression amongst all the 11 NAC genes, with the former being the most expressive one. Overall, leaf tissues were identified to have highest expression of NAC genes. Among the 11 identified NAC genes 72.7% (8 out of 11) have been identified to have higher expression in leaves compared to other tissues indicating their role in the physiological and developmental processes in leaves.
To corroborate the results of the heatmap, furthermore, the box-plot has been constructed for each of the OsNAC genes (Suppl. Figures 1-6) . Table S3). However, unlike NAC48, NAC71 demonstrated higher expression in root as well (maximum, minimum and median FPKM was 183.364, 18.6923 and 67.1224 respectively). Panicle tissue, shoot and anther also showed lower expression of NAC71 with lower FPKM values. The striking similarities between NAC48 and NAC71 might be because they are identified as paralogous genes (Figure 7), with common node in phylogenetic tree ( Figure  3) and had similar intron/exon structure (Figure 4), presented later in this study. This observation, thus extablishes a relationship between tissue-specific expression of genes and it's evolutionary relatedness and intron-exon structure.
LOC_Os11g03370 i.e. NAC45 has comparatively lower tissue-specific expression than other NAC proteins (less than 10 FPKM). It demonstrated higher FPKM values in leaf (maximum FPKM: 8.25702), anther (maximum FPKM: 3.08442), root (maximum FPKM: 5.51044) and callus (maximum FPKM: 9.79774) suggesting that this gene is over-expressed in these tissues (Suppl . Table S3). Its expression is minimal in panicle, pistil, aleurone and seed. LOC_Os12g03050 i.e. NAC77, which was identified as the paralogue of NAC45 (Figure 7) in later section of this study has much higher tissue level expression than NAC45. It showed higher FPKM values in aleurone (maximum FPKM: 24.391) and leaf (maximum FPKM: 78.0282) suggesting that this gene is expressed in these tissues (Suppl .  Table S3). Unlike NAC45, anther and callus also showed lower FPKM values, where NAC77 is minimally expressed. In addition like NAC45, NAC77 showed minimal expression in panicle, pistil and seed. Although being paralogous genes, NAC45 and NAC77 showed difference in intron-exon structure (Figure 4). These changes in the gene arcitechture might have a relation with differences in tissue level expression of NAC45 and NAC77.
LOC_Os03g60080 i.e. NAC2 has been over-expressed in leaf (maximum FPKM: 149.982), callus (maximum FPKM: 191.213) and aleurone (maximum FPKM: 304.861) (Suppl .  Table S3). Like NAC45, NAC2 too has the highest expression in callus. The box plot also suggests that root has outliers Figure 2. The heat-map of 11 OsNAC genes was prepared. The FPKM (fragments per kilobase per million) values were log2 transformed and mean centered by genes using the scale -4 for minimum expression (blue shade) and þ4 (red color) for maximum expression to make the heat-map visible for each gene expression.
Color key represents the relative transcript abundance of the NAC genes.
having FPKM values at 1615.14 suggesting that NAC2 gene has fluctuating expression in roots are higly expressed in roots in some instances. Pistil, panicle and anther tissues showed lower expression of NAC2.
Similar to NAC2, LOC_Os07g12340 i.e. NAC67 shows higher FPKM values in aleurone (maximum FPKM: 17.139), leaf (maximum FPKM: 12.8402) and callus (maximum FPKM: 23.6697) (Suppl . Table S3). However, the net expression level of NAC67 is lower than that of NAC2. Like NAC2, the expression of NAC67 in root has significant outliers having FPKM values as high as at 424.223. Also like NAC2, anther, seed, pistil, panicle and shoot showed lower FPKM values where it suggests this gene is minimally expressed in these tissues. The similarities of the trend of expression profiles of NAC67 and NAC2 might be because these two genes were identified as paralogous genes (Figure 7) and evolutionary related ( Figure  3) later in this study.      .  Table S3). Aleurone, pistil root and shoot had minimal expression with very low FPKM values. Another NAC gene with lowest overall expression is LOC_Os06g33940 i.e. NAC76. It demonstrated highest expression in callus (maximum FPKM: 6.34176) followed by slight expression in root and panicle. Aleurone, anther, pistil and seed showed negligible FPKM values where it suggests this gene is minimally expressed in these tissues.
These NAC genes has been known to have diverse roles in plant growth and developmental processes. Thus the tissue-specific expression analysis might be helpful to explore the biological functions of NAC genes towards the growth and development of different tissues in Oryza sativa japonica.

Phylogenetic relationship and structural investigation of NAC gene family in Oryza sativa japonica
Inspection of the phylogenetic tree topology revealed several pairs of NAC gene with a high degree of homology in the terminal nodes, suggesting that they are paralogous pairs. A total of 3 pairs (NAC45/77, NAC48/71 and NAC67/2) of paralogous NAC genes were identified as shown in Figure 3 Hence, these pairs of paralogous genes supported the hypothesis that they might have been evolved from a genome duplication event.
The evolutionary analysis using gene structural diversity of multi-gene families is an essential component. So, a complete structural study of OsNAC genes was performed through the construction of exon-intron map. The difference in the composition of exon number was observed amongst 11 OsNAC genes. It was found that the exon-intron organization was used as supporting evidence for evolutionary relationships among genes or organisms (Koralewski & Krutovsky, 2011). The position of exons/introns and intronic Figure 5. Multiple sequence alignment of the NAC domain among 11OsNAC proteins. Strongly conserved residues are indicated above the alignment with asterisks, colons indicate the residues variation occurring within the strongly conserved groups and dots indicating the residues variation occurring within weaker conserved residue groups. phase distribution are important characteristics for gene structure analysis. To get an insight into the possible mechanisms of structural evolution of NAC genes in Oryza sativa japonica, comparison of the exon-intron structures of 11 reviewed NAC genes was performed. The distribution and position of introns within each of the NAC gene family is shown in Figure 4 and Table 2. The number of introns ranged from 0 to 3 in NAC genes. NAC74 has a maximum of three introns. Two of the NAC genes (i.e. NAC2 and NAC77) have one intron, while a total of 7 NAC genes possess two introns. The paralogous gene set, NAC71 and NAC48 have similar gene structure with 2 introns and 3 CDS, each. On the contrary, other two pairs of paralogous genes, i.e, NAC45 and NAC77 as well as NAC2 and NAC67 have a different type of gene structure. In the previous studies by Xu et al. (2012) and Guo et al. (2013), it was studied that the differences or gain-loss of introns or exons generally occur due to chromosomal rearrangement. In this study, most importantly, NAC67 was observed to have no introns while its paralogous NAC2 was found to have an intron. This might have occurred due to the rearrangement of their chromosomes. However, variation in the introns and exons were also seen to have occurred in the majority of the NAC genes.
It is well known that the evolutionary closer members of a gene family sharing common node in phylogenetic tree revealed usually have similar coding sequences and exon-intron structures (Xu et al., 2012). In our study, NAC 48 and NAC 71 shared common node in phylogenetic tree ( Figure 3) and had similar intron/exon structure (Figure 4). This pair was also identified as a paralogous pair in later analysis. However, many other evolutionary closer genes were observed to have differences in their intron-exon topology. This might be due to the accumulation of mutations in the gene sequences over the evolutionary period that altered the intron-exon structures.
Introns are considered to be important and necessary subassemblies of eukaryotic genes as their loss or gain generates structural diversity and complexity which leads to the evolution of multiple gene families. Intron's potential functions serve as an origin for regulatory elements, role in exon shuffing and alternative splicing, and also serve as a source for the signals to export mRNA from the nucleus (Fedorova & Fedorov, 2003) (Le et al., 2003). It has also been studied that introns not only tend to increase the intragenic recombination but also alter the evolutionary rate of the genes. (Roy & Gilbert, 2006). In addition, growing evidence has depicted a functional connection between gene expression and introns (Castillo-Davis et al., 2002).
To check the evolutionary conservation among the OsNAC domains, multiple sequence alignment of the domains was done. The multiple sequence alignment revealed an overall high conservation in the NAC gene sequences, among which 39 amino acids were 100% conserved in all the sequences. The conserved amino acids include several basic amino acids like lysine, arginine and histidine ( Figure 5), which supported with the previous documentation, which stated that NAC domains were rich in basic amino acids (R, K and H) as a whole (Kikuchi et al., 2000). The high level of conservation also supported the observation that proteins have an overall conserved motif architechture. Apart from fully conserved regions, there were many instances were an amino acid residue is substituted with another residue of similar properties. Some of such substitutions observed include D!E, I!V, L!V, I!L and K!R. These substitutions preserve the overall properties of protein and therefore might not disrupt protein's structure or function. On the other hand, some amino acid substitutions were also evident in NAC domains, which has potential to alter overall protein's physicochemical properties. Notable substitutions include K !.T (at alignment position 9), A!C (at alignment position 49), K/R!N/Q (at alignment position 57), N!T (at alignment position 71) and G!A (at alignment position 119), which can alter overall charge and its corresponding electrostatic interactions in the domain and A ! C substitution (at alignment position 49) having capability to induce new disulfide bridges in the protein domain. Future experimental studies could help to comprehend the role of these substitutions in bringing structural and functional diversity in NAC proteins. Takada et al. (2001) showed that NAC domain comprises a missense mutation which leads to K ! T substitution. This K ! T substitution generally affects the nuclear localization, binding of DNA or the structural integrity of the NAC domain by removing a conserved salt bridge (Takada et al., 2001). They also showed that Leucine to Phenylalanine substitution leads to conformational stress and instability of the NAC domain by adding a bulkier residue in a firm hydrophobic cavity (Takada et al., 2001). Similar K ! T substitution was also studied in this present work which may also alter the structural integrity of the NAC domain

Osnac gene distribution on chromosome and their evolutionary analysis
To determine the genomic distribution of the OsNAC genes, the chromosomal locations of these 11 reviewed NAC genes were mapped using Chromosome Viewer (Oryzabase tool) (Kurata & Yamazaki, 2006) and crosschecked using Chromosome Viewer (Rice Expression Database tool). The result showed that the distribution of these OsNAC genes among most of the twelve chromosomes except chromosome number 2, 4, 5, 8, 9 and 10. Chromosome 1 has three genes NAC74, NAC68 and NAC48 and chromosome 3 has 2 genes i.e. NAC22 and NAC2. NAC67, NAC10 are present in chromosome 7, while NAC45, NAC71 are present in chromosome 11. Chromosome 6 and 12 have NAC76 and NAC77 respectively as shown in Figure 6. Gene duplication is mainly responsible for the development of differences in the functionality in the gene family (Kong et al., 2007). Based on the phylogenetic tree analysis (as discussed in subsection 3.3), a total of 3 pairs of paralogous genes were identified in the terminal nodes of phylogenetic tree topology constructed using MEGA10 . The location of genes is generally assumed to be either on the same chromosome or on different chromosomes, suggesting that tandem duplication and segmental duplication together might have contributed to  the expansion of a particular gene family. As shown in Figure 6 three pairs of the paralogous genes were randomly scattered throughout the genome and were supposed to result from the segmental duplication event. The genes duplication pattern revealed segmental duplication to be predominant in 11 NAC genes of Oryza sativa japonica as shown in Table 3. In this study, a total of more than 54% of the genes are paralogous and all the genes are segmentally duplicated.
The selection process history of the coding sequences can be predicted through Ka/Ks ratio (Li et al., 1981). So to understand how the deviation of duplication event had occurred, the Ka, Ks and Ka/Ks ratios were determined for the paralogous gene. The date of duplication events were calculated using Ks. Now that Ks values are used in the formula of duplication event (T ¼ Ks/2k), where k ¼ 6.161029 Â 10 À 9 (Lynch & Conery, 2000). As two sets of Ks values were found for each paralogous gene so two duplication event (T) were found as shown in the Figure 7. The segmental duplications of the OsNAC genes for node 1 are 10.58 Mya (Ks ¼ 0.1304) and 12.05 Mya (Ks¼ 0.1485), node 2 are 20.72Mya (Ks ¼ 0.2553) and 21.70 Mya (Ks ¼ 0.2674) and node 3 are 11.5 Mya (Ks ¼ 0.1417) and 15.81 Mya (Ks ¼ 0.1948). The mean mya for node1, node 2 and node 3 are 11.315, 13.655 and 21.21, respectively as shown in Table 3. The selection pressure can also be determined through Ka/Ks ratio where the value of Ka/Ks < 1, > 1 or ¼ 1 indicates that the paralogous pair were under purifying selection, positive selection, and neutral or rare selection, respectively (Juretic et al., 2005). The result revealed from Ka/Ks ratios were substantially lower than 1, suggesting that the OsNAC sequences within paralogous genes are under strong purifying election pressure and that positive selection might have acted only on a few sites during the evolutionary process.
The conservation in the evolution of amino acid position is very important to maintain the protein structure and function. The recognition of selected amino acids may help us to understand the selection pressure which would be easy to comprehend the importance of selected amino acid for NAC protein interaction. Thus, the aim of Selecton server is to calculate the synonymous rate (Ks) and the non-synonymous rate (Ka), at each codon site., These evolutionary rate of genes are normalized in a such a condition that in the instance of no selection, Ka/Ks ¼ 1. In the case of purifying selection, Ka decreases, and thus, Ka/Ks < 1 is indicative of purifying selection. The positive selection test on NAC proteins through MEC model estimated the selection pressure at the specific coding region where codon-based alignment for the sequences is performed, x values at every site is calculated, these ratios are translated into selection scores and are projected onto the primary or tertiary sequence of the protein. Thus, visual analysis of blocks or patches of sites having same x values is allowed. It is important to note here that here, x does not stand for positive selection, and it only stands for the intensity of selection (purifying or positive). Ka/Ks may be computed from the values of x (Doron-Faigenboim & Pupko, 2007; Goldman & Yang, 1994). The difference in the rate of substitution for the coding regions was taken and their selection pressure was determined at various regions of OsNAC protein which indicates the positive selection ( Figure 8). About 10.63% amino acids showed positive selection while the rest of the amino acid was under purifying process as shown in Figure 8. The result clarified the evolution of all gene pairs under the effect of purifying selection.

Cis-Regulatory region analysis in NAC genes
Cis-regulatory elements are regulatory non-coding DNA sequences which used to regulate neighboring gene sequences. Cis-regulatory elements control different biological processes. To explore the function of 11 NAC genes from Oryza sativa japonica cis-regulatory elements of each NAC gene was analyzed using PlantCARE database. The cis-regulatory elements of NAC genes from Oryza sativa japonica were mainly associated with light, hormone, other environmental stresses and seed specific gene expression. A variety of possible Cis acting regions were identified in NAC genes in this study. The 11 NAC cis-regulatory regions contain elements show a majority, that is, 58% responsiveness to other environmental stresses, out of which, again 75% belongs only to light responsiveness (Figure 9). Next to environmental stresses, follows the hormone responsiveness with 32% elements, where almost half, viz., 45% is due to MeJA. MeJA is known to be responsible for flowering, coiling of tendrils, seed and fruit maturation. Other elements occupied physiological responsiveness and responsive to protein binding ( Figure 9). The dominance of the light responsiveness strongly affirms the plausibility of the genes to be associated with the photoperiodic control for flowering. NAC74 gene shows the maximum responsiveness for hormone, protein binding and environmental factors amongst the 11 NAC genes identified in this study, while NAC71 showed the maximum responsiveness for physiological and hormonal stress factors (Suppl. Table S4). The paralogous genes, NAC2 and NAC67 having the maximum value for their respective mya values (Table 3) showed maximum difference amongst themselves in all the four major categories of responsiveness (Suppl . Table S2)., NAC67 didn't demonstrated the responsiveness to abscissic acid, while it was observed in 3 units in NAC2. On the contrary, light responsiveness was observed to get increased from 3 in NAC2 to 11 in NAC67 and responsiveness to meristem inducibility was observed to remain absent in NAC2 while present in NAC67. This, therefore, infers that with the evolution (in terms of mya), there was a significant fluctuation in the diversity of the gene regulatory proteins.

Conclusion and future scope
In the present study, efforts have been made to characterize the OsNAC using high-throughput genome-wide bioinformatics tools revealing the putative functional diversity. The comprehensive genome-wide analysis of the OsNAC gene family in reviewed OsNAC genes was performed to reveal the identification of 11 reviewed OsNAC genes, gene location, gene structure, gene phylogeny, conserved motifs, stress-related cis-element, gene duplication, divergence time and positive selection analysis. The multiple sequence alignment of these OsNAC proteins revealed a highly conserved residue of proline, glycine, phenylalanine, histidine, tyrosine, leucine, isoleucine, tryptophan, aspartic acid, lysine, arginine, asparagine, threonine, valine, methionine and glutamic acid associated with specific domain NAC genes. Further, in silico characterization of these 11 OsNAC genes provides insight to understand the structure and biological function of OsNAC genes and the diversity of gene structure in terms of numbers and positions of exon, introns and UTR revealing functional diversity. The gene evolutionary and selection analysis helped to clarify the diversity and extension within the OsNAC gene family. The OsNAC gene regulatory system is poorly understood in Oryza sativa japonica. Further, the study indicates the possibility of segmental duplications of the Oryza sativa subspecies japonica genome. The tissue-level expression analysis might be helpful to explore the tissues in which the NAC genes are over-expressed and under-expressed, which provided cues to the biological role of NAC genes towards the growth and development of different tissues in Oryza sativa japonica. The gene expression analysis identified highest overall expression of NAC genes in leaf and similarities of tissue level expression profile in two of the three identified paragous gene pairs: NAC48-NAC71 and NAC67-NAC2. The phylogenetic analysis using the tree topology of OsNAC genes of Oryza sativa japonica has led to the identification of several paralogs and also gives an insight into the functions based on the close similarity within the genes. Ka/Ks ratio of 3 pairs of the segmentally duplicated gene being lower than 1, suggests that the OsNAC sequences are under strong purifying selection pressure. The cis-regulatory analysis for different stress responsiveness was performed and was seen that NAC74 and NAC71 showed maximum responsiveness. Few features were absent in some, while present in its paralogous pair. Evolutionary and through mya values, the huge chromosome rearrangement in NAC was observed. However, our results regarding genome-wide analysis of OsNAC genes provides a novel outlook about the family for future use. In future, further studies associated with the NAC genes, their pattern of interactions can be instigated.