Assessment of genetic diversity among low-nitrogen-tolerant early generation maize inbred lines using SNP markers

Low soil nitrogen (low-N) level is responsible for yield reduction in maize (Zea mays L.) fields in sub-Saharan Africa. A clear understanding of the genetic diversity among early generation inbred lines selected from various elite low-N- tolerant populations offers an opportunity to obtain lines that could be used in parental combinations to develop high-yielding low-N-tolerant maize hybrids. A total of 115 S3 lines derived from four low-N-tolerant populations were assessed for genetic diversity using 15 670 single nucleotide polymorphism (SNP) markers. The SNP markers were highly polymorphic with polymorphic information content ranging from 0.0 to 0.38. The genetic diversity among the inbred lines ranged from 0.0 to 0.50 and thus indicated the high level of dissimilarity among the inbred lines. The neighbour-joining clustering algorithm and model-based population structure classified the 115 lines into four distinct groups that were generally consistent with the genetic backgrounds of the inbred lines. The information obtained from this study revealed genetic diversity among the inbred lines and may guide the selection of potential parents for detailed combining ability studies and eventual use in hybrid combinations. The selected inbred lines would be invaluable in the development of low-N-tolerant hybrids.

Maize is a staple crop in several African countries yet most countries in the region produce less than what is consumed (Cassman 2007). Average yields of maize in Africa are low (1.5 t ha −1 ) in comparison with the global average (4.9 t ha −1 ) due to a combination of factors, including high pests and disease pressure, and low input, especially fertiliser use (Shiferaw et al. 2011). Fertiliser use in Africa is low with an average of 8 kg ha −1 of nutrients (IFDC 2006;Heisey et al. 2007). In Nigeria, for example, less than 3.3 kg of nitrogen fertilisers are applied per hectare (FAO 2015). Low soil nitrogen (low-N) levels could account for 50% yield reduction in maize (Wolfe et al. 1988;Meseka et al. 2006), and is common in farmers' fields because of total removal of crop residues (Zambezi and Mwambula 1997) for feed coupled with the high cost of fertilisers that restrict farmers to little (Mosier et al. 2005) or no use of fertilisers. Therefore, developing and promoting maize cultivars that perform well under a low-N environment is desirable.
Several low-N-tolerant maize cultivars have been developed and released in Africa (Bänziger et al. 2000;Ajala et al. 2012;Badu-Apraku et al. 2012). Further progress has been achieved by subjecting the populations to various recurrent selection schemes (Bänziger et al. 2000;Ajala et al. 2010Ajala et al. , 2012 and inbred lines have been extracted from the improved populations (Semagn et al. 2012). These low-N-tolerant inbred lines can serve as parents for the development of hybrids and synthetics with superior performance under low-N conditions in sub-Sahara Africa.
Understanding genetic diversity and relatedness among germplasm is important to the success of any breeding program (Nyombayire et al. 2016), because the information will assist in the selection of parents for crossing and aid broadening of the genetic base of adapted maize germplasm (Laborda et al. 2005). However, most breeding programmes in sub-Saharan Africa depend on phenotyping to determine genetic diversity. Furthermore, approaches such as topcross evaluations to preselect desirable lines and line × tester analysis to separate maize inbred lines into heterotic group pairs are often used before subjecting selected lines to detailed combining ability studies. However, including a large number of lines in combining ability studies is cumbersome. Leal et al. (2010) also questioned the reliability of such data, mainly owing to environmental effects, and provided evidence on the superiority of molecular markers that not only provide better details on genetic differences without environmental interference, but also provide faster results. Markers will, in addition to clustering lines into different groups, also suggest useful inter se combinations among lines between groups. DNA markers have thus been used to classify large numbers of lines into similarity groups (Nyombayire et al. 2016;Mengesha et al. 2017) to help eliminate less desirable lines and enable only those expressing the desirable traits that meet the breeding objectives to be evaluated in detailed combining ability studies for eventual use in hybrid combinations (Semagn et al. 2012;Wende et al. 2013).

Introduction
Molecular markers, including random amplified polymorphism DNA, amplified fragment length polymorphism, restricted fragment length polymorphism and simple sequence repeats, have been used to assess genetic distance among genotypes (Garcia et al. 2004;Menkir et al. 2007;Semagn et al. 2012). However, single nucleotide polymorphism (SNP), such as the Illumina ® , MaizeSNP50, BeadChip and 55 K SNP array (Xu et al. 2017) markers, are now being used efficiently because they are highly polymorphic, locus-specific and have potential for highthroughput analysis as well as lower genotyping error rates (Nyombayire et al. 2016;Wu et al. 2016). The objectives of the present study were therefore to use SNP markers to (1) assess the magnitude of genetic diversity among low-N-tolerant early generation inbred lines selected from low-N-tolerant populations and (2) classify them into groups based on genetic relatedness, to aid in effective selection of usable lines for hybrid maize development.

Germplasm
A total of 115 low-N-tolerant S 3 lines derived from four low-N-tolerant populations were selected for the diversity study. Thirty-three of the lines were from Acr9931DMRSRLNSyn, whereas 27, 52 and three lines were developed from 2000SynEE-W, TZPBPROLC 3 LNSyn and 99TZEE-YSTR, respectively. TZPBProlC 3 LNSyn was derived from TZPB ProlC 3 developed at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria from the TZPB (Tropical Zea Planta Baja) population of the Tuxpeno race introduced from the International Maize and Wheat Improvement Center (CIMMYT). Acr9931DMRSR was formed from Pop31, a yellow endosperm population from CIMMYT but converted for resistance to Maize streak virus and downy mildew at IITA. Both TZPB ProlC 3 and Acr9931DMRSR populations were improved for tolerance to low-N through two cycles of recurrent selection under low-N to form TZPBProlC 3 LNSyn and Acr9931DMRSRLNSyn, respectively. 2000SynEE-W and 99TZEE-YSTR are extra-early-maturing white and yellow grained maize cultivars, respectively, developed also at IITA for resistance to Striga hermonthica and tolerance to drought and low-N. The two cultivars were released as 'SAMMAZ 29' and 'SAMMAZ 28', respectively (NACGRAB 2014).
In each population, S 2 lines were selected based on their performance under low-N in replicated trials conducted in 2015 at the developed low-N screening sites in Mokwa and Zaria in the Southern and Northern Guinea Savanna ecologies of Nigeria, respectively, under both low (30 kg N ha −1 ) and high (90 kg N ha −1 ) nitrogen applications. All trials were replicated twice. Two fields were demarcated at each of the two locations for the two N levels as previously described (Ajala et al. 2010). Soil samples taken from 0-20 cm horizon before land preparation at each site revealed total N values of 0.43-9.2 kg ha −1 and 0.5-11 kg ha −1 for the low-N and high-N fields, respectively. Consequently, only the balance of N to make up the required 30 and 90 kg N ha −1 was applied to the low-N and high-N fields, respectively. The best-performing lines from each population were selected using stay green, plant aspect, ear aspect and ears per plant under low-N and grain yield under both low-N and high-N as selection criteria for recombination to improve the population. At the recombination stage of each population, the selected S 2 lines were selfed to generate S 3 lines that were used for the SNP analyses. The designations of the 115 S 3 inbred lines sampled for DNA analyses are presented in Table 1.

Extraction of genomic DNA
Each of the 115 S 3 inbred lines was planted in the field in single row plots of 5 m length. Leaf samples were collected from 10 plants of each line in the row from three-week-old maize plants, bulked and then lyophilised for extraction of genomic DNA from the dried tissues using a cetyltrimethylammonium bromide (CTAB)-based extraction protocol modified by Doyle and Doyle (1990). Quality and quantity analyses of the samples were performed using agarose gel electrophoresis and spectrometric method, respectively.

Procedure for DNA extraction
Leaf tissue (about 0.1 g) of each genotype was ground into fine powder by shaking for 2 mins at a speed of 1 000 strokes min −1 using a GenoGrinder-2000 homogeniser. Freshly prepared modified CTAB extraction buffer (800 µL; 200 mM Tris, pH 7.5; 50 mM EDTA, pH 8.0; 2 M NaCl; 2% CTAB; 1% β-mercaptoethanol) was added to the ground powder in a 1.5 mL extraction tube and incubated at 65 °C in a water bath for 30 min with continuous gentle rocking. The tubes were removed from the water bath, gently tapped and centrifuged at 2 312 ×g for 10 min. About 500 µL of the aqueous phase from each tube was transferred to a new tube, and 500-600 µL chloroform:isoamylalcohol (24:1) was added. The resulting solution was gently mixed and centrifuged at 2 312 ×g for 10 min. The upper aqueous layer from each tube was transferred to a fresh strip tube and the  Table 1: Sources of the 115 S 3 low-N-tolerant maize inbred lines used in the study process repeated. Next, 400 µL of the upper aqueous layer from each tube was transferred to new fresh strip tubes and 600 µL of 100% ice-cold isopropanol (2-propanol) was added to each tube. The tubes were gently inverted about 50 times and then placed in a freezer (−20 °C) for 60 min.
The tubes were then centrifuged at 2 312 ×g for 20 min to form a pellet and the supernatant was discarded. Each pellet was washed with 400 µL of 70% ethanol, centrifuged for 15 min and the ethanol decanted. The process was repeated and the pellets were air dried and reconstituted with double distilled water. The dried pellets were dissolved in 100 µL ultra-pure water. All five reagents (1.0 M Tris-HCl (Sigma Aldrich), 0.5 M EDTA (Sigma Aldrich), 5.0 M NaCl (AppliChem GmbH), mercaptoethanol (AppliChem GmbH) and CTAB (Sigma Aldrich) that made up the buffer, and chloroform isoamyl alcohol (24:1; Sigma Aldrich), 70% ethanol (Sigma Aldrich), isopropanol (Sigma Aldrich) and RNase (Sigma Aldrich) were obtained from Sigma-Aldrich (now Merck KGaA) or AppliChem GmbH, both in Darmstadt, Germany. RNase used was reconstituted by adding the entire content of the bottle, made up of 10 µL RNase in 90 µL water, to 250 µL TRIS buffer and 275 µL of 5 M NaCl, then made up to 25 mL. The solution was then heated at 65 °C for 5 min. The quality of DNA was checked in 1% agarose gel and quantified using a NANODROP spectrophotometer (Thermo Fisher Scientific Inc., Denver, CO, USA). The quality of the DNA for genotyping by sequencing (GBS) was ascertained by digesting the DNA with the restriction enzyme HindIII and samples were sent to the Genomic Diversity Facility of Cornell University for GBS.

Procedure for genotyping by sequencing
Genotyping by sequencing was used to generate genotype data at the Genomic Diversity Facility of Cornell University. The GBS libraries were prepared and analysed as described by Elshire et al. (2011), using the restriction enzyme ApeKI for digestion and creation of a library with a unique barcode for each genotype. Raw reads from the sequenced GBS library were called in the GBS analysis pipeline TASSEL 3.0.147, an extension to the Java program TASSEL (Bradbury et al. 2007), and the filtered sequences were aligned to the maize reference genome B73 REFGEN v1 (Schnable et al. 2009) using the Burrows-Wheeler alignment tool (BWA). A genomic DNA sample of 10 ng µL −1 concentration and 10 µL quantity was dispensed into separate wells of 96-well plates already treated with the barcoded adapter and common adapter. DNA samples were digested with the ApeKI restriction enzyme, a type II restriction endonuclease that recognises the degenerate 5 bp sequence GCWGC, where W can be A or T, and creates a 3 bp overhang. It also has relatively few recognition sites in the major classes of maize retro-transposons and is partially methylation-sensitive. The digested DNA was then ligated to the barcoded adapter and common adapter that were designed to ligate to the sticky ends of the ApeKI cut site. The barcoded adapters are unique 4-to 8-bp sequences that are used to identify individual samples. The adapter-modified DNA samples were then pooled in a single tube by taking 5 µL of each DNA sample and purified for PCR using a Qiagen PCR cleanup kit. PCR was performed using primers that are complementary to the adapter sequences. The PCR-amplified sample pools constitute the sequencing library. Only fragments with the combination of a barcoded and common adapter at either ends of the fragments were amplified. PCR products were purified with the QIAquick ® PCR Purification Kit and sequenced using an Illumina next-generation sequencing platform. The sequence data were subjected to the GBS reference genome pipeline where raw sequencing reads were called and filtered. Filtered SNPs were then aligned to the maize reference genome, thereby permitting data interpretation to provide the SNP positions and their corresponding polymorphisms.
The GBS returned a data set of 1 123 196 SNPs covering all 10 chromosomes of the maize genome. Data filtering was performed using TASSEL 5.0.8 software. Stringent filtering was performed based on the minimum count of 115 out of 115 sequences, and minimum and maximum allele frequencies of 6% and 94%, respectively, to remove monomorphic sites and those with missing data. After filtering, 15 670 markers that met the criteria were used for the diversity analysis.

Statistical analysis
Summary statistics were obtained using PowerMarker 3.25 software (Liu and Muse 2005). For each marker, the allele number, observed heterozygosity, polymorphic information content (PIC) and gene diversity were calculated. Genetic distance estimates were computed using Roger's genetic distance (Rogers 1972), from which a dendrogram was constructed using the neighbour-joining algorithm (Nei 1991). An admixture model-based clustering method was used to validate the population structure of the 115 inbred lines using Admixture 1.3 software (Alexander et al. 2009).

SNP characteristics and genetic diversity
The distribution of the 15 670 SNP loci on the 10 maize chromosomes is presented in Table 2. Chromosome (ch) 1 had the largest number with 2 474 SNP loci and ch 10 the least with 1 080 loci. Additional information obtained from the SNP markers revealed broad genetic diversity among the inbred lines (Table 3) (Table 3). The minimum, maximum and mean genetic distances for all lines combined and for each population are presented in Table 4. The highest mean genetic distance was obtained among lines derived from 2000SynEE-W, whereas the lowest was observed among 99TZEE-YSTR derived lines. The mean genetic distance among lines developed from Acr9931DMRSRLNSyn was 0.20 higher than the mean value of 0.18 obtained for TZPBPROLC 3 LNSyn.

Cluster analysis
The dendrogram generated using the neighbour-joining clustering algorithm based on SNP data classified the 115 inbred lines into four major clusters (Figure 1). Generally, inbred lines from the same source population were grouped together, indicating that the marker-based grouping was consistent with pedigree-based grouping. All of the 52 lines extracted from TZPBPROLC 3 LNSyn were contained in Cluster I, whereas 31 of the 33 lines derived from Acr9931DMRSRLNSyn constituted Cluster II. Cluster III consisted of all 27 lines developed from 2000SynEE-W, whereas Cluster IV accommodated the three lines from 99TZEE-YSTR and two additional lines derived from Acr9931DMRSRLNSyn.

Population structure
The population structure of the 115 lines generated by the Admixture software is presented in Figure 2. Admixture allows for a cross-validation procedure in which the value of K at which the model has the best predictive accuracy can be identified. The ideal K value is always associated with the lowest cross-validation (CV) error rate (Rabbi et al. 2015) and corresponds with the number of identified groups. The lowest CV error rate was obtained at K = 4, indicating that the 115 lines fell into four distinct groups based primarily on their pedigrees. All 52 lines derived from TZPBPROLC 3 LNSyn were in group 1, whereas group 2 comprised lines developed from 2000SynEE-W. The lines obtained from 99TZEE-YSTR were clustered in group 3 and those from Acr9931DMRSRLNSyn were assigned to group 4. Thus, the groupings were generally consistent with that of the neighbour-joining clustering algorithm and aligned with the pedigree information of the low-Ntolerant inbred lines.

Discussion
The development and classification of stress-tolerant maize inbred lines into heterotic groups is crucial to the development of hybrids and synthetics with superior performance under stress conditions. To appropriately test inbred lines, relevant questions to answer include what level of inbreeding is necessary to stabilise expression of traits of interest and at what generation of inbreeding should lines be tested? It is hypothesised that lines attain their individuality early in the inbreeding process, but it is equally desirable to know whether the performance of lines in early generations of inbreeding is highly predictive of their performance in later generations as this will drastically reduce the workload and expense required during the inbreeding process, because lines that do not perform well early enough would have been eliminated.   Simulated results of Obaidi et al. (1998) suggested that effective visual selection can be made for several traits during inbreeding, but the ultimate value of an inbred line is determined by its performance in hybrid combinations. Rapid elimination of lines at the S 2 or S 3 stages can be achieved with the topcross test resulting in only the selected lines being advanced in subsequent generations of inbreeding until they are near-homozygous at the S 5 to S 6 generation. Topcrossing early at the S 2 or S 3 stage is premised on the fact that the level of homozygosity at these stages should be 75% or 87.5%, respectively, stages considered adequate for expression of useful characters. However, topcross evaluations will not delineate the lines as would a line × tester analysis that separates lines into heterotic groups. Markers would not only cluster lines into different groups but will also suggest useful inter se heterotic combinations among lines between groups.
The diversity of the inbred lines was assessed using both neighbour-joining clustering and model-based population structure analyses for comparison because the two approaches provide a clearer picture of genetic relatedness among different lines in various crops (Kahn et al. 2015;Bedoya et al. 2017;Kaur et al. 2017). The two techniques revealed four distinct groups and placed each inbred line into these groups, thus confirming the superiority of markers in structuring the lines. That two of the lines 99 TZ EE 3 9 9 T Z E E 1 9 9 T Z E E 2 A C R 9 9 3 1 1 1 A C R 9 9 3 1 1 0 C 3 L N 2 C 3 L N 1 C 3 L N 4 7 C 3 L N 4 6 C 3 L N A C R 9 9 3 1 3 3 A C R 9 9 3 1 3 2 A C R 9 9 3 1 3 1 A C R 9 9 3 1 2 A C R 9 9 3 1 1 A C R 9 9 3 1 2 5 A C R 9 9 3 1 2 9 A C R 9 9 3 1 2 8 A C R 9 9 3 1 3 0 A C R 9 9 3 1 2 7 A C R 9 9 3 1 2 6 A C R 9 9 3 1 1 7 A C R 9 9 3 1 1 6 A C R 9 9 3 1 1 8 A C R 9 9 3 1 1 9 A C R 9 9 3 1 2 0 A C R 9 9 3 1 9 A C R 9 9 3 1 2 4 A C R 9 9 3 1 2 3 AC R 99 31 12 AC R 99 31 7 ACR 9931 6 ACR 9931 8 ACR 9931 4 AC R 99 31 3 AC R 993 1 15 A C R 9 9 3 1 1 4 A C R 9 9 3 1 1 3 A C R 9 9 3 1 2 2 A C R 9 9 3 1 2 1 A C R 9 9 3 1 5  Figure 1: Neighbour-joining tree generated from the genetic data of the 115 S 3 low-N-tolerant inbred lines developed from various source populations of maize from Acr9931DMRSRLNSyn were grouped with those from 99TZEE-YSTR suggests similar genetic descent of the five lines in the group and strengthens the use of markers to truly determine levels of relatedness of genotypes.
While several interacting stress factors limit maize yields in West and Central Africa, the four populations from where the lines were extracted were known a priori to have come from different genetic backgrounds, but conversion of the populations for resistance to one or more of the prevalent stresses could have altered the genetic make-up of the populations, resulting in lines from 99TZEE-YSTR being grouped with two from Acr9931DMRSRLNSyn. Although all genotypes intended for use in the region and bred by IITA and its partners are resistant to the Maize streak virus, a biotic stress factor found in all agro-ecologies of the region, other important stresses, namely drought and low soil N, are equally endemic in all agro-ecologies. Striga infestation and downy mildew infection are more pronounced in the savanna and forest agro-ecologies, respectively. Therefore, a different combination of lines from the four populations with resistance to Striga, drought and low-N tolerance will ensure stability of yield in the savannas, whereas genotypes resistant to downy mildew, and tolerant of low-N and drought will be more important for the forests and forest transition zones.
Diversity analysis provides an additional approach to combining ability studies to allow selection of suitable parents for hybrid combinations. In the present study, low-N-tolerant S 3 lines were grouped based on genetic relatedness and can thus set the limit for the total number of lines to select within each group for advancement to later stages of inbreeding and fixation as inbred lines. Lu et al. (2009) noted that lines within a group or subgroup show high genetic similarity, whereas crossing of lines between divergent groups or subgroups generally produces better-performing hybrids as opposed to crossing of lines from within groups. Visual elimination of lines during the inbreeding process in stress-prone environments will thus allow identification of more tolerant sublines from those selected for fixation to homozygosity.
Assessing genetic diversity without detailed studies of the combining ability as well as per se performance evaluations of the selected inbred lines limits the usefulness of the assessed line as potential parents of hybrids and synthetics. Twenty-one of the inbred lines consisting of eight from Acr9931DMRSRLNSyn, eight from TZPBPROLC3LNSyn, four from 2000SynEE-W and one from 99TZEE-YSTR selected through continuous selfing under low-N, are currently at the S 7 stage of inbreeding and will be exploited as potential parents for the development of low-N-tolerant hybrids through the generation of crosses for detailed combining ability studies. Studies conducted by Badu-Apraku et al. (2009) suggest that selection for Striga tolerance improved tolerance to drought and low-N, whereas a spill-over effect was reported by Bänziger et al. (1999) and Meseka et al. (2006) for low-N environments with selection for drought stress. Therefore, although the lines were selected primarily for tolerance to low-N, evaluating the generated crosses in environments affected by drought will further provide opportunities for identifying multiple-stress-tolerant crosses. Integrating pedigree information with combining ability studies and diversity classification from SNP-based markers would assist in defining heterotic groupings for future maize breeding efforts in West and Central Africa.

Conclusion
In conclusion, observations reported in this study confirm the presence of high genetic diversity and distinct clustering of the early generation inbred lines of maize. The lines were grouped based on genetic relatedness and can thus set the limit for the total number of lines to select within each group for advancement to later stages of inbreeding and fixation to homozygosity based on their reactions to low-N stress. The selected lines can then be used for detailed conduct of combining ability tests targeted for generating maize hybrid combinations and synthetics for use in low-Nstress environments. Integrating pedigree information with combining ability studies and diversity classification from SNP-based markers would assist in defining heterotic groupings of the selected maize inbred lines.