Genetics of seed protein and oil inherited from “BARC-7” soybean in two F2-derived mapping populations

ABSTRACT Soybean [Glycine max (L.) Merr.] seed protein inheritance has been extensively studied; however, genetics of high-protein “BARC-7” soybean are still unknown. In this study, we used 250 F2-derived lines from each of two soybean populations for quantitative trait loci (QTL) mapping. UA 5814HP, with high-protein content, tracing to BARC-7 as maternal grandfather, was a common parent. Field experiments were conducted using a randomized complete block design with one replication across four environments. Seed protein and oil were quantified using near-infrared (NIR) instrument. Genetic linkage maps were constructed using the Infinium Soy6KSNP Beadchips. QTL analysis was performed using a composite interval mapping method. QTL for protein and oil were identified on chromosomes 6, 13, and 20. The known major QTL on chromosome 20 was not detected, but a novel QTL further downstream on chromosome 20 (only detected in population two) had high-protein alleles inherited from BARC-7-derived parent. Fine mapping efforts are currently ongoing for confirmation of these results.

recently given a significant attention to quality traits, such as high protein or modified oil content (Lee et al. 2019;Carneiro et al. 2020;Singh et al. 2020). Soybean seeds contain approximately 40% protein and 20% oil (Clemente and Cahoon 2009); however, there is a negative correlation between yield and protein content in soybean (Novikova et al. 2018;Lee et al. 2019;Sobko et al. 2020;Finoto et al. 2021), and between seed protein and oil content (Lee et al. 2019). Broad-sense heritability of protein and oil content in soybean is relatively high, ranging between 0.57 and 0.97 (Chung et al. 2003;Panthee et al. 2005;Jain et al. 2018;Tian et al. 2020;Jiang et al. 2020;Zhang et al. 2021;Arnold et al. 2021).
As many as 248 QTL associated with seed protein and 320 with seed oil have been reported in Soybase (Grant et al. 2010). Some QTL for protein and oil content were detected at the same position, suggesting either closely linked QTL or QTL with pleiotropic effect on both traits. Nonetheless, according to the rules established by the Soybean Genetics Committee (error rate lower than 0.01 and confirmation study showing alleles at the same locus are segregating in all the test populations), only two of those marker associations are accepted (Fasoula, Harris, and Boerma 2004;Nichols et al. 2006). Two protein QTL, located on chromosomes (Chr.) 15 and 20, were commonly identified in several studies. Of these, QTL on Chr. 20 was considered a major QTL with the highest proportion of phenotypic variance explained. The region of the interval on Chr. 15 is between 10 and 30 centimorgan (cM), whereas that on Chr. 20 is between 20 and 40 cM (Grant et al. 2010). Warrington et al. (2015) used a recombinant inbred-line population derived from the cross Benning/Danbaekkong, and mapped a major protein QTL on Chr. 20 carrying the Danbaekkong allele that explained 55% of phenotypic variation in protein content. Leffel (1992) released a series of high protein soybean germplasm lines, including BARC-6 (Reg.no.GP-127, PI 555396), PI 555397), PI 555398) and PI 555399). Among these, BARC-7 was derived from the cross of CX797-21/ D80-6931. D80-6931 is a high protein maturity group (MG) VI BC3 line, in which PI 86490 was the high protein donor parent and "Tracy" was the recurrent parent. BARC-7 is a MG IV germplasm line with purple flowers, tan and brown pods, and determinate stem growth habit. The mean seed protein of BARC-7 is 491 g kg −1 (Leffel, 1992). BARC-7 was a parent used in the breeding program at the University of Arkansas System Division of Agriculture, and had progeny with high seed protein levels, including "UA 5814HP" ) and "R11-7999" (Florez-Palacios et al. 2020). However, the genetic architecture of protein and oil content in many BARC-7-derived soybean elite lines is unknown. Therefore, the goal of this study was to perform QTL mapping for seed protein and oil content in two breeding populations that traced high seed protein to BARC-7 soybean germplasm line.

Plant materials and phenotyping
Four initial crosses (UA 5615C/UA 5814HP, UA 5115C/UA 5814HP, R13-359/UA 5814HP, R13-532/UA 5814HP) between high-yielding lines and the high-protein cultivar UA 5814HP were made at the Milo J. Shult Agricultural Research and Extension Center in Fayetteville, AR, in 2017. "UA 5814HP" ) is the progeny of "R95-1705" (high protein) and "S00-9980-22" (regular protein content); "R95-1705", in turn, is the progeny of "Hutcheson" (regular protein) and "BARC-7" (high protein) (Leffel 1992). On the other hand, "UA 5615C", "UA 5515C" (Florez-Palacios et al. 2019), R13-359, and R13-532 are commodity MG5 soybean varieties and lines with regular seed protein levels. A total of 13, 19, 20, and 26 F 1 seeds of the four populations, respectively, were sent to Costa Rica during the winter of 2018, and bulk harvested by population. Approximately 800 seeds of F 2 generation for each population were then planted in 8 rows of 4.6 m length during the summer of 2018 in Fayetteville, AR. All parental lines were screened for marker polymorphisms using the Infinium Soy6KSNP Beadchips (Song et al. 2020) (data not shown). Based on parental polymorphism and agronomic field adaptation, two populations, UA 5115C/UA 5814HP (Pop1) and R13-532/UA 5814HP (Pop2), were selected for QTL analysis. A total of 250 F 2 plants for each selected population were randomly selected and individually harvested for generation advancement. A sample of 50 to 100 seeds per F 2:3 line was sent to a winter nursery (Costa Rica) in 2019 to advance the population and increase the number of seeds via bulk harvesting. Twohundred fifty F 2 -derived lines from each population were planted in four environments (location-year combination) for phenotyping, using a randomized complete block design (RCBD) with one replication. Environments included Upala, Costa Rica (inceptisol soil order -series unknown) in 2018 (18CR); Portageville, MO (Tiptonville silt loam soil (19MO)), and Rohwer, AR (Sharkey and Desha silt loam soils (19RO)) in 2019; and Fayetteville, AR (Captina silt loam soil in 2020 (20FA)). Plots (single rows in 19CR, and two-row plots for all other environments) were 0.76 m apart with 4.6 m long with a 1.5 m alley. Entries within each population were randomly divided into four experiments, and each experiment had six checks (UA 5115C, R13-532, UA 5814HP, P53A67X, AG55X7, and AG56X8). A sub sample of 50-seed from each line was used for protein and oil estimation via Near-Infrared Spectroscopy using a DA 7250 NIR analyzer (Perten, Sweden).

Genotyping and QTL mapping
DNA was extracted from fresh young leaves using the hexadecymethylammonium bromide (CTAB) protocol. Genotyping was done using the Infinium Soy6KSNP Beadchips (Song et al. 2020) in the Soybean Improvement Laboratory USDA-ARS, Beltsville.
Data were analyzed using analysis of variance (ANOVA) and treating environments as replications. Genotypes were treated as a fixed effect in JMP 16.0. The statistical model for the analysis was: where y ij is the mean response (protein content or oil content) associated with the ith genotype in the jth environment, μ is the overall mean of protein or for oil content, g i is the genotype effect (fixed effect), b j is the environmental effect (fixed effect), and ε ij is the experimental error associated with the ijth observation. Pearson correlation between protein and oil content across environments was computed using JMP 16.0. Broad-sense heritability was estimated using the following equation: where h 2 the is broad-sense heritability, σ 2 g is the variance of the genotype, and σ 2 e is the variance of error, and l is the number of environments. Linkage genetic maps were constructed using JoinMap v.4.1 (Kyazma B.V., Wageningen, Netherlands). The segregation distortion for single nucleotide polymorphisms (SNPs) was analyzed through a chi-square test. A total of 1,423 and 1,115 polymorphic markers for Pop1 and Pop2, respectively, were used to construct the genetic map. The genetic distance was estimated using the Kosambi mapping function to address the inference. Based on the recombination frequencies, 24 linkages were created for Pop1 and Pop2, representing the 20 haploid chromosomes in the soybean genome.
QTL analysis was conducted through WinQTL cartograph v2.5 (Wang, Basten, and Zeng 2012). Composite interval mapping (CIM) was the statistical model for the QTL search and to estimate the magnitude of their effects and their phenotypic variances. Cofactors were added on a backward regression analysis to increase the likelihood of finding a QTL. Genomic regions with a LOD (log-likelihood) >3 were considered significant QTL (Brody 2019).

Trait and environment correlation, and heritability
Pearson correlation analysis showed a highly significant negative correlation (p < 0.0001) between seed oil and protein content varying from r = −0.545 to −0.775 for Pop 1 and r = −0.587 to −0.655 for Pop2 ( Figure 1). Additionally,  correlation of protein levels among environments was found be significant (p < 0.05) and moderately positive, ranging from r = 0.28 to 0.47 and r = 0.27 to 0.47 for Pop1 and Pop2, respectively (Figure 2). Similarly, correlation of oil levels among environments was positive, ranging from 0.23 to 0.43 for Pop1 and from 0.30 to 0.54 for Pop2 (Figure 3).
The broad-sense heritability for protein and oil content (h 2 ) was 91.24% and 90.33% for Pop1, and 93.44% and 93.91% for Pop2, respectively. These results indicate that soybean seed protein and oil contents were highly heritable and mainly influenced by genetic factors under our experimental conditions (Supplementary Table S1 and S2).

QTL analysis of seed protein
Analysis of QTL associated with protein content in individual environments for Pop1 showed eight QTL detected on six chromosomes (Gm03, Figure 2. Pearson correlation analysis of seed protein among 250 F 2 -derived lines from UA 5115C/UA 5814HP (a) and R13-532/UA 5814HP (b). The shaded area represents the 95% confidence interval for the correlation. Gm06, Gm13, Gm15, Gm16, and Gm18). Of these eight QTL, one was identified in 19RO, one in 19MO, two in 20FA, and four in 18CR (Supplementary Table S3). These QTL had absolute additive effect that ranged from 0.15 to 0.95, and explained 4% to 15% of the phenotypic variation. Of the QTL, a region on Chr. Gm03 covering a confidence interval of 28-49 cM was observed both in 18CR and 20FA environments, albeit the actual SNP closest to peak was different. In addition, two nearby regions in Chr. Gm13 were associated with protein content in 19RO and 18CR (Supplementary Table S3). All other QTL were not consistent across environments. In an across-environment analysis, three QTL were identified on Chr. Gm06, Gm13, and Gm18, with an absolute additive effect of 0.42, 0.36, and 0.31, and explaining 12%, 9%, and 7% of phenotypic variation, respectively ( Table 3). The negative value of additive effects −0.36 (Chr.13 for Pop1) and −0.24, −0.34, −0.45, −0.11, −0.50 (Chr. Gm04, Gm05, Gm13, Gm16, and Gm20, correspondingly) for Pop 2 described in Table 3 indicated that favorable alleles for increasing protein were from UA 5814HP, except for the QTL on chromosomes Gm06 and Gm18.
In Pop2, for protein content, individual-environment results showed 14 QTL located on 10 chromosomes. Two QTL were found in the environments 18CR and 20FA, whereas four QTL were identified in the environment 19RO and six in 19MO (Supplementary Table S4). Absolute additive effects ranged from 0.03 to 0.81, with 2% to 13% of phenotypic variation explained. Similar to Pop1, the negative value of the additive effects was related to the favorable allele for increasing protein coming from male parent UA 5814HP on chromosomes Gm04, Gm05, Gm6, Gm17, and Gm20. The QTL analysis across environment for Pop2 showed six QTL across six chromosomes (Gm04, Gm05, Gm06, Gm13, Gm16, and Gm20), with absolute additive effect values ranging from 0.11 to 0.51, and explaining 1% to 18% of the phenotypic variation (Table 3).
Comparing results across populations, we observed that QTL Gm06_46078974_G_A on chromosome Gm06 was consistently identified in both Pop1 and Pop2 in the across-environment analysis, with mean effects of 0.42 and 0.51 for Pop1 and Pop2, respectively (Table 3). However, the higher protein allele was not inherited from BARC-7. Additionally, a QTL on chromosome Gm13 was identified in both populations and traced to BARC-7, with 0.36 and 0.45 absolute allelic effects; however, the location in the linkage map was not the same for the Pop1 and Pop2 resulting in two different QTL (Table 3).

QTL analysis of seed oil
A total of nine QTL for seed oil content were mapped in Pop1 (UA 5115C/ UA 5814HP) in single-environment analyses (Supplementary Table S5). Of these, three QTL were mapped in 18CR, and two QTL each in 19MO,19RO,and 20FA. These QTL were identified on chromosome Gm06 (3 QTL), Gm08 (1), Gm10 (1), Gm13 (2), Gm18 (1), and Gm20 (1). QTL on Gm06, Gm08, Gm10, Gm18 showed negative additive effects (−0.38, −0.31, −0.22, −0.02) and contributed an average of phenotypic variation of 8%, 4%, 10%, and 1%, respectively, indicating the negative effect on oil content from the UA 5814HP. The QTL on Gm06 (2 QTL), Gm13 (2), and Gm20 (1) had a positive additive effect, ranging from 0.10 to 0.64, explaining phenotypic variation from 5% to 33%, which indicated that they traced to the normal protein parent. An across-environment analysis revealed three QTL associated with oil content, found on chromosomes Gm06, Gm13, and Gm15, with an additive effect of −0.22, +0.42, and +0.34, which explained an average of 9%, 15% and 23% of phenotypic variation (Table 4). In Pop2 (R13-532/UA 5814HP), in single-environment analyses, a total of 18 QTL on 10 chromosomes were associated with seed oil content. Seven QTL were observed in the environment 18CR, 4 each in 19MO and 19RO, and 3 in 20FA. These QTL were found on Gm01 (1 QTL), Gm04 (3), Gm06 Table 4. QTL analysis of seed oil content for two breeding populations each consisting of 250 F 2derived breeding lines. Analysis conducted using composite interval mapping of phenotypic data consisting of least-square means from 4-environment 1-rep trials. Confidence intervals in centimorgan (cM) within each population. Previously-reported QTL in the reported region are presented, as available on Soybase.org as of 10 October 2021. (3), Gm08 (2), Gm11 (1), Gm13 (3), Gm14 (1) Gm17 (1), Gm18 (1), and Gm20 (2); with a range between 3% and 22% of the phenotypic variation explained, and absolute additive effects varied from 0.07 to 0.52 (Supplementary Table S6). There were 10 QTL with negative additive effects, indicating that the alleles were from the parent UA 5814HP. In the acrossenvironments analysis, we observed four QTL, mapped on Chr. Gm04, Gm06, Gm13, and Gm20, with LOD > 3.0 (Table 4). Similar to the protein results, the QTL on Chr. Gm06 and Gm13 were detected for both populations and detected in most environments. We also observed in Table 4 that the additive effect for oil for Gm06 was negative (−0.22 for Pop1 and −0.29 for Pop2), and Gm13 was positive (0.42 for Pop1 and 0.28), which was in the contrast with the additive effect for protein on the same chromosome, as the high correlation coefficient between protein and oil contents.

Discussion
The present study investigated the genetic control of the high protein content inherited from BARC-7 source, through QTL mapping in two F 2 -derived populations from a cross between UA 5115C/UA 5814HP and R13-532/UA 5814HP, which were evaluated in four different environments. Both populations exhibited a typical normal distribution, with protein and oil ranges within expected values based on parents. UA 5814HP is a line with high protein, averaging 45.5% protein, and moderate oil content (20.5%) on a dry weight basis ; and although neither UA 5814HP nor the other parents were evaluated in the trials, the progeny showed a wide range for protein and oil values, as expected from transgressive segregation.
The average protein content of both populations was low in environments 18CR and 20FA compared to 19MO and 19RO. This is likely due to the higher temperature in 18CR and 20FA than in 19RO and 19MO during the growing season (data not shown). Conversely, higher temperature tends to increase oil content (Mourtzinis et al. 2017). That is the case in our current study, where environments 19CR and 20 FA had a high oil content compared to 19RO and 19MO. Similar observations on the effect of temperature on protein and oil were also reported in previous studies (Dornbos and Mullen 1992;Piper and Boote 1999;Specht et al. 2001;Mourtzinis et al. 2017;Novikova et al. 2018;Mertz-Henning et al. 2018;Lee et al. 2019;Sobko et al. 2020). Results also showed a negative correlation between protein and oil content. Highly negative phenotypic correlations between protein and oil are well documented in the literature (Cober and D Voldeng 2000;Assefa et al. 2018;Mertz-Henning et al. 2018;Novikova et al. 2018;Lee et al. 2019;Kambhampati et al. 2020;Yao et al. 2020;Li et al. 2021). This indicated that increasing seed protein concentration using phenotypic selection may occur at the expense of oil concentration and vice versa (Chung et al. 2003). In addition, we found a significant correlation between environments for protein and oil content. The correlation indicated that the environment is a factor that affects protein and oil; therefore, it is crucial to evaluate such traits across different environments in a mapping study. However, our study also showed high heritability for protein and oil content, which indicated that the traits were under a high level of genetic control. The high heritability of protein and oil has also been previously reported (Jain et al. 2018;Tian et al. 2020;Jiang et al. 2020;Zhang et al. 2021;Arnold et al. 2021).
Previous studies reported many QTL for protein content identified on Chr. Gm06, Gm15, Gm18, and Gm20 (Diers et al. 1992;Brummer et al. 1997;Warrington et al. 2015). In our study, we identified eight QTL associated with seed protein on Chr. Gm04, Gm05, Gm06, Gm13, Gm16, Gm18, and Gm20. The proportion of the phenotypic variance explained by a given QTL (R 2 value) is a parameter in deciding whether marker-assisted selection can be more efficient than conventional phenotypic selection alone (Bernardo 2001;Bernardo and Charcosset 2006). The QTL with large effect (R 2 ≥ 10%) were present on Chr. Gm06, Gm13 and Gm20. It is important to note that the major QTL identified in this study on Chr. Gm06 from Pop1, and on Chr. Gm13 from Pop1 and Pop2 have been reported in many previous studies (Table 3). It is crucial to highlight that the QTL on Chr. Gm06 from Pop2 has been reported as "cqSeedProtein-012" (Pathan et al. 2013). Indeed, the "cq" designation in SoyBase, indicates a "confirmed QTL." There are only 16 QTL for protein listed as cq in SoyBase and the remaining QTL have not been confirmed to date (Grant et al. 2010). The major QTL on Chr. Gm20 (118.7-166.6 cM) from Pop2, and Chr. Gm06 (137-166.7 cM) from Pop1 have not been reported yet. Those could be potential novel QTL. In fact, the known QTL on Chr. Gm20 (20-40 cM) has been reported in multiple studies (Diers et al. 1992;Chung et al. 2003;Nichols et al. 2006;Bandillo et al. 2015;Warrington et al. 2015), but was not found to be significant in our study. Previous studies revealed that high-protein alleles at that locus have historically been associated with decreases in both seed yield and oil content. Some sources of high protein alleles include PI 437088A, PI 407780A, PI 468916, and Danbaekkong (PI 619083) (Chung et al. 2003;Warrington et al. 2015;Kim et al. 2016;Diers et al. 1992;Nichols et al. 2006). The results of our study suggest that BARC-7 may carry alleles different from Danbaekkong; this could be useful for breeders to diversify sources of higher protein.
Apart from the major QTL, some QTL had relatively small effects and were environment-specific. The inconsistency of the protein QTL could be explained by either genotype specificity or sensitivity to environmental conditions (Patil et al. 2017). This could also be elucidated by the fact that the protein is a complex, quantitative, heritable trait controlled by multiple genes and affected by environmental conditions and each of them might express differently under given environments (Akond et al. 2014;Li et al. 2019). These inconsistent QTL across environments might bring the challenges for breeders to select by using a few markers in a breeding program.
A total of seven QTL associated with the oil content in this study were mainly detected on chromosomes Gm04, Gm06, Gm13, Gm15, and Gm20 for both populations across all environments. Of these, the QTL on Chr. Gm13 (175-181 cM) for Pop1 and on Chr. Gm04 for Pop2 (77.4-87.4 cM) are novel QTL. Akin to the protein, the inconsistency of the oil QTL can be explained by the specificity of the genotype or by its sensitivity to environmental conditions (Patil et al. 2017). The oil QTL that has the largest effect is the one on Chr. Gm15 for Pop1 (R 2 = 23%). Mao et al. (2013) reported the same region on Chr. Gm15 (SeedOil 43-15) in Soybase.
In our study, not all QTL associated with oil and protein content colocated in the same exact regions, in agreement with different studies of diverse genetic backgrounds (Feng, Shi, and Wang 2009;Pathan et al. 2013;Mao et al. 2013;Rossi et al. 2013). However, the oil QTL detected on Chr. Gm20 (R 2 = 22%) and on Chr. Gm13 (R 2 = 11%) did overlap with protein QTL in Pop2. Moreover, the sign of the additive effect is also flipped for the oil content as compared to those of the protein QTL. This emphasizes how protein and oil are correlated, as has also been reported in many previous studies (Cober and D Voldeng 2000;Assefa et al. 2018;Mertz-Henning et al. 2018;Novikova et al. 2018;Lee et al. 2019;Kambhampati et al. 2020;Sobko et al. 2020Yao, 2020Finoto et al. 2021;Li et al. 2021).

Conclusions
A preliminary mapping using 250 F 2 -derived lines each from two populations showed a QTL on Chr. Gm13, explaining approximately 10% of variation for seed protein content, and one QTL further downstream in Chr. 20 (only detected on population two), explaining 18% of protein variation. An ongoing fine-mapping using an advanced inbred-line mapping approach will help confirm and fine-map the regions associated with high protein and oil in BARC-7 genetic background.

Funding
This work was supported by the United Soybean Board.