Selection signatures of Qinchuan cattle based on whole-genome sequences

Abstract Qinchuan cattle has gradually improved in body shape and growth rate in the long-term breeding process from the draft cattle to beef cattle. As the head of the five local yellow cattle in China, the Qinchuan cattle has been designated as a specialized beef cattle breed. We investigated the selection signatures using whole genome sequencing data in Qinchuan cattle. Based on Fst, we detected hundreds of candidate genes under selection across Qinchuan, Red Angus, and Japanese Black cattle. Through protein-protein interaction analysis and functional annotation of candidate genes, the results revealed that KMT2E, LTBP1 and NIPBL were related to brain size, body characteristics, and limb development, respectively, suggesting that these potential genes may affect the growth and development traits in Qinchuan cattle. ARIH2, DACT1 and DNM2, et al. are related to meat quality. Meanwhile, TBXA2R can be used as a gene associated with reproductive function, and USH2A affect coat color. This provided a glimpse into the formation of breeds and molecular genetic breeding. Our findings will promote genome-assisted breeding to improve animal production and health.


Introduction
Genetic resources of domesticated species, such as local breeds, are valuable sources of diversity for broadening the genetic base of elite germplasm. 1 The selection can leave signatures on the genome that provide clues about protein-coding regions can be detected. 2 The analyses of selective signatures have successfully revealed many selected genes during the domestication and selection in cattle. 3,4 By using the SNP data of the whole genome, the 'genome selection' based on genome prediction breeding value has triggered a revolution in animal and plant breeding. 5 It is, therefore, possible to differentiate between predictions of selection by evaluating signatures of selection on the genome, thus providing important insights into domestication.
Beef palatability is one of the most important economic goals of Japanese Black cattle breeding, and its beef has the unique characteristics of strong marbling. 6 For example, in Japanese Black cattle, a large amount of intramuscular fat (IMF) (23%) is deposited in the longissimus dorsi muscle (LM), while in European breeds, the content of IMF is less than 5%. 7 Angus has excellent carcass and meat quality, high fertility, and reach puberty early. 8 Meanwhile, it also has many advantages, such as low dystocia rate, mild temperament and precocity, and it has been widely introduced all over the world 9 .
In animals or plants, accurate phenotypic prediction using genetic markers can help select individuals with ideal breeding values and improve the efficiency of breeding programs. 10 The understanding of the genome is a key determinant of the success of genome-wide scanning selection. 11 The whole-genome sequences of the most economically or ecologically important species are usually available, which can be used to guide the estimation of genetic variation and past population dynamics and to provide information on breeding practices or management strategies, respectively. 12 Based on the whole genome resequencing data of Qinchuan cattle, Red Angus cattle and Japanese Black cattle, this study detected the information of genetic variation, screened the potential selected regions of the genome and analyzed the biological functions and regulatory pathways involved in the selected genes, to screen out candidate genes and regulatory pathways for economically important traits such as meat quality, reproduction and stress resistance, to provide basic information for subsequent genome selection or gene function discovery.  16 Finally, these variants were then annotated using SnpEff (version: 4.3t). 17 Phylogenetic and genetic diversity analyses Independent SNPs were identified using plink (version: 1.90b3.40), 18 with options -indep-pairwise 50 5 0.2 and -maf 0.01. Then, a phylogenetic tree was conducted in MEGA7 19

Genome-wide selective sweep test
To identify the selective sweep regions, the Fst was performed with '-weir-fst-pop group1 -weir-fst-pop group2 -fst-window-size 10000 -fst-window-step 2000' using in vcftools (version: 0.1.13), 22 and the top 1% of the Fst value is used as the candidate selection region. For QCC, RAN and JBC cattle were divided into QCC vs RAN group and QCC vs JBC group, which reflected the level of divergence in QCC cattle relative to RAN and JBC cattle.

Protein-Protein interaction network analysis of the selected genes
The identified selective sweeps regions were annotated to genomic regions based on the NCBI reference genome (Bos taurus UMD3.1). For better investigating the interactions between these selected genes, experimental protein-protein interaction (PPI) information had been retrieved from the STRING database (https://string-db.org/) and visualized in Cytoscape software (version: 3.7.1) 23 with default parameters. To identify highly connected subnetworks, the MCODE clustering algorithm 24 was applied to identify densely connected network modules with default parameters.
Finally, the hub genes of modules were calculated using DAVID (version: 6.8) (https://david.ncifcrf.gov/) to evaluate the statistical enrichment of functional gene sets in terms of molecular function, biological process categories and pathways, significant thresholds (p-value <0.05) were used to determine the functional enrichment.

Population structure
To evaluate the phylogenetic relationship among cattle breeds in this study, phylogenetic tree analysis showed different genetic clusters according to their types (Fig. 1A). The branches of the phylogenetic tree were grouped as expected and consistent with the results of PCA (Fig. 1B), which revealed that the phylogenetic tree was clustered into four different genetic groups. The results of PCA showed that there was a large genetic diversity among the cattle breeds, and the first three principal components explained 6.25%, 3.56% and 2.85% of the total genetic variance, respectively (Fig. 1B). PC1 separated QCC from (RAN and JBC) and India indicine cattle (BRM and GIR), and as well as PC3, suggesting that QCC have different genetic backgrounds from the other four breeds, while inner Eurasian taurine and India indicine cattle shared more similar genetic background than QCC (Fig. 1C, K ¼ 2; Additional file 2: Figure S1). This finding is further supported by the result of model-based analyses of population admixture. PC2 divided the Eurasian taurine cattle into RAN and JBC (Fig. 1B,C, K ¼ 4). Obviously, at K ¼ 3, the majority of interspecific-hybrid QCC received genetic contributions from Eurasian taurine and India indicine cattle.

Genetic diversity
The median values of nucleotide diversity (p) of QCC, RAN and JBC were 0.00263, 0.00101and 0.00093, respectively (Fig. 2). The average value of nucleotide diversity (mean-p) was one of the indexes to measure the nucleotide diversity of breeds, and the mean-p of QCC was the highest (0.00272 ± 0.00115), followed by RAN (0.00126 ± 0.00092) and JBC (0.00119 ± 0.00092).

Genetic signature of selection in qinchuan cattle
Due to the genetic separation among QCC, RAN and JBC cattle, we used the Fst to search for the genomic regions in the QCC population. Totally, we identified 840 and 857 candidate genes from QCC vs RAN group (Fst > 0.2535, top 1%) and QCC vs JBC group (Fst > 0.2570, top 1%), respectively ( Fig. 3; Additional file 3: Table S2). Finally, a total of 290 genes were identified as the candidate genes for QCC cattle.

PPI network construction
The analysis of protein-protein interaction and PPI network is important to reveal the functions of proteins. For each supercluster, the gene network was assembled, from which hub genes were selected and mapped into the hub gene network, linking these hub to 'direct' and 'indirect' gene/protein interactions (Fig. 4). To identify the 'real' hub genes, an MCODE clustering algorithm was applied, which showed that 47 genes were predicted to be the most central regulatory hubs influencing the largest number of genes (Table 1).

Functional annotation of candidate genes
To determine the biological processes, and the more relevant groups of hub genes regulated, the functional enrichment analysis was performed by DAVID using GO (gene ontology) and KEGG pathways database. Overall, 6 GO biological processes, 3 GO cellular components, 2 GO molecular functions and 1 KEGG pathway, 21 hub genes were significantly enriched for these genes (p < 0.05) ( Table 2; Additional file4: Table S3).

Discussion
Previous population analyses have confirmed that the genetic ancestry of Qinchuan cattle was characterized by Bos taurus and B. indicus ancestry. 25 Our genetic structure analysis showed that the ancestors of the five breeds could be divided into European taurine (Red Angus cattle), East Asian taurine (Japanese Black cattle), Indian indicine (Brahman and Gir cattle), and hybrid cattle (Qinchuan cattle). This is consistent with the  systematic analysis of the whole genome of 260 domestic cattle worldwide by Chen et al (2018), which found that the worldwide domestic cattle can be roughly divided into five ancestral groups. 26 Qinchuan cattle have two lineages of taurine and indicine, which reveals that a large number of gene flow take place during the formation of this breed, which may also be another important factor leading to the increase of genetic diversity.
Overall, hybridization is a creative force contributing to species diversity. 27,28 The reason for the high nucleotide diversity of Qinchuan cattle may be due to the hybridization and introgression of other subfamily species (such as B. javanicus) to Qinchuan cattle. The lower nucleotide diversity of Red Angus cattle and Japanese Black cattle relative to the Qinchuan cattle can be explained by a combination of intensive selection and genetic drift in specialized beef cattle breeds.
Demographic studies of domesticated species mainly focus on identifying ancestral populations and quantifying the effects of domestication bottlenecks on genetic diversity. 29 The nucleotide diversity analysis of high Fst region and low Fst region showed that the nucleotide diversity of most high Fst regions was very low in temperate and tropical materials, which supported the view that the genetic differentiation of high Fst regions was due to differential selection. 30 The reduction of genomic genetic diversity indicates the occurrence of domestication or selection, so genomewide selected signal detection based on genetic diversity can screen a large number of selected regions or genes.
Through the top 1% Fst value, a total of 290 candidate genes were screened, and then 11 key modules and 47 key genes were screened according to the protein interaction network. Finally, the functional enrichment and pathway analysis of the 47 key genes were carried out. These genes are significantly enriched in positive regulation of GO:0032436$positive regulation of proteasomal ubiquitin-dependent protein catabolic process, GO:0050953$sensory perception of light stimulus, GO:0000209$protein polyubiquitination, GO:0090090$ negative regulation of canonical Wnt signaling pathway, GO:0007605$sensory perception of sound, GO:0035904$aorta development, GO:0000151$ ubiquitin ligase complex, GO:0030136$clathrin-coated vesicle, GO:0005622$intracellular, GO:0061630$ubiquitin protein ligase activity, GO:0042800$histone methyltransferase activity (H3-K4 specific), bta04144:Endocytosis related pathways and functional classes. Finally, A total of 21 hub genes, revealed the potential genetic factors of specialized beef cattle breeding at the genomic level.
ARIH2 is involved in inflammation and skeletal muscle degeneration. 31 Compared with Angus, Brahmins express much less ARIH2 in key metabolic organs, such as skeletal muscle. 8 In Wagyu x Angus F 1 progeny, ASB3 is significantly associated with backfat (p ¼ 0.0392). 32 As a gene involved in DNA repair and radiation, ASH2L plays a key role in adapting to high altitude conditions. 33 DACT1 was expressed in both chicken and mouse myogenesis, and the expression of DACT1 was significantly up-regulated with cell terminal differentiation, cell cycle withdrawal and cell fusion. 34 DNM2 enhances the rigidity of actin bundles during the invasion, thus promoting the fusion and development of myoblasts into skeletal muscle. 35 DVL1 is one of the important components of the Wnt/b-catenin signaling pathway, which can  , UBE2J2, ARIH2, HIP1R, SGIP1, SH3GL2, SH3GL1, RNF123, PARK2, ASB3, UBR1  2  4  4  6  TBXA2R, GNB1, TRIO, TRHR  3  4  4  6  RPUSD2, RMDN3, DNAJC17, GCHFR  4  3.667  7  11  NIPBL, ATAD2B, KMT2E, hist2h3d, EZH1, MGA, ASH2L  5  3  3  3  PPP1R9A, PON3, SGCE  6  3  3  3  NUCB1, LTBP1, MXRA8  7  3  3  3  RASGRF2, RHOV, ARAP2  8  3  3  3  IMPDH2, FKBP1A, MMEL1  9  3  3  3  DACT1, NKD2, DVL1  10  3  3  3  USH2A, CDH23, DNAH9  11  3  3 3 MRPL19, C10H15orf57, MRPL54 regulate the differentiation of preadipocytes. 36,37 De novo missense mutations of GNB1 were identified as a genetic cause of developmental delay, 38,39 implicating GNB1 as a genomic disease-associated gene. As an important cell cycle regulator, KMT2E participates in the cell cycle regulatory network, 40 and its mutants affect the size of the brain. 41 Genome-wide association analysis shows that LTBP1 is a potential regulatory gene for the synthesis of unsaturated fatty acids. 42 Meanwhile, the insertion and deletion of the LTBP1 gene had significant effects on body length, body weight and other growth traits of sheep. 43 By combining zebrafish and mice studies, studies have shown that NIPBL levels are important for limb development and that NIPBL regulates the expression of specific genes in embryonic limbs, including many key developmental regulators that are conserved between fish and mice 44 and embryonic stem cells rapidly differentiate when NIPBL level is decreased. 45 NKD2 as a myofibroblast-specific target in human kidney fibrosis was identified by single-cell RNA sequencing. 46 PARK2 gene is associated with fatty acid synthesis 47 and can reveal the genetic control of temperament expression in cattle. 48 Genetic variation in SGIP1 affects body fat content and is considered to be a potential determinant of obesity. 49 New World Creole cattle shows that TBXA2R is related to reproductive function in tropical environment selection. 50 The diversity of coat color and patterns after domestication is a focus of biological research, and USH2A has been proved to be a shared target region of selection in the formation of Dalmatians. 51 Therefore, we can regard KMT2E, LTBP1 and NIPBL as potential key genes affecting the growth and development of Qinchuan cattle, while ARIH2, DACT1, DNM2, GNB1 and NKD2 regulate muscle development, and ASB3, DVL1, LTBP1, PARK2 and SGIP1 regulate fat deposition, respectively. In addition, TBXA2R can be used as the reproductive function-related gene. Particularly, USH2A can be used as a candidate gene to affect coat color.

Conclusion
Qinchuan cattle contains two ancestral lineages of taurine and indicine, and its genetic diversity is higher than that of specialized beef cattle breeds, Japanese Black and Red Angus cattle. Through genome-wide selective analysis, the selected genes are significantly enriched in lipid metabolism, growth and development, neuroregulation, reproduction and other important pathways and functional classification, which provides a scientific basis for molecular breeding of beef cattle.