Evolutionary analysis of buffalo sterol regulatory element-binding factor (SREBF) family genes and their affection on milk traits

Abstract The sterol regulatory element-binding factor (SREBF) genes are a vital group of proteins binding to the sterol regulatory element 1 (SRE-1) regulating the synthesis of fatty acid. Two potential candidate genes (SREBF1 and SREBF2) have been identified as affecting milk traits. This study aims to identify the SREBF family of genes and find candidate markers or SREBF genes influencing lactation production in buffalo. A genome-wide study was performed and identified seven SREBF genes randomly distributed on 7 chromosomes and 24 protein isoforms in buffalos. The SREBF family of genes were also characterized in cattle, goat, sheep and horse, and using these all-protein sequences, a phylogenetic tree was built. The SREBF family genes were homologous between each other in the five livestock. Eight single nucleotide polymorphisms (SNPs) within or near the SREBF genes in the buffalo genome were identified and at least one milk production trait was associated with three of the SNP. The expression of SREBF genes at different lactation stages in buffalo and cattle from published data were compared and the SREBF genes retained a high expression throughout lactation with the trend being the same for buffalo and cattle. These results provide valuable information for clarifying the evolutionary relationship of the SREBF family genes and determining the role of SREBF genes in the regulation of milk production in buffalo.


Introduction
Buffalo milk is the second most supplied milk for people around the world, and buffalo milk has a high milk fat content compared to milk from other production species. 1 Buffalo are known for their ability to tolerate heat and can adapt to roughage diets. 2 However, buffalo milk yield is much lower than dairy cow, 3 which greatly restricts the development of dairy buffalo industry.Milk production traits are complex quantitative traits regulated by many genes. 4,5Candidate genes involved in milk production traits and several necessary genes related to milk fat, protein, and lactose synthesis have been identified through research, including genes associated with sterol regulation.The sterol regulatory element-1 (SRE1), a motif in the promoter of the lowdensity lipoprotein receptor gene and other genes involved in sterol biosynthesis, is bound by sterol regulatory element-binding factors (SREBF). 6There are two distinct SREBF functional subfamilies, including SREBF1 and SREBF2, that have been found in different mammals.The SREBF family genes are associated with the synthesis of milk protein and milk fat.For example, the SNP mutation g.8402C > T and g.15369G > T in SREBF1 showed significant effects on milk yield and fat percentage in buffalo 7 and the SREBF1 haplotype showed an association with milk traits in dairy cows. 8REBF1 is a key transcription factor whose transcriptional activation coincides with greater milk fat synthesis at the onset and throughout lactation in dairy cows. 9,10n the Golgi apparatus, SREBF1 is hydrolyzed by proteases, and the transcriptional fragments of nuclear SREBF1 are released to activate lipogenic genes. 11In vitro work with the nonsecretory MAC-T cell line indicated that abundance of FASN could be regulated by SREBFs.SREBF1 has been identified as a positive regulator in milk fat synthesis 12 and SREBF2 regulates the expression of INSIG1 protein, the main protein in milk. 8An additional function of SREBF2 is inducing mRNA expression of genes involved in fatty acid (FA) synthesis. 13amily genes are a series of genes with similar protein structures, common evolutionary origins and similar functions. 2At present, it is a useful method to perform genomic evolutionary analysis of some gene subfamilies to identify novel candidate functional genes.For example, EBERL e D 2 identified 29 HSF family genes in walnut and found most of them were high expression under high temperature, high salt and drought stress, indicating that these family genes might be involved in plant stress response.Yang 14 identified ten METTL21 family genes in the chicken genome and discovered that METTL21C expression levels increased with muscle development, implying that the METTL21 genes were involved in chicken muscle development.In buffalo, Liu 2 identified eight DGAT family genes and several SNPs located in them were associated with milk traits, providing useful information to identify the role of the DGAT family genes in the regulation of milk production and quality.
Milk production is a complex trait regulated by many genes; it is hard to credit all the effects to one single gene.Recent completion of buffalo genome sequencing allows genomic evolutionary analysis of buffalo gene families. 15The objective of this research is to identify the role of SREBF genes in the regulation of milk synthesis by identifying all SREBF family genes, analyzing their sequence, structural features and tissue specific expression, and building a phylogenic tree.Association analysis of SNP located in SREBF family genes will be performed with milk production traits to identify valid markers related to buffalo milk production.

Identification of SREBF genes
For genome-wide analysis of the SREBF genes and proteins, an annotation GFF file of buffalo (assembly UOA_WB_1) 15 and cattle (assembly ARS-UCD1.2) were downloaded from NCBI website (https://www.ncbi.nlm.nih.gov/).A combination of Hidden Markov Model (HMM) and Basic Local Alignment Search Tool (BLAST) were used to identify all the possible SREBFs in this research. 16We obtained 28 SREBF protein sequences for Bos taurus (9), Bubalus bubalis (4), Ovis aries (4), Capra hircus (5) and Equus caballus (6) from UniProt database (https://www.uniprot.org/uniprot/) (Txt S1).These 28-protein sequences were used to construct an HMM profile for possible SREBF family protein identification in HMMER 3.3 (http:// hmmer.org/).They were also used as the query in a BLASTP search for potential candidates with an e-value threshold of 10-6.3.

Structural feature analysis
The candidate sequence identified from each method was regarded as an SREBF family protein.After that, to confirm the conserved protein domain, the SREBF protein sequence was submitted to NCBI-CDD (Conserved Domain Database, https://www.ncbi.nlm.nih.gov/cdd/ ). 17The identified SREBF sequences were named by the transcripts and neighboring relationships with known SREBF genes.To detect the conserve motifs for further analysis of the structure of buffalo SREBF proteins, we submitted them to the MEME 5.0 website (http:// meme-suite.org/tools/meme). 18Then we visualized the result from NCBI CDD-search and MEME analysis in TBTools. 19To further analyze the detail and function of Motifs from MEME analysis, we submitted the sequence of the Motif to Pfam database (https://pfam.xfam.org/) to analyze the sequence for the Pfam matches.To display the conserved domain from NCBI CDD-search more clearly, we visualized the result via TBTools.The molecular weight and isoelectric point of all the buffalo SREBF protein sequence were calculated by ExPASy (https://web.expasy.org/compute_pi/).

Phylogenetic analysis
We identified the family protein sequences in four other livestock in addition to buffalo, including Bos taurus (25), Capra hircus (23), Ovis aries and Equus caballus (21) using the same method as in buffalo.These sequences together with candidate buffalo SREBF family protein sequences, a total of 114 sequences, were aligned by muscle MEGA 7.0 with bootstrap 1000 replication (random seed). 20After that, the phylogenetic analysis was performed, we used MEGA 7.0 for Maximum Likelihood and phylogenic tree construction. 20

Chromosomal distribution and collinearity analysis
Chromosome position map of the SREBF family genes was constructed using the TB-tools v1.059 21 software with the annotation GFF3 file.Multiple Collinearity Scan Toolkit (MCScanX) 22 software was used to perform syntenic analysis of SREBF genes between buffalo and cattle and the results were visualized by using TB-tools. 21To exhibit the positive selection relationship of SREBF family genes between buffalo and cattle, their synonymous (Ks) and nonsynonymous (Ka) values of substitutions analysis were performed using TB-tools software. 21

Association analysis of SNP and buffalo milk traits
The phenotype resource consisted of data from 489 Italian Mediterranean buffalo born from 2000 to 2011 and reared in 4 herds in southern Italy, as described previously. 23A total of 1,424 lactation records, which were adjusted to 270-day records, including peak milk yield (PM270), total milk yield (MY270), fat yield (FY270), fat per-centage (FP270), protein yield (PY270), and protein percentage (PP270), were available.SNPs within and 2000 base pair upstream of buffalo SREBF family genes were obtained from the genotyping data conducted at Delta Genomics (Edmonton AB, Canada) using the 90 K Axiom Buffalo SNP Array (Affymetrix/Thermo Fisher Scientific, Santa Clara, CA), which identified effective 60,387 SNPs in total 23 (Table S1).We adjusted the six buffalo milk production traits to a 270-day lactation length to eliminate environmental factors such as lactation season parity, year, season, and herd on milk traits. 23The association analysis between each SNP and six 270-day adjusted buffalo milk production traits were carried out using SAS version 94 software (SAS Institute Inc., Cary, NC) and the PROC General Linear Model (GLM) procedure, 24 with the following model: where Yijklm ¼ trait observation, l ¼ overall mean, Yi ¼ fixed effect of the ith year, Pj ¼ fixed effect of the jth parity, Sk ¼ fixed effect of the kth season of calving, Fl ¼ fixed effect of the lth farm, Gm ¼ fixed effect of the mth genotype, and eijklm ¼ random residual, for detail see. 25 The genotype effects of SREBFs were presented as least square means with standard error (LSM ± SE), and pairwise comparisons among different levels of fixed effects included in the model were subjected to Bonferroni correction for multiple F-testing.

Gene expression analysis
To explore the expression of SREBF members in buffalo and cattle mammary epithelial cells during lactation, two published RNA-seq databases of buffalo (PRJNA453843) 26 and cattle (PRJNA419906) 27 with early, mid and late lactation milk samples was selected for further analysis.Briefly, the FastQ formats of raw reads were firstly processed by Trimmomatic 28 software to get clean reads and FastQC 29 software to check the quality.Then the clean reads were aligned to buffalo and cattle genome (buffalo: UOA_WB_1 and cattle: ARS-UCD1.2) respectively using HISAT2 30 software with default parameters.The count matrix of genes and transcript was obtained using StringTie software. 31The DESeq2 R-package was used to calculate the number of transcripts per million (TPM) for each gene. 32Then the TPM value of SREBF genes (Table S2) was selected and a line graph constructed by GraphPad Prism.

Identification of the members of the SREBF family
The HMMER and BLAST method identified 24 nonredundant protein sequences encoded by 7 SREBF genes, including SREBF1, SREBF2, TFE3/SREBFL2, MITF/SREBFL1, USF2/SREBFL3, USF1/SREBFL4, and MYCN/SREBFL5 in buffalo (Table 1).The length of the amino acid sequence of the 24 buffalo SREBF protein isoforms ranged from 252 (SREBF2L4) to 1182 (SREBF1), and their molecular weight was ranged from 27.33 (SREBF2L4) to 124.37 kDa (SREBF1), which is correlated with the protein length.The theoretical pI varied from 4.71 to 9.58, indicating that these SREBF proteins may exist and function in different regions of cells.
To further characterize the structural feature of the SREBF family genes, their gene structure was predicted (Fig. 1).Among them, SREBFL1 had the longest gene sequence reaching nearly 120 Kb.The SREBF1 family members showed similar coding areas.However, the SREBF2.3 has longer 3 0 untranslated region (UTR) than the others.Similar to SREBF1, SREBF2 family members also showed similar gene structures with each other, and the only difference was that SREBF2.3 had longer 3 0 UTR.As for the other subgroups, the situation was similar to that of the two subgroups.The genes in the same subgroup all had similar encoding regions but some of their 3 0 UTRs were different.
We submitted the sequence of the Motif to Pfam database to analyze the sequence for Pfam matches, results showing that the common Motifs of all the family members, Motif 1 and Motif 2, have the same Pfam-A matches: Helix-loop-helix DNA-binding domain (HLH) (Table S3).We visualized the result from NCBI CD search via TBTools (Fig. 2), it more clearly showed that all the family members contained the same class domain: bHLHzip.

Phylogenetic relation of SREBF protein across different organisms
There were 114 amino acid sequences were aligned by MUSCLE (MUltiple Sequence Comparison by Log-Expectation) and Neighbor-Joining phylogenic tree (Fig. 3A) was constructed.Then the result was confirmed by Maximum Likelihood (ML) phylogenic tree (Fig. 3B).Both phylogenetic trees showed similar topologies.And SREBF family genes could be classified into three groups.The first group included all the protein isoforms of SREBF1 and SREBF2; then the second group included all the protein isoforms of TFE3/SREBFL1 and MITF/SREBFL2; and the third group included all the protein isoforms of USF2/ SREBFL3 and USF1/SREBFL4.Only the MYCN/ SREBFL5 didn't belong to any group, indicating that more research was needed to determine whether the gene belongs to SREBF family.The same gene remained relatively consistent across different organisms, which indicated that these proteins had the same function in different species.

Gene location on chromosome
Based on genes mapping information of genomes, seven buffalo SREBF genes were located on seven chromosomes (Fig. 4A) and seven identified cattle SREBF family genes were located on six chromosomes (Fig. 4B).The results of collinearity analysis of SREBF family genes between buffalo and cattle showed that a total of seven orthologous gene pairs were identified (Fig. 4C).All of them were distributed in the different chromosomes of buffalo and cattle.Interestingly, buffalo SREBFL2 gene was collinear with both cattle SREBFL1 and SREBFL2 gene.Furthermore, Ka/Ks (non-synonymous/synonymous) ratios for each  SREBFs pairs between buffalo and cattle were ranged from 0 to 0.140 (Table 2), indicating that all SREBFs were under purifying selection.

Expression of SREBF family genes at different lactation stages
Based on the published RNA-seq database, all the identified SREBF family genes were found expressed in buffalo and cattle milk samples at different stages of lactation (Fig. 5).Furthermore, expect for MYCN, all the SREBF members kept relatively high expression in both buffalo and cattle at early, mid and late lactation.However, compared with cattle, all the SREBF family genes of buffalo showed lower expression levels.Five of them had the same trend between buffalo and cattle during the lactation.

Association of SNPs with milk production of buffalo traits
To perform association analysis between SNPs in SREBF family genes and buffalo milk production traits, all the SNPs within each SREBF family genes and their promoter from the genotyping data were scanned utilizing the 90 k Axiom V R Buffalo SNP Array (Table 3) and a total of eight objective SNPs were identified finally.Phenotypic, genotypic, allelic frequencies and p-value of Hardy-Weinberg Equilibrium (HWE) expected heterozygosity (He) value, polymorphism information content (PIC) value was also listed in the table.All the identified SNPs were by Hardy-Weinberg Equilibrium (HWE) that could be used for further association analysis in the population.Association analysis (Table 4) showed that AX-85042696 was associated with PP270 (p < 0.01).Within the SNP, the homozygote TT (4.67 ± 0.03) animals were found to have greater (p < 0.05) PP270 than heterozygote CC (4.58 ± 0.02).AX-85073202 was found to be significantly associated with PM270, MY270 (p < 0.05) and PY270 (p < 0.01).And AX-85063577 displayed to have significant effect on MY270 and PY270 (p < 0.05).

Discussion
Buffalo is an important livestock species for the agriculture economy, supplying milk, meat, and draught for plowing. 33Buffalo milk is considered to be of high quality with more milk fat, protein and dry matter than milk from other livestock species. 34However, low milk yield has seriously restricted the development of the dairy buffalo industry. 35SREBFs encode transcriptional activators required for lipid homeostasis and regulate transcription of the LDL receptor gene as well as the fatty acid. 36However, genomewide identification of buffalo SREBF family genes has been limited.

Structural feature of buffalo SREBF family protein and genes
In this study, we performed a genome-wide analysis and identified seven buffalo SREBF genes including 24 protein sequences.Based on the polygenetic analysis, these proteins were classified into seven clusters.Human SREBP1 had two isoforms including SREBP1a and SREBP1c, differing from each other at a few Nterminal amino acids encoded by different first  exons, 37,38 which was the same as our result with buffalo SREBF1 having three protein sequences.The SREBFs can bind to sterol regulatory element 1 (SRE-1), 38 so they are mostly dependent on their functional motif and domains in the protein.Among the protein sequence identified in this study, they all  shared the common conserved domain including Motif 1 and Motif 2. We found a Pfam-A matches to both the Motif 1 and Motif 2, helix-loop-helix DNAbinding domain (HLH).The bHLH-Zip transcription factors family members, whose second member was SREBF2 could recognize sterol regulatory element 1 (SRE-1). 39,40Almost all of the family members except SREBFL5 had at least 6 Motif, which suggested that they had similar structure.From the NCBI CDD search result, we could see that all the family members contain bHLH-Zip domain.In this study, we found that SREBF1, SREBF2, MITF, TFE3, USF1, USF2 and MYCN all be-longed to the same family, which was consistent with Fengmei Li's result that SREBF1, SREBF2, MITF and USF1 belonged to helix-loop-helix (bHLH) transcription factors (TFs) in cattle by genome-wide identification. 40

Phylogenetic relationship of SREBF family protein
We performed phylogenetic analysis and constructed a Neigbor-Joining tree and Maximum Likelihood tree, which provided an in-deep insight into their evolutionary relationship among species.Both phylogenetic trees were consistent with classification by structural feature, and the SREBF family members in five organisms could be clustered in seven different clades.The same genes in different species showed similar structure and could be clustered in the same clades, which was consistent with the other family genes such as buffalo DGAT family genes, 2 ETS transcription family, 41 and SLARR-B gene family in tomatoes. 42urthermore, we found the evolutionary relationship of SREBF family genes between buffalo and cattle was closer than with any other species indicating that the function of the family protein was similar for both buffalo and cattle.SREBF1 and SREBF2 regulate the synthesis of triacylglycerol (TAG) and affect milk production traits in cattle, 9 so the results of this study indicate that buffalo SREBF family proteins could affect milk production traits in buffalo.

Distribution and collinearity analysis of SREBFs
In genetics, the ratio of Ka and Ks is used to determine whether there is selection pressure acting on the protein-coding gene. 43The present study, it showed that ratios of Ka/Ks of all the SREBFs pairs between buffalo and cattle ranged from 0 to 0.140, indicating that these family genes were under selection. 44,45ctually, it was also found that most of the values of Ka/Ks for mouse-rat orthologous genes were much less than 1 because a protein mutation is much less likely to be different between two species than a silent mutation. 46

Candidate markers in SREBF family genes for buffalo milk traits
To search candidate markers and genes for milk production traits, we obtained SNP within SREBF family genes from genotyping data and performed association analysis between milk traits.Three SNPs located in SREBF1, SREBF2 were associated with buffalo milk traits.The SREBF1 polymorphism is associated with milk fat content in cattle 9 and milk yields in goats, 9 and SREBP1a and SREBP2 directly regulate INSIG1 gene expression, 47 and SREBF2 regulated mRNA expression involved in fatty acid synthesis. 9,13,48owever, the number of SNP obtained from genotyping database conducted using the 90 k Axiom V R Buffalo SNP Array is limited, so further research to validate the function and influence on milk traits of the SREBF family genes are necessary.

Expression of SREBF family genes at different lactation stages
SREBF members are regulators of lipid and cholesterol metabolism 49 with abundant expression of the protein in mammary tissues. 50In the present study, six of the seven identified SREBF family genes had a high expression in both buffalo and cattle milk samples.The orthologous pairs between buffalo and cattle showed similar expression trends during all stages of lactation, and the phenomenon also happens to collagens. 51Our findings suggested that, except for SREBF1 and SREBF2, the other identified SREBF members play a role in lipid and cholesterol metabolism and synthesis of milk.However, the findings need more research to be confirmed.

Conclusion
In this research, we performed a comprehensive genome-wide analysis of the SREBF family of genes in buffalo.Seven SREBF gene families and twenty-four amino acid sequences were identified.Analysis of structural features showed that genes belong to the same clade had similar encoding areas and different 3 0 UTRs.Our association analysis indicated that the SREBF family genes are candidate markers for milk production traits.Most of the SREBF family genes showed high expression levels during lactation.

Figure 1 .
Figure 1.Structural feature of the identified SREBF proteins and genes isoforms in buffalo.(A) Phylogenetic tree constructed by Neighbor-Joining method; (B) Conserved motif of buffalo SREBF protein by MEM analysis; (C) Structure of buffalo SREBF family genes.

Figure 2 .
Figure 2. Prediction of conserved domains of buffalo SREBF protein sequences.Boxes with different colors indicate different conserved domains.

Figure 4 .
Figure 4. SREBFs in buffalo and cattle: chromosomal distribution and collinearity analysis.(A) Chromosomal distribution map of buffalo SREBF family genes; (B) Chromosomal distribution map of cattle SREBF family genes; (C) Collinear block linkages between buffalo and cattle.The collinear blocks are indicated by gray lines in the background between buffalo and cattle genomes, and the orthologous SREBF family genes are highlighted in red.

Figure 5 .
Figure 5. Buffalo SREBF family genes expression profile calculated by TPM in mammary gland tissues at four lactation points.

Table 1 .
Details of information of genome-wide identified SREBF family proteins in buffalo.

Table 2 .
Non-synonymous (Ka) to synonymous (Ks) substitution rates of SREBFs among buffalo and cattle.

Table 3 .
Detail information of SNPs within the SREBF family gene and their promoters.

Table 4 .
SNPs association analysis for six buffalo milk production traits.