Bioinformatics analysis of extracellular subtilisin E from Bacillus subtilis

Abstract Bacillus spp. are the main sources of subtilisin E, which has several applications in biotechnology. The 3D structure of subtilisin E has a significant impact on its efficacy. In this study, we evaluated subtilisin E from Bacillus subtilis subsp. subtilis str. 168 by bioinformatic methods. The results revealed that the subtilisin E sequence from B. subtilis contains highly conserved amino acids, including histidine (H), aspartic acid (D) and serine (S). Subtilisin E cleaves the bonds between hydrophobic and polar amino acids in keratin-associated proteins. The effects of point mutations on the crystal structure of subtilisin E (PDB ID: 1SCJ) showed that changes of asparagine 123 (N123) to valine (V) and serine 331 (S331) to leucine (L) respectively, were the most stabilizing. Genomic analysis of the subtilisin E-coding gene (aprE) indicated that this gene and the yhfN gene are expressed through a σA promoter. The analysis of TBFs revealed AbrB, ScoC, DegU, Hpr, σA, SinR, TenA, and DegU as relevant regulators of aprE expression. Phylogenetic analysis showed that subtilisin Es have highly conserved structures among Bacillus spp., sharing a common ancestor, where their coding genes were duplicated and evolved within the Bacillus spp. As the conclusion, our in silico study demonstrated that the overexpression of the aprE gene and stability of the produced subtilisin E can be improved though system biology methods such as point mutations and identifying the involved transcription factors (TFs) or/and TBFs. Communicated by Ramaswamy H. Sarma


Introduction
Subtilisin E, an alkaline serine protease belonging to the subtilase family of serine proteases, is produced by many prokaryotic and eukaryotic organisms as an extracellular enzyme with a molecular weight of 18 to 90 kDa (average % 40 kDa) (Bouacem et al., 2015). Currently, over 60% of the enzyme market belongs to proteases with annual sales of $1.5-1.8 billion (Goncalves et al., 2016;Hirata et al., 2013). Subtilisin E is an enzyme applicable to many fields of industry, such as production of detergents, wool and leather, pharmaceuticals, cosmetics, and food (de Souza et al., 2015). The major application of subtilisin E in the food industry belongs to dairy, meat, and fish products (Vojcic et al., 2015). Moreover, the main usage of this enzyme in the wool and leather industries is for soaking and de-hairing. In addition, wound-healing and removal of dead tissues are the main applications of subtilisin E in medicine and the pharmaceutical industries.
Bacterial species and specifically Bacillus spp., such as Bacillus subtilis (Wang et al., 2016), Bacillus cereus (Nilegaonkar et al., 2007), Bacillus licheniformis (Fakhfakh et al., 2009), and Bacillus pumilus ( € Ozc¸elik et al., 2014), have been recognized as the main sources of subtilisin E, which was produced for the first time in 1967 (Levisohn & Aronson, 1967). Subtilisin E exhibits a number of important features, such as retaining enzymatic activity upon changes of pH or temperature, stablility in organic solvents, and resistant to toxic compounds, thereby making it the highest selling industrial enzyme globally (Chittoor et al., 2016;Contesini et al., 2017). This secretory enzyme has the highest catalytic activity at neutral to alkaline pH range (pH 7.0-11.0) and contains conserved histidine (H), serine (S), and aspartic acid (D) amino acids at its active site (Tjalsma et al., 2000). This enzyme is stabilized by calcium (Ca 2þ ) ions in some thermophiles, while its decreased thermostability is associated with weak affinity for Ca 2þ ions (Almog et al., 2003).
Some studies have shown that there is a significant association between subtilisin E production and secondary metabolites during the sporulation phase (Paetzel et al., 1998). A combination of UV radiation and N-methyl-N'-nitro-N-nitrosoguanidine-yielded B. subtilis and B. pumilus mutants, which were able to secret larger amounts of subtilisin E with higher hydrolytic activity (Wang et al., 2007(Wang et al., , 2016. Furthermore, studies on genetic manipulation of B. pumilus revealed that the bacterium could produce 114% and 92% more subtilisin E after optimization of its growth conditions and the alkaline protease promoter, respectively (K€ uppers et al., 2014). In addition, there are eight non-polar amino acids, including cysteine (C), serine (S), threonine (T), glycine (G), alanine (A), valine (V), leucine (L), and phenyl alanine (P), in the proximity of aspartic acid (Asp) at the active site of subtilisin E. The point mutations in the listed amino acids, such as leucine resulted in the production of subtilisin E with higher activity in B. subtilis (Takagi et al., 1988). Moreover, site-directed mutagenesis of both G61 and S98 to cysteine (C) in B. subtilis led to the formation of disulfide bonds between the two cysteines, which consequently resulted in the secretion of thermostable subtilisin E with a half-life 2-3 times higher than that of the wild type at 63 C (Takagi et al., 1990). The next factor significantly affecting the yield of active enzyme is its signal peptide. In the previous study, 393 signal peptides (173 as the homologous signal peptides from B. subtilis and 220 as the heterologous signal peptides from B. licheniformis) were fused to the N-terminal of subtilisin E from B. amyloliquefaciens (Degering et al., 2010). This modification induced more efficient secretion of subtilisin E, although the prediction of the secretory strength of these signal peptides has still not been carried out.
Our previous study revealed that the mutation in the scoC (sporulation control through binding to upstream of the transcription start site) and sigma H-spo0B (transcription inducer through binding of the gene product to DNA) genes leads to higher extracellular production of subtilisin E in B. subtilis and B. cereus (Abe et al., 2009;Ahmadpour & Yakhchali, 2017). To produce more extracellular subtilisin Es via genetic manipulation of Bacillus sp., it is necessary to characterize the specifications of these enzymes and identify the relevant coding genes. In this study, we describe the genomic and proteomic features of subtilisin E from B. subtilis subsp. subtilis str. 168 through bioinformatic approaches. In addition, protease specificity function and cleavage site of the enzyme on keratin are predicted.

Structural analysis of the subtilisin E
A subtilisin E protein sequence (UniProt ID: ARW53276.1) from B. licheniformis (Ekici et al., 2008) was used for the initial UniProt BLAST protein homology search (http://www.uniprot. org/blast/) (UniProt, 2010). In the next step, 33 subtilisin E protein sequences (the first three amino acids prior to histidine (H) as the first highly conserved amino acid), including subtilisin E from B. subtilis subsp. subtilis str. 168 (UniProt ID: P04189), were aligned using the Clustal Omega algorithm to create a multiple sequence alignment (MSA) within the Jalview open source program (version 2.10.3) (http://www.jalview.org/) (Waterhouse et al., 2009;Zolfaghari Emameh et al., 2014). Subtilisin E from the fungus Dactylellina haptotyla (UniProt ID: B1NNU1) was applied as the query in the analysis.

Substrate cleavage sites of subtilisin E
Prediction of substrate cleavage sites of subtilisin E from B. subtilis subsp. subtilis str. 168 (UniProt ID: P04189) was conducted using the protease specificity prediction server (PROSPER), (https://prosper.erc.monash.edu.au/home.html) (Song et al., 2012). PROSPER is an integrated feature-based server for prediction of subtilisin Es' cleaving properties regarding various proteinaceous substrates. The prediction mechanism of PROSPER is based on multiple factors, such as the secondary structure of substrate and solvent accessibility. For this prediction, bi-profile Bayesian features were applied to enable PROSPER to predict subtilisin E cleavage sites through four different combinations, including bi-profile Bayesian amino acid, disordered, secondary structure, and solvent accessibility profiles.

Effect of point mutations on the stability of subtilisin E
Predicting the stability of the crystal and 3D-visualized structure of subtilisin E (PDB ID: 1SCJ) revealed that the subtilisin E-propeptide complex includes two chains or asymmetric units: subtilisin E domain (A) and E-propeptide (B), which are autoprocessed by the enzyme cleavage activity (Jain et al., 1998;Sehnal et al., 2017) (Figure S1, supplementary material). In the present analysis, the sensitivity of point mutations and their stabilization were predicted at pH 7.0 using the MAESTRO (https://biwww.che.sbg.ac.at/maestro/web) web tool (Laimer et al., 2015). We performed this prediction to evaluate the effect of point mutations on the stability of subtilisin E. MAESTRO, as a multi-agent machine learning system provides the following: (1) predicted DDG (DDG pred (kcal/ mol): values of changes in standard binding free energy upon mutations), where DDG pred < 0.0 and DDG pred > 0.0 reveal the stabilizing and destabilizing mutations, respectively; (2) confidence estimation (C pred ); and (3) disulfide bond score (S ss ).

Genomic location of subtilisin E (aprE) gene
The genomic location of B. subtilis subsp. subtilis str. 168 subtilisin E (aprE) gene was identified using the Jena Prokaryote Genome Viewer (JPGV) (http://jpgv.fli-leibniz.de/cgi/index.pl) (Romualdi et al., 2007). JPGV is a freely accessible web tool which provides various informative data on sequenced prokaryotic genomes and allows visualization of genome plots with the exact location of individual genes. Moreover, this web tool can be used in horizontal gene transfer (HGT) and evolutionary studies to identify the location of genes on mobile genetic elements (MGEs) (Zolfaghari Emameh et al., 2018;Zolfaghari et al., 2016).

Phylogenetic analysis
A total of 33 subtilisin E sequences were retrieved from the UniProt database (Zolfaghari Emameh et al., 2020). The phylogenetic tree was constructed by using the Molecular Evolutionary Genetics Analysis (MEGA) software (version 7.0) (http://www.megasoftware.net/) (Kumar et al., 2016) through the maximum likelihood method. This method applies Bayesian inference to predict relative branch-length differences and presents a high-quality tree. In the current analysis, the alignment transformation environment (ALTER) tool (http://sing.ei.uvigo.es/ALTER/) (Glez-Pena et al., 2010) was used to convert MSA of the coding DNA sequences (CDSs) into an aligned codon. Subtilisin E from D. haptotyla (UniProt ID: B1NNU1) was included in the analysis as an outgroup.

Prediction of transcription binding factors (TBFs)
The subtilisin E (aprE) gene from B. subtilis subsp. subtilis str. 168 was evaluated by DBTBS (http://dbtbs.hgc.jp/) (Tjalsma et al., 2000), which is a reference database used for the study of transcriptional regulation in B. subtilis based on characterized TBFs, targeted genes for regulation, and motifs recognized by transcription factors (TFs). The latest DBTBS release (release 5.0) is capable of providing some updated information, including number and name of TFs, position-specific scoring matrices, promoters, regulated operons, and terminators. The release 5.0 of the database predicted TBFs for the aprE gene among 44 binding factor candidates. In addition, predictions were evaluated by the Bacillus gene Regulatory Network (BacillusRegNet) (http://vm-bacillusregnet.bioswarm.net/v1/ index.html) (Misirli et al., 2014). The database is supported with a version containing experimental results, which provides a platform for validation of TBF predictions.

Structural analysis of the subtilisin E
The MSA analysis of 33 subtilisin E protein sequences, including that from B. subtilis subsp. subtilis str. 168, showed that all sequences contain three highly conserved amino acids: histidine (H125), aspartic acid (D146), and serine (S231) (Figure 1).

Cleavage sites in the substrates of subtilisin E
Prediction of cleavage sites of subtilisin E from B. subtilis subsp. subtilis str. 168 in keratin and keratin-associated proteins from different hosts are shown in Figure 2 and Table 1.

Effect of point mutations on the stability of subtilisin E
The stability prediction of the crystal structure of subtilisin E (PDB ID: 1SCJ) revealed that point mutations can increase the stability of subtilisin E protein structure. The stabilizing mutations (DDG pred < 0.0) and confidence estimation (C pred ) for both chains A and B of 1SCJ are shown in Supplementary data Tables S1 and S2, respectively. This evaluation revealed that the point mutations leading to changes of asparagine 123 (N123) to valine (V), isoleucine (I), and leucine (L) with DDGpred ¼ À3.059, À2.978, and À2.967, respectively, are the most stabilizing point mutations in chain A of subtilisin E. In addition, the point mutations causing the change of serine 331 (S331) to leucine (L), methionine (M), and isoleucine (I) with DDGpred ¼ À1.697, À1.607, and À1.505, respectively, are the most stabilizing point mutations in chain B of subtilisin E.

Genomic location of subtilisin E (aprE) gene
Analysis of the genomic location of the subtilisin E-coding gene (aprE) revealed that the aprE gene is located on the genome of B. subtilis subsp. subtilis str. 168 between 1,104,423 and 1,105,568 nucleotides (1,146 bp) (Figure 3). This genomic visualization showed that the aprE gene is expressed from a SigA (rA) promoter. Moreover, the sporulation genes, including yhfN and yhfM, are located upstream of the aprE gene, whereas yhfO, yhfP, and yhfQ are downstream of the aprE gene. Both aprE and yhfN are located in a single operon, so the yhfN gene is expressed through the rA promoter, similarly as the aprE gene.

Phylogenetic analysis
The phylogenetic results pf this study with 100% data coverage for internal nodes revealed that the structure of subtilisin E from B. subtilis is highly conserved among various Bacillus spp. It is most likely due to aprE gene duplications, which have evolved in Bacillus spp. during the evolution of subtilisin E. Therefore, a high degree of similarity is observed between subtilisin E of B. subtilis subsp. subtilis str. 168 and subtilisin E of Bacillus spp. from other branches of the phylogenetic tree. The analysis showed that aprE genes from B. subtilis subsp. subtilis str. 168 and B. mojavensis have a common ancestor (Figure 4).

Prediction of TBFs
Prediction of TBFs of the aprE gene identified five binding factors, namely AbrB, DegU, Hpr, rA, and SinR, which actively bind to the relevant positions and act as negative or positive regulators, or recognize the aprE gene promoter (Table 2). Among these TBFs, rA and DegU bind to the promoter regions as positive regulators, while phosphorylated Spo0A acts as a negative regulator and dephosphorylated DegU acts as a positive regulator of AbrB, ScoC, and SinR ( Figure 5). The validation of TBF predictions revealed that in addition to AbrB, ScoC, and SinR, other TBFs including TenA, rA, and DegU are involved in the regulation of the aprE gene (Table 3).

Discussion
Suitable production of recombinant subtilisin E needs identification of genomic and proteomic properties of the coding gene and amino acid structure of the protein through various omics approaches.
The MSA analysis of subtilisin E protein sequences revealed that three amino acids including histidine (H125), aspartic acid (D146), and serine (S231) are highly conserved in all Bacillus spp. as well as B. subtilis subsp. subtilis str. 168. Moreover, the results from the prediction of cleavage sites in the keratin and keratin-associated proteins indicated that this enzyme cleaves the binding sites between hydrophobic [valine (V), isoleucine (I), glycine (G), and proline (P)] and polar [glutamine (G), asparagine (N), serine (S), threonine (T), cysteine (C), and tryptophan (W)] amino acids. In addition, this analysis revealed that subtilisin E cleaves the hair keratins of Homo sapiens (human) and Bos taurus (bovine) as well as wool keratin of Capra hircus (goat) to further segments. Therefore, the hair and wool from human, bovine, and goat sources can be more suitable substrates for subtilisin E.
Furthermore, the present study on point mutations demonstrated that the most stabilizing mutations occurred by changing asparagine 123 (N123) to valine (V), lysine (L), and isoleucine (I) in chain A, and serine 331(S331) to lysine (L), methionine (M), and isoleucine (I) in chain B. This result indicated that the bond between two hydrophobic residues translocates the enzyme cleavage site from the surface to the core and generates a more stabilized protein structure. Moreover, the two chains of subtilisin E differ by biological function. Chain A of the subtilisin E protein exhibits enzymatic activity, whereas chain B functions as an intramolecular chaperon (Jain et al., 1998). Thus, we hypothesize that chain A could be the relevant target of further studies involving cleavage and site-directed mutagenesis leading to improved protein stability and higher yield.
Based on the genomic location of the sporulation genes, such as yhfN, yhfM, yhfO, yhfP, and yhfQ, downstream and upstream of the aprE gene, the rA promoter can be employed for both aprE and yhfN genes, and the intergenic region between the aprE and yfhN genes. Therefore, an association between the expression of the aprE gene and sporulation genes has been speculated, emphasizing the regulatory role of sporulation genes in the expression of the aprE gene (Hertel et al., 2017). This result is consistent with our previous experiments, where the expression of protease in an asporogenic B. cereus strain was activated by the induction of the sucrose promoter of the sacPA operon and expression of the sigma-H and spo0B gene products (Ahmadpour & Yakhchali, 2017). The present phylogenetic analysis showed the same origin for the coding genes of    provide an in vitro opportunity to overexpress the aprE gene through activation of DegU and rA, and thereafter increase the production yield of subtilisin E.

Conclusions
Several genomic and structural factors, including single mutations and TBFs, may have crucial roles in the expression and stability of the recombinant subtilisin E. Hence, the application of genetic and protein engineering as well as system biology techniques can potentially improve the stability of Bacillus spp. in both bench and large-scale production.