figshare
Browse
Proteome database of E. coli.xlsx (2.65 MB)

Proteome database of Escherichia coli K-12

Download (2.65 MB)
dataset
posted on 2019-04-26, 06:29 authored by Wenfa NgWenfa Ng
Ensemble of proteins in a bacterial species hold relevance for understanding the biochemical and metabolic activities of a cell at a global level. To this end, proteomics technologies have opened a window into the inner workings of cells, allowing rational design changes to be implemented at the cellular level for overproduction of specified metabolite through heterologous expression of particular genes or pathways. Such omics technologies generally generate large amount of information that requires bioinformatic approaches for inferring biological meaning. For example, useful information such as protein name and amino acid sequence information needs to be extracted from proteome data file automatically through bioinformatic tools. Hence, this work uses an in-house MATLAB function to construct a proteome database of Escherichia coli K-12. Information encapsulated in the proteome database include protein name, amino acid sequence, number of residues in protein, molecular weight of protein and nucleotide sequence of protein. In particular, protein name and amino acid sequence information are extracted from the original fasta proteome file, while number of amino acid residues, molecular weight and nucleotide sequence of each protein in the proteome are calculated using built-in functions in MATLAB. Collectively, the proteome database of E. coli K-12 should find use in diverse biology and biotechnology applications ranging from understanding the molecular weight of individual proteins to synthesizing a gene in molecular cloning workflow.

Funding

No funding was used in this work.

History