Proteins and sequences from the MicrobesOnline database
This figshare includes scaffold sequences, genes, and protein sequences from MicrobesOnline, a database of (mostly) bacterial and archaeal genomes. The data is provided as a sqlite3 database (gzipped). The tables included are:
Taxonomy -- 1 row per genome
Scaffold -- scaffolds for each genome
ScaffoldSeq -- sequences for each scaffold. (Sequences for Arabidopsis thaliana chromosomes were omitted due to their length.)
Locus -- genes for each scaffold
Position -- positions of genes on scaffolds
AASeq -- protein sequences
Synonym -- gene names
Description -- gene descriptions
TaxParentChild -- parent-child relationships between taxonomic groups
These are a subset of all the tables in MicrobesOnline. More detailed documentation is available of MicrobesOnline's schema is available at
http://www.microbesonline.org/programmers.html