2020-08-07T09:50:09Z (GMT) by Chiara Vanni Antonio Fernandez-Guerra
The agnostosDB (dbf02445-20200519) is a comprehensive dataset of microbial gene clusters (GCs) from genomes and metagenomes. It contains 5,287,759 GCs and more than 280M genes coming from the bacterial and archaeal Genome Taxonomy Database (GTDB) genomes, and from five large-scale metagenomic projects: 583 marine metagenomes from Tara Oceans expedition (TARA), Malaspina expedition, Ocean Sampling Day (OSD), Global Ocean Sampling Expedition (GOS), complemented with 1,246 metagenomes from the Human Microbiome Project (HMP) phase I and II.
The dataset is described in Vanni et al. 2020.
Additional and more detailed information about the dataset creation and some of its applications, can be found at
Related to the agnostosDB is the agnostos-wf, a snakemake workflow stored in the GitHub repository
The agnostos-wf allows to search the agnostosDB gene cluster HMM profiles and/or to integrate new sequence data (genes/contigs) in it.