genomeRxiv is a newly-funded US-UK collaboration to provide a public,
web-accessible database of public genome sequences, accurately
catalogued and classified by whole-genome similarity independent of
their taxonomic affiliation. Our goal is to supply the basic and applied
research community with rapid, precise and accurate identification of
unknown isolates based on genome sequence alone, and with molecular
tools for environmental analysis.
The
DNA sequencing revolution enabled the use of cultured and uncultured
microorganism genomes for fast and precise identification. However,
precise identification is impossible without
1.
reference databases that precisely circumscribe classes of
microorganisms, and label these with their uniquely-shared
characteristics 2. fast algorithms that can handle the volumes of genome data
Our
approach integrates the highly-resolved classification framework of
Life Identification Numbers (LINs) with the speed and computational
efficiency of sourmash and k-mer hashing algorithms, and the precision
and filtering of average nucleotide identity (ANI). We aim to construct a
single genome-based indexing scheme that extends from phylum to strain,
enabling the unique and consistent placement of any sequenced
prokaryote genome.
genomeRxiv
includes protocols for confidentiality, allowing groups to identify and
announce the identities of newly-sequenced organisms without sharing
genome data directly. This protects communities working with
commercially- and ethically-sensitive organisms (e.g. production
engineering strains, potential bioweapons, and to enable benefit sharing
with indigenous communities).
genomeRxiv
will also provide online capability to design molecular diagnostic
tools for metabarcoding and qPCR, to enable tracking of specific
groupings of bacteria directly in the environment.
Funding
19-BBSRC-NSF/BIO genomeRxiv: a microbial whole-genome database & diagnostic marker design resource for classification, identification & data sharing
Biotechnology and Biological Sciences Research Council
BBSRC-NSF/BIO:Collaborative Research: genomeRxiv: a microbial whole-genome database and diagnostic marker design resource for classification, identification, and data sharing