Database for kraken2 KpSC plasmid classifier
Plasmid-free chromosomal sequences and complete plasmid sequences used for constructing a Kraken2 database for the Klebsiella pneumoniae species complex (KpSC). This dataset contains 2,487 complete KpSC chromosomal sequences, representing all seven KpSC taxa, and 30,095 complete Enterobacteriaceae plasmid sequences. These can be found in the kraken2-sequences.tar.gz file.
The original database was developed by Gomi et al., 2021, which has been expanded to include more chromosomal and plasmid sequences using a similar inclusion criteria. Accessions and metadata for all sequences in this database are available in the kraken2-classifier-metadata.xlsx file. This database was designed to be used with the kraken2 software developed by Wood et al., 2019, with example code shown in the build-kraken2-database.sh file.
References:
- Gomi, R., Wyres, K.L., and Holt, K.E. (2021). Detection of plasmid contigs in draft genome assemblies using customized Kraken databases. Microb Genom 7. 10.1099/mgen.0.000550.
- Wood, D.E., Lu, J., and Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257.