Mash sketched databases for: Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics
Accessible Reference Data for Genome-Based Taxonomy and Comparative Genomics is a set of mash sketched genomic data (*.msh)
To use it, you only have to download the dataset of your choice, and with the mash tool you can run genomic comparisons. Last update September 2021
Bacteria_Archaea_type_assembly_set.msh contain 17,442 type genomes from NCBI
Bacteria_Archaea_type_proteome_set.msh contain 12,767 type predicted proteomes from NCBI
GTDB_r202_assembly_set.msh contain 47,894 genomes from GTDB
Fungi_type_assembly_set.msh contain 801 type genomes from NCBI
Fungi_type_proteome_set.msh contain 248 type predicted proteomes from NCBI
Virus_Sept21_GenBank_assembly_set.msh contain 44,916 viral assemblies from NCBI
Soil_Metgenome_assembly_set.msh contain 479 soil metagenomes from NCBI
Freshwater_Metagenome_assembly_set.msh contain 611 freswater metagenomes from NCBI
Fungal_Database.2022_genomic.fna.gz.msh contain 4,293 filamentous and yeast-like fungal genomes (*new*)