Published on by Simon Rasmussen
Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains a challenging task. Here we use recent advances in deep learning to develop Variational Autoencoders for Metagenomic Binning (VAMB), a program that uses deep variational autoencoders to encode sequence co-abundance and k-mer distribution information prior to clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any prior knowledge of the datasets. VAMB outperforms existing state-of-the-art binners on contig datasets, reconstructing 29–98% more near complete draft genomes. We employed VAMB in a novel multi-split workflow, that enables assembly of 28–105% more strains compared to using VAMB with the commonly used single sample binning strategy, and also enables direct high-resolution taxonomic profiling across samples. Finally, to demonstrate the scalability of our method, we bin a human gut microbiome dataset with 1,000 samples. We reconstruct 45% more near-complete bins compared to state-of-the-art methods, while consuming fewer computational resources. VAMB can be run on standard hardware and is freely available at Paper:

Cite items from this project

cite all items