Poster_Ponsero_Draft2_BH.pdf (323.66 kB)
Poster_Ponsero_ViralMetagenomics.pdf
Version 2 2020-04-14, 21:07
Version 1 2020-04-14, 21:06
poster
posted on 2020-04-14, 21:07 authored by Alise PonseroAlise PonseroModern ‘omics allow the exploration in situ of the relationship between phages and their hosts, and provide new insights about the impact of viral populations on numerous biological systems. Retrieving viral sequences from bacterial metagenomes is critical to understand host-viruses interactions in a given ecosystem. Homology to known viral genes is a primary method to retrieve viral contigs from complex metagenomes. However these approaches limit the discovery of novel viral sequences with no similarity to previously known viruses. Recently, VirFinder, a novel tool to detect viral sequences in bacterial metagenomes using a machine learning method, was released [Ren et al. 2017]. This method distinguishes viral from bacterial sequences based on their k-mer signatures, rather than through homology based searches to viral genes. However, because VirFinder relies on a model trained on viral and bacterial genomes from the RefSeq database, the tool shows a bias toward the detection of the most abundant viral groups in reference databases [Ren et al. 2017].
Viromes represent a large collection of viral sequences that are unbiased by cultivation methods and cover a wide variety of ecosystems. These sequences are a vast and interesting source of information about viral k-mer signatures. We present a scalable computational framework to train machine learning models directly on curated aquatic ecosystem-specific metagenomic contigs. This novel tool aims to ensure reliable viral sequence detection even in ecosystems less studied, where a smaller amount of phages have been previously isolated and sequenced, and provide the user with ecosystem specific predictions. Finally, our approach takes into account the possibility of eukaryotic contamination that is fundamental in various environments.
Viromes represent a large collection of viral sequences that are unbiased by cultivation methods and cover a wide variety of ecosystems. These sequences are a vast and interesting source of information about viral k-mer signatures. We present a scalable computational framework to train machine learning models directly on curated aquatic ecosystem-specific metagenomic contigs. This novel tool aims to ensure reliable viral sequence detection even in ecosystems less studied, where a smaller amount of phages have been previously isolated and sequenced, and provide the user with ecosystem specific predictions. Finally, our approach takes into account the possibility of eukaryotic contamination that is fundamental in various environments.