full_col.fasta (11.62 MB)

colicin_database

dataset

posted on 2022-09-01, 14:40 authored by P. Malaka De SilvaP. Malaka De Silva, Rebecca Bennett, Lauriane Kuhn, Patryk Ngondo, Brian HoBrian Ho, Francois-Xavier Weill, Benoit Marteyn, Claire Jenkins, Kate BakerKate Baker

A database of over 10,000 colicins from over 50 species of bacteria were collated from the ENA as well as including some isolates from previously published sources. A multi-FASTA file containing the collated colicin sequences was utilised to generate a custom database via the prepareref command of ARIBA v2.14.6 where prepareref removes erroneous data and runs cd-hit to cluster the sequences based on a user-defined similarity threshold (90% in our case). ARIBA was then run with the FASTQ files of all isolates and the colicin database to report which sequences were observed in each isolate.