A database of over 10,000 colicins from over 50 species of bacteria were collated from the ENA as well as including some isolates from previously published sources. A multi-FASTA file containing the collated colicin sequences was utilised to generate a custom database via the prepareref command of ARIBA v2.14.6 where prepareref removes erroneous data and runs cd-hit to cluster the sequences based on a user-defined similarity threshold (90% in our case). ARIBA was then run with the FASTQ files of all isolates and the colicin database to report which sequences were observed in each isolate.
Funding
Convergent evolution of Enterobacteriaceae in epidemiological networks with high antimicrobial use
Biotechnology and Biological Sciences Research Council