Complete_Enterobacteriaceae_plasmids

Version 4 2017-02-23, 11:41

Version 3 2017-02-06, 21:20

Version 2 2017-02-02, 12:44

Version 1 2017-02-02, 12:24

dataset

posted on 2017-02-23, 11:41 authored by Alex OrlekAlex Orlek

This dataset comprises sequences of 2097 complete Enterobacteriaceae plasmids, curated following initial retrieval from the NCBI nucleotide database on 26th August 2016. The 2097 nucleotide sequences are provided as a FASTA file ('nucleotideseq.fa'). Corresponding protein sequences (n=12,582), generated by translating each plasmid in all 6 frames, are also provided ('translatedproteinseq.fa'). In addition, there are two zipped Genbank files providing more information on accessions. One contains the 2097 curated accessions; the other contains 6952 accessions that were obtained initially, prior to curation.

The protein dataset ('translatedproteinseq.fa') is a useful resource for MOB typing plasmids (a method of plasmid classification based on detection of relaxase proteins). To conduct MOB typing, download the protein dataset, as well as scripts provided in a related Figshare code repository: https://figshare.com/s/3f8973dea1fe03c4f62f

Further instructions can be found on the Github page referenced in the Description section of the Figshare code repository.

For more details about the dataset provided here, see the associated journal article: "A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database", Orlek et al. in press.