Supporting datasets for Comparison of Multi-locus Sequence Typing software for next generation sequencing data

dataset

posted on 2017-02-01, 14:12 authored by Andrew PageAndrew Page

To test the accuracy of MLST applications, we have constructed two datasets of simulated reads in FASTQ format. The first has perfect reads over the MLST genes, plus a flanking region (based on the Salmonella Typhi CT18 reference) in varying levels of coverage from 1 to 30. This allows for us to see at what point each software application can accurately detect an allele.

The second dataset is similar to the first, but contains

2 Salmonella samples Salmonella Typhi CT18 and Salmonella Weltevreden, with the samples mixed in varying ratios. This allows us to see at what point software applications detect that there is a mixed allele/contamination.

Funding

This work was supported by the Wellcome Trust (grant WT 098051)

History

Usage metrics

Keywords

MLST Salmonella fastq files coverage contamination Bioinformatics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM