To test the accuracy of MLST applications, we have constructed two datasets of simulated reads in FASTQ format. The first has perfect reads over the MLST genes, plus a flanking region (based on the Salmonella Typhi CT18 reference) in varying levels of coverage from 1 to 30. This allows for us to see at what point each software application can accurately detect an allele.
The second dataset is similar to the first, but contains
2 Salmonella samples Salmonella Typhi CT18 and Salmonella Weltevreden, with the samples mixed in varying ratios. This allows us to see at what point software applications detect that there is a mixed allele/contamination.
Funding
This work was supported by the Wellcome Trust (grant WT 098051)