Effect of the depth of coverage on the assembly efficiency measured by NGA50 sizes based on randomly sub-sampled E. coli Sakai data sets.

posted on 08.09.2014, 03:10 by Sebastian Jünemann, Karola Prior, Andreas Albersmeier, Stefan Albaum, Jörn Kalinowski, Alexander Goesmann, Jens Stoye, Dag Harmsen

The coverage is referring to the average depth each genomic position is covered by the sequencing reads and not to the average depth of coverage the assemblies are actually reaching. The fitted average is, for each data set, the mean of all NGA50 lengths at each coverage fitted to a nonlinear local regression model. Sub-sampling was done in steps as a percentage of the original full sample size; hence, the x-axis ranges of the four sub-plots differ. The dotted vertical lines mark the finally used 40-fold (PGM 200 bp) and 75-fold coverage limits (PGM 400 bp, MiSeq 2×150 bp and MiSeq 2×250 bp).