Zhang, Qingpeng Pell, Jason Canino-Koning, Rosangela Chuang Howe, Adina Brown, C. Titus Low-memory digital normalization. <p><b>The results of digitally normalizing a 5 m read </b><b><i>E. coli</i></b><b> data set (1.4 GB) to C = 20 with k = 20 under several memory usage/false positive rates. The false positive rate (column 1) is empirically determined. We measured reads remaining, number of “true” k-mers missing from the data at each step, and the number of total k-mers remaining. Note: at high false positive rates, reads are erroneously removed due to inflation of k-mer counts.</b></p> Computational biology;genome analysis;Genomic databases;genetics;genomics;molecular biology;Molecular biology techniques;Sequencing techniques;Genome sequencing;Sequence analysis;Computing methods;cloud computing;software engineering;Software tools 2014-07-25
    https://plos.figshare.com/articles/dataset/_Low_memory_digital_normalization_/1118681
10.1371/journal.pone.0101271.t004