The impact of filtering reads on Kraken2 and Bracken for abundance profiling accuracy.
(a) Abundance profiling accuracy by normalized Hellinger distance (lower means more accurate) of two ways of running Kraken2 and Bracken on Illumina and PacBio reads from three mock microbial communities (50 known, 100 mixed, and 50 novel genomes). Dashed lines correspond to using filtered reads, and solid lines correspond to using all (unfiltered) reads. (b) Scatter plot of species-specific abundance estimation errors (PacBio reads) to corresponding genome sizes for 50 known genomes of Bracken and Kraken2 using filtered or all reads as inputs. The estimation error for each taxon is calculated as the fractional difference between its estimated abundance and the reference abundance (y-axis). A Robust Linear Model with Huber Loss [44] was used to fit a regression line for each method. The shaded area around each fitted line represents a 95% confidence interval of the corresponding method.