40246_2017_118_MOESM3_ESM.pdf (945.12 kB)
Download file

Additional file 3: Figure S1. of Evaluating somatic tumor mutation detection without matched normal samples

Download (945.12 kB)
journal contribution
posted on 04.09.2017, 05:00 authored by Jamie Teer, Yonghong Zhang, Lu Chen, Eric Welsh, W. Cress, Steven Eschrich, Anders Berglund
Large tumor dataset quality control metrics. A. Principal component analysis and B. loadings using sequencing metrics. Colors in A. represent the different tissue sites of origin. C. Ratio of sequence reads aligning to the X and Y chromosome and cutoffs used to infer gender. D. Histogram of average coverage over targeted bases (filtered, aligned reads). Figure S2: VQSR filtering effects on tumor-only mutation detection. A. Fraction of total putative TGS mutations falling in each GATK VQSR tranche (PASS being the most specific, SNPto100 being the least specific). B. Fraction of TGS mutations seen in COSMIC more than five times falling into each VQSR tranche. C. Fraction of total putative WES mutations falling in each GATK VQSR tranche (PASS being most specific, SNPto100 being least specific). D. Fraction of WES mutations seen in COSMIC more than five times falling into each VQSR tranche. Figure S3: Mutation counts after filtering with additional population databases. Boxplots showing numbers of mutations detected after filtering with KAVIAR, ExAC, or both (excluding AF ≥ 1%) in addition to 1000 Genomes and ESP. The rightmost columns show the minimal effect of filtering with KAVIAR and ExAC after the normal filter has been applied. A. TGS cohort, B. WES cohort. Median counts are indicated by the dark line in the middle of the box. The bottom and top of the box are the first and third quartiles, respectively. The whiskers represent the most extreme points within 1.5 times the interquartile range. The y-axes are in the log scale. Figure S4: Normal pool features affect the ability to remove variants. Boxplots showing the putative mutation counts after filtering with titrated sample counts in the normal pool for A. TGS cohort, B. WES cohort. Figure S5: Total nonref counts, precision, and recall with subsequent filters. Total nonref counts (left), precision compare to MuTect (middle), and recall compared to MuTect (right) for A. TGS and B. WES. All plots are in a linear scale. Figure S6: Precision-recall curve. Plot showing approximate precision vs recall for A. TGS and B. WES. Data point circles are area-proportional to the number of putative mutations at each filter level. Note the largest circle across the middle of the plots corresponds to precision = 0, recall = 1. Also note that data point circle sizes are scaled to fit, and the scaling factors are different for TGS and WES. The red line indicates performance of the random classifier based on positives (median number of MuTect call)/total positions (targeted bases). Figure S7: Overlap between mutation calls across four methods. Non-area-proportional Venn diagram showing median mutation counts across samples called by each combination of methods. The bold underlined values are the intersection of all the four methods. The underlined values are the counts unique to each method. A. TGS (TCC) cohort and B. WES (TCGA) cohort. Figure S8: Overlap between mutation calls across three matched tumor/normal methods. Non-area-proportional Venn diagram showing median mutation counts across samples called by each combination of methods. The bold underlined values are the intersection of all the three methods. The underlined values are the counts unique to each method. A. TGS (TCC) cohort and B. WES (TCGA) cohort. (PDF 945 kb)

Funding

National Cancer Institute

History