figshare
Browse
12711_2024_875_MOESM1_ESM.xlsx (67.07 kB)

Additional file 1 of A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy

Download (67.07 kB)
dataset
posted on 2024-01-13, 04:40 authored by David Wragg, Wengang Zhang, Sarah Peterson, Murthy Yerramilli, Richard Mellanby, Jeffrey J. Schoenebeck, Dylan N. Clements
Additional file 1: Table S1. Sequencing coverage summary statistics. Sample ID, gender, coat colour, and sequencing summary statistics for each library sequenced. Table S2. Summary of variant records in VCF files following each filtering step described in Additional file 2: Figure S1. Breakdown of variants tagged with different filtering criteria, and subsequently remaining after each filtering step. Table S3. Reference panels for phasing and imputation. List of dog ID and breeds comprising the full reference panel, with those included in panels 1 and 2 indicated, respectively. Table S4. Summary of variant counts binned by allele frequency, those reported with different depths are derived from the full panel. Breakdown of variant counts per allele frequency bin for the full reference panel, and subset panels 1 and 2. Also provided are variant counts in the imputed data at each sequencing depth based on using the full reference panel for phasing and imputation. Table S5. Linear model coefficient estimates after fitting imputed genotype r2 to MAF, with depth and panel as covariates: lm(r2 ~ MAF + depth + panel). Summary table of coefficient estimates after fitting a linear model of imputed genotype r2 to MAF, including depth and reference panel as covariates. Table S6. Linear model coefficient estimates after fitting SNV imputed dosage r2 to depth, with panel as a covariate: lm(r2 ~ depth + panel). Summary table of coefficient estimates after fitting a linear model of SNV imputed dosage r2 to depth, including reference panel as a covariate. Table S7. Summary of mismatch rates for SNVs arising from different reference panels at each depth of coverage tested. Table of mismatch rates for homozygous reference (RR), heterozygous reference (RA), and homozygous alternate (AA) genotypes of SNVs when comparing each depth of coverage to the genotypes from the 43.5X depth dataset, derived from imputation with each of the reference panels. Table S8. Linear model coefficient estimates after fitting paired Wilcox test p values from comparing region and chromosome depth to region median haplotype count: lm(p ~ haps). Summary table of coefficient estimates after fitting linear model of paired Wilcox test p values from comparing region and chromosome depth to region median haplotype count.

Funding

IDEXX Laboratories Inc. Biotechnology and Biological Sciences Research Council

History