figshare
Browse
12864_2023_9901_MOESM1_ESM.pdf (3.43 MB)

Additional file 1 of Inferring single-cell copy number profiles through cross-cell segmentation of read counts

Download (3.43 MB)
journal contribution
posted on 2024-01-03, 05:08 authored by Furui Liu, Fangyuan Shi, Zhenhua Yu
Additional file 1: Fig. S1. An example of segmentation results on simulated datasets. The top four subfigures depict LRC data of four tumor clones, and the bottom subfigure shows the learned latent sequence and segmentation results. Fig. S2. Analysis of the relationship between the ground truth average copy number and estimated baseline shift oof LRC signal. Data marked by blue, green and red are from tetraploid, triploid and diploid cells, respectively. Fig. S3. Comparison of cell ploidy estimation results between different methods. Average copy number (ACN) is calculated for each cell based on the inferred copy numberprofiles, and difference between the ground truth and inferred ACNs(denoted as ∆ACN) is analyzed for each method.Values of ∆ACN that are closer to 0 indicate better estimation results. Fig. S4. Analysis of the effect of bin size on copy number segments detection accuracy of DeepCNA. The values in {100kb, 200kb, 500kb, 1000kb} are tested for the bin size. Four performance metrics including mean of absolute distances (MAD), difference between the ground truth and inferred average copy numbers(denoted as ∆ACN), breakpoint distance and ratio between the number of inferred breakpoints and the number of real breakpoints(denoted as w)are calculated for comparison. Fig. S5. Analysis of the effect of latent dimensionality on breakpoint detection accuracy. The values in {1, 2, 3} are tested for the latent dimensionality d. Breakpoint distance and the metric ware calculated for comparison. Data points marked with an asteriskrepresent mean values. Fig. S6. Segmentation results of DeepCNA on three simulated datasets when the latent dimension is set to 2. The first dimension well captures most of the breakpoints, while sometimes underrepresents the breakpoints that may be shared by only a few cells, and these breakpoints are detectable from the second dimension. It is also observed that some breakpoints can be simultaneously detected from both dimensions. Fig. S7. Analysis of DeepCNA’ performance on different-sized datasets. The numbers of cells in {100, 200, 300} are tested. Four performance metrics including mean of absolute distances (MAD), difference between the ground truth and inferred average copy numbers(denoted as ∆ACN), breakpoint distance and ratio between the number of inferred breakpoints and the number of real breakpoints(denoted as w) are calculated for comparison. Fig. S8. Analysis of DeepCNA’ performance on datasetswith varying data heterogeneity. Thenumbers of cloneschanges from 4 to 8. Four performance metrics including mean of absolute distances (MAD), difference between the ground truth and inferred average copy numbers(denoted as ∆ACN), breakpoint distance and ratio between the number of inferred breakpoints and the number of real breakpoints(denoted as w) are calculated for comparison.Cdenotes thenumber of clones. Fig. S9. Analysis of DeepCNA’ performance on datasets with no normal cells mixed in the data. Four performance metrics including mean of absolute distances (MAD), difference between the ground truth and inferred average copy numbers(denoted as ∆ACN), breakpoint distance and ratio between the number of inferred breakpoints and the number of real breakpoints(denoted as w) are calculated for comparison. Fig. S10. Copy number estimation results on cell “SRR053675” from the breast cancer dataset. DeepCNA detects acopy number amplification on chromosome 7, while SCOPE, SCYN and rcCAE predict it as normal copy number, and their predictions deviate from the observed data.SCICoNE overestimates the cell ploidy. Copy number deletion, normal copy numberand copy number amplification are marked by green, blue and red, respectively. Fig. S11. Copy number estimation results on cell “SRR054618” from the breast cancer dataset. DeepCNA detects a smallcopy number segment on chromosome 7 that is missed by SCOPE, SCICoNE and rcCAE.Copy number deletion, normal copy number and copy number amplification are marked by green, blue and red, respectively. Fig. S12. Copy number estimation results on cell “SRR089402” from the breast cancer dataset. Existing methods except SCYN underestimate the number of breakpoints on chromosome 2q, thus yield biased predictions of the copy numbers.SCICoNE overestimates the cell ploidy.Copy number deletion, normal copy number and copy number amplification are marked by green, blue and red, respectively. Fig. S13. Copy number estimation results on cell “SRR054632” from the breast cancer dataset. SCOPE and SCICoNE predict asmall segmenton chromosome 6 to copy number amplification. rcCAE fails to find a small segment on chromosome 7. Copy number deletion, normal copy number and copy number amplification are marked by green, blue and red, respectively. Fig. S14. Copy number estimation results on the cell with barcode “AAGGCAGGTTCGCGTG” from the 10X Genomicsdataset. rcCAE’ predictions on chromosome 2 probably deviate fromthe ground truth. Copy number deletion, normal copy number and copy number amplification are marked by green, blue and red, respectively. Fig. S15. Comparison of Pearson correlation coefficients on the 10X Genomics dataset. By using CHISEL results as the baseline, Pearson correlation coefficientsare calculatedbased on copy numbers inferred byeach method. Fig. S16. An example of copy number estimation results of DeepCNA onthe BJ Fibroblast Euploid Cell Linedataset. DeepCNA predicts all cells as diploidy. Table S1. The parameters used to run each method on simulated datasets. Table S2. The runtime performance comparison results of all investigated methods. The results are obtainedfromsimulated datasets with varying number of cells.All the experiments are conducted on a computational server with64 CPU cores,128 GB RAM and 1 GeForce RTX 2080 Ti GPU.

Funding

Natural Science Foundation of Ningxia Province Key Research and Development Program of Ningxia

History

Usage metrics

    BMC Genomics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC