figshare
Browse

Single-cell transcriptional mapping reveals genetic and non-genetic determinants of aberrant differentiation in AML - Datasets

dataset
posted on 2025-10-26, 02:23 authored by Andy ZengAndy Zeng
<p dir="ltr">This includes datasets that were generated as part of this paper. Datasets are split into sections for different analyses of the paper, each provided in different directories. I will outline them here: </p><h3><u>Establishing and Characterizing BoneMarrowMap</u></h3><p dir="ltr"><b>Folder: 0_Reference_atlas_setup</b></p><p dir="ltr">These datasets pertain to the characterized BoneMarrowMap resource:</p><ul><li>The annotated Seurat object is provided as [<u>BoneMarrowMap_Annotated_Dataset.rds</u>]</li><li>The reference object for query mapping using Symphony, along with the reference UMAP model, are provided as [<u>BoneMarrowMap_SymphonyReference.rds</u>] and [<u>BoneMarrowMap_uwot_model.uwot</u>]<br></li><li>The 55 cell types annotated in BoneMarrowMap are outlined through a high quality PNG in [<u>0_BoneMarrowMap_Annotated_cropped.png</u>] and their unabbreviated names are outlined in [<u>1_BMRef_CellType_Annotations.csv</u>].</li><li>The top marker genes for each cell type are shared in [<u>2_BoneMarrowMap_CellType_MarkerGenes_Top100.csv</u>] and provided as a gmt file for GSEA and signature scoring in [<u>2_BoneMarrowMap_MarkerGenes_byCellType.gmt</u>].<br></li><li>The top TF regulons for each cell type inferred by pySCENIC are provided in [<u>2_CellType_Markers_TFregulon_AUC_Global.csv</u>]</li><li>Unsupervised gene expression programs identified by consensus NMF are provided in [<u>4_BMMap_cNMF_top100_genes_S7.csv</u>]</li></ul><h3><u>Normal Hematopoietic Projections</u></h3><p dir="ltr"><b>Folder: </b><b>1_Normal_projections</b></p><p dir="ltr">The Flowjo and FCS files for a CD38 index sorting experiment within immunophenotypic GMPs and MEPs are provided in [<u>CD38_IndexSort_Flowjo_FCS.zip</u>]</p><h3><u>Leukemia Projections</u></h3><p dir="ltr"><b>Folder: 2_Leukemia</b><b>_projections</b></p><p dir="ltr">These datasets pertain to the projected AML datasets:</p><ul><li>The annotated Seurat object of the newly generated MLL-SJCRH scRNA-seq dataset including AML and AEL samples is provided as [<u>MLL_SJCRH_scAML_AEL_processed.rds</u>]</li><li>Figures depicting projection results for 318 patient samples from 21 AML/MPAL cohorts are provided in [<u>scAML_21cohort_ProjectionResults_bySample.zip</u>]</li></ul><h3><u>AML Composition Analysis</u></h3><p dir="ltr"><b>Folder: 3_AML_composition_analysis</b></p><p dir="ltr">These datasets pertain to composition analysis of the projected AML datasets:</p><ul><li>Cell-level Annotations for >1.2 million cells projected across 318 patient samples, including cell state classification, pseudotime estimates, and LSC signature scores, are provided in [<u>scAML_21cohorts_Annotations_byCell.csv.zip</u>]</li><li>Composition data for 318 patient samples across 38 cell states (excluding mature lymphoid and stromal cells), depicted as the number of cells mapping to each state within each patient sample, are provided in [<u>scAML_patient_composition_filtered_noBTNKPlasmaStromal.csv</u>]</li><li>Integrated patient-level clinical annotations from the 21 studies, including our internal MLL-SJCRH cohort, along with standardized abundance values for each differentiation stage, are provided in [<u>scAML_21cohort_patient_Annotations_withDiffStage.csv</u>]<br></li><li>For each differentiation stage within each patient sample, single cells were pooled together into pseudo-bulks for easier characterization. These are provided as a Seurat object in [<u>scAML_Leukemia_318samples_DiffStagePseudobulks_rawcounts.rds</u>]</li><li>Differential expression results performed on this pseudobulk data, comparing each differentiation stage against all others while controlling for originating study using DESeq2, are provided in [<u>scAML_DifferentiationStage_DE_results.csv</u>].</li><li>The top 100 marker genes for each of these AML Differentiation Stages, as well as marker genes for LSPC populations from Zeng 2022 and marker genes for functional LSC populations from Ng 2016 and Somervaille 2009, are provided as a gmt file for GSEA and signature scoring here: [<u>scAML_Differentiation_Stage_MarkerGenes.gmt</u>]</li></ul><h3><u>AML Differentiation Stage Quantification</u></h3><p dir="ltr"><b>Folder: 4_AML_DiffStage_Quantification</b></p><p dir="ltr">These datasets pertain to quantification of AML Differentiation Stages in bulk RNA-seq:</p><ul><li>Linear equations for calculating AML Differentiation Stage abundances in normalized bulk RNA-seq data (e.g. logCPM) from a total of 400 genes are provided in [<u>scAML_DifferentiationStage_LASSO_ModelWeights.csv</u>]<br></li><li>Quantified AML Differentiation Stage abundance across five AML cohorts totalling 1224 patients compiled by Severens 2024, along with updated clinical annotations, are provided in [<u>BulkAML_FiveCohorts_Severens2024_DiffStageScored_annotated.csv</u>] and as a Seurat object with Combat-seq corrected gene expression counts in [<u>BulkAML_FiveCohorts_Severens2024_RNAseqCounts_DiffStageScored.rds</u>]</li><li>Results of genotype-to-phenotype mapping of genetic alterations and AML differentiation stage abundance from the 1224 patient cohort are provided in [<u>BulkAML_FiveCohorts_Genotype_to_Phenotype_Associations.csv</u>]<br></li><li>Quantified AML Differentiation Stage abundance within a 136-patient erythroleukemia cohort from Iacobucci 2019, along with clinical annotations, are provided in [<u>BulkAEL_Iacobucci2019_DiffStageScored_annotated.csv</u>] and as a Seurat object with gene expression counts in [<u>BulkAEL_Iacobucci2019_RNAseqCounts_DiffStageScored.rds</u>]</li></ul><h3><u>AML Subclone Analysis</u></h3><p dir="ltr"><b>Folder: 5_AML_subclone_analysis</b></p><p dir="ltr">These datasets pertain to sub-clonal analysis of MLL-SJCRH samples:</p><p dir="ltr">Single cell genotyping and concurrent imunophenotyping with Tapestri was performed on most samples from the MLL-SJCRH cohort. The genotyping panel is outlined in this file [<u>Tapestri_CO131_Panel_designSummary.tsv</u>] and a spreadsheet with Tapestri results for each patient sample are provided in [<u>MLL_SJCRH_scAML_AEL_Tapestri_results.xlsx</u>].<br>Results from expressed variant analysis with cbsniffer are provided in [<u>MLL_SJCRH_scAML_AEL__expressed_variants.zip</u>]</p><h3><u>KMT2A-rearranged AML Sub-clustering</u></h3><p dir="ltr"><b>Folder: 6_KMT2A_subclustering</b></p><p dir="ltr">Annotations of KMT2A-r subgroups are provided in the updated clinical annotations of the 1224 patient bulk RNA-seq analysis outlined in section 4 (AML Differentiation Stage Quantification)</p><h3><u>Co-existing LSC analysis</u></h3><p dir="ltr"><b>Folder: </b><b>7_coexisting_LSC_analysis</b></p><p dir="ltr">These datasets pertain to the co-existing LSC analysis:</p><ul><li>The Flowjo and FCS files from sample sorting on CD34 and CD38 are provided in [<u>Flowjo_FCS_coexistingLSC_Sort.zip</u>]</li><li>The clinical annotations for the two AML patients from Princess Margaret Hospital (PMH) in Toronto in this analysis are provided in [<u>PMH_coexisting_LSC_scAML_Annotations.csv</u>]<br></li><li>The annotated Seurat objects are also provided for each sample. This includes:</li><li><ul><li>AML90240 primary [<u>AML90240_primary_annotated.rds</u>] and xenograft [<u>AML90240_xeno_annotated.rds</u>] </li><li>AML90394 primary [<u>AML90394_primary_annotated.rds</u>] and xenograft [<u>AML90394_xeno_annotated.rds</u>] </li><li>Control mobilized Peripheral Blood from healthy donors for allogeneic transplant [Control_mPB_annotated.rds]</li></ul></li></ul><p dir="ltr"><br></p><p dir="ltr"><br></p><p dir="ltr"><br></p><p dir="ltr"><br></p><p dir="ltr"><br></p>

History