MCA DGE Data
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
MCA_500more_dge.rar: The raw digital expression matrix (dge) of more than 400,000 single cells sorted by tissues. All cells have more than 500 transcripts. The batch genes were not removed.
MCA_BatchRemove_dge.zip: The batch gene removed dge of more than 200,000 primary single cells sorted by tissues. Some tissues are not included due to relatively strong batch effects. This dataset can be used to make global tissue tSNE plot and do cross-tissue analysis.
MCA_CellAssignments.csv: The annotation of cells, which includes the cell names, cluster ID, belonged tissues, experimental batches and cell barcodes.
MCA_Figure2-batch-removed.txt.tar.gz: The batch removed dge of approximately 60,000 cells of high quality. 1500 cells were sampled from 43 tissues respectively. This sampled data is used for Figure 2.
MCA_Figure2_Cell.info.xlsx: The annotations of cells used in Figure2.
Sheet1: The annotations of each cell used in Figure2, including cell names, cluster ID, belonged tissues.
Sheet2: The annotations of 98 clusters in Figure2.
Sheet3: The composition of cell numbers in 98 clusters and 43 tissues.
MCA_Batch Information.xlsx: The batch information, which includes the age and gender of the mouse, and experiment batches for MCA data.
MCA_BatchRemoved_Merge_dge.h5ad：The updated dge with batch gene removed. It can be read with scanpy python package. About 333778 cells are included.
MCA_BatchRemoved_Merge_dge_cellinfo.csv: The cell information of MCA_BatchRemoved_Merge_dge.h5ad.
Batch effect removal
For cross tissue comparison, we removed the batch gene background to improve presentation. We assume that for each batch of experiment, the cell barcodes with less than 500UMI correspond to the empty beads exposed free RNA during the cell lysis, RNA capture and washing steps. The batch gene background value is defined as the average gene detection for all cellular barcodes with less than 500 UMI, multiplied by a coefficient of 2, and then rounded to the nearest integer. Genes detected in less 25% of all cells are removed from the batch gene background list. We subtract the batch gene background for each cell from the digital expression matrix before making the cross tissue comparison figures.