Contents

Abstract: EpiSCORE is an R-package for constructing a tissue-specific DNA methylation reference matrix that can be subsequently used in conjunction with a reference-based cell-type deconvolution algorithm to (i) obtain cell-type fraction estimates in a corresponding bulk-tissue sample for which a genome-wide DNAm profile exists, and (ii) to infer cell-type specific differential DNA methylation signals in the context of a general Epigenome-Wide-Association Study. EpiScore is aimed particularly at complex solid tissues, for which generating through experiment an appropriate DNAm reference matrices representing all the major cell-types within the tissue is not possible. EpiScore exploits the tissue-specific single-cell RNA-Sequencing atlases to construct corresponding tissue-specific DNA methylation references. EpiSCORE has been published in Genome Biology 2020 Teschendorff et al. (2020) .

1 Motivation and Background

There are many thousands of genome-wide DNA methylation (DNAm) profiles in the public domain, with the overwhelming majority of them derived from bulk tissue. For instance, The Cancer Genome Atlas (TCGA) has generated thousands of such profiles in cancer tissue encompassing all major cancer types. Because DNAm is highly cell-type specific, the interpretation of such DNAm data is severely hampered by the underlying cell-type heterogeneity. There is thus a need to dissect such heterogeneity in order to be able to infer the proportions of underlying cell-types within the tissue, and to infer in which of these cell-types DNAm changes in relation to specific exposures and phenotypes are occuring.

In the case of an easily accessible tissue like blood, it has been possible to generate appropriate DNAm reference matrices, i.e. representative DNAm profiles for the main cell-types within the tissue. For blood, this is relatively easy to accomplish since cell surface markers characterising the major cell-types (e.g. neutrophils, monocytes, T-cells, B-cells) are known, and so these can be used via FACS sorting to create purified samples, for which DNAm profiles can then be generated. A DNAm reference for blood would then be constructed by identifying CpGs that are differentially methylated between the major cell-types, and building representative profiles for the cell-types over a relatively small number of such cell-type specific markers (typically on the order of a few hundreds). Such a DNAm reference matrix can then be used in conjunction with a reference-based cell-type deconvolution algorithm Houseman et al. (2012) to achieve the above-mentioned goals. For complex solid tissues, however, the main cell-types within a tissue are not always known, or they have not been fully characterized in terms of cell surface marker expression, making it difficult to generate DNAm profiles for purified samples. In principle, the same challenge applies to transcriptomic data, yet in the case of gene expression, single-cell technologies have allowed the construction of tissue-specific atlases, which encompass most major cell-types. Thus, in the case of gene expression it has been possible to construct tissue-specific mRNA expression reference matrices, which can subsequently be used for cell-type deconvolution. For DNAm however, generating reliable and high-coverage single-cell methylomes for large numbers of cells and samples is not yet possible. Thus, for DNAm we require an orthogonal, computational based approach, in order to construct tissue-specific DNAm reference matrices.

EpiSCORE constructs a tissue-specific DNAm reference matrix by first building a corresponding tissue-specific expression reference matrix from the single-cell RNA-Seq atlases as generated by the Human and Mouse Cell Atlas Projects (HCA/MCA), and subsequently imputing DNAm values into this reference matrix. EpiScore achieves the imputation by using matched expression DNAm data from projects like the NIH Epigenomics Roadmap to identify marker genes in the expression reference matrix for which DNAm at their regulatory elements (we shall focus on gene promoters) is predictable from their expression value. EpiSCORE relies on two assumptions (which turn out to be valid as shown in our manuscript). First, it assumes that there are genes in the genome for which promoter DNAm variation across different cell-types is predictable from corresponding variation in gene expression. We shall refer to these genes as “imputable genes”. Second, it assumes that marker genes that make up expression reference matrices will overlap sufficiently with such “imputable genes”. The subset of marker genes in the expression reference matrix that are imputable, will make up the DNAm reference matrix, and EpiSCORE is able to weight these imputable marker genes according to how well DNAm differences are informative of gene expression, thus allowing more informative genes to be more influential when subsequently infering cell-type proportions and cell-type specific differential DNAm signal.

Thus, EpiSCORE consists of 4 steps:

  1. Construction of a tissue-specific mRNA expression reference matrix.
  2. Identification of imputable genes.
  3. Development of a probabilistic imputation model and construction of DNAm reference matrix.
  4. Application of DNAm reference matrix (e.g. inferring cell-type fractions or identifying differentially methylated cell-types (DMCTs).

Because the purpose of this vignette is to show how to implement EpiSCORE, we shall focus on those tasks which need to be carried out whenever starting out with a new scRNA-Seq tissue atlas. That is, some of the steps in the algorithm are independent from the actual scRNA-Seq tissue atlas (e.g. step-2 and first part of step-3 above), and so these steps can be precomputed with the associated data loaded in whenever it is required. Thus, in this vignette we shall focus on the following steps:

  1. Construction and validation of a tissue-specific mRNA expression reference matrix.
  2. Imputation and validation of a corresponding tissue-specific DNAm reference matrix.
  3. Application to infer cell-type fractions and differentially methylated cell-types (DMCTs).

To illustrate the whole procedure, we shall focus on lung tissue, using the Smart-Seq2 scRNA-Seq from the Tabula Muris to construct the scRNA-Seq reference matrix. We shall then validate the expression reference matrix using the 10X scRNA-Seq data also from Tabula Muris. We next impute the DNAm reference matrix for lung-tissue, and apply it to the lung squamous cell carcinoma (LUSC) Illumina 450k DNAm dataset from the TCGA.

2 Tutorial Example

In order to run the tutorial we must first load in the necessary library and data objects:

2.1 Loading in the scRNA-Seq data

library(EpiSCORE);
## 
data(lungSS2mca1); ##loads in scRNA-Seq SmartSeq2 lung atlas
data(lung10Xmca1); ##loads in a subset of the scRNA-Seq 10X lung atlas for validation purpose
ls();
## [1] "celltype10X.idx" "celltypeSS2.idx" "celltypeSS2.v"   "lung10Xmca1.m"  
## [5] "lungSS2mca1.m"

We shall use the lungSS2mca1.m data matrix object which contains the scRNA-Seq lung atlas (SmartSeq2) from the Tabula Muris to first contruct the mRNA expression reference matrix. We shall do this for the four main cell-types in lung tissue: epithelial, endothelial, stromal (fibroblasts) and immune-cells. To see, how many of each there are:

ncpct.v <- summary(factor(celltypeSS2.idx));
names(ncpct.v) <- celltypeSS2.v;
print(ncpct.v);
##  Epi Endo  Fib   IC 
##  138  693  423  387

2.2 Building the expression reference matrix

In order to construct the expression reference matrix, we use the ConstExpRef function as follows:

expref.o <- ConstExpRef(lungSS2mca1.m,celltypeSS2.idx,celltypeSS2.v,markspecTH=rep(3,4));
## [1] "Finding marker genes"
## $Epi
## [1] 7100
## 
## $Endo
## [1] 9043
## 
## $Fib
## [1] 7352
## 
## $IC
## [1] 7421
## 
## [1] "Now compute marker specificity scores and filter markers"
## $Epi
## [1] 715
## 
## $Endo
## [1] 131
## 
## $Fib
## [1] 274
## 
## $IC
## [1] 172
## 
## [1] "Now construct reference"
print(dim(expref.o$ref$med));
## [1] 1292    4
head(expref.o$ref$med);
##         Epi Endo Fib       IC
## CORO1A    0    0   0 13.80338
## LAPTM5    0    0   0 13.02622
## ARHGDIB   0    0   0 13.17141
## LSP1      0    0   0 13.40256
## PTPRC     0    0   0 11.68256
## RAC2      0    0   0 12.56245

We can see that the reference matrix contains 1293 unique marker genes and that the first genes are markers for immune-cells as they are highly expressed in such cells, but not in the other cell-types. Indeed, these marker genes all have maximal specificity scores of 3, as required since the markspecTH.v vector was to be 3 for all cell-types. We can also see that for each cell-type there are at least 100 marker genes. We strongly recommend on the order of this number of marker genes for each cell-type, since for only about \(20\%\) of the marker genes it will be possible to impute DNAm values.

2.3 Validating the expression reference matrix

Let us now validate the constructed reference matrix, using the lung scRNA-Seq 10X dataset from Tabula Muris. We only use a subset of 49 cells from each of the same cell-types in the 10X set, because of file size restrictions. We consider 49 cells of each cell-type because this was the smallest number assayed in the 10X experiment across the 4 cell-types considered here. For the validation strategy, we aim to estimate for each single-cell in the 10X dataset, the proportions of the underlying cell-types as given by our expression reference matrix. If our reference matrix is correct, we would expect that cells annotated to given cell-types as done by the Tabula Muris consortium should be correctly predicted to be such cell-types by our expression reference matrix. To make the prediction we consider the cell-type attaining the largest estimated proportion. To estimate cell-type fractions we can use our epidish function, which is part of the EpiDISH Bioconductor package Teschendorff et al. (2017), in the robust partial correlation (RPC) mode. Thus, in what follows, we load in the 10X data and EpiDISH library, and subsequently estimate cell-type fractions:

data(lung10Xmca1);
library(EpiDISH);
estF.m <- epidish(lung10Xmca1.m,ref.m=expref.o$ref$med,method="RPC",maxit=200)$estF

To check that the majority of single-cells have been correctly classified, we can generate a series of boxplots

par(mfrow=c(1,4));
par(mar=c(4,4,2,1));
for(ct in 1:4){
boxplot(estF.m[,ct] ~ celltypeSS2.v[celltype10X.idx],ylab="EstFrac",xlab="10X cell-type",main=paste("EstFrac-",celltypeSS2.v[ct],sep=""),pch=23,ylim=c(0,1));
}

To compute the classification accuracy, we could run

pred.idx <- apply(estF.m,1,which.max);
acc <- length(which(pred.idx==celltype10X.idx))/length(celltype10X.idx);
print(paste("Overall accuracy=",round(acc,2),sep=""));
## [1] "Overall accuracy=0.98"

That is, the overall accuracy is close to \(100\%\), and so we can be satisfied that the expression reference matrix is reasonably accurate. And so, we are now ready to create a corresponding DNAm reference matrix for lung.

2.4 Building the DNAm reference matrix

Now that we have validated the lung-specific expression reference matrix, the next step is to generate a corresponding tissue-specific DNAm reference matrix. The underlying hypothesis is that a significant proportion of the marker genes in the expression reference exhibit variation in gene expression between the cell-types that can be explained, i.e. predicted to some degree of accuracy, by variation in DNAm at regulatory elements associated with the gene. We refer to genes for which DNAm at their regulatory element (and we shall be focusing on the gene promoter) is predictable from its expression level, as “imputable genes”. To find imputable genes, we first perform a genome-wide scan using independent datasets that have profiled mRNA expression and DNAm in a genome-wide manner for a reasonable number of samples, ideally, purified or relatively pure samples representing many different cell-types. Two such independent datasets are available, one from the NIH Epigenomics Roadmap (RMAP) Bernstein et al. (2010), and another from the Stem-Cell-Matrix Compendium-2 (SCM2) Nazor et al. (2012). The former is sequencing based involving WGBS and RNA-Seq data, whereas the SCM2-dataset is Illumina beadarray based. We provide the two matched datasets within the EpiSCORE package, together with a list of the imputable genes from each set. Thus, when building the DNAm reference matrix, we first build two separate DNAm reference matrices, one for each of the two databases:

refMscm2.m <- ImputeDNAmRef(expref.o$ref$med,db="SCM2",geneID="SYMBOL");
## 'select()' returned 1:1 mapping between keys and columns
refMrmap.m <- ImputeDNAmRef(expref.o$ref$med,db="RMAP",geneID="SYMBOL");
## 'select()' returned 1:1 mapping between keys and columns

In the above, we also specified that the gene identifier in the expression reference matrix is the official gene symbol, which we will then convert into Entrez Gene IDs. At present only gene symbol and Entrez gene IDs are supported. One could in principle use these separate DNAm reference matrix, but we will merge them, because for the overlapping genes in the two references, there is very strong correlation in DNAm patterns, suggesting that it makes sense to increase marker gene coverage by merging the two DNAm reference matrices. To merge, we run

refMmg.m <- ConstMergedDNAmRef(refMscm2.m,refMrmap.m);
print(dim(refMmg.m));
## [1] 258   5
head(refMmg.m);
##            Epi      Endo       Fib IC    weight
## 7805 0.8597258 0.8597258 0.8597258  0 0.8597258
## 397  0.9400000 0.9400000 0.9400000  0 0.9400000
## 5788 0.9416881 0.9416881 0.9416881  0 0.9416881
## 5880 0.2885714 0.2885714 0.2885714  0 0.2885714
## 9595 0.9589742 0.9589742 0.9589742  0 0.9589742
## 6404 0.8700614 0.8700614 0.8700614  0 0.8700614

From the output, we can see that the final DNAm reference matrix consists of 258 gene promoters. The DNAm values for some of the top genes in the matrix have been displayed (now with Entrez Gene IDs), showing how for these particular genes, the promoter DNAm value for the immune-cells is zero (because these genes are highly expressed in immune-cells), whilst the DNAm value for the other cell-types is close to 1. We note that these marker genes are not expressed in epithelial, endothelial and fibroblasts, so the imputation model assigns the same DNAm value for these cell-types. We also note that the weight is the average of these non-zero DNAm values. Thus, genes with weights close to 1 are most informative, whereas a gene with a weight close to 0 will not be very informative. Correspondingly, these weights will be used later when inferring cell-type fractions in bulk-tissue samples, to favour the genes with weights close to 1.

We note that some of the genes in the DNAm reference matrix have a weight of zero. These occur mainly because there were no samples in the database (SMC2/RMAP) for which the corresponding gene was not expressed, thus not allowing for the DNAm value to be imputed. A solution for these genes would be to fit logistic regressions with DNAm as the response variable and expression as the independent variable, but in the current version of EpiSCORE this is not supported.

2.5 Validating the imputed DNAm reference matrix

Note: In order to run through this section you will need to download the file dataExampleLung.Rd from http://github.com/aet21/EpiSCORE, put it in your working directory, and load it in:

load("../dataExampleLung.Rd")

Let us now validate the imputed DNAm reference matrix. Our strategy will be to validate it by first demonstrating that it can predict reasonably accurate cell-type fractions in simulated in-silico mixtures, for which the underlying cell-type proportions are known. Ideally, these mixtures should be generated from purified samples representing the exact same cell-types in the lung scRNA-Seq atlas. However, DNAm data of purified samples representing these exact same cell-types may not always be available. We use the next best surrogates: for epithelial cells we use normal epithelial cell lines from ENCODE, for fibroblasts we use normal adult fibroblast cell-line from ENCODE, for endothelial cells use a 450k set profiling pulmonary endothelial cells, and for immune-cells we used purified 450k profiles from Reinius et al. (2012). We have precomputed 100 in-silico mixtures over a common set of 483793 probes from the Illumina 450k platform, which due to size restrictions we can’t upload here. Assume however that this matrix (called dataSIM.m) has been uploaded. Since this DNAm data matrix is defined over CpGs and our reference matrix is defined for promoter regions, we need to collapse or summarise the DNAm data matrix at the level of gene promoters. We do this by averaging the DNAm values for probes mapping to within 200bp upstream of each gene’s transcription start site (TSS). If such probes are not available, we take the average over 1st Exon probes. If these are also not available, we discard the gene. This model of assigning unique DNAm values to genes was validated in our previous publication Jiao, Widschwendter, and Teschendorff (2014). To perform this averaging, we would run the function constAvBetaTSS but instead comment this operation out:

#avSIM.m <- constAvBetaTSS(dataSIM.m,type="450k");
print(dim(avSIM.m));
## [1] 12602   100

Of note, the data file dataExampleLung.Rd also contains a number of other objects (trueW.m , avLUSCtss.m, phenoLUSC.lv, dmctLUSC.lv), which will be required later in this tutorial. Now, we are ready to estimate the proportions of epithelial, endothelial, fibroblast and immune-cells in our in-silico mixtures. Before deciding on whether to use all marker genes with non-zero weights, it is useful to generate a density plot of the weight distribution:

plot(density(refMmg.m[,5]),lwd=2,xlab="Weight",main="");
abline(v=0.4,lwd=2,col="red");

This reveals a bi-modality, suggesting that a binarisation into informative and non-informative genes is sensible. We thus select genes with weights larger than the threshold \(w=0.4\) shown in the above figure, and only use these to estimate cell-type fractions weighting the remaining genes according to the actual weight value. This can be specified by the wth argument:

print(paste("Number of selected genes=",length(which(refMmg.m[,5]>0.4)),sep=""));
## [1] "Number of selected genes=135"
estF.o <- wRPC(data=avSIM.m,ref=refMmg.m,useW=TRUE,wth=0.4,maxit=200);

Thus, the number of selected genes for the cell-type fraction estimation is 135, which is quite a reasonable number for 4 cell-types. If this number were much less than 100, the subsequent inference may not be meaningful. Let us now check how well the cell-type fractions correlate with the exact proportions given in the trueW.m matrix:

print(cor(estF.o$estF,trueW.m));
##             Epi         EC        Fib         IC
## Epi   0.9140326 -0.2832608 -0.1481061 -0.5317131
## Endo -0.6878753  0.6848686 -0.2953122  0.3272086
## Fib  -0.1502026  0.1047273  0.8505774 -0.7384849
## IC   -0.3139809 -0.3004083 -0.4044622  0.9882232
pcc.v <- diag(cor(estF.o$estF,trueW.m));

We can see that the Pearson Correlation Coefficients along the diagonal are fairly high, suggesting good correlative agreement. We can confirm this with some scatterplots:

par(mfrow=c(1,4));
par(mar=c(4,4,2,1));
for(ct in 1:4){
  plot(estF.o$estF[,ct],trueW.m[,ct],pch=23,xlim=c(0,1),ylim=c(0,1),main=colnames(trueW.m)[ct],xlab="fCT(Estimated)",ylab="fCT(True)",cex=0.5);
  abline(a=0,b=1,col="green",lty=2,lwd=2);
  text(x=.5,y=0.9,paste("PCC=",round(pcc.v[ct],2),sep=""),font=2,cex=1); 
}

This shows that it has been possible to reasonably accurately infer cell-type proportions in these in-silico mixtures. The validation of the whole procedure on real data mixtures is done in the next section.

3 Validation and application of EpiSCORE on real epigenome data

3.1 Estimation of cell-type fractions in lung cancer

Now, we validate EpiSCORE in the context of real DNAm mixture data. We consider the case of the lung squamous cell carcinoma (LUSC) Illumina 450k set from The Cancer Genome Atlas (TCGA). The full dataset consists of 316 samples (41 normal-adjacent and 275 cancers) and 395963 CpGs. Because of space restrictions, we don’t provide the full DNAm dataset, but only the promoter-DNAm averaged one (avLUSCtss.m), which is easily generated from the full data matrix by application of the constAvBetaTSS function (see previous sections for how this function is applied):

estF.o <- wRPC(avLUSCtss.m,ref=refMmg.m,useW=TRUE,wth=0.4,maxit=200);

The expected fraction of epithelial cells should be higher in cancer compared to normal, which we check as follows:

pv <- wilcox.test(estF.o$est[,"Epi"] ~ phenoLUSC.lv$Cancer,alt="less")$p.value;
print(pv)
## [1] 2.397205e-12

We used alternative hypothesis as less because in phenoLUSC.lv$Cancer, cancer is 1 and normal is 0, and the null is that cancer is not higher than normal. The obtained P-value clearly indicates that the null can be rejected, demonstrating that the epithelial fraction is indeed much higher in cancer, owing to its growth.

3.2 Identification of cell-type specific differential DNAm in lung cancer

Having obtained the cell-type fractions for the main cell-types in all lung-tissue samples, we can now aim to identify differentially methylated cell-types (DMCTs) in cancer, i.e. cell-type specific differentially methylated cytosines. Since we have a DNAm reference matrix, we can apply a reference-based method like CellDMC Zheng et al. (2018), which incorporates statistical interaction terms between phenotype (normal vs cancer) and estimated cell-type fractions. Because the input are the estimated cell-type fractions themselves, CellDMC is run over all CpGs, that is, while the inference of cell-type fractions was done by consider promoter DNAm levels, identifying DMCTs is done at single cytosine resolution level. CellDMC is part of the EpiDISH package loaded in earlier and we provide detailed vignette in that package to explain how it is run. Because the full LUSC dataset bmiqLSCCrmRS.m is too large for inclusion here, we just display the syntax:

library(EpiDISH);
cdmc.o <- CellDMC(bmiqLSCCrmRS.m,pheno.v=phenoLUSC.lv$Cancer+1, frac.m=estF.o$est, adjPMethod = "fdr", adjPThresh = 0.05, cov.mod = NULL, sort = FALSE, mc.cores = 4);

The result of running CellDMC was loaded in earlier in the tutorial and can be found in the object dmctLUSC.lv which would have been generated from the following code:

dmctLUSC.lv <- list();
for(ct in 1:4){
 dmctLUSC.lv[[ct]] <- rownames(cdmc.o$coe[[ct]][which(cdmc.o$dmct[,1+ct]!=0),]);
}

To visualize the number of DMCTs for each cell-type and their overlaps, we can run:

library(SuperExactTest);
## Loading required package: grid
## 
## Attaching package: 'SuperExactTest'
## The following objects are masked from 'package:base':
## 
##     intersect, union
res.o <- supertest(dmctLUSC.lv,n=395963);
plot(res.o,"landscape",sort.by="size");

This demonstrates that most of the DMCTs occur in the epithelial compartment, which is as expected, and that there is relatively little overlap between DMCTs called in each cell-type, except between endothelial and epithelial cells, and endothelial and immune-cells.

The figure below is a follow-up result demonstrating a number of results that are consistent with previous biological knowledge. Notably, HOX-genes, transcription factors (TF) and bivalent targets of the Polycomb Repressive Complex-2 (BIV/PRC2) are well-known to be hypermethylated in cancer epithelial cells, and CellDMC in conjunction with our DNAm reference has been able to retrieve this important result:

Scatterplots of -log10[P-value] vs -log10[OddsRatio] of enrichment among cell-type specific DMCTs

Scatterplots of -log10[P-value] vs -log10[OddsRatio] of enrichment among cell-type specific DMCTs

We further note that these biological terms were not as strongly enriched in the endothelial compartment, demonstrating the specificity of our result.

4 Session Info

sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] SuperExactTest_1.0.7 EpiDISH_2.6.0        EpiSCORE_0.9.2      
## [4] BiocStyle_2.18.1    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0     xfun_0.22            purrr_0.3.4         
##  [4] splines_4.0.5        lattice_0.20-41      vctrs_0.3.7         
##  [7] generics_0.1.0       htmltools_0.5.1.1    stats4_4.0.5        
## [10] yaml_2.2.1           utf8_1.2.1           blob_1.2.1          
## [13] rlang_0.4.10         e1071_1.7-6          pillar_1.5.1        
## [16] glue_1.4.2           DBI_1.1.1            BiocGenerics_0.36.0 
## [19] bit64_4.0.5          matrixStats_0.58.0   lifecycle_1.0.0     
## [22] stringr_1.4.0        memoise_2.0.0        evaluate_0.14       
## [25] Biobase_2.50.0       knitr_1.31           IRanges_2.24.1      
## [28] fastmap_1.1.0        class_7.3-18         parallel_4.0.5      
## [31] AnnotationDbi_1.52.0 fansi_0.4.2          highr_0.8           
## [34] Rcpp_1.0.6           BiocManager_1.30.12  cachem_1.0.4        
## [37] org.Hs.eg.db_3.12.0  locfdr_1.1-8         S4Vectors_0.28.1    
## [40] magick_2.7.1         bit_4.0.4            presto_1.0.0        
## [43] digest_0.6.27        stringi_1.5.3        bookdown_0.21       
## [46] dplyr_1.0.5          quadprog_1.5-8       tools_4.0.5         
## [49] magrittr_2.0.1       proxy_0.4-25         RSQLite_2.2.5       
## [52] tibble_3.1.0         crayon_1.4.1         pkgconfig_2.0.3     
## [55] MASS_7.3-53.1        ellipsis_0.3.1       Matrix_1.3-2        
## [58] rmarkdown_2.7        R6_2.5.0             compiler_4.0.5

References

Bernstein, B. E., J. A. Stamatoyannopoulos, J. F. Costello, B. Ren, A. Milosavljevic, A. Meissner, M. Kellis, et al. 2010. “The Nih Roadmap Epigenomics Mapping Consortium.” Nat Biotechnol 28 (10): 1045–8.

Houseman, E. A., W. P. Accomando, D. C. Koestler, B. C. Christensen, C. J. Marsit, H. H. Nelson, J. K. Wiencke, and K. T. Kelsey. 2012. “DNA Methylation Arrays as Surrogate Measures of Cell Mixture Distribution.” BMC Bioinformatics 13: 86.

Jiao, Y., M. Widschwendter, and A. E. Teschendorff. 2014. “A Systems-Level Integrative Framework for Genome-Wide Dna Methylation and Gene Expression Data Identifies Differential Gene Expression Modules Under Epigenetic Control.” Bioinformatics 30 (16): 2360.

Nazor, K. L., G. Altun, C. Lynch, H. Tran, J. V. Harness, I. Slavin, I. Garitaonandia, et al. 2012. “Recurrent Variations in Dna Methylation in Human Pluripotent Stem Cells and Their Differentiated Derivatives.” Cell Stem Cell 10 (5): 620–34.

Reinius, L. E., N. Acevedo, M. Joerink, G. Pershagen, S. E. Dahlen, D. Greco, C. Soderhall, A. Scheynius, and J. Kere. 2012. “Differential Dna Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility.” PLoS One 7 (7): e41361.

Teschendorff, A. E., C. E. Breeze, S. C. Zheng, and S. Beck. 2017. “A Comparison of Reference-Based Algorithms for Correcting Cell-Type Heterogeneity in Epigenome-Wide Association Studies.” BMC Bioinformatics 18 (1): 105.

Teschendorff, A. E., T. Zhu, C. E. Breeze, and S. Beck. 2020. “EPISCORE: Cell Type Deconvolution of Bulk Tissue Dna Methylomes from Single-Cell Rna-Seq Data.” Genome Biol 21 (1): 221.

Zheng, S. C., C. E. Breeze, S. Beck, and A. E. Teschendorff. 2018. “Identification of Differentially Methylated Cell Types in Epigenome-Wide Association Studies.” Nat Methods 15 (12): 1059–66.