2conv2norm recall for regulatory sequence prediction for different cell lines.
Ten CNN models of the 2conv2norm architecture were trained each on DHS datasets (positive) and corresponding negative sets of k-mer shuffled sequences (k = 2, k = 7) or genomic background sequences (tGC = 0.02) for A549 or MCF-7 cells. A549 and MCF-7 cell lines are represented in our data with two training datasets each, which are labeled as A and B, respectively. Model performance was evaluated based on recall for hold-out sets (chromosome 8). The table summarizes mean and standard deviation across ten trained models. There are seven different hold-out sets derived from different cell lines and we assess model generalization across cell-types. Datasets are named according to S1 Table. Respective results for the gkm-SVM models are available Table 1, results for CNN models of 4conv2pool4norm architecture are available in S6 Table.