TY - DATA T1 - Additional file 1: of 2D–EM clustering approach for high-dimensional data through folding feature vectors PY - 2017/12/28 AU - Alok Sharma AU - Piotr Kamola AU - Tatsuhiko Tsunoda UR - https://springernature.figshare.com/articles/journal_contribution/Additional_file_1_of_2D_EM_clustering_approach_for_high-dimensional_data_through_folding_feature_vectors/5742591 DO - 10.6084/m9.figshare.5742591.v1 L4 - https://ndownloader.figshare.com/files/10110192 KW - EM algorithm KW - Feature matrix KW - Small sample size KW - Transcriptome KW - Methylome KW - Cancer KW - Phenotype clustering N2 - In this file the bias of using filtering process is analyzed. Here, we analyzed the effect of applying the filter (which was used for 2D–EM algorithm) to other clustering algorithms. We preprocess data to retain top m 2 features. The m 2 values for all datasets at 0.01 cut-off were as follows: 1156 (SRBCT), 529 (ALL), 6084 (MLL), 1444 (ALL subtype), 15,129 (GCM) and 5625 (Lung Cancer). Then clustering algorithms are applied to see the difference in performance (both in Rand score and adjusted Rand index). Table S1 and Table S2 show the Rand score and adjusted Rand score when filtering step is applied. Table S3 and Table S4 show the variations in Rand score and adjusted Rand score after filtering compared to before filtering process. (DOCX 25 kb) ER -