figshare
Browse
12911_2017_556_MOESM1_ESM.docx (555.15 kB)

Additional file 1: Figure S1. of Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

Download (555.15 kB)
journal contribution
posted on 2017-12-01, 05:00 authored by Wei-Hung Weng, Kavishwar Wagholikar, Alexa McCray, Peter Szolovits, Henry Chueh
The Final Dataset Selection Process of MGH Dataset. Figure S2 The performance of classifiers (using AUC) built by different combinations of the clinical feature representation method, vector representation method and supervised learning algorithm. In both datasets, the combination of the hybrid feature of bag-of-words + UMLS concepts restricted to five semantic groups with tf-idf weighting and linear SVM yielded the optimal performance for clinical note classification based on the medical subdomain of the document. (a) AUC of classifiers trained on iDASH dataset, (b) AUC of classifiers trained on MGH dataset. The lines connecting data points for different clinical feature representation methods only serve to tie together the visual results from specific algorithms on different sets of features, but should not imply continuity in the horizontal axis features. Table S1 Representative medical subdomains in the iDASH and MGH dataset. We selected the top 24 medical subdomains from 105 medical specialties in the MGH dataset. Table S2 Ranked top post-stemming important features (bag-of-words + UMLS concepts restricted to five semantic groups) of six medical subdomains identified by iDASH and MGH classifiers. The phrases in the parentheses are the UMLS descriptions of the corresponding UMLS CUIs. Table S3 The confusion matrices of the classification tasks using the (a) baseline and (b) the best iDASH classifiers. Table S4 Percentage of overlapping ranked top features of iDASH and MGH datasets (DOCX 555 kb)

Funding

National Library of Medicine, National Institutes of Health

History