Explainable machine learning prediction of ICU mortality

Chia, AHT; Khoo, MS; Lim, AZ; Ong, KE; Sun, Y; Nguyen, Binh; Chua, MCH; Pang, J

doi:10.25455/wgtn.19242666

File(s) stored somewhere else

https://www.sciencedirect.com/science/article/pii/S2352914821001593?via%3Dihub

Please note: Linked content is NOT stored on Open Access Te Herenga Waka-Victoria University of Wellington and we can't guarantee its availability, quality, security or accept any liability.

Explainable machine learning prediction of ICU mortality

journal contribution

posted on 2022-02-26, 11:00 authored by AHT Chia, MS Khoo, AZ Lim, KE Ong, Y Sun, Binh NguyenBinh Nguyen, MCH Chua, J Pang

Background: There is a variety of mortality prediction models for patients in intensive care units (ICU) to guide appropriate clinical management. Advances in machine learning methodologies typically employ classifiers such as Neural Network and Random Forest which are often regarded by healthcare professionals as black boxes. These models often do not provide clear links between the input model features and output clinical event. We investigate whether features identified by Cox-Proportional Hazards (CPH) model can be used for ICU mortality prediction. Methods: We employ the PhysioNet Challenge 2012 dataset, a subset of MIMIC-II Clinical Database data of ICU patients admitted to Boston's Beth Israel Deaconess Medical Center from 2001 to 2008. The dataset is split into train set A, test set B and unseen set C, with 4000 patients each. Python is the programming language used alongside scikit-learn, and lifelines packages. Besides white-box feature selection methods (logistic regression and decision tree), we also explore using Cox-Proportional Hazards model for feature selection. We then trained the machine learning model using classifiers such as logistic regression and variants of decision tree. Extreme gradient boosted trees models performed better than other classifiers. The model is validated using 5-fold cross-validation and evaluated against unseen set C. The model performance is assessed using area under the precision-recall curve (AUC-PR) as the main metric. Findings: The data of about 12,000 patients is used, providing a high degree of generalizability. The number of statistically significant features identified by CPH (n = 16) is significantly smaller than logistic regression (n = 36), decision tree (n = 26) and all features (n = 42). With only 16 features used, the model achieves a performance of AUC-PR 0·438 on test set B, which is close to decision tree (AUC-PR 0·442) and logistic regression (AUC-PR 0·446) and all features (AUC-PR 0·446). Interpretation: The significantly fewer features identified by CPH allows the building of a model that is easily interpretable by clinicians whilst still achieving comparable results to other models. This finding allows clinicians to use CPH as an alternative method to determine and act on features that need to be closely monitored for ICU patients.

History

Preferred citation

Chia, A. H. T., Khoo, M. S., Lim, A. Z., Ong, K. E., Sun, Y., Nguyen, B. P., Chua, M. C. H. & Pang, J. (2021). Explainable machine learning prediction of ICU mortality. Informatics in Medicine Unlocked, 25, 100674-100674. https://doi.org/10.1016/j.imu.2021.100674

Publisher DOI

https://doi.org/10.1016/j.imu.2021.100674

Journal title

Informatics in Medicine Unlocked

Volume

25

Publication date

2021-01-01

Pagination

100674-100674

Publisher

Elsevier BV

Publication status

Published

ISSN

2352-9148

Article number

100674

Language

en

Usage metrics

Keywords

Patient Safety 3 Good Health and Well Being

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) stored somewhere else

Explainable machine learning prediction of ICU mortality

History

Preferred citation

Publisher DOI

Journal title

Volume

Publication date

Pagination

Publisher

Publication status

ISSN

Article number

Language

Usage metrics

Categories

Keywords

Licence

Exports