figshare
Browse
1/1
2 files

A Modified Random Survival Forests Algorithm for High Dimensional Predictors and Self-Reported Outcomes

Version 2 2018-08-20, 18:19
Version 1 2018-05-18, 16:02
dataset
posted on 2018-08-20, 18:19 authored by Hui Xu, Xiangdong Gu, Mahlet G. Tadesse, Raji Balasubramanian

We present an ensemble tree-based algorithm for variable selection in high-dimensional datasets, in settings where a time-to-event outcome is observed with error. This work is motivated by self-reported outcomes collected in large-scale epidemiologic studies, such as the Women’s Health Initiative. The proposed methods equally apply to imperfect outcomes that arise in other settings such as data extracted from electronic medical records. To evaluate the performance of our proposed algorithm, we present results from simulation studies, considering both continuous and categorical covariates. We illustrate this approach to discover single nucleotide polymorphisms that are associated with incident Type 2 diabetes in the Women’s Health Initiative. A freely available R package icRSF has been developed to implement the proposed methods. Supplementary material for this article is available online.

Funding

This work was supported by National Institutes of Health 1R01HL122241-01A1 to RB. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C.

History