figshare
Browse
000380850_sm_Figures.pdf (91.72 kB)

Supplementary Material for: A Bagged, Partially Linear, Tree-Based Regression Procedure for Prediction and Variable Selection

Download (91.72 kB)
dataset
posted on 2015-07-28, 00:00 authored by Mbogning C., Perdry H., Broët P.
Objectives: In genomics, variable selection and prediction accounting for the complex interrelationships between explanatory variables represent major challenges. Tree-based methods are powerful alternatives to classical regression models. We have recently proposed the generalized, partially linear, tree-based regression (GPLTR) procedure that integrates the advantages of generalized linear regression (allowing the incorporation of confounding variables) and of tree-based models. In this work, we use bagging to address a classical concern of tree-based methods: their instability. Methods: We present a bagged GPLTR procedure and three scores for variable importance. The prediction accuracy and the performance of the scores are assessed by simulation. The use of this procedure is exemplified by the analysis of a lung cancer data set. The aim is to predict the epidermal growth factor receptor (EGFR) mutation based on gene expression measurements, taking into account the ethnicity (confounder variable) and perform variable selection. Results: The procedure performs well in terms of prediction accuracy. The scores differentiate predictive variables from noise variables. Based on a lung adenocarcinoma data set, the procedure achieves good predictive performance for EGFR mutation and selects relevant genes. Conclusion: The proposed bagged GPLTR procedure performs well for prediction and variable selection.

History

Usage metrics

    Human Heredity

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC