TY - DATA T1 - Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates PY - 2017/11/09 AU - M. D. Koslovsky AU - M. D. Swartz AU - L. Leon-Novelo AU - W. Chan AU - A. V. Wilkinson UR - https://tandf.figshare.com/articles/dataset/Using_the_EM_algorithm_for_Bayesian_variable_selection_in_logistic_regression_models_with_related_covariates/5584183 DO - 10.6084/m9.figshare.5584183 L4 - https://ndownloader.figshare.com/files/9708418 L4 - https://ndownloader.figshare.com/files/9708421 L4 - https://ndownloader.figshare.com/files/9708424 L4 - https://ndownloader.figshare.com/files/9708427 KW - Bayesian inference KW - binary outcomes KW - deterministic annealing KW - expectation-maximization KW - grouped covariates KW - heredity constraint KW - inheritance property KW - variable selection KW - 62F15 KW - 62J12 KW - 68U20 N2 - We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents. ER -