pbio.3001398.g001.tif (1.1 MB)

Overview of the algorithmic approach and its place in the hypothesis generation toolkit.

figure

posted on 2021-09-23, 17:47 authored by Braden T. Tierney, Elizabeth Anderson, Yingxuan Tan, Kajal Claypool, Sivateja Tangirala, Aleksandar D. Kostic, Arjun K. Manrai, Chirag J. Patel

(A) The process/paradigm of “data-driven” discovery. First, high-dimensional data are collected on a set of individuals with a given phenotype (i.e., a disease) as well as controls (individuals without the phenotype). The researcher selects a modeling strategy, most often just one, and computes associations between the phenotype of interest (a “Y”) and a particular feature (an “X,” like coffee in the example). The resultant associations yield associational “hypotheses” regarding the relationship between the X and the Y, but only fitting one model specification can yield nonrobust results. Quantvoe replaces this middle step of choosing one model, instead attempting to fit up to every possible model specification, thereby charting a course to robust hypothesis generation. (B) Quantvoe takes 3 types of input data, all in the form of pairs of data frames (tables), either at the command line or in an interactive R session: (1) a single dependent variable and multiple independent variables; (2) multiple dependent variables, or (3) multiple datasets. (C) There are 4 main steps—checking the input data, computing initial univariate associations, computing vibrations across possible adjusters, and quantifying how adjuster presence/absence correlates to changes in the primary association of interest. (D) Following computing VoE, we evaluate the results by measuring Janus effect (the fraction of associations greater than 0) and estimating the impact of different adjusters on the change in correlation size.