figshare
Browse
uasa_a_1962720_sm5150.pdf (5.94 MB)

Derandomizing Knockoffs

Download (5.94 MB)
Version 2 2021-09-14, 15:00
Version 1 2021-08-04, 13:00
journal contribution
posted on 2021-09-14, 15:00 authored by Zhimei Ren, Yuting Wei, Emmanuel Candès

Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs is a randomized procedure which relies on the one-time construction of synthetic (random) variables. This article introduces a derandomization method by aggregating the selection results across multiple runs of the knockoffs algorithm. The derandomization step is designed to be flexible and can be adapted to any variable selection base procedure to yield stable decisions without compromising statistical power. When applied to the base procedure of Janson and Su, we prove that derandomized knockoffs controls both the per family error rate (PFER) and the k family-wise error rate (k-FWER). Furthermore, we carry out extensive numerical studies demonstrating tight Type I error control and markedly enhanced power when compared with alternative variable selection algorithms. Finally, we apply our approach to multistage genome-wide association studies of prostate cancer and report locations on the genome that are significantly associated with the disease. When cross-referenced with other studies, we find that the reported associations have been replicated.

Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Funding

Z. R. is supported by the Math + X award from the Simons Foundation, the JHU project no. 2003514594, the ARO project no. W911NF-17-1-0304, the NSF grant no. DMS 1712800 and the Discovery Innovation Fund for Biomedical Data Sciences. Y. W. is supported partially by the NSF DMS 2147546/2015447. E. C. is partially supported by NSF via grants nos. DMS 1712800 and DMS 1934578 and by the Office of Naval Research grant no. N00014-20-12157.

History