Supplementary Material for: The Robustness of Generalized Estimating Equations for Association Tests in Extended Family Data

Variance components analysis (VCA), the traditional method for handling correlations within families in genetic association studies, is computationally intensive for genome-wide analyses, and the computational burden of VCA increases with family size and the number of genetic markers. Alternative approaches that do not require the computation of familial correlations are preferable, provided that they do not inflate type I error or decrease power. We performed a simulation study to evaluate practical alternatives to VCA that use regression with generalized estimating equations (GEE) in extended family data. We compared the properties of linear regression with GEE applied to an entire extended family structure (GEE-EXT) and GEE applied to nuclear family structures split from these extended families (GEE-SPL) to variance components likelihood-based methods (FastAssoc). GEE-EXT was evaluated with and without robust variance estimators to estimate the standard errors. We observed similar average type I error rates from GEE-EXT and FastAssoc compared to GEE-SPL. Type I error rates for the GEE-EXT method with a robust variance estimator were marginally higher than the nominal rate when the minor allele frequency (MAF) was <0.1, but were close to the nominal rate when the MAF was ≥0.2. All methods gave consistent effect estimates and had similar power. In summary, the GEE framework with the robust variance estimator, the computationally fastest and least data management-intensive approach, appears to work well in extended families and thus provides a reasonable alternative to full variance components approaches for extended pedigrees in a genome-wide association study setting.