Dataset for: Spatial Regression with an Informatively-Missing Covariate

The US EPA has established a large network of stations to monitor fine particulate matter (PM2.5) that is known to be harmful to human health. Unfortunately, the network has limited spatial coverage and stations often only measure \PM every few days. Satellite-measured aerosol optical depth (AOD) is a low-cost surrogate with greater spatiotemporal coverage, and spatial regression models have established that including AOD as a covariate improves spatial interpolation of PM2.5. However, AOD is often missing, and our analysis reveals that the conditions that lead to missing AOD are also conducive to high AOD. Therefore, naive interpolation that ignores informative missingness may lead to bias. We propose a joint hierarchical model for PM and AOD that accounts for informatively missing AOD. We conduct a simulation study of the effects of ignoring informative missingness in the covariate, and evaluate the performance of the proposed model. We apply the method to map daily PM in the Southeastern United States. Our analysis reveals statistically-significant informative missingness and relationships between PM and AOD in many seasons after accounting for meteorological and land-use variables.