File(s) not publicly available
Bias and High-Dimensional Adjustment in Observational Studies of Peer Effects
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Peer effects, in which an individual’s behavior is affected by peers’ behavior, are posited by multiple theories in the social sciences. Randomized field experiments that identify peer effects, however, are often expensive or infeasible, so many studies of peer effects use observational data, which is expected to suffer from confounding. Here we show, in the context of information and media diffusion, that high-dimensional adjustment of a nonexperimental control group (660 million observations) using propensity score models produces estimates of peer effects statistically indistinguishable from those using a large randomized experiment (215 million observations). Compared with the experiment, naive observational estimators overstate peer effects by over 300% and commonly available variables (e.g., demographics) offer little bias reduction. Adjusting for a measure of prior behaviors closely related to the focal behavior reduces this bias by 91%, while models adjusting for over 3700 past behaviors provide additional bias reduction, reducing bias by over 97%, which is statistically indistinguishable from unbiasedness. This demonstrates how detailed records of behavior can improve studies of social influence, information diffusion, and imitation; these results are encouraging for the credibility of some studies but also cautionary for studies of peer effects in rare or new behaviors. More generally, these results show how large, high-dimensional datasets and statistical learning can be used to improve causal inference. Supplementary materials for this article are available online.