An Adaptive, Automatic Multiple-Case Deletion Technique for Detecting Influence in Regression

Version 3 2015-10-08, 14:46

Version 2 2015-10-08, 14:46

Version 1 2015-07-03, 00:00

dataset

posted on 2015-10-08, 14:46 authored by Steven Roberts, Michael A. Martin, Letian Zheng

Critical to any regression analysis is the identification of observations that exert a strong influence on the fitted regression model. Traditional regression influence statistics such as Cook's distance and DFFITS, each based on deleting single observations, can fail in the presence of multiple influential observations if these influential observations “mask” one another, or if other effects such as “swamping” occur. Masking refers to the situation where an observation reveals itself as influential only after one or more other observations are deleted. Swamping occurs when points that are not actually outliers/influential are declared to be so because of the effects on the model of other unusual observations. One computationally expensive solution to these problems is the use of influence statistics that delete multiple rather than single observations. In this article, we build on previous work to produce a computationally feasible algorithm for detecting an unknown number of influential observations in the presence of masking. An important difference between our proposed algorithm and existing methods is that we focus on the data that remain after observations are deleted, rather than on the deleted observations themselves. Further, our approach uses a novel confirmatory step designed to provide a secondary assessment of identified observations. Supplementary materials for this article are available online.