usbr_a_1119720_sm6163.pdf (167.71 kB)

False Discovery Versus Familywise Error Rate Approaches to Outlier Detection

Version 2 2016-06-02, 19:17

Version 1 2015-12-23, 10:06

dataset

posted on 2016-06-02, 19:17 authored by Yihuan Xu, Boris Iglewicz

Outliers, in general, are observations that deviate sufficiently from a base distribution. This study deals with outlier detection approaches for large samples from continuous univariate distributions. Investigated are the properties of a practical newer outlier detection approach based on use of a false discovery rate (FDR) method in conjunction with a robustly estimated Tukey g-and-h base distribution. Compared are the properties of a boxplot type outlier detection approach that controls the familywise error rate (FWER) with a newer FDR approach. These options are compared in terms of error rates and effects of moving the outliers gradually further from base distribution center while using 5% and 1% FDR or FWERs. Two microarray datasets are used as examples where the assumed null distributions do not fit the data well. In such cases, the proposed estimated Tukey g-and-h null distribution approach leads to superior outlier detection performance. Supplementary materials for this article are available online.