Comparison of disease prevalence in two populations under double-sampling scheme with two fallible classifiers

Qiu, Shi-Fang; He, Jie; Tao, Ji-Ran; Tang, Man-Lai; Poon, Wai-Yin

doi:10.6084/m9.figshare.10000277.v1

cjas_a_1679727_sm8208.pdf (55.19 kB)

Comparison of disease prevalence in two populations under double-sampling scheme with two fallible classifiers

journal contribution

posted on 2019-10-18, 04:41 authored by Shi-Fang Qiu, Jie He, Ji-Ran Tao, Man-Lai Tang, Wai-Yin Poon

A disease prevalence can be estimated by classifying subjects according to whether they have the disease. When gold-standard tests are too expensive to be applied to all subjects, partially validated data can be obtained by double-sampling in which all individuals are classified by a fallible classifier, and some of individuals are validated by the gold-standard classifier. However, it could happen in practice that such infallible classifier does not available. In this article, we consider two models in which both classifiers are fallible and propose four asymptotic test procedures for comparing disease prevalence in two groups. Corresponding sample size formulae and validated ratio given the total sample sizes are also derived and evaluated. Simulation results show that (i) Score test performs well and the corresponding sample size formula is also accurate in terms of the empirical power and size in two models; (ii) the Wald test based on the variance estimator with parameters estimated under the null hypothesis outperforms the others even under small sample sizes in Model II, and the sample size estimated by this test is also accurate; (iii) the estimated validated ratios based on all tests are accurate. The malarial data are used to illustrate the proposed methodologies.