Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes

de Boves Harrington, Peter

doi:10.6084/m9.figshare.5278060

batc_a_1361314_sm7405.docx (825.81 kB)

Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes

journal contribution

posted on 2017-08-04, 13:53 authored by Peter de Boves Harrington

Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.