posted on 2015-12-15, 00:00authored byJan Gerretzen, Ewa Szymańska, Jeroen
J. Jansen, Jacob Bart, Henk-Jan van Manen, Edwin R. van den Heuvel, Lutgarde M. C. Buydens
The
selection of optimal preprocessing is among the main bottlenecks
in chemometric data analysis. Preprocessing currently is a burden,
since a multitude of different preprocessing methods is available
for, e.g., baseline correction, smoothing, and alignment, but it is
not clear beforehand which method(s) should be used for which data
set. The process of preprocessing selection is often limited to trial-and-error
and is therefore considered somewhat subjective. In this paper, we
present a novel, simple, and effective approach for preprocessing
selection. The defining feature of this approach is a design of experiments.
On the basis of the design, model performance of a few well-chosen
preprocessing methods, and combinations thereof (called strategies) is evaluated. Interpretation of the main effects and interactions
subsequently enables the selection of an optimal preprocessing strategy.
The presented approach is applied to eight different spectroscopic
data sets, covering both calibration and classification challenges.
We show that the approach is able to select a preprocessing strategy
which improves model performance by at least 50% compared to the raw
data; in most cases, it leads to a strategy very close to the true
optimum. Our approach makes preprocessing selection fast, insightful,
and objective.