10.6084/m9.figshare.5314384.v1
David Wright
David
Wright
Mark Thyer
Mark
Thyer
Seth Westra
Seth
Westra
Benjamin Renard
Benjamin
Renard
David McInerney
David
McInerney
A generalised approach for identifying influential data in hydrological modelling
figshare
2017
hydrology
influence diagnostics
cook's distance
Water Resources Engineering
2017-08-16 04:52:31
Poster
https://figshare.com/articles/poster/A_generalised_approach_for_identifying_influential_data_in_hydrological_modelling/5314384
Influence diagnostics identify data points that have a disproportionate
impact on model calibration, and are therefore useful to identify
possible erroneous data points or scrutinise the sensitivity of the
model results to a small portion of the overall calibration dataset.
Case-deletion Cook's distance calculates influence; however, it has a
large computational demand due to the requirement for recalibration of
the model parameters for every data point in the calibration data.
Regression based Cook's distance provides an approximation of
case-deletion Cook's distance by combining two regression components for
each observed data point: 1) the leverage which is used to assess the
potential importance of individual observations, and 2) the standardised
residuals. By combining these two components the regression based Cook's
distance requires only a relatively small number of additional runs and
is therefore an attractive alternative to the computationally demanding
case-deletion Cook's distance. The objective of this study is to develop
generalised regression based influence diagnostics that can be applied
across a wide range models and objective functions in a computationally
efficient manner. This overcomes the limitations of the current suite of
influence diagnostics. For example, the regression based Cook's distance
has two assumptions that are not satisfied in hydrological modelling: 1)
the hydrological model is linear; 2) the objective function applied is
limited to standard least squares. In addition, although the
case-deletion diagnostics overcome these assumptions, they require high
performance computing and therefore are not computationally feasible for
most hydrological model applications. In this study we generalise
regression based Cook's distance to be applied beyond linear models and
to the vast majority of objective functions currently applied in
hydrological model calibration. The improvements from the new
formulation are then examined by comparing generalised Cook's distance
to the computationally demanding case-deletion influence metric (used as
a baseline measure of the performance of each influence diagnostic). We
consider three case studies: (1) a series of synthetic regression models
with varying nonlinear model response and heteroscedasticity in residual
error; 2) a conceptual hydrological model (GR4J), and 3) a rating curve
incorporating discharge uncertainty and Bayesian parameter priors. The
generalisation of the regression based Cook's distance allows for
computationally cheap influence analysis across the vast majority of
hydrological model structures and objective functions. The inclusion of
highly influential data in model calibration can have a substantial
impact on the predicted maximum and mean flows. Due to the large amount
of insight that influence diagnostics provide combined with their
computationally cheap nature we recommend that influence analysis is
undertaken by hydrological practitioners as a step towards more robust
model calibration.