Uncertainty sources in flood projections over contrasting hydrometeorological regimes

ABSTRACT This study evaluates the uncertainty of four components of the hydroclimatic modelling chain on flood projections over 96 basins covering contrasting hydrometeorological regimes located in Canada and Mexico. Two ensembles of climate simulations are considered, a large ensemble of 22 global climate model simulations and a smaller ensemble of three high-resolution regional climate model simulations. The other components are two post-processing techniques, three lumped hydrological models and six probability distributions. These four sources are assessed through a method of variance decomposition applied to six flood indicators over a reference period and two future periods: 1976–2005, 2041–2070 and 2070–2099. Systematic differences are observed between basins with contrasting flood-generating processes. Snow-dominated basins consistently show larger variance contributions from hydrological models, while rain-dominated basins show climate simulations as their dominant source. These results underline the need to consider the variability of each component’s uncertainty contribution and its link to hydroclimatic conditions and dominant processes.


Introduction
The observed changes in the climate system have impacted flooding around the world, and it is expected that a changing climate will continue to impact their occurrence in the future (Vormoor et al. 2016, IPCC 2018. Hydrological projections have thus become a common approach to assess the potential impacts of climate change on floods, and to prepare adaptation and mitigation strategies for the future (Kundzewicz et al. 2017, Krysanova et al. 2018. These projections are often generated at the basin scale by coupling climate models' outputs and hydrological models under different emission scenarios (Minville et al. 2009). This process usually comprises multiple stages that create a hydroclimatic modelling chain that varies depending on the methodological choices of each study (Chen et al. 2011, Her et al. 2019. Frequently, the hydroclimatic modelling chain for flooding projections comprises one or multiple (1) emission scenarios; (2) global climate model (GCM) simulations; (3) statistical or dynamical downscaling approaches (i.e. statistical downscaling models [SDMs] and regional climate models [RCMs]); (4) biasadjustment methods; (5) hydrological models (HMs); and (6) statistical models for flood frequency analysis, among others. Numerous studies applying different variations of this modelling chain can be found in the literature (Hirabayashi et al. 2008, Taye et al. 2011, Bosshard et al. 2013, Thompson et al. 2014, Riboust and Brissette 2015, Teutschbein et al. 2015, Reszler et al. 2018, Chan et al. 2020, Lucas-Picher et al. 2020. However, the plethora of existing models and methods, with their own structures and limitations, adds uncertainty at each step of the hydroclimatic modelling chain (Beven 2016, Meresa and Romanowicz 2017, Kundzewicz et al. 2018. Thus, confidence in flooding projections remains low due to the "cascade of uncertainty" arising from their simulations (Schneider 1983, Henderson and Sellers 1993, Mareuil et al. 2007, Wilby and Dessai 2010, AghaKouchak et al. 2012, Hall et al. 2014, Kundzewicz et al. 2017.
A wide range of studies has assessed the modelling uncertainty in hydrological and flood projections over basins with different hydrometeorological regimes (e.g. rain-or snowdominated) around the world (Kay et al. 2009, Teng et al. 2012, Bosshard et al. 2013, Falloon et al. 2014, Vetter et al. 2015, Ho et al. 2016, Collet et al. 2017. In general, these studies evaluate the uncertainty related to each source of the study's modelling chain by using ensembles that allow the analysis of their influence on the resulting hydrological projections. Table 1 summarizes and briefly describes the main conclusions of some relevant uncertainty analyses on hydrological simulations.
This comparison reveals that the cited studies have some similarities and differences. For instance, various studies (e.g. Kay et al. 2009, Teng et al. 2012, Bosshard et al. 2013, Vetter et al. 2017, Giuntoli et al. 2018) have shown that climate simulations (i.e. GCMs and RCMs) are one of the main sources of uncertainty, particularly in mean-and peak-flow simulations. This is line with similar studies identifying climate simulations as the leading source of uncertainty (Wilby and Harris 2006, Graham et al. 2007, Prudhomme and Davies 2009, Gosling et al. 2011, Chan et al. 2020). Yet more recent studies have shown that natural variability can play an important role in climate simulations' uncertainty (Vidal et al. 2016, Hingray et al. 2019; this source of uncertainty is not always considered in the modelling chains. Other studies have given particular focus to hydrological modelling uncertainties (e.g. Bosshard et al. 2013, Giuntoli et al. 2018, Chegwidden et al. 2019, Lemaitre-Basset et al. 2021, showing that hydrological models' structures can be relevant contributors of uncertainty (Poulin et al. 2011, Velázquez et al. 2013, Vetter et al. 2015, Hattermann et al. 2018, but that this depends on the hydrological variable under consideration, on the time of year and on the hydrological regime of the catchment considered (e.g. low flows). It has thus often been recommended to consider hydrological model ensembles and more thorough model evaluations (Velázquez et al. 2013, Vetter et al. 2015. The uncertainty arising from the estimation of return levels in flood frequency analyses has also been investigated in different studies (Merz and Thieken 2005, Blöschl and Montanari 2010, Merz et al. 2012, Hailegeorgis andAlfredsen 2017), which have shown that the uncertainty associated with historical records (e.g. presence of non-stationary trends), the selection of the "best" probability distribution to model extreme values, and the statistical models' parameterizations can be major sources of uncertainty in the estimation of return levels. Few studies, however, have included these uncertainty sources in climate change impact studies (Zhang et al. 2014, Collet et al. 2017, Meresa and Romanowicz 2017, Das and Umamahesh 2018. Among them, it has been highlighted that uncertainty contributions from flood return level statistical models and their parameterizations increased with increasing return periods, underlining the potential impact of ignoring the uncertainty associated with flood frequency estimation (Merz and Thieken 2005, Collet et al. 2017, Meresa and Romanowicz 2017. All uncertainty studies cited above have provided relevant information and insights about the leading uncertainty sources in regional streamflow projections. However, important differences and limitations are observed. For instance, each study has constructed its own hydroclimatic modelling chain over the selected basin(s) of that study. The different types and number of models and methods among studies, in addition to the different climates and flood-generating processes of each Only one raindominated basin included Das and Umamahesh (2018) region, make it difficult to compare and extrapolate their conclusions. Regarding flood frequency analyses, studies have often relied on a single "best" probability distribution, and fewer studies have included uncertainty related to return level estimations and their interplay with other sources (e.g. GCMs and HMs). The regional differences, methodological choices, and limitations of each study make results hardly comparable for different contexts. Therefore, a study that not only includes various basins but also covers contrasting hydrometeorological regimes under a common methodology is still missing. Thus, this study aims to analyse the uncertainty arising from four sources -(1) climate models, (2) post-processing techniques, (3) hydrological models and (4) probability distributions -on flood simulations over 96 basins with contrasting hydroclimatic regimes.

Study regions
The study area consists of 96 basins located in two North American regions with contrasting climates and floodgenerating processes. Figure 1 shows the location and mean total annual precipitation of each basin over both regions. The upper right panel in Fig. 1 shows 50 snow-dominated basins located in the south of the Canadian province of Quebec.
According to the Köppen-Geiger climate classification (Kottek et al. 2006), these 50 basins are in sub-arctic and continental humid climates (i.e. Dfc and Dfb), snowy regions characterized by cold winters and cool summers. These 50 basins have mean annual temperatures ranging from −2.6 to 11.5°C and mean annual total precipitation varying from 900 to 1250 mm. In contrast, the lower right panel in Fig. 1 shows the other 46 basins, located in Mexico. These basins are located over 14 different Köppen-Geiger climate types (Kottek et al. 2006) including equatorial, arid, and warm temperate climates (e.g. Af, BSh and Cf). These 46 basins have mean annual temperatures ranging from 11.7 to 23.6°C and mean annual total precipitation ranging from 475 to 3110 mm. Regarding physical properties, the 96 catchments vary in area from 225 to 29 584 km 2 , covering a total of 419 039 km 2 , with topography ranging from coastal plains and flat areas (over both regions) to mountainous regions reaching up to 2971 m.a.s.l. (in Mexico).

Observed data
Daily historical records of minimum temperature, maximum temperature, precipitation and streamflow were used for each basin. These meteorological and hydrometric records were obtained from the Hydrometeorological Sandbox -École de technologie supérieure (HYSETS) database (Arsenault et al. 2020b), a large-scale hydrometeorological database of North American basins. The meteorological data consists of a grid at 1/16° (~6 km) resolution covering Mexico, the conterminous United States (CONUS) and southern Canada (Livneh et al. 2013(Livneh et al. , 2015. The meteorological data points inside each basin's area were averaged to obtain one time series per basin. Based on hydrometric data availability for each basin, the meteorological and hydrometric data consist of time series with a minimum length of 30 years between 1950 and 2013.

Climate simulations
The climate model outputs used in this study consisted of daily precipitation and minimum and maximum temperatures, from two ensembles of GCM and RCM simulations. The first ensemble comprises 22 GCMs issued from the Coupled Model Intercomparison Project Phase 5 (CMIP5) database (Taylor et al. 2011) described in Table 2. These simulations were selected to represent the CMIP5 ensemble diversity by including 14 different modelling centres, five of the more complex earth system models (ESMs), and various spatial resolutions (0.75-3.75°). The second ensemble comprises three RCM simulations issued from the Canadian Regional Climate Model version 5 (CRCM5; Separovic et al. 2013) driven by three GCMs (i.e. Canadian Earth System Model version 2 (CanESM2), Centre National de Recherches Météorologiques-Climate Model version 5 (CNRM-CM) and the Geophysical Fluid Dynamics Laboratory-Earth System Model 2M (GFDL-ESM2M)) at 0.22° spatial resolution. The CRCM5 simulations over both regions were produced and provided by the Ouranos Consortium on Regional Climatology and Adaptation. Over Mexico, the Ouranos Consortium produced new CRCM5 simulations over a modified Central American domain (CAM) in order to extend the northern border to completely cover the country of Mexico. Simulations over both regions are part of the Coordinated Regional Downscaling Experiment (CORDEX) database (Giorgi and Gutowski 2015). Both GCM and CRCM5 ensembles cover the reference period of 1976-2005 and two future horizons of 2041-2070 and 2070-2099 under Representative Concentration Pathway (RCP) 8.5. The use of these two ensembles enables an evaluation of the impacts of the choice of climate simulation ensemble (smaller-limited RCM ensemble vs. larger GCM ensemble) on the uncertainty contributions to flood simulations.

Hydroclimatic chain overview
This study was carried out in five main stages (see Fig. 2). The first stage consisted of extracting the GCM and CRCM5 outputs for each basin and period. The second stage consisted of post-processing all climate simulations using two biasadjustment techniques. The third stage comprised the calibration and validation of the hydrological models to then couple them with all climate simulations to obtain the GCM-and CRCM5-driven streamflow simulations. The fourth stage entailed performing the flood frequency analysis on each simulated streamflow series with six probability distributions to obtain six flood indicators from each analysis. The fifth stage consisted of the analysis of mean seasonal climate and streamflow simulations, and an uncertainty analysis of six flood indicators for each basin.

Climate outputs
The first stage consisted of extracting and preparing the GCM and CRCM5 outputs for each basin and period. Daily precipitation and minimum and maximum temperatures of all global and regional climate simulations were extracted in each region for the reference period of 1976-2005 and the two future periods of 2041-2070 and 2070-2099. For each basin, the grid points located within or surrounding their delimitation area were averaged using the Thiessen polygon method to obtain a single data series for each of the 25 climate simulations. A minimum of four grid points was averaged using this method for the smallest basins.

Post-processing methods
The second stage consisted of post-processing all climate simulations. Two post-processing techniques were included in the hydroclimatic modelling chain: (1) daily bias correction (DBC) and (2) quantile mapping (QM) methods.
The DBC is a post-processing method merging the daily translation method (DT; Mpelasoka and Chiew 2009) and the local intensity scaling method (LOCI; Schmidli et al. 2006). The DBC starts using the LOCI method to fit simulated precipitation occurrences to the occurrences of observed precipitation data to correct the simulated wet-day frequencies. Then, the DT method is used to correct the frequency distributions of precipitation amounts and temperature simulations through a quantile approach. The QM method (Themeßl et al. 2011, Maraun 2016) is a simple and robust post-processing method based on the transformation of distributions by quantile. First, the correction of temperature simulations is done by fitting the quantiles of simulated data to the quantiles of observations using normal distributions. In the case of precipitation, the QM method corrects the precipitation distribution based on daily-constructed point-wise empirical cumulative distribution functions, including both wet and dry days in the estimation of empirical cumulative distribution functions (Chen et al. 2013b). In this way, the frequency and amounts of precipitation occurrence are simultaneously corrected. Further details on these two methods are presented in the Supplementary material (S1). These two methods fall in the bias-adjusting approaches category, meaning that adjustment factors identified in reference periods are used to adjust the future simulations. Different studies have applied these methods for diverse applications, including comparisons of different bias-adjustment methods (e.g. Chen et al. 2013aChen et al. , 2013b, and studies analysing climate change impacts on hydrology (e.g. Chen et al. 2019, Zhao et al. 2020, endorsing their use in this study.

Hydrological modelling
The hydrological modelling of the third stage is carried out using three lumped conceptual hydrological models with varying structures and levels of complexity -(1) GR4J (Génie Rural à 4 paramètres Journalier), (2) MOHYSE (Modèle Hydrologique Simplifié à l'Extrême) and, (3) HMETS (Hydrological Model -École de technologie Supérieure)that have been evaluated in various regions, validating their use (Arsenault et al. 2019(Arsenault et al. , 2020a). The GR4J model is a parsimonious lumped and conceptual rainfall-runoff model developed by Perrin et al. (2003). In this hydrological model with four parameters, snow accumulation, snowmelt and evapotranspiration are not directly estimated. Thus, the snow module CemaNeige (Valéry et al. 2014) was added to allow for its application over the snow-dominated regions, adding two parameters to the model (six parameters in total). To estimate the potential evapotranspiration the Oudin evaporation formulation (Oudin et al. 2005) was used. This formulation uses extraterrestrial radiation, determined from the date and latitude, and mean temperature to estimate potential evapotranspiration. The inputs required for applying the GR4J-CemaNeige-Oudin model consist of continuous series of daily precipitation, mean temperature and the previously calculated potential evapotranspiration. GR4J coupled with CemaNeige has been widely used for different applications over snow-and rain-dominated basins (e.g. Coron et al. 2012, Troin et al. 2018).
The MOHYSE model is a simplified lumped conceptual model with 10 parameters, created by Fortin and Turcotte (2006). This model represents the main hydrological processes including potential evapotranspiration, snow melting and accumulation, vertical water budget and routing. The input data required for this hydrological model consists of mean daily temperatures, total daily rainfall and total daily snow accumulation. Some examples of recent MOHYSE applications are presented by Thiboult et al. (2016) and Arsenault et al. (2020a).
The HMETS model is a simple lumped conceptual model with 21 parameters, developed by Martel et al. (2017). This hydrological model accounts for the surface flow, delayed flow, hypodermic flow and groundwater flow. The snowmelt is simulated by the melting and refreezing process of the accumulated snowpack. For the potential evapotranspiration, the McGuinness and Bordne formulation was used (McGuinness and Bordne 1972). The required inputs are continuous daily precipitation, maximum and minimum temperatures, and the calculated potential evapotranspiration. Recent applications cover various studies such as streamflow forecasting (Arsenault et al. 2019) and hydrological modelling (Troin et al. 2018).
The hydrological model calibration was performed over the odd years of the period available for each of the 96 basins. Then, the validation was performed on the even (noncalibrated) years. The idea behind this approach is to account for the climate variability of the full time series by alternating the calibration/validation years. Based on recommendations from previous studies, the Covariance Matrix Adaptation Evolution Strategy (CMAES) was used to calibrate GR4J and MOHYSE (Hansen andOstermeier 1997, Arsenault et al. 2014). HMETS was calibrated using the dynamically dimensioned search algorithm (Asadzadeh andTolson 2009, Huot et al. 2019). The objective function used for the calibration/ validation was based on the modified Kling-Gupta efficiency criterion (KGE; Kling et al. 2012) over the odd/even years between observed and simulated streamflows, and is defined as follows: KGE ¼ 1 À ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where r, β and γ are the correlation coefficient, bias ratio and variability ratio, respectively. CV is the coefficient of variation, μ the mean, and σ is the standard deviation of the streamflow series. The "o" subscript indicates observed data and the "s" subscript indicates simulated data. KGE values range from −∞ to 1, where 1 indicates a perfect fit. According to Knoben et al. (2019), KGE values larger than ≈ −0.41 indicate that the simulation has higher skill than using the mean observations. KGE values above 0.5 were thus considered acceptable in this study. This metric was selected as it has been shown to overcome issues related to mean squared error functions (e.g. Nash-Sutcliffe efficiency), such as underestimating variability and prioritizing runoff peaks (Gupta et al. 2009, Castaneda-Gonzalez et al. 2019. After calibrating and validating the three hydrological models, they were all coupled with the post-processed GCM and CRCM5 ensembles. Thus, 150 streamflow simulations were produced in total (i.e. 25 climate simulations × 2 postprocessing methods × 3 hydrological models).

Flood frequency analysis
This analysis was performed over annual and seasonal daily maximum discharges. Four seasons were considered, winter (December, January and February -DJF), spring (March, April and May -MAM), summer (June, July and August -JJA) and autumn (September, October and November -SON). Each GCM-and CRCM5-driven streamflow simulation was fitted to six probability distributions, presented in Table 3. Different methods were used to estimate each probability model's parameters (see Table 3). These six probability distributions are commonly used in hydrological applications and often recommended in regional guidelines (Cunnane 1989, Castellarin et al. 2012, Escalante-Sandoval and García-Espinoza 2014. For each probability distribution, six return period flows were estimated and used as flood indicators: the (1) 2-year, (2) 5-year, (3) 10-year, (4) 20-year, (5) 50year and (6) 100-year return period flows. The performance of each statistical model was evaluated prior to their consideration in this study (results not shown).

Climate simulation ensembles
Mean annual cycles of precipitation, minimum temperature and maximum temperature were analysed to evaluate the uncertainty spread of all climate simulations used in this study. For each basin, the standard deviation of the 22 postprocessed GCM outputs and three post-processed CRCM5 outputs (25 climate simulations in total) was estimated to quantify the seasonal spread of each climate variable. The seasonal analysis was performed over the winter (DJF), spring (MAM) summer (JJA) and autumn (SON) seasons.

Simulated streamflow ensembles
As described in Section 3.4, all post-processed GCM and CRCM5 outputs were used as inputs for the hydrological models to obtain multiple series of streamflow simulations (150 streamflow simulations in total) for each basin. To measure the spread of seasonal streamflow simulations for each basin, the seasonal standard deviation of the mean annual hydrographs of all simulated streamflows was estimated for the reference and future periods. These seasonal standard deviations were then normalized by each basin's mean annual observed streamflow to obtain a customized CV. This Table 3. Description of probability distributions and their parameter estimation method used in this study.

Probability distribution
Parameter estimation method Log-normal with two parameters Maximum likelihood Log-normal with three parameters Maximum likelihood Gumbel Maximum likelihood Generalized extreme value (GEV) Method of moments Pearson type III Method of moments Log-Pearson type III Method of moments assessment was performed for the winter, spring, summer and autumn seasons.

Flood indicators
As described in Section 3.5, six flood indicators were estimated and used to evaluate the uncertainty contribution of four elements of the hydroclimatic modelling chain in flood simulations. The six flood indicators (2-, 5-, 10-, 20-, 50-and 100year return period flows) were estimated from each time series obtained from combining six probability distributions, three hydrological models, two post-processing methods and 25 climate simulations (i.e. 25 × 2 × 3 × 6 = 900 time series per basin). As performed for the climate and streamflow simulations, the seasonal uncertainty spread of each flood indicator was quantified through a custom CV estimated from a normalized standard deviation for each basin. The analysis was also carried out for the winter, spring, summer and autumn months.

Uncertainty analysis
The assessment of the uncertainty in the flood simulations consisted of decomposing the uncertainty contributions of (1) climate simulations, (2) post-processing methods, (3) hydrological models, and (4) probability distributions on six flood indicators using the variance decomposition method (Déqué et al. 2007, Troin et al. 2018). This method allows isolating and comparing the contribution of each element involved in the hydroclimatic chain and the interactions between them. The total variance associated to one component, such as CRCM5 simulations (referred to as as V(R)), can thus be expressed as follows: where R, P, H and D represent the variance related to the CRCM5 simulations, post-processing methods, hydrological models and probability distributions, respectively. Further details on the total variance calculation and interaction terms of the variance decomposition method are given in the Supplementary material (S2).
The assessment was divided into two main variants in terms of the climate simulations included in the hydroclimatic chain. The first configuration considers the GCM ensemble and the second configuration considers only the CRCM5 ensemble. The rationale behind these two configurations is to evaluate the impact of a small high-resolution ensemble and a larger GCM ensemble on the uncertainty contribution of each element. These two uncertainty evaluations were performed for the annual and seasonal (i.e. winter, spring, summer and autumn) flood indicators (i.e. 2-, 5, 10-, 20-, 50-, 100-year) for the reference and future periods (i.e. 1976-2005, 2041-2070 and 2070-2099).

Climate simulation analysis
As described in Section 3.6, the seasonal uncertainty spread is measured in terms of standard deviation on mean seasonal precipitation, maximum temperature and minimum temperature envelopes obtained from the GCM and CRCM5 outputs post-processed by the QM and DBC methods. Figure 3 presents the seasonal standard deviation of precipitation and maximum and minimum temperature envelopes for two groups of basins based on their main flood-generating process: (1) 50 Quebec snow-dominated basins (panels a) and (2) 46 Mexican rain-dominated basins (panels b). For both groups, the standard deviation of precipitation, maximum temperature and minimum temperature is presented in the upper, middle and lower panel, respectively. Each climate variable (each row) presents the seasonal results for the reference and future periods, from left to right. Overall, the standard deviation box plots of GCM (in grey) and CRCM5 (in blue) climate variables show increasing values when moving towards future periods over both groups of basins. Yet the magnitude of these increasing spreads varies among seasons, climate variables and basins. In terms of precipitation, both climate ensembles show larger standard deviations during the summer and autumn months, with the GCM ensemble consistently showing slightly larger values. Yet the difference between GCM and CRCM5 ensembles is more important over the rain-dominated basins. Regarding maximum temperature spread, the snowdominated basins present similar standard deviations in the four seasons during the reference period, with slightly higher values during the winter. However, over future periods the GCM ensemble presents larger standard deviations during spring and autumn months, while the CRCM5 ensemble shows larger spreads during the autumn. Some differences are also observed between ensembles over the future periods. During the 2041-2070 period, the GCM ensemble shows slightly larger standard deviations, while during the 2070-2099 period the CRCM5 ensemble displays larger standard deviations during spring, summer and autumn months. Over the rain-dominated basins, the maximum temperature spreads show small differences between seasons and climate ensembles. However, during the future periods larger spreads are observed during the summer. This is especially visible during the farthest period, where the CRCM5 ensemble shows larger standard deviations than the GCM ensemble. The minimum temperature standard deviations show very similar behaviours to those observed for maximum temperatures, except for the rain-dominated river basins where the GCM ensemble spread remains important over future periods. The few basins with KGE validation values lower than 0.6 were retained in the study as at least one of the three hydrological models showed higher skill over those basins. No clear difference was observed between the hydrological models' performance, yet slightly higher performances were observed from GR4J and MOHYSE over the snow-and rain-dominated basins, respectively (results not shown).

Streamflow projections
To present an overview of the hydrological diversity of the study area, Fig. 4 shows the mean annual hydrographs created using observed and simulated climate data for the three selected basins. The GCM-(grey) and CRCM5-driven (blue)  mean annual hydrographs are presented for a snowdominated basin (panels a), a dry rain-dominated basin (panels b) and a humid rain-dominated basin (panels c). The streamflow simulations obtained with all post-processing methods and hydrological models are included in the envelopes. Each basin (each row) displays the results for the reference and future periods from left to right. This figure shows that the observed mean annual streamflow is covered by the simulated envelopes among the three selected basins during the reference period. Overall, the GCM-driven streamflow ensemble shows wider spreads than the CRCM5-driven streamflow ensemble. It is also observed that the envelopes are wider during future periods over all basins. Looking at each basin, it is observed that the range of uncertainty spread clearly varies between them. The snow-dominated basin shows a smaller spread compared to the wider envelopes of the raindominated basins. During spring and autumn months of the reference period, it is observed that the seasonal peaks of the snow-dominated basin were not always covered by the simulated envelope. During the same seasons, the envelopes show particularly wider spreads during the future periods. This is especially observed for the GCM-driven spring peaks (in grey).
In terms of spread, the dry and humid rain-dominated basins show a clearly wider uncertainty envelope during the summer and autumn months (rain season). This is especially observed over the humid basin, with values varying from ≈0 to 500 m 3 s −1 during the reference period, and up to 1000 m 3 s −1 during the farthest horizon.
To quantify the uncertainty spread of mean seasonal streamflows of each basin, a customized seasonal CV was estimated from all GCM-and CRCM-driven mean annual hydrographs. The GCM-and CRCM5-driven mean annual hydrographs include streamflow simulations using all postprocessing techniques and hydrological models. As presented in Section 4.1, this analysis shows the results for the snow-and rain-dominated basins. Figure 5 shows the CV box plots of the mean annual streamflow envelopes simulated with the GCM (grey) and CRCM5 (blue) ensembles for the reference and future periods for the snow-dominated (panels a) and raindominated basins (panels b).
Looking at the results presented for the two groups of basins, three main differences between the groups are observed. First, the streamflow simulation spread using the GCM ensemble is always wider than that obtained using the CRCM5 ensemble. Second, the snow-dominated basins show larger spreads during the spring months. Third, rain- dominated basins show clearly larger spreads during the rainy season (summer and autumn months). Regarding temporal variations, both basin groups show generally higher CVs over the future horizons. Over the snow-dominated basins, both future periods show similar median seasonal CVs, except during spring where slightly higher values are observed in the 2041-2070 period. In contrast, the rain-dominated basins show slightly larger spreads during the summer over the 2041-2070 period, while during the 2070-2099 period larger CVs are presented over the autumn months. This behaviour is observed with both the GCM-and CRCM5 ensembles.

Flood indicators
To quantify the uncertainty spread of each flood indicator, the spread of seasonal return levels estimated with the GCM and CRCM5 ensemble, two post-processing methods, three hydrological models and six probability distributions was measured with a customized CV for each basin. This analysis is presented for the two groups of basins, as before. Figures 6 and 7 show the seasonal CV box plots for the six flood indicators estimated from the GCM-(grey) and CRCM5-driven (blue) streamflows for the snow-and raindominated basins, respectively. The CV box plots over the snow-dominated basins (Fig. 6) generally show larger uncertainty spreads of all flood indicators during the spring months, the season of annual maximum flood peaks. Over future periods, the box plots show generally higher medians and wider distributions. This behaviour is also observed with increasing return periods. In other words, the spread of return levels increases with an increased return period, over both the GCM and CRCM5 ensembles. Over the raindominated basins (Fig. 7), the CV box plots of seasonal return levels show larger values during summer and autumn, the seasons of annual maximum flood peaks. This behaviour is observed with the GCM and CRCM5 ensembles over all flood indicators. As observed over the snowdominated basins, the spread increases when moving to future periods and with increasing return periods. Overall, the GCM-and CRCM5 ensembles show similar temporal behaviour over the six flood indicators, with the GCMdriven flood indicators consistently showing larger CVs and wider box plots.

Variance decomposition
To quantify the uncertainty contribution of each element of the hydroclimatic modelling chain on flood simulations, two main variance decomposition analyses were performed on six flood indicators estimated from each streamflow projection. As described in Section 3.6, two variance decomposition configurations were considered, one using the GCM ensemble (22 simulations) and the other one using the CRCM5 ensemble (three simulations) as the climate simulations' ensemble element. Both analyses were performed for each season (DJF, MAM, JJA and SON) and period (1976-2005, 2041-2070 and 2070-2099) over all 96 basins.
Following the same structure as in the previous sections, the results are presented per group of basins. All variance decomposition results of this section are presented in terms of the total variance associated to each element of the hydroclimatic modelling chain. As described in Section 3.6.4, the total variance accounts for the variance contribution of the source alone (e.g. R for CRCM5 run) and its interactions with the other elements of the modelling chain (e.g. RP, for CRCM5 and post-processing interaction; see Equation 3). Therefore, the percentages of total variance associated to each uncertainty source presented in the following figures (Figs 8 and 9) will not add up to 100%. The rationale behind this is to present the total variance contributed by each of the four elements of the hydroclimatic chain to allow ranking them by their total contribution to uncertainty in this study's modelling chain. Results in this section are presented for the CRCM5 ensemble configuration only since the major conclusions for the GCM ensemble are the same (see Supplementary material, Figs S1-S4). A section of the discussion is nonetheless dedicated to using the CRCM5 ensemble vs. the GCM ensemble. Figures 8 and 9 show box plots of the seasonal percentages of total variance contributions considering the contribution of the CRCM5 ensemble for snow-and rain-dominated basins, respectively. Figure 8 shows that, as observed in the uncertainty assessment using the GCM ensemble configuration (see Supplementary material, Fig. S1), the variance decomposition over the snow-dominated basins shows hydrological modelling as one of the main uncertainty sources.

Uncertainty analysis with CRCM5 ensemble
Hydrological modelling uncertainty is consistently the first or second main contributor in most seasons, flood indicators and periods. The dominance of hydrological models on the overall variance is especially large during the spring. Overall, the CRCM5 ensemble shows the second largest uncertainty contribution over many cases. An increase in its contribution is observed when moving towards future periods, especially during the summer, where the CRCM5 ensemble shows larger contributions for return periods smaller than 50 years. The uncertainty contribution of probability distributions increases with increasing return periods over the three periods, as also observed over the variance decomposition analysis using the GCM ensemble (see Supplementary material,   S1). In some cases, probability distributions become the first or second main uncertainty source, especially for the 50and 100-year flood indicators. Post-processing methods show the smallest uncertainty contribution and small increases over the future horizons (the same is observed for the GCM ensemble configuration).
The uncertainty analysis over the rain-dominated basins, presented in Fig. 9, reveals that the main uncertainty contributors vary between flood indicators, seasons and periods. Overall, flood indicators smaller than the 50-year return period show the CRCM5 ensemble as the main source of uncertainty during summer and autumn seasons over the future periods. During the reference period, however, hydrological modelling presents slightly larger median percentages of variance contributions. During winter months, no clear behaviour is observed, with the CRCM5 ensemble, hydrological models, and probability distributions showing higher contributions over different cases. During the spring, probability distributions and hydrological models are observed to be the main uncertainty contributors, with probability distributions showing higher contributions in most flood indicators over reference and future periods. In addition, the uncertainty contributions of probability distributions become more important with increasing return period, as also observed for the GCM ensemble configuration (see Supplementary material, Fig. S2). In line with the GCM ensemble analysis, postprocessing methods are the smallest source of uncertainty. However, their contribution increases when moving towards future periods.
The results from the annual maximum flood analyses are presented in Figs 10 and 11 for the snow-and rain-dominated basins, respectively. Each figure presents maps of the highest variance source between the CRCM5 ensemble (R, in blue), post-processing methods (P, in yellow), hydrological models (H, in green), and probability distributions (D, in red) on six return periods, and for the six flood indicators (top to bottom).  The results for snow-dominated basins (see Fig. 10) show a larger number of basins displaying hydrological models as their major uncertainty source over most flood indicators. As observed in the seasonal analysis using the CRCM5 ensemble, hydrological modelling is consistently shown to be the main uncertainty source, with few exceptions over the largest return periods -especially over the 100-year indicator, with probability distributions identified as the largest uncertainty source over some basins. However, most of these basins show hydrological models are the main uncertainty source when moving towards future periods (as also seen with the GCM ensemble; see Supplementary material, Fig. S3).
Compared to the snow-dominated basins, the results over the rain-dominated basins (see Fig. 11) are more varied, especially over the reference period. During this period, CRCM outputs, hydrological models and probability distributions are among the main uncertainty sources for different basins. Yet the CRCM5 ensemble is consistently identified as the largest uncertainty source in the 2-, 5-and 10-year return periods during both future periods. As observed in previous analyses, the influence of probability distributions increases with an increasing return period. This is observed over the 20-, 50-, and 100-year return periods, where the number of basins showing probability distributions as their main uncertainty source increases over reference and future periods. The uncertainty contribution of post-processing methods, as observed in previous analyses, is the smallest compared to the other three sources. Thus, it is not observed as a dominant source over the rain-dominated basins. Similar results are also obtained for the GCM ensemble configuration, except with a clearer domination of the GCM ensemble as the main uncertainty source.

Impact of hydroclimatic conditions on uncertainty contribution
Our findings reveal that the leading sources of uncertainty in flood simulations are consistently related to the basin's flood-generating process. The hydrometeorological diversity among the 96 basins included in this study allowed the identification of two main groups of basins: snow-and raindominated basins. Over the snow-dominated basins, hydrological models were consistently found to be the largest contributors among six flood indicators. Figure 10 shows a clear picture of this finding, especially over the future periods. This is in line with other studies, where hydrological models were observed to be the largest contributors of uncertainty in snow-dominated regimes (Vetter et al. 2017, Giuntoli et al. 2018. The uncertainty brought by hydrological models to flood simulation can be influenced by different elements. Flooding over snow-dominated regions is mainly driven by snowmelt occurring with the increasing temperatures during spring (at these latitudes). Thus, one of the reasons for the larger uncertainty contributions of hydrological models in snow-dominated basins could be the inherent structural uncertainties of hydrological modelling associated with snowmelt. This well-known uncertainty might arise from the simplifications used to represent hydrological processes, as well as the parametric uncertainty related to hydrological model calibration (Kuczera et al. 2010). It is therefore likely that different formulations used to represent the snowpack accumulation, melting and refreezing processes could influence the different high flow simulations between hydrological models.
Another possibility could be associated with the impacts of climate change on seasonality. Some studies have suggested that climate change could influence future flood seasonality of snow-dominated catchments (Vormoor et al. 2015(Vormoor et al. , 2016. This issue could affect the "parameter transferability" of hydrological models. This means that calibrating hydrological models over certain reference conditions can affect hydrological models' performance over different future conditions than the ones used to calibrate them (Seiller et al. 2012, Zhu et al. 2016. It is therefore expected that a change in basins' seasonality might influence hydrological models' response in future periods. Thus, these results underline not only the importance of using hydrological model ensembles, but also the need to evaluate models' and methods' adequacy for more robust projections and improved transferability under a changing climate, particularly in light of the knowledge that removing low-performing models can impact the magnitude of the projected hydrological changes (Wen et al. 2020).
In contrast, over the rain-dominated basins, climate simulations were consistently found to be the highest contributor to uncertainty in flood simulations. This agrees with various studies evaluating uncertainty sources on streamflow simulations over watersheds with similar flood-generating processes (Kay et al. 2009, Gosling et al. 2011, Vormoor et al. 2015; it was thus expected that flood simulations over rain-dominated basins would be more sensitive to the climate simulationsespecially over this type of basins where floods are mainly driven by extreme rainfall events. Additionally, the analysis of the climatic uncertainty spread presented in Fig. 3 clearly shows the larger spreads of precipitation simulations over the rain-dominated basins than over snow-dominated basins. This result highlights the need for further studies on the uncertainty associated with the simulation of convective precipitation events, and on the potential added value of using a convectionpermitting climate (Reszler et al. 2018).

Temporal and seasonal differences in uncertainty contribution
The results indicate that the uncertainty contribution of the hydroclimatic chain elements was related not only to the basins' flood-generating process, but also to seasons and periods. Results over both basin groups showed that the leading uncertainty sources during the reference period sometimes changed when moving towards future periods. This temporal evolution of uncertainty has been evaluated and discussed in some studies that, in line with this study's results, have shown that the uncertainty related to a certain source is likely to vary over time (Giuntoli et al. 2018, Shen et al. 2018. Moreover, it is observed that the overall uncertainty increases with time (Shen et al. 2018). This is clearly observed in Fig. 5, where mean seasonal streamflows show increasing spreads over future periods. In terms of seasonality, the results show that the uncertainty contribution of each source varies depending on the season. This is observed over basin groups, especially during the future periods (see Figs 8 and 9). For instance, while the leading source of uncertainty on spring flooding over the snow-dominated basins was mainly hydrological modelling, climate simulations (i.e. GCM and CRCM5 ensembles) were consistently the leading source during other seasons (e.g. summer and autumn months). The rain-dominated basins also presented seasonal variations. During the summerautumn flooding season, climate simulations were consistently the dominant source of uncertainty. Yet during drier winterspring months, hydrological modelling, climate simulations and probability distributions all showed high uncertainty contributions. This seasonal difference could be explained by the different dominant hydrological processes during each season. For instance, during the summer all basins show climate simulations as the main uncertainty source -in a season where flooding over all basins is mainly driven by rainfall events. This can thus be explained by the larger uncertainty associated with the representation of convective events on GCMs and RCMs. At the same time, the post-processing methods may also be playing a role in precipitation simulation uncertainty. Studies have shown that if a given precipitation simulation is too biased, the bias-adjustment method may be unable to overcome it, and will consequently produce biased hydrology, particularly in extremes (Chen et al. 2013a). This can explain the generally larger hydrograph spreads in raindominated basins over the reference periods (see Fig. 4), as well as their slightly larger post-processing uncertainty contributions compared to snow-dominated basins (see Fig. 9). Furthermore, as presented in the description of postprocessing methods (see Supplementary material, S1), one of the main differences between the selected methods is the precipitation adjustment, particularly in their extremes, which can be thus linked to the observed impacts in rain-dominated basins.
Another example of seasonal variations is observed during the dry winter-spring season of the rain-dominated basins, where hydrological modelling is consistently the main source of uncertainty. During dry seasons, potential evapotranspiration estimation plays an important role in streamflow estimation. Thus, the dominance of hydrological models during dry months could be explained by the uncertainty related to the estimation of potential evapotranspiration, which has been identified as one of the main contributors to hydrological modelling uncertainty (Oudin et al. 2005, Dallaire et al. 2021.

Flood indicator impacts on uncertainty sources
This study evaluated the uncertainty contributions of four elements of the hydroclimatic chain on six flood indicators: the 2-, 5-, 10-, 20-, 50-, and 100-year return periods. Our results showed that the variance contribution of each source varied between flood indicators. This behaviour was especially observed for the uncertainty related to the choice of probability distribution for the return level estimations. It was generally observed that the uncertainty contributions of probability distributions increased when the return period increased. This is clearly displayed in the seasonal analyses of snow-and raindominated basins presented in Figs 8 and 9. This behaviour could be explained by the uncertainty associated with the length of the data series used for the flood frequency analysis. Different studies have shown that using short runs of data (e.g. 30 years or less) to estimate larger return periods (e.g. 100year) might result in larger uncertainties (Hu et al. 2019). In this study, three 30-year streamflow series were used: the 1976-2005, 2041-2070 and 2070-2099 periods. We thus expected to observe larger uncertainties related to their estimation on the largest return periods (i.e. 50-year and 100-year). It can therefore be recommended to further study the uncertainties associated with flood frequency analyses, such as using peaks-over-threshold approaches to increase the length of streamflow series.

GCM ensemble vs. CRCM5 ensemble
The present study evaluated the uncertainty contribution of climate simulations using two different ensembles, (1) an ensemble of 22 GCM simulations and (2) an ensemble of three high-resolution CRCM5 simulations. The aim of these comparisons was not to evaluate GCM ensembles vs. highresolution RCM simulations, but to compare their impact on the uncertainty analysis. It is important to recall that the size of the RCM ensemble was limited by the availability of simulations covering Mexico (which is not the case with the regular CORDEX CAM-domain simulations). Nonetheless, this only reflects the reality of working with generally more limited RCM ensembles with respect to the larger available GCM ensembles. The idea behind this was to evaluate the impact of the choice of climate simulation ensemble on the uncertainty contributions to flood simulations. Therefore, generalizations based on these results should be made cautiously, as a small ensemble -like the CRCM5 ensemble used in this study -is not robust enough compared to the more robust GCM ensemble results. Similarities were, however, observed between the two main variance decomposition configurations, especially over the snow-dominated basins. Both analyses showed hydrological models and climate simulations were the leading sources of uncertainty for snow-and raindominated basins, respectively. However, the impact of using the GCM ensemble against the CRCM5 ensemble on the uncertainty contributions was different between the two basin groups. The snow-dominated basins showed consistent dominance of hydrological models in both analyses (GCM or CRCM5), especially during spring floods. Over the raindominated basins, however, the dominance of climate simulations was not as clear when using the CRCM5 ensemble as when using the GCM ensemble (see Supplementary material, Fig. S4). The results using the larger GCM ensemble showed a clear dominance of climate simulation in contributing to uncertainty, while the CRCM5 ensemble results showed more varied results, especially for return periods larger than the 20-year period. These results were expected since the smaller CRCM5 ensemble presented narrower streamflow spreads. This is observed in Fig. 6, where the seasonal CVs of the mean annual hydrographs of both basin groups showed smaller spreads with the CRCM5 ensemble than with the GCM ensemble. This could possibly mean that the streamflow uncertainty was underestimated when using the small CRCM5 ensemble. This is especially important to highlight because it has become a common approach in the literature to use a couple of high-resolution RCM simulations to evaluate climate change impacts on future flooding. It is thus important to further evaluate whether a small RCM ensemble can be representative of the streamflow uncertainty or if it is only a subsample of a more representative uncertainty spread, particularly over regions showing a larger climatic uncertainty like that observed over the rain-dominated basins in this study.
In contrast, the comparisons between the GCM and CRCM5 outputs' uncertainty spreads (i.e. precipitation, maximum and minimum temperatures) showed that the small CRCM5 ensemble sometimes displayed larger spreads than the GCM ensemble. For instance, the seasonal uncertainty spreads of temperature envelopes over the snow-dominated basins (Fig. 3, panel a) show that in some cases the threemember CRCM ensemble covered a wider uncertainty envelope than the GCM ensemble. This is observed with maximum and minimum temperatures during the spring, summer and autumn months of the 2070-2099 period. Similarly, the CRCM5 ensemble for maximal temperature displayed slightly larger spreads for the summer months over the raindominated basins (Fig. 3, panel b). This behaviour could be explained by the improved representations of high-resolution RCM simulations, e.g. of topography, underlining the need to further study the representation of processes in climate simulations. On the other hand, this study did not account for the natural climate variability in climate simulations. And, as presented in recent studies (Chegwidden et al. 2019, Hingray et al. 2019, Deser et al. 2020, the GCM/CRCM5 spreads are likely to be a combination of their uncertainty and natural climate variability, particularly in extreme climate projections.

Limitations
This study is unavoidably limited by its various methodological choices. First, this study only covers some of the hydrometeorological regions of the world. Other hydrological regimes, such as those located in the equatorial hydrobelt (Meybeck et al. 2013) were not considered, thus different results can be expected. The hydroclimatic processes that drive streamflow generation and flooding are location-dependent. It can therefore be expected that the selected climate simulations, hydrological models, snow and potential evapotranspiration formulations will yield different results in different regions. Yet applying a common methodology over numerous regions allows more robust conclusions and the identification of local sensitivities.
The use of a subset of the GCM and RCM simulations available can also influence and limit the results and conclusions. For instance, high-resolution simulations issued from the latest CMIP6 were not included in this study. Different results could be expected in that case as high-resolution climate simulations have shown better process representation in smaller catchments (Castaneda-Gonzalez et al. 2019) and lower temperature spreads have been observed compared to the CMIP5 ensemble (Song et al. 2021).
A plethora of climate simulation post-processing methods have been and continue to be developed, yet only two methods were considered in this study. These methods were selected based on satisfactory performances obtained in previous studies. However, this choice can influence the presented results as some studies have shown that state-of-the-art methods still have difficulties in correcting variability and can misrepresent regional feedbacks (Maraun et al. 2017). This can be particularly relevant in correcting precipitation series, which -it has been shown -can have a greater impact on rain-driven regions and seasons (Chen et al. 2013a(Chen et al. , 2013b.
This study was also limited by only using lumped hydrological models. This limitation mainly resulted from the high demand in terms of computational power and data that would have been necessary to apply distributed/physically-based hydrological models over the numerous basins used. It is important to highlight that the conclusions thus obtained might be different, as distributed/physically-based hydrological models are expected to more reliably represent future climate conditions (with their own uncertainties), particularly on complex spatiotemporal events as floods (Ajami et al. 2004, Pechlivanidis et al. 2011. At the same time, using lumped hydrological models led to the need to average GCM and CRCM5 outputs. This process can impact the magnitude of the simulated climate extremes, but can also have the advantage of minimizing the impact of anomalous/biased simulated points.

Conclusions
This study investigated the contribution of four elements of the hydroclimatic modelling chain to uncertainty in flood simulation: (1) climate simulations, (2) post-processing techniques, (3) hydrological models and (4) probability distributions. Two main variants of the hydroclimatic modelling chain were evaluated in terms of climate simulations. One configuration used a large GCM ensemble (22 simulations) and another one used a smaller high-resolution CRCM5 ensemble (three simulations). Two post-processing methods, three lumped hydrological models and six probability distributions were included for both configurations of the hydroclimatic modelling chain. Moreover, to cover a wide range of hydroclimatic conditions and periods, this study was performed over 96 basins with contrasting hydroclimatic and flooding regimes, for a reference period  and two future periods (2041-2070 and 2070-2099). Using a variance decomposition approach, the annual and seasonal uncertainty contribution of each source was quantified for both configurations on six flood indicators (2-, 5-, 10-, 20-, 50-and 100-year return periods) and for each period. The results showed a relationship between each region's flood-generating process and the overall uncertainty contributions of each source. It was shown that hydrological models and climate simulations consistently contributed the most uncertainty to flood simulations for snow-and rain-dominated basins, respectively. This difference is most likely due to the different flood-generating processes of the basins. Therefore, the assumption that climate simulations are the main uncertainty source in hydrological studies should be reconsidered and further analysed. The temporal and seasonal analyses also revealed that the uncertainty contribution of each source varied depending on the season and period of analysis. The main uncertainty contributors during the reference period were consistently different over the future periods. Yet the uncertainty related to the choice of probability distribution consistently increased with an increased return period, being the first and second greatest contributors to uncertainty for the largest return periods (i.e. 50-year and 100-year). The clear variations of uncertainty contributions at the basin level discussed in this study highlight the need to further research the uncertainty contributions of various sources and possibly identify reducible uncertainties such as better process representations in climate and hydrological models, which might lead to increasing our confidence in and understanding of future flood projections.