Data assimilation into land surface models: the implications for climate feedbacks

Land surface models (LSMs) are integral components of general circulation models (GCMs), consisting of a complex framework of mathematical representations of coupled biophysical processes. Considerable variability exists between different models, with much uncertainty in their respective representations of processes and their sensitivity to changes in key variables. Data assimilation is a powerful tool that is increasingly being used to constrain LSM predictions with available observation data. The technique involves the adjustment of the model state at observation times with measurements of a predictable uncertainty, to minimize the uncertainties in the model simulations. By assimilating a single state variable into a sophisticated LSM, this article investigates the effect this has on terrestrial feedbacks to the climate system, thereby taking a wider view on the process of data assimilation and the implications for biogeochemical cycling, which is of considerable relevance to the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report.


Introduction
Pioneering work such as Charney et al. (1975) on the link between vegetation loss in sub-Saharan Africa and drought persistence highlighted the role that feedback mechanisms between the land surface and the atmosphere play in determining climate. Numerous studies (e.g. Zeng et al. 1999, Friedlingstein et al. 2001) have reinforced our knowledge of how land surface properties change in response to climatic forcing, the magnitude of which itself is influenced by the land surface changes. Indeed, vegetation change is accompanied by soil moisture change, which can lead to changes in properties such as surface albedo and evaporation, resulting in precipitation changes through soil moisture feedback (Koster et al. 2004, Zhang et al. 2008, Liu et al. 2010. These complex feedbacks between the terrestrial ecosystem and climate have been extensively studied using land surface models (LSMs), but remain poorly understood.
LSMs calculate the surface to atmosphere fluxes of heat, water and carbon, and update the state variable of the surface and subsurface layers . They are crucial components of general circulation models (GCMs), influencing cloud cover, precipitation and atmospheric chemistry, with these coupled systems representing key tools for predicting the likely future states of the Earth's system under 618 D. Ghent et al. anthropogenic forcing (IPCC 2007). However, representation of highly complex biophysical processes in LSMs over highly heterogeneous land surfaces with limited collections of mathematical equations, and the tendency of overparameterization, infers a degree of uncertainty in their predictions (Pipunic et al. 2008). A substantial portion of this uncertainty may be attributed to the representation of land surface feedbacks within coupled climate models (Notaro 2008).
Even if atmospheric greenhouse gas concentrations were stabilized, the longmemory effect associated with the climate system means that anthropogenic warming would continue through future decades and centuries. However, large uncertainties remain with respect to our understanding of biogeochemical cycle feedbacks, diminishing our ability to model climate forcing accurately. Significant progress has been made in reducing uncertainties associated with atmospheric change, but further consideration of the long-term changes in atmospheric chemistry and the consequences of the associated climate forcing remain a priority (Dameris et al. 2005, Cracknell and Varotsos 2007. To this end, improving the estimations in LSMs of feedbacks to the climate system represents a pertinent objective. Data assimilation may be viewed as an optimum solution for such improvements. Data assimilation is a method of minimizing some of the uncertainties inherent in all LSMs due to their approximation of the complexity in the terrestrial ecosystem. Observations, if available, from sources such as Earth Observation (EO) satellites, can be integrated into the model to update a quantity simulated by the model with the purpose of reducing the error in the model formulation. The correction applied is derived from the respective weightings of the uncertainties of both the model predictions and the observations. There has been much research focused on data assimilation into LSMs in previous years. Particular attention has been paid to assimilation of land surface temperature (LST) to constrain simulations of soil moisture and surface heat fluxes. These assimilation studies include the use of variational schemes (Caparrini et al. 2003) and variants of the Kalman filter sequential scheme, such as the ensemble Kalman filter (EnKF; Crosson et al. 2002, Huang et al. 2008, Pipunic et al. 2008, Quaife et al. 2008, first proposed by Evensen (1994).
Coupled GCM land atmosphere models are important tools for climate change prediction and for assessing climate feedbacks over future decades and centuries. However, because of large uncertainties with respect to these feedbacks, an example being cloud formation, a concerted effort is required to improve the modelling of water, energy and carbon exchanges in these coupled systems, by optimizing prediction of key variables, such as soil moisture. The assimilation of observations to improve the quantification of soil moisture has long been an objective of the hydrological community (Crosson et al. 2002, Crow and Wood 2003, Huang et al. 2008. Margulis and Entekhabi (2003), for instance, assimilated skin and air temperature, plus relative humidity, to optimize the water and energy budgets of a coupled land surface-atmospheric boundary layer model. Pipunic et al. (2008) also demonstrated enhanced model estimates as a result of integrating EO observations into their land surface scheme, with improved predictions of latent and sensible heat fluxes. This focus on the moisture states of models illustrates the importance attributed to the longer memory characteristics in coupled systems. Optimization, as a result of data assimilation, thus presents an opportunity to improve our ability to predict water and energy fluxes from the land surface to the atmosphere, with the prospect of reducing climate feedback uncertainty. Moreover, the application of data assimilation in understanding and quantifying feedbacks in the climate system is not just restricted to landatmosphere interactions. The role of marine sediments and ocean biogeochemistry in the long-term regulation of atmospheric carbon has driven the development of data assimilation techniques in these systems, resulting in improved parameter estimation (Annan et al. 2005) and enhanced calibration of ocean atmosphere models (Ridgwell et al. 2007) through, for example, the integration of phosphate and alkalinity observations. However, as in any coupled chaotic system, minor changes in a single characteristic can have far-reaching effects.
This paper considers the sensitivity of related characteristics to the model update of a single variable, through the process of data assimilation. In §2, LST over two regions of the African continent (an area of West Africa (17 • W to 20 • E longitude, 4 • N to 20 • N latitude and an area of North Africa (10 • W to 33 • E longitude, 20 • N to 30 • N latitude)), is integrated into the state-of-the-art LSM JULES (Joint UK Land Environment Simulator), developed by the UK Met Office, during the period 1 January to 31 May 2007. The effect on soil moisture is discussed in §3, whereby the model simulations are compared with European Remote Sensing Satellites (ERS-1 and ERS-2) scatterometer top soil moisture observations. Finally, in §4, the implications of the data assimilation exercise on surface energy, water and carbon fluxes are considered.

LST
LST is the radiative skin temperature of the land, with wide-ranging influences on several biophysical processes of the terrestrial biosphere, such as the partitioning of energy into ground, sensible and latent heat fluxes (Sellers et al. 1997, Huang et al. 2008) and the emission of longwave radiation from the surface (Rhoads et al. 2001, Trigo et al. 2008, the physiological activities of leaves (Sims et al. 2008), surface dryness (Sandholt et al. 2002, Snyder et al. 2006, and stomatal conductance (Sellers et al. 1997), and its reported response as an effect of the El Niño Southern Oscillation (ENSO; Manzo-Delgado et al. 2004). Sensible heat flux (H) is a function of the difference between surface and air temperature (Rhoads et al. 2001), whereas latent heat flux (LE) is a function of surface temperature because of the influence LST expends on vapour pressure deficit (Hashimoto et al. 2008). Within the surface balance equation, LE and H are tightly coupled, and an increase in one is usually at the expense of the other.
LST also has a role to play in the topic of fire modelling within LSMs. For example, it is related to fuel moisture content (Chuvieco et al. 2004), and in combination with other environmental variables can be applied in predicting fire occurrence and propagation (Manzo-Delgado et al. 2004). This is particularly important for Africa, where climate scenarios remain highly uncertain (Williams et al. 2007), most notably in the fire-dominated savannas. Here cloud-free LST pixels from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) instrument onboard the Meteosat Second Generation (MSG) geostationary satellites, centred over the equator at an altitude of 36 000 km, are integrated into the JULES model over two regions of Africa (West Africa and North Africa) for a 5-month period in 2007.

MSG-SEVIRI data
SEVIRI acquires an image every 15 min, at a spatial resolution of between 3 and 5 km for the African continent. LST is generated by the Satellite Application Facility on Land Surface Analysis (LandSAF) using a Generalized Split Window (GSW) algorithm (Madeira 2002) for channels IR10.8 and IR12.0, as a linear function of clear-sky top-of-the-atmosphere (TOA) brightness temperatures. Within each scene, bareground and vegetation emissivities, previously assigned to land cover classes (Peres and DaCamara 2005), are averaged and weighted with the fraction of vegetation cover retrieved by the LandSAF (Garcia-Haro et al. 2005) to estimate channel surface emissivity.
Independent data assessment of the GWS algorithm against a set of radiative transfer simulations indicated a bias-free algorithm, with random errors increasing in response to increasing viewing zenith angle (Trigo et al. 2008), and with a reported accuracy of 1.5 K (Sobrino and Romaguera 2004) for most simulations between nadir and 50 • . Because clouds scatter and absorb infrared radiation, LST retrieval requires identification of cloudy/part-cloudy pixels. Clear-sky pixels are identified by the LandSAF through the application of a cloud mask, which makes use of software developed in support to Nowcasting and Very Short-Range Forecasting Satellite Application (NWC SAF; http://www.nwcsaf.org); with this information being represented in quality control flags. A complete description of the LST retrieval algorithms can be found in the LandSAF product user manual (available at http://landsaf.meteo.pt/).

Model description and data assimilation
The JULES land surface model, which has been described elsewhere , Alton et al. 2007) in detail, is the community version of MOSES (Met Office Surface Exchange System). It is becoming increasing important to the UK ecological modelling community because it can be coupled to the Hadley Centre GCM or can be driven by its output. In brief, JULES is terrestrial gridbox model of a fine temporal resolution, in which each gridbox is composed of nine surface tiles: five are plant functional types (PFTs; broadleaf trees, needleleaf trees, C 3 grasses, C 4 grasses and shrubs) and four are non-vegetation types (urban, inland water, bare soil and ice). Each gridbox is profiled into four soil layers that are homogeneous over the gridbox, with soil thermal characteristics being functions of soil moisture. Prognostic soil fields are updated from values for the previous time-step using the mean heat and water fluxes over the time-step, whereby the total soil moisture content within each soil layer is incremented by the evapotranspiration extracted directly from the layer by plant roots, the diffusive water flux flowing in from the layer above, and the diffusive flux flowing out to the layer below . Furthermore, the Clapp and Hornberger (1978) equations for hydraulic conductivity and soil water suction are applied in the model. The physical processes are driven by meteorological data, which update the state variables typically every 30 or 60 min, whereas the biophysical parameters remain constant over the duration of each model run. The output from JULES includes numerous variables depicting the state of the land surface in terms of water, energy and carbon fluxes. At each time-step the grid box LST is derived from the sum of the individual tile surface temperatures multiplied by their respective fractional covers within the grid box. Thus, the surface energy balance equation for each tile, defined by Cox et al. (1999), is given by equation (1): where SW N is the net downward shortwave radiation, which is derived from the surface albedo, LW ↓ is the downward longwave radiation, σ is the Stefan-Boltzmann constant, T s is the surface temperature, H is the sensible heat flux, LE is the latent heat flux, and G 0 is the heat flux into the ground.
Here LST was assimilated into JULES for a 5-month period from 1 January to 31 May 2007, by applying EnKF sequential data assimilation, which uses a Monte Carlo approach. The exact methodology, which has been applied previously (Ghent et al. , 2010, is described comprehensively in Ghent et al. (2010), with the EnKF approach implemented according to Evensen (2003). To give a brief overview: at each time-step, model estimates are nudged towards the observations based on the respective state and observation error covariance matrices, P and R. The correction to the forecast state vector is determined by the Kalman gain matrix K defined by: where H is the observation operator relating the true model state to the observations, taking into account the observation uncertainty. The Kalman gain matrix is applied to the difference between the model estimates and the observations according to equation (3): where ψ a is the updated model estimate, ψ f is the forecast state vector, ψ t is the true model state, and ε is the observation uncertainty. The estimate of the model state following the update is taken as the mean of the ensemble members, with the uncertainty indicated from the variance around the mean. The observation error covariance matrix is a measure of the ensemble spread of observations, with randomly generated perturbations constructed using the observation uncertainty of 1.5 K for SEVIRI LST (Sobrino and Romaguera 2004). The distribution of the model ensemble spread, from an ensemble size of 50 in this case, determines the state error covariance matrix, thereby avoiding the expensive integration of the standard Kalman filter. In this study, only perturbations with respect to the meteorological forcing data, generated from normally distributed random number perturbations with zero mean and unit variance, following the Box-Muller transform method (Box and Muller 1958) were considered. Uncertainties in model parameterization or initial conditions were not taken into account. Meteorological forcing variables were taken from generated 6-hourly National Centers for Environmental Prediction (NCEP) reanalysis datasets (Kalnay et al. 1996), with precipitation data calibrated from monthly Tropical Rainfall Measuring Mission (TRMM) precipitation data (Kummerow et al. 1998). The model itself was run at an hourly time-step over the 5-month assimilation period, with a spatial resolution of 1 • × 1 • . Land-cover change was not considered in this experiment, so the fractional coverage of the surface tiles was derived from International Geosphere-Biosphere Programme (IGBP) land-cover classes and mapped onto JULES according to Dunderdale et al. (1999). Initial conditions were set from an equilibrium state following a 200-year spin-up cycle, with soil parameters derived from the International Satellite Land-Surface Climatology Project (ISLSCP) II soil data set (Global Soil Data Task 2000). To quantify the influence that LST assimilation has on the state of the modelled land surface, the changes in several variables were examined: soil moisture, evapotranspiration (ET), and net primary productivity (NPP).

Soil moisture
The partitioning of available energy into sensible heat (H) and latent heat (LE), driven by changes in the surface temperature, is influenced by the vegetative cover and the available soil moisture (Smith et al. 2006). Temperature change in soil is dependent on thermal conductivity and heat capacity. A dry soil heats up more rapidly than wet soil because the heat capacity of water is higher than that of air, which occupies a much greater percentage of the volume in dry soil. A wet soil surface loses more LE whereas a dry soil surface loses more H.
Soil moisture exhibits a significant memory that can persist for many months, prolonging and intensifying pluvial and drought events (Notaro 2008). Moreover, soil moisture feedbacks can regulate climate change and increase our predictability of seasonal climate, yet the strength and regional significance of this feedback remains poorly understood (Zhang et al. 2008). Evidence for soil moisture-climate feedbacks includes the relationship between soil moisture and precipitation, evaporation, air temperature and cloud cover (Findell andEltahir 1997, Zhang et al. 2008).
The most extensive study on soil moisture effects, the Global Land-Atmosphere Coupling Experiment (GLACE; Koster et al. 2004, Guo et al. 2006, involved 12 atmospheric GCM (AGCM) simulations and illustrated that the strong land-atmosphere coupling lies mainly in the ability of soil moisture to affect evaporation in the transition zones between dry and wet climates (Zhang et al. 2008). Identified hotspots include the Sahel, northern USA and southern Europe. Furthermore, the feedback among Intergovernmental Panel on Climate Change (IPCC) AR4 models was assessed over Europe (Seneviratne et al. 2006), with a positive correlation between soil moisture and precipitation. In other words, high soil moisture will support enhanced evaporation, increasing atmospheric water content and eventually leading to increased rainfall, although this temporal response depends on subgrid condensation processes within global models and therefore can vary substantially (Koster et al. 2004). Moreover, the strength and impact of soil moisture feedbacks are likely to differ between El Niño and La Niña events (Seneviratne et al. 2006, Notaro 2008, with vegetation interactions also being a substantial influence (Sellers et al. 1997).
Future climate change, driven by increased greenhouse gas concentrations, are likely to enhance hydrological responses in these hotspots of strong positive soil moisture feedback (Notaro 2008). In respect of this, the importance of global soil moisture retrieval, and assimilation into hydrological and biophysical models, has received much recent recognition (Crow et al. 2005, Reichle and Koster 2005, Parajka et al. 2006. Here modelled and assimilated soil moisture estimations are compared with ERS scatterometer top soil moisture observations.

ERS scatterometer data
The ERS-1 and ERS-2 scatterometers are active C-band (5.6 GHz) microwave instruments, providing backscatter measurements sensitive to the surface soil water content without being affected by cloud cover. The surface soil moisture (SSM) data are retrieved, in a discrete 12.5 km global grid, from the radar backscattering coefficients using a change detection method developed at the Institute of Photogrammetry and Remote Sensing at the Vienna University of Technology. Scatterometer estimates are used to model the incidence angle dependency of the radar backscattering signal. Backscattering coefficients are normalized to a reference incidence angle of 40 • , with these coefficients scaled between the driest and wettest observations over the long term to produce relative SSM data ranging between 0% and 100%, with uncertainty detailed with a soil moisture noise model ).
The ERS scatterometer (ESCAT) soil moisture dataset used here has undergone previous validation experiments. Wagner et al. (1999) tested the SSM dataset with gravimetric soil moisture measurements over field sites in the Ukraine and found mean correlations of 0.45 (0-20 cm profile) and 0.41 (0-100 cm profile). Ceballos et al. (2005) performed a more extensive validation using a network of 20 soil moisture stations located in western Spain. They found a correlation of 0.75, with a root mean square error (RMSE) (0-100 cm profile) between the scatterometer data and the average soil moisture of 2.2%. However, use of this dataset comes with the caveat that, in extreme climates, such as desert regions, biased estimates may be derived, with azimuthal viewing geometry not taken into account during retrieval (Bartalis et al. 2006).

Comparison model: ESCAT
In this study modelled soil moisture from the JULES model is compared with SSM scatterometer values in the top 5 cm of the soil from two separate ERS receiving stations, generating SSM 'observations' for northern hemisphere Africa: Maspalomas, covering West Africa, and Matera, covering North Africa. Since 2001, coverage of southern hemisphere Africa did not begin until mid-July 2008, and is therefore not considered during our assimilation period. Figures 1(a) and 1(b) illustrate the comparison for both the modelled state and the assimilated state carried out over the 5-month assimilation period. The SSM 'observations' derived from ESCAT for both West Africa and North Africa are lower than the equivalent modelled by the JULES LSM. It is clear following assimilation that the updated model estimates are closer to the 'observation' values. Indeed, for West Africa a 27.4% reduction in RMSE, from 16.8 to 12.2vol%, between the model soil moisture estimates and the ESCAT SSM 'observations' resulted from the assimilation process. For North Africa, the reduction in RMSE between the model soil moisture estimates and the ESCAT SSM 'observations' as a result of the assimilation process was 32.2%, from 14.6 to 9.9vol%. The modelled and assimilated runs were repeated 50 times over each region, respectively, and paired t-tests performed on the mean RMSEs showed that these reductions in RMSE were significant at the 99% confidence level.
It is therefore evident that the process of data assimilation has produced a systematic reduction in the model predictions of soil moisture over both West Africa and North Africa for the period 1 January-31 May 2007. The implication is that this reduction may affect the predictions of heat, water and carbon fluxes from the land surface to the atmosphere. When coupled to the Hadley Centre GCM, this altered change in the strength of the soil moisture-climate feedback could influence the predictions of seasonal and interannual climate.

Biogeochemical cycles
The main aim of this investigation was to understand and quantify the impact that a change in LST has on the water, heat and carbon fluxes from the surface to the atmosphere. It has been shown that integrating SEVIRI LST into the JULES LSM for the first 5 months of 2007 over much of northern hemisphere Africa resulted in a mean reduction in surface soil moisture during this period. We now consider the effect that this integration, taking the case of West Africa as an example, has on further key fluxes of the water and carbon cycles, respectively: ET (figure 2) and NPP (figure 3). Unmistakeable mean reductions are observed for both these fluxes over the assimilation period.
LST and the partitioning of surface energy into H and LE is a function of varying SSM and vegetation cover. Predominantly vegetated surfaces are associated with lower maximum LST values compared with bare soil (Weng et al. 2004), with surface roughness a factor (Sandholt et al. 2002). This is because increases in surface temperatures are associated with increases in H, and because in the surface balance equation more energy is partitioned into LE for higher vegetative cover. Higher H exchange is more typical of sparsely vegetated surfaces. LE is enhanced with increased ET, which is controlled by stomatal conductance (Essery et al. 2003). Stomatal conductance is affected by the quantity of photosynthetically active radiation (PAR), but is also crucially linked to the availability of moisture in the soil. A reduction in soil moisture below a critical value causes a partial closing of stomata on the underside of  leaves to reduce water loss. The subsequent decrease in ET results in a decrease in LE because the drop in humidity reduces the humidity gradient between the surface and atmosphere, reducing the evaporative cooling and causing an increase in H and thus also in the surface temperature (Crucifix et al. 2005).
ET is an important climate system feedback between the land surface and the atmosphere in that soil moisture anomalies can translate into precipitation anomalies through the ET rate (Shukla and Mintz 1982). This feedback on the precipitation regime could significantly influence the occurrence and persistence of pluvial and drought conditions, which in turn influences the distribution of vegetation and thus surface albedo, subsequent surface evaporation and the terrestrial carbon stocks. The terrestrial carbon cycle feedback may be an important component of future climate change (Melillo et al. 2002), with experiments such as Cox et al. (2000) inferring that these feedbacks could significantly influence climate change over the course of the next few decades. A reduction in soil moisture and associated reduction in ET impacts upon the carbon balance, leading to a reduction in NPP as suggested by Rosenzweig (1968), who postulated, in general, a positive relationship between ET and NPP. With interannual variability of NPP greater than that of heterotrophic respiration over Africa (Weber et al. 2009), the implication of a reduction in NPP over a region would be a corresponding reduction in net ecosystem productivity, and hence an altered carbon balance. However, large uncertainties in both the sign and magnitude of the carbon cycle feedbacks remain because of model simplification of the complex terrestrial system. Data assimilation is an exciting field of research offering significant benefits to land surface modelling. The rationale behind this technique is that, although both sources of information, the model and EO, are associated with uncertainty, the combination of the two sources is expected to reduce the resultant uncertainty. For highly changeable variables in time, an LSM may produce more comprehensive coverage than an EO product, which can suffer from missing data or occasional instrumentation problems. However, because validated EO products can be shown to produce more realistic representations of the ground measurements, the integration of these into LSMs may provide the best possible compromise. Furthermore, data assimilation is reliant on the accurate prediction of uncertainty in observations. EO products are generated using implicit or explicit assumptions, which may not be consistent with the assumptions made in an LSM, whereby biased observations will cause the model to depart from the correct state (Quaife et al. 2008). If remote sensing products are to be integrated more comprehensively into LSMs, then further validation work needs to be undertaken, with the accurate reporting of measurement uncertainty a priority.
As highlighted in Pinheiro et al. (2006), to demonstrate how a small change can be influential: Brutsaert et al. (1993) report a 10% error in sensible heat flux as a result of an error of 0.5 K in LST; Moran and Jackson (1991) report a 10% error in ET as a result of a 1 K error in LST; and Kustas and Norman (1996) suggest that an LST error of between 1 and 3 K can lead to errors of up to 100 W m -2 in surface fluxes to the atmosphere. Because of the feedbacks between the land surface and the atmosphere, it is clear how these comparatively minor uncertainties can produce significantly different climatic conditions. Climate change can lead to both positive and negative feedbacks to the climate system. It is therefore essential that we accurately represent these feedbacks in coupled LSM GCM frameworks if we are to successfully predict future climate change.

Conclusions
These relationships, among others, suggest that there is potential for LST to act as surrogate for assimilating other state variables into a land surface scheme. Indeed, demand for LST observations is increasing because of its importance in regional and global ecosystem studies, and particularly its sensitivity to surface moisture conditions. Remotely sensed data from EO satellites offers the most feasible source of data to constrain and validate LSMs over large geographical regions, as this overcomes the limitation of sparsely available ground measurements. The significance of model predictions as a resource in climate policy decision making ensures the validation of increasingly employed data assimilation methods a priority. Moreover, care should be taken to quantify the changes in the entire ecosystem dynamics through updating of key variables.
Although assimilation of EO data into LSMs offers the prospect of optimizing estimates of key biogeochemical states, herein lies the danger. Unless a thorough understanding and validation of the model output is performed, the possibility of the model being improved in one sense, in terms of reduced RMSEs against validation observations, but degraded elsewhere remains a distinct likelihood. In terms of the predictions of biogeochemical fluxes, the acknowledgement of the influence that data assimilation of EO data has on the feedback from LSMs to AGCMs is of great relevance to the IPCC Fifth Assessment Report.