A spatiotemporal analysis of the relationship between near‐surface air temperature and satellite land surface temperatures using 17 years of data from the ATSR series

The relationship between satellite land surface temperature (LST) and ground‐based observations of 2 m air temperature (T2m) is characterized in space and time using >17 years of data. The analysis uses a new monthly LST climate data record (CDR) based on the Along‐Track Scanning Radiometer series, which has been produced within the European Space Agency GlobTemperature project (http://www.globtemperature.info/). Global LST‐T2m differences are analyzed with respect to location, land cover, vegetation fraction, and elevation, all of which are found to be important influencing factors. LSTnight (~10 P.M. local solar time, clear‐sky only) is found to be closely coupled with minimum T2m (Tmin, all‐sky) and the two temperatures generally consistent to within ±5°C (global median LSTnight‐Tmin = 1.8°C, interquartile range = 3.8°C). The LSTday (~10 A.M. local solar time, clear‐sky only)‐maximum T2m (Tmax, all‐sky) variability is higher (global median LSTday‐Tmax = −0.1°C, interquartile range = 8.1°C) because LST is strongly influenced by insolation and surface regime. Correlations for both temperature pairs are typically >0.9 outside of the tropics. The monthly global and regional anomaly time series of LST and T2m—which are completely independent data sets—compare remarkably well. The correlation between the data sets is 0.9 for the globe with 90% of the CDR anomalies falling within the T2m 95% confidence limits. The results presented in this study present a justification for increasing use of satellite LST data in climate and weather science, both as an independent variable, and to augment T2m data acquired at meteorological stations.

the accuracy and spatial resolution of IR LSTs is superior to MW LSTs, making them more suitable for many applications.
The most accurate satellite LST data sets are derived from sensors that have two or more thermal IR channels; these channels typically include the 'split-window' channels, which are located at approximately 11 and 12 micrometres (µm). This enables an improved atmospheric correction to be made compared with single-channel IR retrievals, as atmospheric attenuation varies with wavelength [Dash et al., 2002;Li et al., 2013]. One such sensor that provided this capability is the Along-Track Scanning Radiometer (ATSR), which had unprecedented radiometric accuracy and stability (section 2.1.1) and a record length that exceeds 20 years. Together with a very stable orbit with little temporal drift (per sensor), these factors make the ATSR series a desirable target for generation of an LST CDR. The first ATSR LST CDR has been produced within the framework of the European Space Agency's (ESA) GlobTemperature project (http://www.globtemperature.info/) and it is version 1.0 of this data set that is analysed in this study.
The analyses presented here comprise two aspects. Firstly the worldwide LST-T 2m differences are characterised by comparing the ATSR CDR with in situ T 2m observations, in both point-station and in gridded form. Several studies now exist in the literature where the LST-T 2m relationship is explored through the analysis of satellite LST observations and coincident ground-based observations of T 2m [e.g. Hachem et al., 2012;Mildrexler et al., 2011;Sohrabinia et al., 2014;Urban et al., 2013;Vancutsem et al., 2010]. A few studies also examine this relationship using ground-based observations of both LST and T 2m [e.g. Gallo et al., 2011;Good, 2016]. However, these studies have tended to be focused, for example on specific geographical regions [Hachem et al., 2012;Sohrabinia et al., 2014;Urban et al., 2013;Vancutsem et al., 2010] or stations [Gallo et al., 2011;Good, 2016,], or a particular aspect of the LST-T 2m relationship. For example, Mildrexler et al. [2011] present an analysis of the global relationship between the annual maximum LST and T 2m using seven years of data from MODIS/Aqua, which has a local solar overpass time of ~1:30 pm. More recently, Lian et al., [2017] used 12 years of MODIS/Aqua data to analyse the global relationship between maximum monthly T 2m (T max ) and monthly maximum LST.
This study complements previous studies by providing new information on the relationship between data sets for LST and T 2mincluding the relationship between night time LST (LST night ) and T min -on a global scale as a function of land cover type, vegetation fraction and elevation. This study is based on ~17 years of data from the ATSR, which has not yet been used to study the LST-T 2m relationship in detail.
The second part of this paper looks at the temporal evolution of the global mean LST compared to the equivalent T 2m time series, which has not been addressed in the existing literature. Previous studies analysing LST time series are scarce and tend to be limited to specific geographical regions and focus only on LST. For example, Jiménez-Muñoz et al., [2013] analyse 13 years of LST data from MODIS and ERA-Interim skin temperatures over the Amazon, while Oku et al. [2006] analyse seven years of LST acquisitions from the Geostationary Meteorological Satellite 5 (GMS-5) over the Tibetan Plateau.
LST is a challenging parameter to estimate from satellite observations, owing to the variation and uncertainty in surface emissivity and atmospheric attenuation, which must be known precisely to retrieve LST accurately [Dash et al., 2002;Li et al., 2013]. The accuracy of current operational IR-based data sets is typically 1-3 °C [Duguay-Tetzlaff et al., 2015;Freitas et al., 2013;Trigo et al., 2008;Wan, 2014], which is considerably lower than for IR SST retrievals that can achieve accuracies of close to 0.1 °C [Embury et al., 2012]. The spatial heterogeneity of LST and land T 2m is high, particularly during the day owing to differential solar heating, vegetation transpiration and surface turbulence. As a result, variations of several °C between neighbouring stations and across single satellite pixels are observed [Good, 2015;Yan et al., 2010]. These are important factors to consider when comparing satellite LSTs with station-based T 2m estimates, as they will lead to inherent differences of up to a few °C.
Previous studies show that LST and T 2m are generally well coupled, with correlation coefficients that usually exceed 0.6 and very often 0.8. LST and T 2m are most tightly coupled over highly vegetated surfaces and when insolation is low, for example under full cloud cover, at night or at high latitudes during winter, spring and autumn months. In these cases, LST and T 2m may differ by only 1-2 degrees Celsius (°C). In contrast, LST can exceed T 2m by several °C when insolation is high and vegetation cover is low-to-moderate [Good, 2016;Hachem et al., 2012;Mildrexler et al., 2011;Sohrabinia et al., 2014;Urban et al., 2013;Vancutsem et al., 2010]. In extreme conditions, for example under clear skies during the middle part of the day at low latitudes over non-vegetated surfaces, the LST-T 2m temperature difference may approach or even exceed 20 °C [Good, 2016;Mildrexler et al., 2011]. Even in these cases, LST and T 2m remain coupled albeit with much greater changes in LST for a given change in T 2m . It is this coupled relationship that has led to the recent abundance of studies that attempt to use satellite LSTs to help infill gaps in current T 2m data sets [Benali et al., 2012;Chen et al., 2014;Good, 2015;Kilibarda et al., 2014;Janatian et al., 2016;Oyler et al., 2015;Parmentier et al., 2015;Zhang et al., 2011].
The focus of this study is on the spatial and temporal relationship between LST and T 2m on a global scale. The analysis of a >17-year satellite LST record presented here will demonstrate the potential for using LST in climate science, particularly in augmenting information from traditional meteorological T 2m observations. (Data from the first four years of the ATSR record are not analysed here owing to ongoing calibration issues.) Section 2 introduces the data sets used in the study, while section 3 summarises the methods used in analysing the data. The results of the study are presented in sections 4 to 6, and their implications discussed in section 7. The main conclusions of the study are summarised in section 8.

Data
A summary of the data sets used in this study is provided in Table 1. Further details are presented in the following sub-sections.

GlobTemperature CDR
The GlobTemperature CDR comprises observations from the Along-Track Scanning Radiometer (ATSR) series [Llewellyn-Jones et al., 2001;Smith et al., 2012]. The ATSR was designed to make accurate observations of Sea Surface Temperature (SST) with channels located within the visible to infrared part of the electromagnetic spectrum, including the splitwindow channels at approximately 11 and 12 µm for surface temperature retrieval. The instrument design benefits from an exceptionally stable on-board calibration system with two onboard black body targets, and Stirling-cycle cooled detectors, enabling radiometric accuracy of its infrared channels of better than 0.05 °C [Smith et al., 2012]. The ATSR was equipped with dual-viewing capability allowing nominally the same point on the Earth's surface to be viewed through two different atmospheric path lengths albeit with slightly differing spatial footprints. For SST retrieval, this is used to improve the correction for atmospheric effects [Zavody et al., 1995]. The ATSR had a swath width of approximately 500 km, achieving all-sky global coverage in three days.
ATSR-1 was launched onboard the European Space Agency's first Earth Remote Sensing (ERS-1) satellite in July 1991, with ATSR-2 following onboard ERS-2 in April 1995. The third ATSR, the Advanced ATSR (AATSR), was launched in March 2002 onboard ESA's Envisat satellite, which unfortunately ceased communications on 8 April 2012, bringing an end to the ATSR mission. The AATSR is succeeded by the Sea and Land Surface Temperature Radiometer (SLSTR), which was launched onboard ESA's Sentinel-3 satellite in February 2016. The overpass time of the ATSR series was approximately 10:00 am/pm (AATSR) to 10:30 am/pm (ATSR-1; ATSR-2) local solar time. SLSTR also has a local overpass time of 10:00 am/pm. An operational LST retrieval scheme was first introduced for the AATSR and is described by Prata [2002] and references therein. The operational algorithm is a nadir-only split-window retrieval. The forward view (at ~55° from nadir) is not usually used for LST owing to difficulties in accounting for emissivity dependency on view angle, and LST anisotropy (the observed LST for some surfaces depends on zenith and azimuth observation angles), as well as the collocation issues arising from the spatial mis-match between the forward and nadir footprints noted earlier . A modified version of this algorithm, described in Ghent [2012], with improved retrieval coefficients and auxiliary data sets, enhanced cloud masking, and full uncertainty budget has been implemented within ESA's GlobTemperature project to create a long-term LST data set based on the latter two ATSR instruments (http://www.globtemperature.info/). The CDR is a homogenised version of this data set, providing monthly average global fields of clear-sky LST day and LST night in which a consistent algorithm and cloud detection method is applied to observations from both sensors.
Version 1 of the CDR includes only ATSR-2 and AATSR, owing to ongoing calibration issues with parts of the ATSR-1 record. Uncertainty information is provided within the CDR at 0.05 degree resolution, accounting for instrument noise, systematic retrieval uncertainties, and surface related components. Surface parameters in the LST retrieval are constrained using the Cooperative Institute for Meteorological Satellite Studies (CIMSS) emissivity dataset [Hulley et al., 2015], with coefficient fitting to an extended GlobCover land cover classification [Arino et al., 2007]. The uncertainty in the coefficient fitting due to emissivity is 0.01 °C, with an additional surface uncertainty component related to fractional vegetation auxiliary data. Emissivity and land cover classification monthly composites are compiled from high resolution satellite data at 100-300m resolution, providing global coverage, maximising the information on surface spatial variability whilst minimising data gaps due to persistent cloud.
Owing to the 30-min difference in overpass time between ATSR-2 and AATSR, ATSR-2 LSTs in the CDR are adjusted to account for this difference, as LST can change substantially in 30 minutes, particularly when insolation is high. For LST day , this is essentially a cooling correction for ATSR-2, while for LST night , the correction will usually have a small warming effect. The corrections are derived empirically from the LST differences during ATSR-2/AATSR overlap period (June 2002 to May 2003 inclusive) on a monthly basis for both LST day and LST night on the output 0.05° grid. For 0.05° cells where this LST difference cannot be estimated directly (e.g. due to cloud), and there is a gap in the output grid, the correction is derived from cells with the same land cover within a 10° x 10° tile. The errors associated with implementing this correction are likely to be larger for day time (~0.5 °C) compared with night time (~0.2 °C) owing to the strong dependency of LST on insolation.
AATSR is used in preference to ATSR-2 in this study where records from both sensors are present due to the lower radiometric noise levels associated with AATSR brightness temperature (BTs) compared with those of ATSR-2 [Smith et al., 2012].

CRU TS Monthly Gridded T 2m
The Climatic Research Unit Time Series (CRU TS; v3.23) is a gridded station data set [Harris et al., 2014]. It consists of monthly time series of a number of variables, including monthly mean (T mean ), minimum (T min ) and maximum (T max ) temperatures at a spatial resolution of 0.5 degrees latitude/longitude. While CRU TS includes many homogenised station records from National Meteorological Services, it is not 'specifically homogeneous' and should therefore be treated with caution in time series analysis [Harris et al., 2014]. For this reason, CRU TS is only used for quantifying the spatial and seasonal relationship between LST and T 2m in this study. Unlike the CDR, the CRU TS data are 'all-sky' and include T 2m observations under both cloud and clear-sky.

GHCN-Monthly Station T 2m
CRU TS is the primary data set used in this study to characterise the spatial and seasonal LST-T 2m relationship. However, this is an interpolated data set with larger uncertainties where station density is low (Figure 2). To verify the analysis, station observations from the Global Historical Climate Network Monthly (GHCN-M) v3.3.0.20160130.qca data set are also used. GHCN-M is a collection of 7280 monthly station records produced by the National Centers for Environmental Information (NCEI) in the United States. Version 3 of the 'QCA' (quality controlled adjusted) data set is used in this study, which has undergone quality control and includes homogeneity adjustments to correct for non-climatic changes in the station time series [Lawrimore et al., 2011]. No additional screening of the data is carried out in this study since any data that have failed quality checks or have too many  [Jones et al., 2012]. The data set is produced through a collaborative effort between the Climatic Research Unit (CRU) at the University of East Anglia (UEA) and the Met Office, both in the United Kingdom. Like GHCN-M, CRUTEM4 has been used by the IPCC to assess global land air temperature changes [Hartman et al., 2013]. It is an anomaly data set and represents a time series of mean monthly temperature anomalies with respect to the 1961-1990 baseline period at a spatial resolution of 5 degrees latitude/longitude. CRUTEM4 data are used in the time series analysis presented in the second part of this study. The global time series of CRUTEM4 anomalies, which is also used in this part of the study, includes the 95% confidence intervals for uncertainty components that account for station and grid box sampling, coverage and bias uncertainties (http://www.metoffice.gov.uk/hadobs/crutem4/data/diagnostics/time-series.html).

Other data sets used in the study
The LST-T 2m relationship is characterised with respect to land use, vegetation and elevation.
For elevation, the Shuttle Radar Topography Mission (v1) data set was used [Farr and Kobrick, 2000;Rodriguez et al., 2005]. This provides near-global land elevation at 30 m spatial resolution.
Two additional LST data sets were used to verify the time series analysis in the second part of this study: MODIS monthly LST day and LST night fields at 0.05 degrees latitude/longitude [Wan, 2013[Wan, , 2014 and 6-hourly 'skin' temperatures from the European Centre for Medium-Range Weather Forecasts (ECMWF) Interim Reanalysis (ERA-Interim) [Dee et al., 2011].

Analysis of the LST-T 2m variability
The variability in the global LST-T 2m relationship is assessed by comparing the ATSR CDR with T 2m data from CRU TS and GHCN-M. CRU TS is used in preference to CRUTEM4 owing to its higher spatial resolution (recall the ATSR CDR has spatial resolution of 0.05°) and availability of monthly averages of daily extreme temperatures (T min and T max ). For the CRU TS comparisons, the CDR is re-projected onto a regular 0.5 degree grid separately for day time and night time LSTs. For the GHCN-M comparisons, the CDR 0.05° cell nominally containing the station location is used. In both cases, LST day is compared directly with T max , and LST night with T min .
Both the elevation and land use data sets are also re-sampled to 0.05 and 0.5 degrees latitude/longitude. For elevation, the mean cell elevation is used. For land use, the dominant land cover class within each grid cell is assigned to the cell. Comparisons between the CDR and CRU TS are only performed for 0.5° cells where the percentage of LC CCI pixels within the cell matching the dominant land cover class is ≥80% and the fraction of water in the cell is ≤20%. Similarly, comparisons between the CDR and GHCN-M are only performed for 0.05° cells that also meet these criteria, and additionally, where the LC CCI classification of the GHCN-M station matches that of the CDR 0.05° cell. This is to ensure that the analysis is only carried out for cells and stations that truly represent the assigned land classification.

Comparison of LST and T 2m time series
The comparison between LST and T 2m time series is performed using the CRUTEM4 data set, which is widely used to study temporal changes in global T 2m . CRUTEM4 is a 5 degree latitude/longitude monthly mean anomaly data set referenced to the 1961-1990 baseline period, which is before the beginning of the ATSR CDR. To facilitate like-with-like comparison, the ATSR CDR is first re-projected onto a 5-degree latitude/longitude grid before calculating a monthly mean LST from the average of the monthly LST day and LST night fields. This is then converted to a time series of monthly anomalies by subtracting a monthly mean LST climatology calculated over the August 1995-March 2012 baseline period (excluding incomplete months in the CDR: see section 2.1). Monthly climatology values are only calculated for cells with at least 10 years of data. Since the mean T 2m has changed substantially between the two reference periods (1961-1990and August 1995-March 2012 [ Hartman et al., 2013], an 'adjusted' version of the CRUTEM4 anomalies is also calculated by subtracting a monthly mean T 2m climatology of CRUTEM4 anomalies using the same ATSR CDR baseline period. The result is a time series of both ATSR CDR monthly mean LST anomalies and CRUTEM4 monthly mean T 2m anomalies, both referenced to the August 1995 to March 2012 baseline period. The global mean time series for each 5° anomaly data set is calculated to be consistent with the averaging in CRUTEM4 time series presented by Jones et al., [2012]. Spatial averages are calculated separately for the northern (NH) and southern hemispheres (SH) by weighting each grid box by the cosine of its latitude [Jones, 1994]. The global mean value is then determined from (2NH+SH)/3, which approximates for the higher proportion of land in the NH [Jones et al., 2012]. Two sets of comparisons are presented: a global time series where all available data from each data set are used, and a version that uses space-time cells where both data sets are present ('spatially-matched'). The spatially-matched version is included because both data sets contain gaps, which may introduce uncertainty into the comparison.
The CRUTEM4 uncertainties (section 2.3) are included in the time series comparisons to indicate the likely range of monthly anomalies for this data set. The equivalent uncertainty envelope for the CDR is not presented as this cannot be determined from the uncertainty information provided in the version 1.0 data files. Retrieval uncertainties are propagated from the 1 km pixels through to the 0.05° CDR product, but further scaling is required to facilitate like-with-like comparison to CRUTEM4. Provision of independent uncertainty estimates in surface temperature retrieval from satellite data is subject of active research [Bulgin et al., 2016] and as such, a rigorous methodology for propagating these uncertainties to provide an uncertainty envelope equivalent to CRUTEM4 is presently unavailable.

Statistical parameters
The relationship between LST and T2m is often explored in this study through the use of Ordinary Least Squares (OLS) regression. For example, T 2m (y-axis) is plotted against LST (x-axis)a scatter plotand the gradient of the linear regression line fitted to the data is reported in this study as the 'slope'. A slope of unity signifies that a 1 °C change in T 2m equates to a 1 °C change in LST, which would indicate that the two temperatures are perfectly coupled. The response of the LST-T 2m difference (y-axis) with other parameters, e.g. vegetation fraction (x-axis), is explored in the same way. Here, the reported 'slope' is the gradient of the linear regression line fitted to these data. P-values for reported correlations and slopes are calculated using a two-tailed student T-test with p-values above 0.05 considered here to indicate a result that is likely to have occurred by chance, and is therefore insignificant.
In the second part of this study, differences in the rate of change in LST and T2m over the 1995-2012 period are assessed by calculating the trend in the LST-T2m difference time series. For this analysis, the median of pairwise slopes (Sen, 1968) is used to calculate the trend; the 95% confidence interval on the trend is also given. Where this interval does not encompass zero, it is assumed there is high confidence that the calculated trend is non-zero. In general, the spatial patterns exhibited by both sets of data are very similar, with CRU TS and the CDR showing the same dominant features. LST night and T min are generally within ±5 °C, although LST night is typically warmer than T min , with median difference for the globe for all seasons of 1.8 °C (Figure 6a). This is expected because the nominal night observation time for the CDR is 10:00 pm when LST would still be warmer than T min , which typically occurs close to sunrise [Edwards et al., 2011;Good, 2016;Jin et al., 1997]. However, there are notable situations where LST night is cooler than T min , for example, in Europe and Russia in DJF and SON (Figure 5,left). For the mid-high latitude winter, colder LST night could be due to snow cover [Good, 2016], or the clear-sky sampling bias of LST night on cold, clear winter nights when surface is cooling more efficiently compared with cloudy nights. LST night and T min are most similar over tropical vegetated regionsthe role of vegetation is explored further in the following sections.

Spatial variation in the relationship between LST and T 2m
The differences between LST day and T max are larger in magnitude with a high degree of spatial variability, although the median global LST day -T max difference for all seasons is -0.1 °C (Figure 6b). LST day is typically cooler than T max at very high latitudes, over some equatorial regions in all seasons, and at middle latitudes during winter ( Figure 5, right). These spatial patterns are very similar to those reported by Lian et al. [2017] who analysed differences between MODIS/Aqua maximum monthly LSTs and Tmax from CRU-TS. The tendency of LST day to fall below T max at high latitudes and during winter months can be explained by the lower insolation in these regimes resulting in cold LSTs, whereas T 2m is higher because the air has passed over warmer SSTs. LSTs that are colder than T 2m may also occur over snow-covered surfaces. Negative LST-T 2m differences over equatorial regions have been reported previously by Jin et al. [1997] who analyse modelled LST and T 2m ; this is discussed further in section 4.2. Jin et al. [1997] also observed cooler LSTs compared with T 2m in winter at middle-to-high latitudes in their simulations.
By contrast, LST day tends to be warmer than T max over the dry tropics in all seasons and at middle latitudes during the summer months. This positive difference occurs because at the 10:00 am nominal observation time of the CDR, the clear-sky insolation in these regimes is high enough to elevate LST above T 2m by several degrees, and even above T max [Edwards et al., 2011;Good, 2016;Jin et al., 1997]. The results for the same analysis using GHCN-M station data illustrate the same general features (not shown).
The distributions of differences shown in Figure 6 include comparisons between the CDR LSTs and T mean . This indicates that in both cases, closer agreement in magnitude is obtained between LST night and T min , and LST day and T max , than the equivalent comparisons with T mean .
Both the LST night -T min and LST night -T mean distributions are approximately Gaussian. Thus areas where both the correlation and slope are close to unity correspond to where LST and T 2m are generally well coupled. This is observed for LST night /T min for much of the middle-to-high latitudes, and for LST day /T max in parts of north-east Asia. Elsewhere, slopes are generally <0.8 although correlation coefficients still nearly always exceed 0.9 outside of the tropics. This indicates that LST becomes increasingly warmer than T 2m with increasing temperature. For both LST night /T min and LST day /T max , the correlations and slopes are substantially lower over the equatorial regions, with a marked latitudinal gradient in correlation towards the equator in both hemispheres. This is consistent with the results of Jin et al. [1997] who also found lower LST-T 2m correlations in model simulations at lower latitudes compared with middle-to-high latitudes.
Both the T min versus LST night and T max versus LST day slopes tend to be less than unity, indicating that the LST-T 2m difference becomes more positive with increasing temperatures. This pattern has been noted previously by Mildrexler et al. [2011], who reported an increasing difference between annual maximum LST and T 2m with increasing temperature.
While the deviation from unity for T min /LST night in this study is reasonably small (usually within the range 0.8 -1.1), the T max / LST day slope is typically less than 0.7. This is consistent with the more extreme range of LST day -T max differences observed in Figure 5.
4.2 Variability in LST-T 2m differences by land cover classification Figure 8 shows the characteristics of the CDR-CRUT TS relationship as a function of land cover classification. The positive LST night-T min difference and slope of slightly less than unity reported in section 4.1 seems reasonably consistent across all surface types (Figure 8a, b).
The proximity of the slopes to unityall but one are between 0.88 and 1.1 -and high correlation coefficients (Figure 8c) indicate a close coupling between LST night and T min for most land cover types, which is consistent with the maps in section 4.1 and with findings reported in previous studies examining the LST-T 2m relationship (e.g. Good, [2016], Zhang et al., [2011]). Figures 4-7, the relationship between LST day and T max is more complex and dependent on surface regime. The variability in LST day -T max , both within and between surface types, is much higher than for LST night -T min (Figure 8a), also indicated by the slightly lower correlation coefficients (Figure 8c). The slope of the T max versus LST day relationship is lower than for T min versus LST night (Figure 8b), reflecting the dependence of LST on insolation, increasing the LST day -T max difference at higher solar elevations, and therefore at higher surface temperatures. A notable feature of Figure 8 is that the LST day -T max difference tends to be negative over the forested land cover types (classes 50-80), which by definition represent some of the more vegetated surfaces. Healthy vegetation actively transpires, losing surface heat to the overlying atmosphere [Sun et al., 2015], thus reducing LST relative to T 2m.

As inferred from
Greater surface roughness over vegetation also increases turbulent mixing, which also aids transfer of heat from the surface to the overlying air. Cooler LSTs are generally associated with increased vegetation density, and LST and T 2m are often close in areas of dense vegetation [Jin and Dickinson, 2010;Mildrexler et al., 2011]. Negative LST day -T max differences are also characteristic of the lichens and mosses and permanent snow and ice classes, which occur at high latitudes and therefore lower solar elevation and colder LSTs.
Positive LST day -T max differences, on the other hand, occur over shrubland, grassland and bare area classes. The bare area class in particular is associated with low latitudes, where the high insolation and lack of vegetation can result in extremely high LSTs that are well above T max , even at 10 am local time (e.g. see Good [2016] Figure 1).
The same analysis using GHCN-M stations in place of CRU TS presents very similar results (not shown), although the number of land cover types represented is lower. Usefully, this finer-scale analysis enables the comparison of LST and T 2m over urban areas, which was not possible at the 0.5° spatial scale of CRU TS. For this surface type, based on 394 stations, a median difference of 2.1 °C (interquartile range: 4.1 °C) for LST night minus T min , and 3.0 °C (interquartile range: 6.7 °C) for LST day minus T max is obtained. The nature of the relationship over this surface, and its similarity to some of the less vegetated classes, is expected given that urban areas are often very sparsely vegetated. However, it should be noted that turbulent fluxes are likely to be more efficient coupling T 2m and LST over urban areas owing to increased surface roughness in this regime compared with sparsely-vegetated surfaces [Stull, 2015, pg 700]. There is a weak dependence of the urban LST-T 2m relationship with latitude (not shown), where the sign of the differences becomes slightly more negative, and the variability increases towards the higher latitudes, probably reflecting the variation in insolation with latitude.
Both the GHCN-M and CRU TS analyses suggest substantially lower correlations and slopes between LST and T 2m over the broadleaved evergreen tree cover class (class 50). This class represents the equatorial forests and is spatially consistent with the low-correlation/low-slope regions evident in Figure 7. Although LST and T 2m tend to be close in these areas ( Figure 5 and also see Mildrexler et al. [2011]), the temperatures are poorly correlated because the diurnal range of both LSTs and T 2m is typically small with little seasonality [Good, 2016;Jin and Dickinson, 2010]. The influence of vegetation on the LST-T 2m relationship is discussed further in the following section. Figure 9 shows the variation in LST-T 2m difference with vegetation fraction for different ranges of SZA at solar noon. While the presentation of the results approximates high-to-low latitude from top-to-bottom, SZA was used to partition the results so that seasonal variability is also taken into account.

Variability in LST-T 2m differences with vegetation fraction
The general pattern of results is consistent with e.g. Mildrexler et al. [2011], indicating that LST and T 2m become increasingly close with increasing fractional vegetation cover (FVC).
Over full vegetation cover (FVC = 1) the difference between them tends to be a degree or so above zero for LST night /T min except at SZAs ≥65° and a few degrees below zero for LST day /T max . (The y-intercept for full vegetation cover is simply the sum of the intercept and 10 x slope since the maximum value of FVC is 1.) For both temperature pairs, the intercept becomes increasingly positive and the slope becomes increasingly negative (except for LST night /T min for SZA ≥65°) with decreasing SZA. This is more marked for the LST day /T max analysis compared with LST night /T min , again indicating the dependency of LST on insolation.
The results suggest that for well-vegetated surfaces (e.g. FVC>0.8), LST night /LST day may provide a reasonable proxy for T min /T max for all ranges of SZA for some applications. For regimes where the SZA is above 45°, T min /T max is reasonably well approximated by LST night / LST day except for sparsely-vegetated and bare surfaces (e.g. FVC<0.2). Figure 10 illustrates the variation in LST-T 2m relationship with elevation, which has received little attention in previous studies. LST night -T min differences appear to be quite stable, while LST day -T max differences have a general tendency to increase with increasing elevation. For both temperature comparisons, there is a clear decrease in both the slope and correlation coefficient with increasing elevation indicating a de-coupling of the LST-T 2m relationship at altitude. The effect is more marked for the LST day -T max comparison. This may, at least in part, be due to the fact that all-sky T 2m are being compared with clear-sky LSTs. However, at high altitude, it is possible for LST to be elevated by heat from the sun, while T 2m may be cooler because of the temperature lapse rate and exchange with the surrounding free air. This was also noted by Good [2016], who analysed ground-based observations of all-sky LST and T 2m at 19 of the Atmospheric Radiation Measurement (ARM) program sites, including two at © 2017 American Geophysical Union. All rights reserved.

Variability in LST-T 2m differences by elevation
high-elevations. The equivalent analysis using GHCN-D stations demonstrates very similar results (not shown), although the drop-off in correlation with increasing elevation is perhaps slightly less apparent, which probably reflects the lack of very high-altitude stations.

Spatially-averaged time series comparisons
The time series of CDR monthly mean anomalies is shown in Figure 11a Figure   11).
Both data sets suffer from spatial gaps. For example, much of Africa and Antarctica are regularly missing from CRUTEM4, while the ATSR data set does not provide monthly observations under persistent cloud. With this in mind, Figure 11b shows the same time series but for global averages using only cells where both data sets have observations in that month. The correlation increases to 0.87 (p<0.01) for this spatially-matched time series.
The difference between the time series of anomalies is shown in Figure 11c. The linear trend of this differenced time series is negative (e.g. -0.17°C/decade for the spatially matched comparison, 95% confidence range -0.22 to -0.11) implying that the two data sets may not exhibit the same rate of temperature change with time. While this apparent difference might seem surprising, it should be regarded with caution given the short time series (<18 years). This analysis uses version 1.0 of the ATSR CDR and part of the motivation of this study is to assess the temporal consistency of the time series, particularly as there may be residual inhomogeneities caused by the transition from ATSR-2 to AATSR, for example due to the change in overpass time of the sensors (as discussed in later sections).  Figure 12 shows the median CDR minus CRUTEM4 anomaly differences, for the whole time series, by season and by sensor. Results are shown for the globe and for different geographical regions (Table 3). For nearly all regions, the CDR-CRUTEM4 differences are clearly more positive for ATSR-2 compared with AATSR; for the globe (spatially matched cells), the median CDR minus CRUTEM4 difference is 0.07 °C for ATSR-2 and -0.10 °C for AATSR. This could account for the overall negative trend in the CDR-CRUTEM4 time series, since ATSR-2 preceded AATSR. The median difference for the whole series for most regions is slightly negative, reflecting the longer AATSR record.

Analysis of the CDR-CRUTEM4 differenced time series
There is some seasonal variation in the median differences, although no clear pattern is evident, other than perhaps an increased clustering of the differences around zero in DJF and SON. Figure 13 shows similar graphics but for the trends of the CDR minus CRUTEM4 time series; in this case, trends that are not statistically different from zero (see Section 3.3) are indicated by an unfilled symbol. Table 3 provides the numerical values of the trends for the whole time series, together with the correlation coefficients between the CDR and CRUTEM4 time series for each region. For the spatially-matched comparisons, all regions with significant trends (i.e. not statistically different from zero) for mean LST are negative although the magnitude is variable, ranging between -0.43 (S. Asia) to -0.17 (Globe) °C/decade. When considering the sensor-partitioned results, there are some small differences between the trends, with the ATSR-2 time series trends tending to be slightly more negative than AATSR. However the confidence in these results is lower than for the full time series given the even shorter record length (<10 years), as emphasised by the large number of regional trends that are not statistically different from zero (unfilled symbols in Figure 13).
The evidence presented in Figures 11-13 suggests there may be some discrepancy between the ATSR-2 and AATSR portions of the CDR. A likely source of this discrepancy is in the overpass time correction that is applied to the ATSR-2 CDR LSTs to align them with the AATSR overpass, which is 30 minutes earlier (section 2.1). LST can change by several °C in 30 minutes particularly around the 10 am nominal overpass time of the CDR [Good, 2016;Jin and Dickenson, 2010], thus this correction is likely to introduce errors into the ATSR-2 LSTs. These errors could be the cause of the variation in the median CDR-CRUTEM4 anomaly difference between ATSR-2 and AATSR (Figure 12), and the apparently higher noise in the ATSR-2 portion of the time series (Figure 11c).
The uncertainty in the ATSR-2 CDR LST temporal correction will naturally be larger where insolation is higher, especially where vegetation cover is low, as this is where LST changes most rapidly. While there was no clear pattern in the seasonal median differences ( Figure   12), there does appear to be a tendency of the JJA trends in the NH regions to be slightly more negative (Figure 13a-c). For the lower-latitude NH regions (Figure 13c), this tendency also appears in MAM. If these more negative trends correspond to more warm-biased ATSR-2 anomalies, this would support the hypothesis that the ATSR-2 LST temporal correction is at least partly responsible for the relative lack of warming in the CDR compared with CRUTEM4, as this is also consistent with regimes with higher insolation, and therefore potentially larger errors in the LST temporal correction.
To verify that the negative CDR minus CRUTEM4 trends are a result of inhomogeneities in the ATSR CDR rather than an actual difference in rate of temperature change in the two data sets, the analysis presented in Figures 12 and 13 has been repeated using ERA-Interim reanalysis skin temperatures in place of CRUTEM4. ERA-Interim does not assimilate satellite LST data so the CDR and ERA-Interim are independent. To do this, the ERA-Interim data were aggregated from 6-hourly instantaneous skin temperatures to monthly The results reported above analyse the mean monthly CDR LST, which is calculated from an average of LST day and LST night . A comparison with GHCN-M, which provides temporallyhomogeneous station-based observations of T min and T max (section 2.3), enables a separate assessment of the stability of the CDR LST day and LST night . The time series at each station is compared directly with the CDR time series for the 0.05° cell nominally containing the station location. The mean LST night -T min trend is found to be -0.05 °C/decade (n=2122 stations), while the mean LST day -T max trend is -0.36 °C/decade (n=2200 stations). The more strongly negative trend for the LST day -T max result indicates that most of the discrepancy between the CDR and other temperature time series is due to LST day . This is evidence that further supports the hypothesis that the overpass correction is introducing errors into the ATSR-2 CDR LSTs, since such errors would be more prevalent during the day, because of the dependency of LST on insolation. Figure 14 shows the CDR and CRUTEM4 time series, but this time excluding LST day from the analysis (i.e. CRUTEM4 mean monthly temperature anomalies compared with monthly CDR LST night anomalies). This time series is notably more stable and less noisy than the equivalent time series in Figure 11, which is based on the mean monthly LST and therefore includes LST day . The percentage of CDR anomalies that fall within the CRUTEM4 uncertainties for the adjusted time series has risen to 90 %. A linear trend in the differenced time series (Figure 14c) is now undetectable using all data (-0.02 °C/decade, confidence interval: -0.06 to 0.02), and a much smaller negative trend is present only in the spatiallymatched data (-0.08 °C/decade, confidence interval: -0.11 to -0.04). The anomaly correlations with the CRUTEM4 adjusted time series have also increased to 0.83 and 0.90 for 'all' and 'spatially-matched' data, respectively. Table 3, which provides the correlation coefficients and differenced time series results for different geographical regions, indicates that the improved agreement between CRUTEM4 and LST night anomalies persists in all regions. Other than in the Arctic, the trends of the difference time series become less negative and/or not statistically different from zero, indicating that the agreement between the CDR and CRUTEM4 is stronger when LST day is not included in the analysis. Figure 15 shows the relationship between the CRUTEM4 and CDR LST night time series for each 5-degree grid cell. LST day has been excluded from this analysis given the results presented in Section 5.1. For most of the globe, the cell-based trends of the differenced time series (CDR minus CRUTEM4) are not statistically different from zero (indicated with an 'X' in Figure 15a). Cells with both positive and negative tendencies are present, reflecting the lack of any clear linear trend in the global differenced time series shown in Figure 14. Figure 15b implies there may be some regional variation in the median CDR minus CRUTEM4 anomalies. For example, the CDR anomalies tend to be warmer in Australia, while for much of Asia and North America, the CDR anomalies are cooler. However, the pattern is again generally heterogeneous.

Analysis of grid-cell time series
A more consistent pattern is observed in the CRUTEM4 versus CDR slopes ( Figure 15c) and correlation coefficients (Figure 15d) that reassuringly bear close resemblance to Figure 7, which shows the equivalent comparisons with CRU TS. Both correlations and slopes approach unity at mid-to-high latitudes, and are significantly less than one over the tropics.
The correlation coefficients are slightly lower for the CRUTEM4 comparison. This is expected, as the data presented in Figure 15 are anomalies rather than actual temperatures (CRU TS: Figure 7), so they have a smaller range and are therefore more sensitive to small variations (noise). Nevertheless, the strength of the correlations and proximity of the slopes to unity outside of the tropics suggest that both LST actual temperatures and anomalies are well aligned with T 2m in these regions. Figure 16 shows the temperature anomalies from the CDR and CRU TS data sets during August 2003; the first half of this month is characterised by an extreme heat wave that affected much of Europe. The CRU TS data have been used in preference to CRUTEM4 here owing to its higher spatial resolution and availability of T min and T max data, which can be compared directly with the LST night and LST day , respectively. It should be noted that since only monthly data are analysed here, the results are not intended to provide full characterisation of the August 2003 heat wave event.

Case Study: Europe in August 2003
The two data sets share many similar features. Both data sets show warm anomalies over much of Europe, which are particularly strong for LST day /T max ; the presence of elevated daytime and nighttime temperatures in this month is consistent with previous studies on the August 2003 heat wave event [Dousset et al., 2011;García-Herrera et al., 2010]. The magnitude of the anomalies is more extreme for LST than for T 2m . For LST day this is likely to reflect the clear-sky-only data acquisition. For the LST night /T min comparison, this is expected because LST night is acquired at ~10 pm local solar time, when temperatures are still influenced by day time heating, whereas T min typically occurs just before dawn. Therefore, as evident in Figure 16, the LST night anomaly pattern shares features with both the T min and T max anomaly patterns.
The bottom two panels in Figure 16 show the CDR-CRU TS anomaly difference maps, with the locations of the CRU TS stations overlaid as black filled circles. CRU TS has larger uncertainties where station density is low, therefore one might anticipate larger CDR-CRU TS differences in these areas. While the general pattern does not fully support this, most of the largest CDR-CRU TS differences do occur in station voids, for example, South-West France (LST night -T min only), eastern France, Central Germany (LST day -T max only) and northern Scandinavia/North-West Russia (Murmansk province). It is also notable from Figure 16 that the satellite data present a great deal more spatial structure and detail in the temperature variability than CRU TS.

Discussion: Using satellite LSTs to augment T 2m observations
LST observed at IR wavelengths represents the temperature of the top few micrometres of the earth surface. From space, this corresponds to an 'ensemble directional radiometric temperature', which is the aggregate of all radiometric surface temperatures within the satellite field of view in the direction of observation [Dash et al., 2002;Norman & Becker, 1995;Li et al., 2013]. Over dense vegetation, a satellite-observed LST may approximate to the canopy temperature. This is not the same as the ambient air temperature measured at weather stations at ~2m above the Earth's surface, which has traditionally been used in climate and weather applications. In addition to this geophysical difference, satellite IR LST data are also limited to cloud-free scenes, whereas station-based T 2m estimates are all-sky.
The clear-sky bias of satellite IR data is known to affect long-term observations of upper tropospheric humidity, for example [John et al., 2011], so it is natural to anticipate this may also be an issue for IR LST.
Despite these fundamental differences, the results presented here and in other studies demonstrate that satellite LST and T 2m are strongly related, with LST and T 2m closest at night, or under cloud [Gallo et al., 2011;Good, 2015;Good, 2016;Mildrexler et al. [2011]; Sohrabinia et al., 2014]. The relationship between LST night and T minwhich are usually observed when solar heating is absent -should be less affected but not completely free from clear-sky bias because the surface cools more efficiently at night under clear skies compared with cloudy skies, leading to higher sampling of colder LSTs occurring in these conditions.
The results presented in this study support this and show that the monthly CDR LST night data are particularly well aligned with monthly T min and even T mean , both in actual temperatures and anomalies. For applications that can tolerate an uncertainty of up to 5 °C, LST night could provide a reasonable proxy for T min for locations without ground-based observations. Where more accurate T min data are required, estimates may be obtained through simple models that predict T min from satellite data and other parameters, such as those proposed by Benali et al., [2012], Good [2015], etc.
The comparison between LST day and T max presented in this study suggests LST day may also provide useful new temperature data. In areas of very dense vegetation, LST day and T max can be close (within a few °C). Over more sparsely vegetated and bare surfaces LST day can exceed T max by much more than this (up to >10 °C), such that LST day may not be a viable direct proxy for T max . Satellite T 2m models can also play a role here to provide more accurate estimates of T max . LST day may also be biased by the clear-sky sampling, implied by the more extreme anomalies present during the August 2003 European case study introduced in section 6. However, the time series analysis of anomalies over different regions discussed in section 5.1 does not seem to show any clear-sky bias effects, which suggests that spatial averaging of anomalies may reduce the problem. However, the dependency of LST day on insolation clearly causes problems in generating a temporally homogeneous LST product from sensors with different overpass times.
Given that IR satellite data offer near-complete global coverage, particularly if composited in time, there is a clear role for the use of LST data in climate and weather applications. A further benefit of satellite data over ground-based T 2m observations is in instrumental and methodological consistency: a single instrument with a single retrieval methodology can potentially provide a globally consistent product, whereas in situ data are collected using different instruments at each site using different practices (e.g. observation times). A satellite data archive can also be reprocessedfor example, using an improved retrieval or calibration technique -in a consistent way, whereas in situ data often come with missing or erroneous metadata, so that applying retrospective corrections or improvements can be problematic.
Lastly, satellite data are often available in very-near real time, which enables a quick response time to monitoring events. For example, LST from SEVIRI is provided operationally by EUMETSAT within two hours of acquisition. Some international station T 2m data, on the other hand, can take several days to weeks to be received by data producers, delaying the output of gridded data sets for monitoring. A major limitation of satellite LST dataparticularly polar-orbiting -is that they provide clear-sky 'snapshots' in time. For the ATSR CDR, this time is at 10:00 am/pm, which is a limiting factor for studies that require knowledge of maximum and minimum surface temperatures that usually occur at other times of the day. Nevertheless, this study suggests that the ATSR data can still provide useful information, particularly where the station network is sparse. It is highly unlikely that satellite LST data will ever replace conventional T 2m observations. However, the benefits of a synergistic approach seem clear, using multivariate station measurements as a complementary observing array that is essential to ensure adequate understanding of uncertainties in LST.

Conclusions and outlook
This paper presents a comparison between a new >17-year, monthly satellite LST data set derived from the ATSR series and ground-based observations of T 2m . The LST-T 2m difference is characterised in space, by season, land cover type, vegetation fraction and elevation. (Note: some of these influencing factors may co-vary but this is not addressed here and each influence is considered separately in this study.) LST night is typically warmer than T min (global median = 1.8 °C), as expected given the ~10 pm local solar time satellite overpass and typical near-dawn timing of T min . LST night is highly correlated (>0.9) and has a near one-to-one relationship with T min outside of the tropics. The LST night -T min interquartile range is 3.8 °C, indicating that LST night is often close in magnitude. This strong coupling means that for some applications, LST night may provide a reasonable proxy for T min . The LST day -T max variability is higher (median = -0.1 °C, interquartile range = 8.1 °C) and more extreme: LST day tends to be higher than T max when insolation is higher but can also be cooler, e.g. at high latitudes during winter months, or over snow or ice. LST day and T max are not as well coupled as LST night /T min , but actual temperature correlations are still typically >0.9 at mid-to-high latitudes.
The LST-T 2m difference depends strongly on vegetation fraction and land cover type, particularly for LST day /T max . The largest positive LST-T 2m differences occur over bare surfaces: both LST night and LST day tend to be warmer than T min and T max , respectively, and the difference increases with decreasing solar zenith angle (higher insolation). LST-T 2m differences approach zero with increasing vegetation fraction. LST day is typically cooler than T max over fully vegetated surfaces owing to surface cooling by evapotranspiration, with negative LST day -T max differences observed frequently for the forested land cover types. In contrast, LST night tends to be slightly warmer than T min for nearly all surface typesagain, this is attributed to the 10 pm local solar time overpass of the ATSR.
LST night -T min differences are stable with varying elevation. However, the LST-T 2m coupling weakens with increasing elevation, evidenced by lower correlation coefficients and regression slopes (T 2m versus LST). This is particularly apparent for LST day -T max .
The CDR global time series shows remarkable agreement with CRUTEM4, with a correlation between the anomaly data sets of up to 0.9 for the globe, with up to 90% of the CDR anomalies falling within the CRUTEM4 T 2m uncertainties. This gives useful verification of the CRUTEM4 monthly anomalies since the CDR is a completely independent data set. However, the time series analysis presented here suggests that the CDR is not free from errors arising from non-climatic effects and there is a discrepancy between the ATSR-2 and AATSR portions of the CDR, resulting in an inhomogeneous time series. This is attributed to the overpass time correction applied by the data set providers to align the ATSR-2 data with the AATSR overpass time, which is 30 minutes earlier. The LST night time series appears more stable than the LST day time series, which is expected given the dependency of LST on insolation.
LST anomalies appear to be surprisingly well connected to T 2m anomalies in space and time.
Grid-box (5 ° lat/lon) correlations between the CDR and CRUTEM time series are typically >0.7 and very often >0.8 outside of the tropics. Analysis of the August 2003 European anomaly maps show that LST anomalies quite closely resemble the equivalent T 2m and may add information where in situ observations are sparse. The LST maps also show more detail and structure, which will be useful where high resolution information is needed.
Although the ATSR ceased operations in 2012, the analysis presented here is relevant to the ATSR successor, SLSTR, which was launched in 2016, and other IR imagers such as MODIS and SEVIRI. It is hoped that this study will provide some of the foundation for use of LST data in climate applications. The results of this study suggest that the ATSR CDR LST night may be useful for time series analysis of LST, but that LST day is not temporally stable enough for this application, at least prior to the AATSR. However, the ATSR CDR LST day data are still useful for other applications where temporal stability is less critical, for example, where a climatology of LST is required for knowledge of the 'typical' (background) surface temperature for a particular scene, for informing gridded estimates of T 2m , or the study of surface fluxes through the analysis LST-T 2m differences. The next release of the ATSR CDR will include the uncertainties associated with the temporal correction applied to the ATSR-2 LSTs to account for the difference in ATSR-2/AATSR observation time, which should enable users to make better use of these data. It is implausible that LST will replace T 2m as the surface temperature variable of choice over land for many applications, since it represents a different physical quantity and has a comparatively short record length. However, it seems clear that it offers benefits both where LST is the relevant variable and in augmenting T 2m data from meteorological stations, particularly in data-sparse regions or where a high level of spatial detail is required.

Acknowledgements
This study was carried out as a user case study within the framework of the European Space Dakota. The SRTM elevation data were obtained from NASA JPL (http://www2.jpl.nasa.gov/srtm/). Please contact the author, Elizabeth Good (email elizabeth.good@metoffice.gov.uk), for the IDL code used to produce the analysis described in this article. The authors would like to help the three anonymous reviewers, whose feedback has helped to improve this article. Perm. snow/ice Permanent snow and ice Table 3: Relationships between time series of anomalies from CRUTEM4 (adjusted to 1995-2012 baseline) and the CDR. R is the correlation coefficient. 'Trend' indicates the trend of the differenced time series (CDR minus CRUTEM4) with time. Results are for spatially-averaged time series using all grid cells available for each data set ('all data') and only grid cells that are available in both data sets ('spatially matched'), and separately using the mean LST time series ('Mean'), and LST night only ('Night'). Results in brackets indicate trends that are not statistically different from zero (see Section 3.3). Antarctica is excluded from the analysis owing to the very small number of data points in this region.    The CDR anomalies are also referenced to this period. Panel (a) shows averages using all available data points for each data set, while (b) indicates averages using only cells where both data sets are present. Panel (c) shows the respective CDR minus CRUTEM4 differences where 'All' corresponds to the data shown in (a) and 'Match' to the data shown in (b). The 95% confidence interval (CI) for the difference trends are indicated on the plot. Shading represents the total uncertainties associated with the CRUTEM4 time series (sourced from http://www.metoffice.gov.uk/hadobs/crutem4/data/diagnostics/index.html).