Application of knowledge for automated land cover change monitoring

This paper outlines an approach for updating baseline land cover datasets. Knowledge about land cover, as used during manual mapping, is combined with simple remote sensing analyses to determine land cover change direction. The philosophy is to treat reflectance data as one source of information about land cover features. Applying expert knowledge with reflectance and biogeographical data allows generic solutions to the problem. The approach is demonstrated in areas of semi-natural vegetation and shown to differentiate ecologically subtle but spectrally similar land cover classes. Further, the advantages of manual mapping techniques and of high resolution remotely sensed imagery are combined. This approach is suitable for incorporation into automated approaches: it makes no assumption about the distribution of land cover features, can be applied to different remotely sensed data and is not classification specific. It has been incorporated into SYMOLAC, an expert system for monitoring land cover change.


Introduction
Before satellite imagery became so freely available in the 1970s, aerial photography was commonly used to map land cover. During aerial photograph interpretation (API) land cover is mapped manually. The interpreter combines their specific expertise such as knowledge of the relations that land cover features have with various biogeographical gradients, the landscape context in which different land covers are found, with their appearance in the aerial photograph. By using contextual information much greater land cover detail is captured (Paine 1981, Lillesand andKeifer 1987). However photographic data are relatively expensive and require much human effort to extract thematic information. Now land cover is more usually mapped from remotely sensed imagery recorded by sensors mounted on satellites. Satellite imagery is cheap compared to alternative data sources such as aerial photography, covers large areas and has a high temporal frequency. However the granularity of the land cover information derived from such imagery is limited by the spatial resolution of the data and the number of land cover feature classes that can be reliably identified by their reflectance properties alone. Typically, spectrally distinct cover types are easily classified, whilst other more spectrally heterogeneous land covers are less reliably identified. Improvements can be made by fine tuning the analysis, but the results are frequently instance specific and subjective.
The problem we addressed was how to use satellite imagery to update an ecologically detailed land cover dataset. The cost of a repeat aerial photograph survey with API is prohibitive, the extent of cloud free coverage provided by very high resolution (v5 m pixel) satellite data is poor, and the granularity of land cover information that can be extracted reliably from medium resolution satellite imagery such as Landsat Thematic Mapper (TM) is low.
In this paper we present an approach for determining land cover change direction that uses API knowledge of land cover biogeographical characteristics and class specific knowledge combined with simple remote sensing analyses. We show how this approach: (a) marries the benefits of API with those of satellite remotely sensed data; (b) avoids the specificity of many remote sensing analyses; (c) is generic in terms of its applicability to other change direction problems.

Land cover of Scotland 1988
The Land Cover of Scotland 1988 (LCS88) survey (MLURI 1993) provides a baseline census of land cover information. It was manually classified from an aerial photograph survey at 1:24 000 scale, before being digitized into a Geographic Information System (GIS). The objective of LCS88 was to record information specific to the Scottish landscape, particularly upland semi-natural vegetation and to this end it describes the distribution of 126 land cover classes.

Mapping semi-natural vegetation from satellite imagery
Mapping semi-natural land cover from remotely sensed imagery is difficult. A review of the remote sensing literature specific to upland semi-natural vegetation supports this statement. Belward et al. (1990), studying semi-natural vegetation using Landsat TM data concluded that it would be inappropriate to try to match spectral classes with semi-natural land cover classes. Baker et al. (1991) found that spectral classification of SPOT HRV (Systeme Probaboire pour l'Observation de la Terre High Resolution Visible Image Instrument) data alone would not discriminate between semi-natural vegetation types. Whilst Weaver (1987), using simulated Landsat data, concluded that discrimination of moorland vegetation was possible her conclusions have not been endorsed by more recent work that has examined the use of actual Landsat TM data with reference to semi-natural moorland vegetation, such as Wright and Morrice (1997), Gauld et al. (1997), Bird et al. (2000) and Taylor et al. (2000). Wright and Morrice (1997) found it difficult to match LCS88 land cover features to Landsat TM spectral capabilities. Gauld et al. (1997) concluded that unsupervised segmentation of Landsat TM imagery division bore little relation to the ecological classes on the ground. Work on monitoring landscape change in the UK National Parks has showed that current satellite data are not suitable for mapping land cover features and analysing land cover change in UK National Parks containing a large amount of semi-natural moorland and heath land covers .

The use of auxiliary data in remote sensing analyses
Consistent and explicit calls for remote sensing analyses to incorporate knowledge or ancillary data into the classification process have been made (e.g. Green et al. 1994, Mattikalli 1995, Foody and Hill 1996, Stuckens et al. 2000. Mapping of land cover features would be improved if other data were applied (e.g. Holmgren and Thuresson 1998). Due to LCS88 classification detail, this trend is reflected in work that has considered how LCS88 may be updated (Birnie 1996, Horgan et al. 1997, Wright and Morrice 1997. This makes sense for two reasons. First subtle variations in land cover botany may be obscured by sensor specifications such as pixel size (Fisher 1997) and may be difficult to discern due to image specific characteristics (Verstraete et al. 1996) or the nature of the landscape under investigation. Secondly, land cover classes are commonly defined by their biophysical properties such as species composition, biogeographic position and landscape context (Comber et al. 2001).

Summary
The difficulties of identifying detailed semi-natural land cover features from data such as Landsat TM arise because they are: (a) spectrally indistinct (Wright and Morrice 1997); (b) not necessarily defined on their physical reflectance properties alone, rather by other objectives such as policy (e.g. MLURI 1993); (c) only subtly different in botanical terms and class identification may depend on biogeographic context (Comber et al. 2001).
Therefore in semi-natural environments, the advantages of using remotely sensed satellite data (speed of image capture, data cost, areal coverage, repeatability) are offset by difficulties in reliably identifying semi-natural land cover features. In these situations traditional data oriented change methodologies may be inappropriate. Typically conclusions about analyses that proceed in this way are that they work for some sets of classes and in some areas, and not in others (for example, Lyon et al. 1998, Macleod and Congalton 1998, Mas 1999. A further problem is that their specificity makes them difficult to incorporate into generic, expert systems for monitoring land cover change, such as SYMOLAC (Skelsey 1997).

Materials and methods
In this section we describe how knowledge of land cover features from different sources can be identified and then combined. Necessarily this involves some data analysis. The data is described followed by descriptions of land cover knowledge and an outline of how all the information in this section was applied to the change direction problem.

Data
The area of analysis was a 40 km by 41 km area around Elgin in north eastern Scotland. This area contained some 3996 LCS88 polygons. Of these, the classes with populations of w20 polygons were used in the analyses described belowsome 3465 polygons or 91.4% of the test area in total.
A 20 m binary raster grid of each LCS88 polygon was generated using ArcInfo's POLYGRID command (ESRI 2001). Landsat TM data of the area from 1987 (the nearest date to the air photograph survey for which cloud free coverage could be obtained-see Wright and Morrice 1997) and Landsat Enhanced Thematic Mapper (ETM) data from 2000 was registered to the British National Grid from Ordnance Survey point map data and resampled to 20 m. The 20 m cell size was chosen as a compromise between minimizing information loss during LCS88 land cover parcel conversion to raster format and maximizing the information content of the Landsat imagery.
Soil Quality and Soil Wetness datasets were derived from the digital 'Quarter Million' soil series produced by the Macaulay Institute in 1984 (Macaulay Institute For Soil Research 1984). 1 km Mean Annual Rainfall data for the area was obtained. This data is described in Matthews et al. (1994). Ordnance Survey's 50 m DEM was used to generate a Slope dataset using ArcInfo's SLOPE command (ESRI 2001). All of these datasets were resampled using a cubic convolution to 20 m rasters from their original resolutions for ease of data overlay in the analysis using the RESAMPLE command in the GRID module of ArcInfo (ESRI 2001).

LCS88 land cover knowledge
Land cover knowledge is given in three parts. First, we describe the information used during API by experts. This includes the position of land cover features in various environmental gradients. Secondly, we detail how simple descriptions of land cover class reflectance properties can be derived from remotely sensed data. Thirdly, an approach for extracting information about an individual region of land cover change is given.

API
Air-photograph interpreters involved in the LCS88 project were interviewed. Knowledge of how they mapped different land cover classes and the nature of their expert knowledge was identified. This included land cover related facts or principles, rules and heuristics. They described their mapping processes (e.g. which features were identified first and why), class specific information about how they mapped and differentiated amongst each of the LCS888 land cover classes, and the class to class transitions that were possible and under which scenarios. The resulting information, specific to individual land cover classes, included descriptions of the feasible changes, the scenarios under which the changes might occur and information about the typical biogeographical position of each class in a range of dimensions. In API the interpreter identifies specific classes by bringing together all this information. An example of this knowledge for different LCS88 grassland classes is illustrated in table 1.

Reflectance
The objective was to assess the reflectance characteristics of LCS88 land cover classes to determine the extent to which LCS88 land cover classes are separable using Landsat TM data. Each LCS88 polygon grid was used as a template to punch out the appropriate portions of the 1987 Landsat TM imagery in PVWAVE (Visual Numeric 2001). A histogram of the reflectance properties of each land cover polygon, excluding edge pixels, was generated for each band and for a standard Normalized Difference Vegetation Index (NDVI) value. For each polygon, in each band the median value was determined. The median values for the polygons of each class were placed in a histogram, and from that the median and inter-quartile range (IQR) of the class medians were extracted. The median and IQR give an indication of what the typical spectral characteristics were for all the polygons in a given class. The extent to which the reflectance values of the different land cover classes in Landsat TM band 2 were separable is shown in figure 1. Whilst only band 2 is illustrated, the same trends were shown in the other bands and the NDVI. Two clear patterns were evident: individual and cover class spectral overlap and the similarity of Summary class elements, as indicated by the IQRs and medians, respectively.

Generating change area information
An area of change has been identified and which in 1988 formed part of a LCS88 polygon of 'Dry Heather Moorland, no rocks, no scattered trees, no muirburn'. The change area location and context is shown in figure 2 and its spectral properties in figure 3.
The knowledge acquired from the air-photograph interpreters described the typical positions of different land cover classes in different environmental gradients-slope, soil wetness, soil quality and climate (rainfall). The different component soil types were allocated 'wetness' and 'quality' scores from 1 (driest and poorest) to 5 (wettest and richest) by one of the expert soil surveyors at the Macaulay Institute, Aberdeen. The slope values were allocated slope scores of 'very steep' (w25 ‡), 'steep' (16-25 ‡), 'tractor accessible' (9-15 ‡), 'gentle' (3-8 ‡) and 'flat' (0-2 ‡). The mean annual rainfall values were allocated wetness scores of 'very wet' (w1600 mm year 21 ), 'wet' (1200-1600 mm year 21 ), 'average' (1000-1199 mm year 21 ), 'dry' (800-999 mm year 21 ) and 'very dry' (v800 mm year 21 ). These ranges were identified from the API knowledge acquisition exercise ( § 3.2.1). The median position of the change area in each of these environmental gradients was determined in order that its characteristics could be compared to the API expert descriptions. The median and IQR positions of the change area in six bands and a standard NDVI were extracted from 2000 Landsat ETM data to determine the band in which the change area was least variable.

Outline approach
The analysis and application of knowledge in the walkthrough was partitioned into three general stages as follows: Stage 1: Generate a large set of all possible change hypotheses (SET 1). Reduce this set to a smaller set (SET 2) by relegating some of the possible land cover change directions. This stage used expert API knowledge to identify possible transitions.
Stage 2: Compare the reflectance characteristics of the change area with those of the remaining candidate land cover classes now to narrow the set of candidate hypotheses down further (SET 3). This stage uses simple analysis of change area spectral properties to identify the change area summary class. The approach as described in § 3.2.2 with 1987 Landsat TM data (to establish the difficulty in identifying LCS88 land cover classes from their spectral characteristics alone) is now applied to 2000 data. Stage 3: Apply land cover class specific knowledge to differentiate amongst the hypotheses contained in SET 3. At this stage the expert knowledge was returned to in order to determine the land cover change direction.

Results
In this section we describe how the methods described in § 3 were applied to an actual change problem. After describing the 'Walkthrough' example, a series of other results are presented in tabular form. The change area was introduced in § 3.2.3 and is LCS88 class 'Dry Heather Moorland, no rocks, no scattered trees, no muirburn'.

Walkthrough example
Stage 1: Generate a large set of all possible change hypotheses (SET 1). Reduce this set to a smaller set (SET 2) by relegating some of the possible land cover change directions.
The API expert described the class to class land cover transitions that were possible and under which scenarios. For a polygon of 'Dry Heather Moorland, no rocks, no scattered trees, no muirburn' some 66 initial change directions are possible (SET 1). The set is reduced by applying some of the API knowledge, reducing the set to 12 competing hypotheses (SET 2). The rules, and the number of hypotheses they cause to be relegated are shown in table 2.
Stage 2: Compare the reflectance characteristics of the change area with those of the remaining candidate land cover classes now to narrow the set of candidate hypotheses down further (SET 3).
The spectral characteristics of the change area were extracted and then compared with LCS88 land cover populations. The lowest IQR for the change area determined the Landsat TM band in which the change area showed the greatest homogeneity. Table 3 shows that the change area was least variable in band 2, with the lowest IQR. The median positions of the remaining hypothesized land cover change directions were compared with the position of the change area in Landsat TM band 2. Those that were closest form SET 3. Closeness was arbitrarily set at half the maximum distance to avoid specifying a numeric threshold. From table 4, SET 3 contains five elements.
Stage 3: Specific land cover knowledge is applied to differentiate amongst the hypotheses contained in SET 3. Table 5 details the biophysical evidence about the change area and the remaining five hypotheses in SET 3, including expert knowledge about the origins of each of the five candidate changes. According to the experts, the likely transitions from Heather Moorland were to Undifferentiated Rough Grassland and Undifferentiated Smooth Grassland. Of these the change hypothesis with the most 18 'There will be no changes to Peatland vegetation' 7 'Changes to bracken and agriculture are from adjacent areas' 6 'No changes in scattered tree status in 20 years' 6 'Forestry will not be planted and felled in v30 years' 1 support from all the different sources of evidence and land cover knowledge was a change to 'Undifferentiated Smooth Grassland: no rocks, no scattered trees'. Although formal methods for combining such evidence exist (for instance Dempster-Shafer, Bayesian Probabilities, Endorsement Theory), these are not within the scope of this work and are presented elsewhere (see Comber et al., in press).

Validation by field visit
A field visit to the change area was undertaken in June 2001 and the change area was photographed. The photographs were examined in the laboratory by an expert (familiar with the area, field mapping and LCS88 land cover classes) to identify the species composition and land cover present. The photographs of the change area are presented in figures 4 (a) to (e) . Figures 4 (a)  The ecologist considered the change area to have been over-burned in terms of intensity, and in too concentrated an area in 1997 or 1998. As a result the heath is regenerating very slowly, and there is a grassier flush than would normally be expected in a post-burn environment of this age. What these images show is the extent to which the classic dwarf shrub heath found in Dry Heather Moorland (Calluna vulgaris) has been knocked out by the burn. In the ecologist's opinion the land cover of this area has changed to the single feature class of Undifferentiated  Smooth Grassland: no rocks, no scattered trees and may eventually go back to 'Dry Heather Moorland, no rocks, no scattered trees, no muirburn' provided that it is not overstocked.

Other results
Three results of three further examples are described in table 6. In each case land cover knowledge is successfully applied to augment simple remote sensing analyses and identify land cover change direction.  The change direction was correctly identified as one of the two hypotheses. The change was due to mis-management over two different land cover parcels 5. Discussion 5.1. Discussion of results There are two general problems with the approach. First, despite identifying the 'correct' land cover change direction, it is always possible that due to socio-cultural norms the land use gets mapped not the land cover. All land cover classifications confuse the differences in ontology between land cover and land use. So in the walkthrough described in § 4.1 for instance it is possible that the area of change may get re-mapped as moorland with muirburn, rather than the actual cover present on the ground. Secondly, land management has caused all of the changes considered in the walkthroughs. Whilst this has long been recognized, it presents problems when seeking to discern subtle shifts between semi-natural land cover classes. The potential for dramatic changes in the management, by design or by error, are by their nature difficult to predict and model. Yet despite these problems, the approach has shown that it is possible to separate spectrally similar land cover classes (for instance grassland) by applying some general common sense and some land cover class-specific ecological knowledge.

Remote sensing issues
Since satellite imagery first became available in the 1970s there have been considerable developments-increased data availability, more sensors, many resolutions, variable frequencies in the electromagnetic spectrum. However these have not been matched by developments in image processing. Analyses remain specific to the data, the application or the area under investigation with the result algorithms developed for one application will produce different results with another image scene. This is part of the land cover mapping paradigm that is avoided by the remote sensing community: land cover features identified in remote sensing analyses are commonly described in terms of their botanical, floristic, ecological, biogeographical or other biological characteristics. However they are defined from the image data on their reflectance characteristics alone. The reason for this is the primacy given to the remotely sensed data itself. In this work we have taken steps to address this paradox by focusing on how best to achieve the aims of the analysis. This task oriented approach to land cover mapping, as introduced by Skelsey (1997), considers remotely sensed data as only one of a number of useful datasets to be used to solve the change direction problem. In one sense this is already implicitly acknowledged by many land cover mapping exercises that define the classes they identify in terms of their biology.

Generic solutions
The methodology presented here for determining change directions to LCS88 can be readily adapted to other baseline land cover surveys given some signal of change, an expert familiar with the land cover concepts and some environmental data. The stages in this are: (a) identify the land cover transition pairs that are possible; (b) identify the defining biogeographical characteristics of the land cover classes; (c) elicit some simple rules from experts that are familiar with the data and map concepts to eliminate some of the transitions; (d) use simple remote sensing analyses to characterize the land cover classes (we used means and inter-quartile ranges) and to identify the general land cover change direction (in this case at a summary class level), eliminating some further candidate change directions; (e) compare the change area characteristics with those of the remaining possible change directions.
These steps make no assumptions about any underlying distributions of the data. The results presented here have been successfully implemented inside SYMOLAC, an automated land cover monitoring system developed by Skelsey (1997) and extended by Comber (2002).

Conclusions
The main findings of this work are that the integrated approach combining API expert knowledge with simple reflectance characterizations of land cover classes from satellite imagery allows ecologically and spectrally subtle shifts in land cover type to be identified. This method produces a solution that is both inexpensive and at a fine degree of thematic land cover detail, thereby maximizing the advantages of both types of mapping approach. Also it suggests that more meaningful environmental monitoring is possible than current estimations of gross land cover stocks such as 'forest' and 'rangeland'. Of perhaps wider significance are first the applicability of this approach to automated and semi-automated land cover monitoring exercises, and secondly, preservation of the value of original baselines such as the Land Cover of Scotland 1988 Survey which are not lost due to their unrepeatability.