An environmental assessment of land cover and land use change in Central Siberia using quantified conceptual overlaps to reconcile inconsistent data sets

Environmental monitoring and assessment frequently require remote sensing techniques to be deployed. The production of higher level spatial data sets from remote sensing has often been driven by short-term funding constraints and specific information requirements by the funding agencies. As a result, a wide variety of historic data sets exist that were generated using different atmospheric correction methods, classification algorithms, class labelling systems, training sites, map projections, input data and spatial resolutions. Because technology, science and policy objectives are continuously changing, repeated natural resource inventories rarely employ the same methods as in previous surveys and often use class definitions that are inconsistent with earlier data sets (Comber, Fisher and Wadsworth 2003). Since it is generally not economically feasible to recreate these historic land cover/land use data sets, often inconsistent data sets have to be compared. An environmental assessment of land cover and land use change in Central Siberia is presented. It utilises several different digital land cover maps generated from satellite data acquired in different years. The specific characteristics of different land cover maps create difficulties in interpreting change maps as either land cover/land use change or a pure data inconsistency. Many studies do not explicitly deal with these inconsistencies. It is argued that a rigorous treatment of multi-temporal data sets must include an explicit map of consistency between the multi-temporal land cover maps. A method utilising aspects of quantified conceptual overlaps (Ahlqvist 2004) and semantic-statistical approaches (Comber, Fisher and Wadsworth 2004a,b) is presented. The method is applied to reconcile three independent land cover maps of Siberia, which differ in the number and types of classes, spatial resolution, acquisition date, sensor used and purpose. A map of inconsistency scores is presented that identifies areas of most likely land cover change based on the maximum inconsistency between the maps. The method of quantified conceptual overlaps was used to identify regions where further investigations on the causes of the observed inconsistencies seem warranted. The method highlights the value of assessing change between inconsistent spatial data sets, provided that the inconsistency is adequately considered.


Introduction
There is a common problem in the environmental sciences that frequently when a new survey is conducted it creates a new 'baseline' rather than faithfully repeating an established procedure. There are good reasons why this happens: Scientific knowledge advances, policy objectives change and technology develops. Unfortunately these developments can make it difficult to disentangle changes in the phenomena from changes in their representation in the data sets. Of course for some subjects, e.g. studies at geological time scales, we can be sure that our understanding of the phenomena is changing faster than the phenomena itself is changing. With geological maps of the same area we know that the differences represent changes in knowledge, technology and objectives, not that the geology itself has changed (with very few exceptions). For phenomena such as land cover and land use, we are less certain whether an observed difference is 'real' (the land cover or land use has changed) or whether it is a different representation of the same entity.
In the past this problem was less acute because a given map was used to support a description of the phenomena, libraries are full of hundreds of pages of monographs describing the issues, the methods and the implications behind a particular survey. In these monographs the maps are usually attached at the back of the book and are used in conjunction with the report. With the development of GIS, web-based mapping and the Internet, the 'map' may be the only information that is available; the monograph may not be available to the user and may not even have been produced (Fisher 2003). Not only has the 'balance of power' between the graphical (map) and textual (monograph) changed, but crucially the user may wrongly treat the map as measured data and not as information (an interpretation of the measurements). In the case of remote sensing data, the direct measurements are top-of-atmosphere radiances at different wavelengths, and their interpretation could be a classified land cover map.
The implication of not having the accompanying 'monograph' to a data set is that the user may assume that their 'mental image' of the world prompted by the class labels exactly matches that of the data producer. Everyone is familiar with land cover and land use, and hence there is an inbuilt assumption that the common terms or labels, such as forest, wetland, grassland, etc. relate to a common physical reality. For example, if you say the word 'forest' to someone then that word will generate an image in their mind, they will have a clear idea of what they think you are talking about; but, there is no guarantee that their image and your image are the same. Figure 1 shows some of the differences between official definitions of forest used by different countries (Gyde Lund 2005). In fact the situation is even more complex than indicated by Figure 1  countries include land where there is intention to plant trees (for example, the UK); other countries exclude land where the trees are not growing fast enough (e.g. Eire). Similarly, neighbouring countries can differ as to whether or not to include bamboo and palms as 'trees' for the purpose of defining a 'forest'.
The objective of this article is to introduce the approach and demonstrate the application of 'quantified conceptual overlaps' to a real study of land cover change in Siberia using inconsistent land cover data that were generated by different producers and using different class definitions. The general and widespread issue of reconciling inconsistent data sets is discussed.

Remote sensing data
All the presented maps are derived from satellite remote sensing and none of them has a significant amount of local validation. Land cover data are available from two global data products and one regional study: International Geosphere-Biosphere Program (IGBP) presents a global classification of 1 km advanced very high resolution radiometer (AVHRR) data collected between April 1992 and March 1993 with 17 classes in the study area. Global Land Cover 2000 (GLC 2000) is a global classification based on 1 km SPOT-VEGETATION (Satellite pour l'Observation de la Terre), AVHRR and other data collected between November 1999 and December 2000 with 29 classes. SUC (University of Wales at Swansea) is based on moderate resolution differential imaging spectrometer (MODIS) 500 m imagery with 16 classes, two maps were available corresponding to data acquisition dates in 2003 and 2004. Figure 2 shows a small area of about 200 km across, centred on 55 20¢N 92 20¢E in the Siberian data sets. Arbitrary colours are used because the intention is to illustrate and emphasis that despite the different class systems used some landscape features are apparent (some very high resolution images and a few photographs of these regions are available through GoogleEarth). There are some features that are clearly differentiated in some classification systems but not in others. Our approach is to try and maximise the use of the inconsistent information available from these different remote sensing surveys.

A novel approach to inconsistent data
The most common approach to inconsistent data is to thematically aggregate each data set into a reduced number of common classes. This relies on the often implicit assumption that the aggregations are significantly more similar than the initial classes and that each individual class of each data set falls naturally and discretely into an aggregate class. An extreme example is Wulder, Boots, Seemann and White (2004) who reduced the land cover of Canada to just two classes: 'forest' and 'not forest'. Even in cases where the aggregation assumption is warranted aggregating classes reduce the thematic content of the data and hence the types of changes and comparisons that can be made. A less extreme example of aggregation is shown in Figure 3 (Flety, George and Balzter 2004) where two land cover schemes (the IGBP and GLC 2000 classifications) are aggregated into eight 'super' classes.
The aggregation approach is based on expert opinion and it can be seen that in this case for example, the expert considers that the 'forest/cropland complex' should contribute to the 'forest' class and not to the cropland class. Under any aggregation scheme a class can contribute to only one 'super' class. Although not strictly mandatory it is typical in an aggregation approach to degrade the spatial resolution of the information to the coarsest spatial scale.
To address these issues of lack of consistency and comparability, FAO has developed a land cover classification system (LCCS) to overcome the problems with aggregation (Di Gregorio and Jansen 2000). LCCS describes each land cover class in terms of the biophysical properties of that class in a hierarchical tree structure. Studies adhering to this system will simplify cross-comparisons with other LCCS-compatible projects although there will still be issues of how similar two classes defined in the LCCS are to each other. However, still a whole range of historical data sets exist that could be analysed if a method for dealing with uncertainty and inconsistency was available.
Recently an alternative approach to aggregation has been proposed in a series of papers (Comber et al. 2004a,b) and termed a combined semantic-statistical procedure. In essence this extends the aggregation approach in two ways: 'many-to-one' relationships to 'many-to-many' relationships; 'yes/no' to 'yes/don't know/no'. Expert opinion is sought on the relationship between individual classes in each data set; the expert expresses the relationship between all possible pairings as being 'expected' (1 or yes), 'unexpected' (-1 or no) or 'uncertain' (0 or don't know). By 'expected', we mean that the presence of that class increases our confidence in a particular attribution or that the classes are very similar; by 'unexpected', we mean that it decreases our confidence or that they are very different and 'uncertain', it neither increases nor decreases our confidence in an attribution nor is the expert certain how similar they are. These relationships are set out in a series of look-up-tables (LUT). One LUT is required between each pair of data sets; a LUT is also required to specify the similarity between classes within each data set (this LUT captures information on the ability to separate classes in theory and in practice). Table 1 shows part of an LUT from one expert comparing the SUC and the IGBP classifications. It can be seen that in the area of interest there are three cases where there is an exact one-to-one relationship: 'evergreen needle-leaf forest', 'deciduous broadleaf forest' and 'deciduous needle-leaf forest'. This is indicated by a single 'expected' (+1) relationship (note that only the relevant part of the LUT is shown). In the case of the IGBP 'mixed forest' class, the expert believes that it has a positive relationship with several of the SUC classes; conversely the expert attributes no more than an uncertain relationship from the IGBP 'closed shrublands' class to any of the SUC classes.
All commonly used algorithms to classify satellite remote sensing estimate how similar each pixel is spectrally to any possible end point (classes). This distance between the attributes of pixel and the exemplary classes can be considered as a measure of the quality of the classification at that pixel. If the pixel is very similar to a desired class, it has a higher quality than if it is equidistant between two or more alternative class clusters in the spectral feature space. Although these distance or quality measures have to be calculated, they are very rarely provided to the user; the typical data producer provides only a global rather than spatially explicit estimate of quality. If the landscape can be thought of as a series of patches (objects, segments, polygons) each consisting of several pixels, then the diversity of pixels within the patch can be used as a surrogate for the consistency that is calculated by the producer but not given to the user. It is this diversity of pixels within the segment that is used with the LUT to estimate the extent to which the occurrence of classes in one data set are consistent with the distribution of classes from a second data set. The semantic-statistical approach can be considered in relation to a hypothetical segment shown in Figure 4 together with the associated LUT. Note that we are explicitly saying that within a classification some classes are more closely related to certain classes than to others and that the landscape can be characterised as a series of objects, each larger than a single pixel, which are represented by the segment structure.
The segment contains four different classes (types A-D); for each possible class we can calculate the expected, unexpected and uncertain scores from the LUT. For example for class A the expected score is 18, the uncertain score is 7 (four class B pixels plus three class C pixels) and the unexpected score is 1 (the single pixel of class D). For class B the expected score is 4, the unexpected score 21 (18 class A plus three class C) and the unexpected score is 1 (the single class D pixel); and so on for all other classes. These scores can be converted from a count to a proportion by dividing by the total number of pixels.
Suppose that we have a second map that uses similar but different classes, like those shown in Figure 5. Using the LUT between the first and second data set we can calculate the expected, unexpected and uncertain scores that the segment is class A from the second data set without having to aggregate any of the classes. In this case the expected score is 19 (class X), the uncertain score is 5 (class Z) and the unexpected score is 2 (class Y).
In cases where it is sensible to interpret proportions as probabilities, there are a number of techniques to combine the scores to determine whether the new evidence from the second map has increased or decreased our belief (the probability) in any particular attribution. As we have an explicit representation of uncertainty, the formalism of Dempster-Shafer (Shafer 1976) seems a natural choice. Dempster-Shafer can be considered as an extension of Bayesian statistics. It extends Bayes by introducing the concept of 'plausibility' which is belief plus uncertainty (where belief + uncertainty + disbelief = 1). What this means is that unlike other schemes a weak belief in a proposition does not have to imply a strong belief in its negation nor is there is a requirement to distribute the strong disbelief between specific alternates. Using the formalism  developed by Tangestani and Moore (2002) combining beliefs can be carried out numerically using the following two equations: In our numerical example from the Siberia case study, Bel 1 and Bel 2 are the beliefs from the inconsistent land cover maps 1 and 2, Unc 1 and Unc 2 the uncertainties and Dis 1 and Dis 2 the disbeliefs. First, frequencies of scores are counted within a segment. Second, converting the score frequencies to proportions and interpreting them probabilistically as beliefs, uncertainties and disbeliefs we have for the 'hypothesis' that the segment is class A: Therefore: In this case our belief has increased (from 0.692 to 0.901) with the addition of the extra information; therefore, we consider that this segment is consistent. This method is very effective in identifying consistent and inconsistent land parcels. In a test site in the UK (with two different land cover maps) (Comber et al. 2004a) it was estimated that only 2% of the consistent parcels had in fact changed; that is, 98% of the consistent parcels really were consistent. However, approximately 80% of the inconsistencies were because of error (misclassification) rather than to a real land cover change.

Problems with the semantic-statistical approach
There are two problems with the semantic-statistical approach; the number of decisions that need to be made and the subtlety of the decisions. The first problem is that the method relies on the expert being able to consistently compare all possible pairs of classes from all data sets; as the number of data sets increases, the expert is required to understand the nuances between more and more categories. In the case of the presented Siberia study with three data sets having 16, 17 and 29 classes respectively, the expert is required to determine over 3500 relationships. Not only is this a large number of decisions but the expert needs to maintain consistency across and between classifications. In practice this is difficult if not impossible to achieve. The tendency is towards iterative approaches where the data start to influence the interpretation of meaning (semantics). The second problem is that the allocations are still very 'crisp', i.e. a class cannot partially support another class. Pairs of class definitions are either fully consistent, fully uncertain or fully inconsistent. Going back to the fragment shown in Figure 2, the IGBP 'closed shrubland' class is some type of woody vegetation, but the only choice the expert has is to say it fully supports a forest class attribution or that it does not provide any support but does not contradict the attribution either. If more graduations of relationship are added then it might be more realistic but the expert is in even more difficulty in maintaining consistency. The large number of comparisons that the expert is required to make in the semantic-statistical approach is inconvenient and potentially error prone. One of the reasons why it is difficult to maintain consistency across the decisions is that the choices are 'opaque'. It is not possible for an external scientist or data user to reconstruct why an expert decided that a class relationship was fully consistent rather than fully inconsistent or uncertain. In comparisons between experts (Comber, Fisher and Wadsworth 2005), some differences can be inferred from their relative familiarity with how the data were produced and their experience of using it within a particular domain, but this does not help reconstruct the basic assumptions and knowledge leading to the individual decisions.
A method has been sought to disaggregate the experts' decisions so that it becomes possible to 'audit' the decisions and understand why a particular choice was made. Ahlqvist (2004) discusses a number of different algorithms and weighting schemes to quantify the conceptual overlap and distance/similarity between land cover classes. Experience with the semantic-statistical approach shows that at least some experts consider the relationship between some land cover classes to be asymmetric; therefore, measures of overlap (which can be asymmetric) are preferable to measures of distance/similarity (which must be symmetric). The degree of overlap can be 'mapped' onto the expected, uncertain and unexpected classes used in the semantic-statistical approach or it can be used directly to estimate a measure of consistency.
Using six classes from a widely used land use/land cover classification scheme (Anderson, Hardy, Roach and Witmer 1976), Ahlqvist (2004) presents the degree of overlap and distance between the classes using four 'approximation spaces' or 'domains' namely; intensity of use, food and fibre production, crown closure and tree species (actually a binary choice between deciduous and evergreen). Each class can be assessed more or less independently within each domain; complete independence is not possible for a qualitative domain like 'intensity of use' because the allocated value is relative to the other classes. Measures from Bouchon-Meunier, Rifqi and Bothorel (1996) can be applied to both continuous domains (Equation (7)) and to non-ordered qualitative domains (Equation (8)).

O p
where f pA (x) and f pB (x) represent the values of concept (classes) A and B at location x in domain p; and p A and p B are the properties of concepts A and B in domain p. The overlap metric can vary from 0 (no overlap) to 1 (class B is a subset of A). Classes will overlap to a different degree in each of the domains. Once the overlap has been calculated for each domain the total overlap can be calculated as the weighted sum of all the overlaps. For example, 'deciduous broadleaf forest' is a subset of 'forested land'; therefore, the overlap between deciduous broadleaf forest and forested land is 1. On the other hand, forested land contains several different types of forest; therefore, the overlap is less than 1. Equal 'salience weights' were used by Ahlqvist (2004) to generate an average overlap across all the domains; here we take the inverse of the breadth occupied within the domain as a measure of the diagnostic power.
For the Siberian data presented here, five domains were selected: photosynthetic activity/biomass accumulation; wetness; human disturbance; seasonality/phenology; and vegetation height.
In each domain, the possible range of values was divided into 10 classes and treated as if they were qualitative (i.e. Equation (8) is used). Within the domain an individual class is categorised as being present (1) or absent (0) from a particular division. One advantage of this method is that it allows an explicit representation of why the expert considers certain classes to be similar. Table 2 illustrates part of the table showing the relationship between classes in the 'human disturbance' domain.
Classes from the GLC data are prefixed 'g' (g27, etc.), from the IGBP with 'i' (i13, etc.) and the SUC classes with 's' (s3, etc.). From Table 2, it can be seen that in this domain the expert is assuming that the 'barren ground' and 'bare soil/rock' classes are a natural climatic feature and not the result of human disturbance (as they would be in some environments). The expert also believes that the various cropland type classes overlap with urban and grassland type classes, but that there is no overlap between urban/built classes and grassland classes. Using Equation (8) on this domain the class conceptual overlaps can be quantified as overlap (croplands, urban) = 1.0, that is urban is a subset of croplands; overlap (urban, croplands) = 0.5; overlap (cropland, grass) = 0.67; overlap (grass, cropland) = 0.5; overlap (urban, grassland) = 0.0; that is, they are disjoint concepts in this domain.
The overlap between all classes is calculated for all domains and then a 'weighted average' overlap is calculated. Weights for each class in each domain were derived from the specificity  of the allocation; the broader the range of values in a particular domain the lower the weight given to that domain. It could be possible to use the degree of overlap directly as an indication of the belief that two classes are consistent. Using the overlap directly tends to generate an unrealistically high level of consistency between the maps; this is especially evident in the forest classes where the subtly of the distinctions made in all three systems (SUC, GLC and IGBP) are not adequately reflected in the five 'domains' selected to describe the totality of the variation in land cover. Using the overlaps directly leads to similar results to an aggressive aggregation approach. The overlaps were thus translated into 'expected', 'uncertain' and 'unexpected' using the arbitrarily selected thresholds !0.9 for 'expected', 0.9-0.6 for 'uncertain' and 0.6 'unexpected'. These thresholds were chosen so that the proportions of expected, uncertain and unexpected were similar to the proportions that experts use.

Results
The degree of overlap calculated using Equations (3) and (4) resulted in similar patterns of conceptual overlap as those produced by expert opinion following the Comber et al. (2004a,b) semantic-statistical methodology. Figure 6 shows a dendrogram based on the weighted average overlap values. Some apparent idiosyncrasies are evident; for example, two 'wetlands' classes are more closely related to 'grasslands' than to 'bogs', which are themselves closely related to 'tundra' classes. It is evident from the values entered in the domains that the expert visualises the wetlands as tall, lush fens at lower latitude while the 'palsa bogs' and 'sedge tundra' as high latitude, sparse, stressed communities. The most distinct classes -with the lowest average overlap and the largest number of zero overlaps with any other classes -are found to be 'permanent snow and ice' (GLC class 25) and 'snow & ice' (IGBP class 15). These have an average overlap of 0.16 with all other classes and they have no overlap at all with half of the other classes. The next most distinct class is 'salt pans' (GLC class 29) with an average overlap of 0.24. The least distinct class is 'recent burns' (GLC class 19) Eight test sites were selected corresponding to regions where Landsat data were available. As they had been unclassified, a visual assessment of the land cover in those regions was carried out. The eight test sites cover a north-south range of just under a 1000 km spanning environments from the 'pure' tundra in the far north through the forest/taiga, dense forest to the (slightly) more settled southern boundary.
To apply and test the proposed method, a single expert was asked to generate a LUT between the SUC, GLC and IGBP data sets. The consistency between the SUC04 land cover and SUC03, GLC and IGBP data is shown in Table 3. Note that consistency is defined as parcels where the belief increases or remains the same after the addition of the extra information. This is a rather strict definition of consistency. Tables 4 and 5 summarise the main causes of the major inconsistencies between the SUC04 map and the GLC and with the IGBP data for each test site. A significant  discrepancy is considered to be 500 km 2 or 5% of a study site. The 'dominant' class is defined as the most frequent class in the patch concerned. Because of the definition of consistency it is possible to have a patch where the most frequent class in both data sets have a very high degree of overlap but where the patch is still considered to be inconsistent. This is usually the result of the patch being very heterogeneous in the earlier data set; for example, the discrepancy in five 'croplands' to 'croplands' patches in site 134023. The agreement between the SUC03 and SUC04 data sets is very high; as they are produced by the same team using the same algorithm with data from the same satellite (MODIS) and the data are only 1 year apart this is perhaps to be expected. Apparent disagreements between SUC03 and SUC04 are concentrated in areas where the cropland or cropland/forest complex has become more homogeneous over time.
Agreement and disagreement with the GLC and IGBP data indicate some general patterns. In the heavily forested mid-latitudes which are dominated by evergreen needle-leaf forests there is good agreement between all the data sets; the IGBP shows a little less consistency as it allocated more land to the 'mixed forest' class than either of the other data sets. In the south, the SUC 'cropland/forest complex' is mapped as a mixture of 'forest' and 'open' classes by GLC and as 'mixed forest' by IGBP. In the north there is confusion between 'tundra' and 'deciduous needleleaf forest' at one site in the GLC and between 'tundra' and 'woody savannah' in the IGBP data sets. Several hypotheses can be suggested for the inconsistencies in the south of the study region: There is more intensive exploitation of the landscape over time. The landscape characterisation may be a function of the resolution as well as the 'definition' of the classes and the thresholds used. Inadequacies of the expert opinion, especially over the extent of forest heterogeneity, may confound the intercomparison.
Inconsistencies in the north of the region might represent a northward migration of the forest-tundra ecotone that has been reported for some boreal/sub-arctic regions as a result of increasing temperatures. As the data presented here only span a period of just over a decade, these inconsistencies are thought more likely to represent differences in how that ecotone is characterised in the different data sets. In particular, it is unclear how thin and scattered the trees are before the area appears as tundra and the extent to which the tree canopy influences reflectance with data at 500 m and 1 km resolution.

Conclusions
The standard way to manage inconsistent data is to aggregate. Aggregation is wasteful of information and because the assumptions underlying the aggregation are rarely tested there is no guarantee that the aggregation generates a more meaningful interpretation. If it is accepted that the conceptual overlap can be quantified, then aggregating classes must increase the range of any domain they occupy, hence potentially increase the conceptual overlap with other classes. The weighting system described above emphases domains where the concept has a narrow range therefore reducing some of the impact. Alternatively the idea of a quantified conceptual overlap could be used to determine the optimal aggregation to any number of classes. Generally, data aggregation is a subjective process driven by expert opinion but how this opinion was used is not always documented well. The semantic-overlap method appears successful in identifying inconsistent land parcels. In most cases, the inconsistency is more likely caused by misclassification in one or more of the maps than to a real land cover change. Applying this methodology to all four land cover maps means that parcels identified as being consistent are highly unlikely to have changed. When more than two data sets have to be reconciled, the semantic-overlap method reduces some of the complexity facing the expert and by disaggregating their choices makes it slightly easier to identify and describes their conception of what the classes mean.
Data producers routinely know much more about their data than they are willing to communicate to the user. Metadata standards do little to encourage the producer to communicate the meaning of class labels, how and why particular classes were chosen or the quality of the classified map in a spatially explicit way, e.g. through a spatial accuracy assessment. The semantic-statistical method is efficient at distinguishing between consistent and inconsistent parcels; over a short time period, misclassification is likely to be a more important component of inconsistency than change but the method does allow the user to produce a refinement of where change is possible and where it is unlikely. As the number of data sets and classes increases the semantic-statistical method becomes increasingly difficult to apply as the number of decisions increases rapidly with no relaxation in the need to maintain consistency. Simplifying the semantic-statistical method by adopting the ideas of conceptual overlap allows the decisions to be disaggregated into smaller, simpler choices that are also more explicit. Disaggregating the decision process makes it much easier for the expert to review the consistency of their decisions and for others to understand the choices that were made. It is not certain that the five domains used in this experiment are optimal. If an optimal set could be determined and tested at other locations, it may be possible to encourage data producers to communicate more of what they know about the data by describing their classes in terms of agreed universal domains.
The case study of Siberia that was discussed in this article has shown that quantified conceptual overlaps can provide a traceable method to overlay and analyse inconsistent spatial databases. It has demonstrated how experts inform the process and make their decisions transparent in the formalisation of inconsistencies between different classification keys.
Data inconsistency has become an increasing problem as it is becoming easier for data users to get access to spatial data that they are unfamiliar with. At the same time, many data producers document and communicate less about the data characteristics, despite metadata standards. The approach suggested here provides one way to reconcile inconsistent data that preserves the thematic content and spatial resolution of the data and allows a richer understanding of the complexity of a landscape to be developed.