Rapid method for yearly LULC classification using Random Forest and incorporating time-series NDVI and topography: a case study of Thanh Hoa province, Vietnam

Abstract Land-use and land-cover (LULC) mapping in the complex area is a challenging task due to the mixed vegetation patterns, and rough mountains with fast-flowing rivers. In Vietnam, LULC update is not frequently. In this study, we applied a supervised machine learning (Random forest—RF) approach to mapping LULC in Thanh Hoa province, Vietnam from 2011 to 2015 utilizing multi-temporal Normalized Difference Vegetation Index (NDVI) data from MODIS, combined with topographic features. Random forest classification (RFC) reached a total prediction accuracy of 91% and Kappa coefficient (K) of 0.89 across eight LULCs. Besides, the results showed that the features extracted from time-series NDVI comprising the mean of yearly NDVI, the sum of NDVI, and the topography were the important variables controlling the LULC classification. For similar studies on the distribution of LULC, the method proposed in this study could be helpful.


Introduction
The LULC represents a major factor in various analyses, from physical geography and environmental studies to land management, such as deforestation, agricultural planning, and urban growth (Foley et al. 2011;Bhanja et al. 2018;Sasmito et al. 2019;Zhang et al. 2019;Longato et al. 2019;Jiang et al. 2020). LULCs are highly dynamic due to the interaction between socio-economic activities and regional environmental changes. These dynamics necessitate a regular update of LULCs maps (Rujoiu-Mare and Mihai 2016; Bhanja et al. 2018;Zhang et al. 2021;Neinavaz et al. 2021). Traditionally, LULC mapping was conducted by field plotting (Zhang et al. 2002). However, these maps are only updated in a five-year term in developing countries like Vietnam (Nguyen et al. 2020) partly due to significant constraints such as cost and time-consuming, especially in multiple rugged mountains areas with fast-flowing rivers and large size areas (Kavzoglu and Colkesen 2009;Jin et al. 2018). Therefore, remote sensing in current years is applied to detect land cover due to the large geographic extents of covering and high temporal coverage of images. Besides, remote sensing supports the investigation of historical land cover and provides data over inaccessible areas (e.g., rugged mountain areas). Remote sensing data that is rapid, macroscopic, and synchronous are widely used in LULC classification studies (Fichera et al. 2012;Rujoiu-Mare and Mihai 2016;Keshtkar et al. 2017;Teluguntla et al. 2018;Mutanga and Kumar 2019;Liu et al. 2020;Li et al. 2022).
However, the existing LULC products still have several problems with complicated classifications, though remote sensing was applied, especially in complex landform areas. For instance, several areas of grassland, forest, and cropland are not easily distinguished only with the vegetation phenology information from a single satellite image. Therefore, NDVI time-series data, extracted from Landsat, MODIS, and Sentinel-2, is important for LULC mapping (Friedl et al. 2010;Fonji and Taff 2014;Schulz et al. 2021). Time series data-derived temporal features have been proven to improve the accuracy of LULC classification at the regional and national scales (Jia et al. 2014;Hermosilla et al. 2018). Long time-series MODIS data with large spatial scales and good quality is more applicable for setting large-scale maps of LULC classification than high spatial resolution data (e.g., Landsat, Sentinel-2) since MODIS can effectively avoid the cloud and calculation burdens (Zhou and Zhang 2016). Moreover, to generate the precise classification of LULC, not only the vegetation phenology index but also topographic features should be considered due to the effects of topographic features on the LULC pattern (Xiao et al. 2018;Jin et al. 2018). For example, forests located in mountainous and hilly areas limit the urbanization process. The topography of a region determines the flow of water and existing time, thus the shape of water bodies (Xiao et al. 2018).
Models combining remote sensing based on known drivers of the vegetable index, topography, and texture information could provide predictions of LULC needed for multiple purposes across regions. Such models can significantly enhance the accuracy and precision of LULC classification and change assessments. Random Forest (RF) are insensitive to overfitting and generally have high classification performance compared with other machine-learning methods (Breiman 2001;Hastie et al. 2009;Liaw and Wiener 2002). Furthermore, RF also can measure variable importance in LULC classification. Machine learning with satellite images derived many global and regional LULC data products such as the Atlanta, Georgia metropolitan area (Yang and Lo 2002); North-eastern Latvia (Fonji and Taff 2014); Prahova Subcarpathians, Romania (Rujoiu-Mare and Mihai 2016); four federal states including Brandenburg, Lower Saxony, North Rhine-Westphalia, and Rhineland Palatinate, in Germany (Schulz et al. 2021); and the small area mixed LULC located in the county of Uppsala in south-central Sweden (Abdi 2020).
Several studies have analyzed LULC dynamics using the two-phase approach with one image representing one phase (Chapman et al. 2009). Besides, most studies did not thoroughly analyze which auxiliary variable characteristics were important the most to classify complex land-cover patterns (Chan and Paelinckx 2008). Multi-temporal approach with a time-series imaginary combined machine learning approach is a more robust alternative that has been demonstrated to be effective in improving the accuracy of information about LULC dynamics at the regional and national scales (Jia et al. 2014;Hermosilla et al. 2018). Therefore, MODIS time-series data images were chosen in this study to classify land cover in Thanh Hoa province, Vietnam, from 2011 to 2015 using the RF model with auxiliary data like DEM-derived indices to improve the classification. Then, we identified the most important factors controlling LULC classification, and also clarified the sensitivity of the RF model to training sample reduction. Our main goals are (1) to develop a LULC categorization map and (2) to determine the relationship between the primary controlling factors and the LULC.

Study area
Thanh Hoa province, one of the largest provinces in Vietnam with an area of 11,106 km 2 , is located in North Central Vietnam (Latitude: 19 18 0 N-20 40 0 N, Longitude: 104 2 0 'E-106 05 0 E) ( Figure 1). The province is contiguous to the East Sea, has a mountainous terrain, and a dense network of rivers. Thanh Hoa has a tropical climate regime, with an average annual temperature of 26.6 C and annual rainfall of 1964 mm (Statistical Yearbook of Thanh Hoa, 2015). The fluctuation of monthly climate conditions is described in Supplementary Fig. S1. The Ma River Delta (also known as the Thanh Hoa Delta) is the third-largest delta in Vietnam, formed by the area along the Ma River. Although the basin has a small amount of agriculture land, 65% of which is used for rice cultivation (World Bank 2007), agricultural activities nonetheless account for 35% of GDP in the region.

Data and sample selection
2.2.1. Multi-temporal normalized difference vegetation index (NDVI) A total of 115-time series 16-day MOD13-Q1 images of Thanh Hoa province from January 2011 to December 2015 were downloaded from the web of https://lpdaacsvc.cr. usgs.gov/appeears/. We chose the period 2011-2015 for LULC mapping because of the best availability of reference data in this period. The NDVI is a widely used metric for LULC classification that can be used to capture vegetation characteristics (Pittman et al. 2010). NDVI ranges from À 1 to þ1, with a high value of þ1 indicating the highly vegetated and close to zero value or less indicating the surface without vegetation. Cloud pixels were removed from time-series images, and interpolation was conducted to fill null data. Table 1 shows temporal features of NDVI time series and explanations. Besides, the variables' selection was also referenced in Jin et al. (2018), Schulz et al. (2021) and Zeferino et al. (2020).

Topography data
The Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) of the study area was used to derive topographical indices. The DEM was downloaded at https:// earthexplorer.usgs.gov/. Then, topographic indices including aspect, slope, tangential curvature, and profile curvature were derived from DEM (Table 1).

Sample selection
Reference data is the core ingredient in machine learning, and most models require thousands of reference data samples. However, identification of reference data and collection over large, remote, or distant areas in the training process can be a challenging task (Chi et al. 2008). Existing LULC maps were used as reference data. For example, 1970s thematic maps were used as training data to classify Landsat-1 data from 1973 to investigate the change in LULC in Ca Mau province, Vietnam (Tran et al. 2015).
Existing LULC maps with appropriate overall accuracy, on the other hand, can efficiently generate a large number of training samples, allowing a wide range of features to be represented (Hermosilla et al. 2018). Furthermore, existing maps also were used to random check model accuracy to minimize misclassification. In this study, we used initial training and validation sample selection with the LULC maps developed by Environment and Natural Resources Department, Thanh Hoa province. These samples were then compared to RGB and false-color composites of the multi-temporal Landsat 8 scenes to discern any difference, such as bare-land or built-up areas. The sample labels modified similarly in both methods were selected as reference samples. The eight LULC categories (bareland, crops, forest, grassland, mangrove, rice paddy, urban and built-up, and water) and the total samples and number of samples for each are listed in Table 2.

Model
Random forest (RF) is a machine learning method introduced by Breiman (2001) that allows improving prediction and accuracy in classification without overfitting the data. The classification and regression trees (CART) are the foundations of RF. In this study, the reference dataset applied in RF includes LULC categories as response and 15 candidate predictive variables with high information rates for LULC classification: ten describing NDVI characteristics and five topographic indices that were already mentioned in detail above in Table 1. A total of 3710 reference samples with sample sizes for each LULC category were provided in Table 2. The reference dataset then was randomly split into training (70%) and validation (30%) datasets. The training dataset was for developing classifying models, while the validation dataset was used in accuracy assessment (Adam et al. 2014). The rationale for splitting by the 70/30 proportion is the common practice in machine learning (Dobbin and Simon 2011;Vrigazova 2021). In a high-dimensional dataset with 17 features, there is a risk of overfitting by constructing complex classifying models with too many parameters. Therefore, hyperparameter tuning with the number of trees (ntree) and the number of variables at each decision tree (mtry) was conducted to maximize the predictive performance of the RF model. The tuning of ntree decreased the correlation between individual trees (Rodriguez-Galiano et al. 2012), while the tuning of mtry attributes higher importance of fewer variables (Oliveira et al. 2012). We applied K-fold cross-validation with k of 10 and repeats of 3 to determine the optimal ntree and mtry that produce the highest average accuracy. In detail, the repeated cross-validation technique randomly split training data into ten groups, with group 1 as a validation set and groups 2-10 as the training set. The similar processes were repeated three times, each time using a different group as a validation set and the rest of the groups as training sets. We optimized the final RF based on the accuracy of each algorithm assessed by several metrics derived from an error matrix.

Model performance assessment
The final training model was validated using the validation data in a model accuracy assessment. The standard summaries should be reported for the accuracy assessment of the classification model comprising the error matrix, the overall accuracy (OA) (Equation 1), Producer's Accuracy (PA) (Equation 2), User Accuracy (UA) (Equation 3), and the Kappa coefficient (Congalton 1991;Equation 4). The similarities between the samples classified by model and validation data were quantitatively compared in the error matrices. The OA is the proportion of correctly classified samples according to their input labels. However, the Kappa coefficient is a more useful representation of performance as OA tends to overestimate the actual performance (Schulz et al. 2021). The PA is the proportion of correctly classified samples of a particular category, whereas UA is the proportion of the correctly classified samples, and the total number of samples classified into that category (Olofsson et al. 2013). The variable importance is a statistical measure of the weight of each predictor in the optimal RF model, which is used to define the most useful engineering characteristics for automated LULC detection by applying the mean decrease in accuracy (% IncMSE), which measures how much model error increases if information from that variable is removed by randomizing it. We used the R package caret for fitting RF. Figure 2 is the summary of LULC classification using RF. All calculations and visualization were performed in R version 4.1.2 (R Core Team 2021).
OA ¼ Number of correctly classified pixels Number of reference pixels Â 100 (3)

Engineering features
The dispersion of the engineering features of the LULC classes for the selected variables is described in Figure 3. The mean NDVI values of vegetative cover such as forests (0.82), annual crops (0.61), grassland (0.74), and mangrove (0.60) were much higher than the mean NDVI values of water (0.47), bareland (0.55), and built-up (0.54). Besides, forests and grass had much higher topographical indices values than other types of LULC, which implies that they were located in high elevated and high slope areas. Also, the large vertical extension of the boxplots observed in topographical indices values of grass and forest (Figure 3a, b) shows that these two LULC classes had wide landscape distribution ranging from mild hilly to mountainous areas.

Best tune and important variables
The original data was split into ten groups to construct each tree in the repeated crossvalidation progress with 10-fold and 3-repeats. The training observations used to construct trees have denoted the accuracy. Testing observations then can be predicted from the model to evaluate classification accuracy and out-of-bag error (OBB). The optimal number of the tree (ntree ¼ 2500) and the number of terminal nodes (mtry ¼ 5) at each node that produced the highest accuracy were determined (Figure 4a; Table S3). When the number of trees was small, the accuracy of RF was lower (higher OOB error). However, if we added an infinite number of trees, the generalization error did not increase or decrease much, and the larger the number of trees, the more stable classification (Figure 4b). A similar finding was also observed by Jin et al. (2018) and Tatsumi et al. (2015). Though all 15 variables are useful for LULC classification, few were highly important for classification ( Figure 5). From the mean decrease in accuracy (% IncMSE), we reduced three conclusions (1) most of the variables with mean importance of more than 25% in the training data are required for consistent performance of RF, (2) features related to terrain topography are important for accurate LULC classification, (3) the level shift variables and the number of shifts reflecting the dates of crops/rice paddy harvesting and the crops/rice paddy plowing date respectively, had unexpectedly low importance for the classification models that are similar to results found by Schulz et al. (2021).  Figure S4. Visualization of the RF classification results shows good performance and the general LULC structures of the study area. Table 3 shows the land cover distribution (in km 2 ) and the evolution of the area of each land cover class in the periods 2011 to 2015. The largest change can be observed in the land cover classes of rice paddy, water, urban and built-up, and bare land. Urban and built-up areas increased gradually over the  first half of the study period of 2011-2013 (approximately 96 km 2 to 208 km 2 ) but significantly for the latter (to 811.95 km 2 in 2015). Bare land decreased until 2014 (975.69 km 2 ) before a rapid increase in 2015 (1193.56 km 2 ).

Classification performance
Errors matrix (Table 4) shows the accuracy assessment for LULC mapping obtained by the optimal RF model using validation data of 1114 samples. The findings of the accuracy assessment through the RF model confirm the overall good performance of the classification groups considering the eight LULC classes. The OA and Kappa coefficients were 91% and 0.89, respectively. UA and PA of individual classes were approximately 90% on average, respectively. However, UA was lower for water (77.5%), rice paddy (80.4%), and bare land (83.3%). In other words, the classification map missed 22.5%, 19.6%, and 16.7% (class error) of the water, rice paddy, and bare-land, suggesting that the model had a proclivity to misclassify water as bare-land (10/120) and forest (11/120); rice paddy as urban and built-up (13/102); and bare-land as rice paddy (13/120). This high-class error was attributable mainly to their complex phenological characteristics. On the other hand, estimated LULC areas were the most accurate for forest, urban, and built-up due to their distinctive phenological characteristics. Besides, RF overestimated bare land area because other LULC types could be misclassified as "bare land" (Table 4). The accuracy of a LULC classification varied with methods, techniques (Rodriguez-Galiano et al. 2012;Thanh Noi and Kappas 2017;Maxwell et al. 2018;Camargo et al. 2019), LULC classes (Foody 2008;Rwanga and Ndambuki 2017;Leyk et al. 2018;Islam et al. 2018) due to the  influence of atmospheric, surface and illumination variations (Li et al. 2016;Yang et al. 2017). It also fluctuated with the training sample data size (Tatsumi et al. 2015;Jin et al. 2018) which will be discussed in more detail in section 3.4. The comparison of model performance under two classification models (with and without topographical features) is presented in Table 5. In general, RF incorporating NDVI time-series and topography (NDVI þ TOPs) was better than the model only with NDVI engineering features in both OA and Kappa. In detail, the OA and Kappa were 77% and 73%, respectively, if only NDVI engineering features were inputted into the models (NDVIs , Table S1), whereas the OA and Kappa were improved by more than 10% if we added topographical features into models (NDVI þ TOPs). Topographical indices improved the UA of all types of LULC, except for bare-land at least 4% when combined with NDVIs. Especially for mangroves, the UA improved by more than 40%. For urban and built-up lands, the UA of NDVI þ TOPs even increased by about 7.5%. Clearly, topographic data could increase the classification accuracy of LULC maps that are similar to results found by Cai et al. (2011). The geographic data was found to help increase classification accuracy, particularly in mountainous regions where the distribution of LULC types is significantly influenced by elevation, slope, and aspect. Similarly, Wang et al. (2020) found that 3.3% of OA increased when topographic characteristics and EVI were integrated for LULC classification in the Qilian Mountains in Northwest China. In addition, topographic indices exhibited varying effects on the various LULC classes. Wang et al. (2020) discovered that when TOPs were combined with EVIs, the UA of shrublands, grasslands, permanent wetlands, croplands, and permanent snow and ice all increased by at least 2% whereas the UA of permanent wetlands was improved by 16% in particular because these classes were located in particular topographic areas.  Many stakeholders and scientists in Vietnam require the creation of LULC maps that include the most recent acreage data on LULC patterns. Vietnamese Government statistics, however, lag behind by one year, and the official land use maps were only updated at 5-year intervals (Nguyen et al. 2020). Policymakers traditionally used time-consuming approaches that may not accurately reflect the actual situation on the ground by the time the data was issued, such as the census household-based survey or the cadastral ground survey, for determining LULC regions. As a result, the method suggested in this study may be useful for reducing the time and cost required for LULC classification in other regions (L€ ow et al. 2018). Effective LULC monitoring is essential for supporting policy makers with accurate information concerning LULC growing areas, and estimation of change patterns to achieve increased economic development.

Effect of sample size in the training data
Data collection on a large scale for the RF training process is time-consuming in classifying complex areas with many categories. Many previous studies found that the more training input samples, the better information contained in each category (Breiman 2001;Tatsumi et al. 2015;Jin et al. 2018), and the increase in classification accuracy as a result. However, the accuracy reduced in case of the training data were not representative (Ghimire et al. 2010). Therefore, a scheme for training sample selection should be designed to meet the feasibility in time, cost, and acceptable accuracy achievement (Rogan et al. 2008). Table 6 shows the sensitivity of RF performance due to the decrease in training data size. The training data decreased from 5% to 70%, while the OA only reduced by less than 5%. At the threshold of 70% training data reduction, the accuracy was reduced more suddenly to achieve an OA of approximately 70% when the training data was reduced by 95%.
The results show that the RF classification accuracy decreased with the deduction of the size of the training dataset, but not linearly. Since the OA was reduced suddenly after reaching a threshold of 70% in our study, we concluded that RF classification is not sensitive much to training size deduction.

Conclusion
The objective of this study was to combine both time-series NDVI and the topography variables applying RF to map the LULC in Thanh Hoa province, Vietnam. We achieved a total prediction accuracy of 91% for classification and a Kappa coefficient (K) of 0.89 across eight different LULCs. We observed topography that could enhance the LULC classification when it was integrated with NDVIs. In the RF model, the deduction in training data increased in the classification error overall, but not linearly. The reduction in the training data size has no significant effect on the accuracy of the classification until reaching the threshold of 70% data, which indicates that the multivariable RFC is not much sensitive to the decrease of the training set in our case. The model findings may not only directly classify the LULCs in Thanh Hoa provinces but also may be applied in other regions of Vietnam with similar terrain topography and farming patterns to launch into establishing a regional assessment of both current and future changes in LULC.