A 0.1° gridded dataset of forest biological hazards in China from 1900 to 2000 based on integrated learning
A gridded dataset of forest biohazard occurrence area in China at 0.1° resolution for the period of 1900-2000 with one period of 20 years and a total of six periods was generated by using superimposed integrated learning and geospatial data of longer time series and Chinese municipal forest biohazard statistics for the period of 2003-2017. Municipal FBHs statistics in China from 2003-2017 were first collected as the dependent variables for model inputs, and 10 proxy variables were used as the independent variables for model inputs. Subsequently, three machine learning algorithms (RF, XGBoost and LightGBM) were selected as the basic models to create and train the stacked integrated learning model. Then, 10 proxy variables for six years, namely 1900, 1920, 1940, 1960, 1980 and 2000, were introduced into the trained integrated learning model to generate a gridded forest biohazard occurrence area dataset (0.1° × 0.1°) for China from 1900 to 2000. Finally, we collected forest biohazard record points from historical literature and reports to compare with them, and the average incidence area grids of forest biohazards in China for the years 1900-1950 and 1950-2000 overlapped with the historical occurrence points by 81.82% and 83.33%, respectively.