A 100-m gridded population dataset of China’s seventh census using ensemble learning and geospatial big data
A 100-m gridded population dataset of China’s seventh census in 2020 was generated by stacking ensemble learning and geospatial big data. The county-level and town-level data of China’s seventh census and ten related covariates at the 100-m resolution were first collected as the input datasets. Three popular machine learning algorithms (i.e., random forest, XGBoost, and LightGBM) were chosen as base models to create and train the stacking ensemble learning to generate gridded population dataset for China.The estimated gridded population dataset (R2=0.8936) is more accurate than existing WorldPop (R2=0.7427) and LandScan (R2=0.7165) products assessed by the town-level test census data. The dataset is associated with the paper of "Yuehong Chen, Congcong Xu, Yong Ge, Xiaoxiang Zhang and Ya'nan Zhou. A 100-m gridded population dataset of China’s seventh census using ensemble learning and big geospatial data, 2024" published in the journal of Earth System Science Data.