es7b01210_si_001.pdf (228.08 kB)
Download fileEstimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach
journal contribution
posted on 2017-05-23, 00:00 authored by Xuefei Hu, Jessica H. Belle, Xia Meng, Avani Wildani, Lance A. Waller, Matthew J. Strickland, Yang LiuTo
estimate PM2.5 concentrations, many parametric regression
models have been developed, while nonparametric machine learning algorithms
are used less often and national-scale models are rare. In this paper,
we develop a random forest model incorporating aerosol optical depth
(AOD) data, meteorological fields, and land use variables to estimate
daily 24 h averaged ground-level PM2.5 concentrations over
the conterminous United States in 2011. Random forests are an ensemble
learning method that provides predictions with high accuracy and interpretability.
Our results achieve an overall cross-validation (CV) R2 value of 0.80. Mean prediction error (MPE) and root
mean squared prediction error (RMSPE) for daily predictions are 1.78
and 2.83 μg/m3, respectively, indicating a good agreement
between CV predictions and observations. The prediction accuracy of
our model is similar to those reported in previous studies using neural
networks or regression models on both national and regional scales.
In addition, the incorporation of convolutional layers for land use
terms and nearby PM2.5 measurements increase CV R2 by ∼0.02 and ∼0.06, respectively,
indicating their significant contributions to prediction accuracy.
A pair of different variable importance measures both indicate that
the convolutional layer for nearby PM2.5 measurements and
AOD values are among the most-important predictor variables for the
training process.
History
Usage metrics
Read the peer-reviewed publication
Categories
Keywords
PM 2.5 Concentrations2011. Random forestsAODmost-important predictor variablesR 2 valuePM 2.5 measurementsConterminous United Statesground-level PM 2.5 concentrationsRandom Forest Approachprediction accuracyRMSPEMPEland use termsland use variablesregression modelsestimate PM 2.5 concentrationsconterminous United Statesprediction errorPM 2.5 measurements increase CV R 2