Appendix C. Visualization techniques for random forests.
This appendix contains some technical details concerning partial dependence plots and information about additional visualization techniques for random forests.
MULTIDIMENSIONAL SCALING PLOTS FOR CLASSIFICATION RESULTS
With regard to the data for the cavity nesting birds in the Uinta Mountains, Utah, USA, a natural question to ask is whether there are any differences on the measured stand characteristics among the nest sites for the three bird species. An RF classification of the nest sites produced correct classification rates at the chance level (i.e., very low), and we sought to understand why this was the case through graphical summaries. RF produces measures of similarity of data points called proximities (see Appendix A). The matrix of proximities is symmetric with all entries taking values between 0 and 1. The value (1 – proximity between points j and k) is a measure of the distance between these points. A bivariate, metric, multidimensional scaling (MDS) plot is a scatter plot of the values of the data points on the first two principal components of the distance matrix (the matrix of 1 – proximities). Using the RF classification for the combined data on the cavity nesting birds, we constructed an MDS plot (Fig. C1). Note that the nest sites for the three species are completely intermingled, showing that it is not possible to separate the nest sites for the different species on the basis of the measured stand characteristics. The nest sites for all the species are fairly well separated from the non-nest sites, which explains why the classification accuracies for nest sites versus non-nest sites were high. Plots of pairs of measured stand characteristics—including the two stand characteristics that RF identifies as most important to the classification—do not show such a clear separation of the nest and non-nest sites.
FIG. C1. Random forest-based multi-dimensional scaling plot of non-nest vs. nest sites for three species of cavity nesting birds in the Uinta Mountain, Utah, USA. Non-nest sites are labeled "N". Nest sites are coded "S" for Sphyrapicus nuchalis, "C" for Parus gambeli, and "F" for Colaptes auratus. |
PARTIAL DEPENDENCE PLOTS
Partial dependence plots (Hastie et al. 2001; Friedman 2001) are tools for visualizing the effects of small numbers of variables on the predictions of “blackbox” classification and regression tools, including RF, boosted trees, support vector machines, and artificial neural networks. In general, a regression or classification function, f, will depend on many predictor variables. We may write f(X) = f(X1, X2, X3, … Xs), where X = (X1, X2, …, Xs) are the predictor variables. The partial dependence of the function f on the variable Xj is the expectation of f with respect to all the variables except Xj. That is, if X(-j) denotes all the variables except Xj, the partial dependence of f on Xj is given by fj(Xj) = EX(-j) [ f(X)]. In practice we estimate this expectation by fixing the values of Xj, and averaging the prediction function over all the combinations of observed values of the other predictors in the data set. This process requires prediction from the entire dataset for each value of Xj in the training data. In the R implementation of partial dependence plots for RF (Liaw and Wiener 2002), instead of using the values of the variable Xj in the training data set, the partialPlot function uses an equally spaced grid of values over the range of Xj in the training data, and the user gets to specify how many points are in the grid. This feature can be very helpful with large data sets where the number of values of Xj may be large.
The partial dependence for two variable, say Xj and Xl, is defined as the conditional expectation the function f(X) with respect to all variables except Xj and Xl. Partial dependence plots for two predictor variables are perspective (three-dimensional) plots (see Fig. 4 in the main article). Even with moderate sample sizes (5,000–10,000), such as the Lava Beds NM invasive plants data, bivariate partial dependence plots can be very computationally intensive.
In classification problems with, say, K classes, there is a separate response function for each class. Letting pk(X) be the probability of membership in the kth class given the predictors.
X = (X1, X2, X3, …, Xs), the kth response function is given by
fk(X) = log pk(X) - j log pj(X) /K
(Hastie et al. 2001, Liaw and Wiener 2002). For the case when K = 2, if p denotes the probability of “success” (i.e., presence, in species distribution models), the above expression reduces to
f(X) = 0.5 log( p(X)/(1- p(X)) = 0.5 logit( p(X)).
Thus, the scale on the vertical axis of Figs. 2–4 is a half of the logit of probability of presence.
REAL-TIME 3D GRAPHICS WITH rgl
Bivariate partial dependence plots are an excellent way to visualize interactions between two predictor variables, but choosing exactly the correct viewing angle to see the interaction can be quite an art. The rgl real-time 3D graphics driver in R (Adler and Murdoch 2007) allows one to take a 3D plot and spin it in three dimensions using the computer mouse. In a matter of seconds one can view a three-dimensional plot from literally hundreds of angles, and finding the “best” perspective to view the interaction between two variables is quick and easy. Figure C2 is a screen snapshot of an rgl 3D plot for the cavity nesting birds data, using the same variables as in Fig. 4 of the main article.
FIG. C2. Screen snapshot of 3D rgl partial dependence plot for variables NumTree3to6in and NumTree9to15in. Nest site data for three species of cavity nesting birds collected in the Uinta Mountains, Utah, USA. |
LITERATURE CITED
Adler, D., and D. Murdoch. 2007. rgl: 3D visualization device system (openGL). R package version 0.71. URL http://rgl.neoscientists.org
Hastie, T. J., R. J. Tibshirani, and J. H. Friedman. 2001. The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics, New York, New York, USA.
Friedman, J. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics 29(5):1189–1232.
Liaw, A., and M. Wiener. 2002. Classification and Regression by randomForest. R News: The Newsletter of the R Project (http://cran.r-project.org/doc/Rnews/) 2(3):18–22.