Modeling the risk of spread and establishment for Asian longhorned beetle (Anoplophora glabripennis) in Massachusetts from 2008-2009

Abstract Land managers responsible for invasive species removal in the USA require tools to prevent the Asian longhorned beetle (Anoplophora glabripennis) (ALB) from decimating the maple-dominant hardwood forests of Massachusetts and New England. Species distribution models (SDMs) and spread models have been applied individually to predict the invasion distribution and rate of spread, but the combination of both models can increase the accuracy of predictions of species spread over time when habitat suitability is heterogeneous across landscapes. First, a SDM was fit to 2008 ALB presence-only locations. Then, a stratified spread model was generated to measure the probability of spread due to natural and human causes. Finally, the SDM and spread models were combined to evaluate the risk of ALB spread in Central Massachusetts in 2008–2009. The SDM predicted many urban locations in Central Massachusetts as having suitable environments for species establishment. The combined model shows the greatest risk of spread and establishment in suitable locations immediately surrounding the epicentre of the ALB outbreak in Northern Worcester with lower risk areas in suitable locations only accessible through long-range dispersal from access to human transportation networks. The risk map achieved an accuracy of 67% using 2009 ALB locations for model validation. This model framework can effectively provide risk managers with valuable information concerning the timing and spatial extent of spread/establishment risk of ALB and potential strategies needed for effective future risk management efforts.

of invasive forest insects, including emerald ash borer (Anoplophora planipennis), Asian longhorned beetle (ALB) (Anoplophora glabripennis), hemlock woolly adelgid (Adelgis tsugae) and gypsy moth (Lymantria dyspar), are established in the USA and cost as much as $5 billion per year in federal and local government expenditures, household expenditures and property value loss and timber loss (Aukema et al. 2011). Most economic estimates do not account for potential impacts to non-market ecosystem services (Holmes et al. 2009), and impacts of invasive insect spread in forest environments include the reduction of biodiversity (Chornesky et al. 2005), alteration of future species composition (Lovett et al. 2006), changes in microclimate, light conditions and chemical/nutrient cycling (Gandhi & Herms 2010) and the potential susceptibility to future insect invasion Jactel et al. 2005).
ALB is a polyphagous insect native to China and Korea, and is considered one of the 100 worst invasive species in the world by the International Union of Conservation (IUCN) due to its potential to cause serious damage to its host species. While ALB can infest any of over 100 host tree species, Acer, Salix and Ulmus species are particularly vulnerable in introduced ranges, such as in the US North-east (Meng et al. 2015). Of these, the Acer genus is of particular concern for the study area due to both its prevalence throughout the urban and wildland forests of the region and its importance for the timber, maple syrup and tourism industries. Many other locations throughout North America and Europe are also vulnerable to ALB infestation since host species form important, often dominant, parts of the forest ecosystem in these regions (Lowe et al. 2000;MacLeod et al. 2002;Haack et al. 2010). Likely transported to the USA via solid wood packing materials (Bartell & Nair 2003;Haack et al. 2010;Dodds & Orwig 2011), the ALB was first discovered in New York City in 1996, and has subsequently been found in five other states, most likely due to independent introductions in each case. ALB was detected in Worcester, Massachusetts, in 2008Shatz et al. 2013), although due to the size of this infestation, it is likely that the pest had been established in the location for approximately a decade. The United States Department of Agriculture -Animal and Plant Health Inspection Service (USDA-APHIS) has established an 285-km 2 quarantine zone within Central Worcester county and has removed over 34,000 infested and susceptible host trees, the most removed during any ALB outbreak in the USA to date (Dodds & Orwig 2011;USDA 2014). While previous ALB outbreaks have been concentrated primarily in trees along streets in urban areas (Haack et al. 2006;Colunga-Garcia et al. 2010), ALB outbreak in Worcester has spread into closed canopy urban forest stands (Dodds & Orwig 2011), suggesting the susceptibility of mixed-hardwood forests of New England.
Tools are needed to predict the establishment and spread of ALB in Massachusetts to guide risk management objectives of early detection/eradication and educational outreach programmes (Andersen et al., 2004;Liebhold & Tobin 2008;Baxter & Possingham 2011;Shatz et al. 2013). Spread models (also referred to as mechanistic models, see Gallien et al. 2010) and species distribution models (SDMs) are commonly used to understand the ecology and processes of insect invasions at multiple spatial and temporal scales (Hastings et al. 2005;Liebhold & Tobin 2008;Gallien et al. 2010). Spread models generalize and predict dynamics of invasive species dispersal over space and time by fitting empirical data to mathematical functions or by simulating population dynamics collected from field or laboratory experiments (Kot et al. 1996;Gilbert et al. 2004;BenDor et al. 2006;Roques et al. 2008). Spread models are highly data intensive and require extensive species-specific knowledge of dispersal capabilities from either laboratory or field observations (Bancroft & Smith 2005;Hastings et al. 2005). Additionally, mathematical functions of species spread do not explicitly account for landscape structure, and may not predict invasive spread adequately when spatial heterogeneity may influence the spread, establishment and growth of invasive species Hastings et al. 2005).
SDMs derive species-environment relationships from species location data and biophysically relevant predictor variables using a multivariate algorithm in order to project species habitats from environmental space into geographic space (Austin 2002;Franklin 1995;Guisan & Zimmermann 2000;Elith & Leathwick 2009;Miller 2010). SDMs have been employed at regional and global scales to identify suitable habitat for potential insect invaders (Peterson & Vieglais 2001;Roura-Pascual et al. 2004) and at state and local scales to identify potential sites for early detection and eradication efforts Shatz et al. 2013). SDMs require less information than spread models, but are static by nature and do not account for the movement of invasive propagules across the landscape and the timing for successful incidents of establishment (Pitt et al. 2009;Franklin 2010).
Hybrid models that incorporate elements of both spread models and SDMs provide a means to account for spatio-temporal dynamics of invasive species spread with heterogeneous arrangement of suitable habitat in invaded locales (Gallien et al. 2010;Smolik et al. 2010). To date, numerous studies have employed both spread models and SDMs using a hybrid model framework to model the distributions of invasive insect and pathogen species Meentemeyer et al. 2008;Roura-Pascual et al. 2009;Pitt et al. 2009;Václavík & Meentemeyer 2009). Pitt et al. (2009) found that a hybrid model of Linepithema humile in New Zealand, incorporating a spread model parameterized using a mean rate of invasive spread derived from numerous field observation studies coupled with a habitat suitability map, produced more accurate results and better accounted for landscape heterogeneity after initial establishment than spread models alone. Roura-Pascual et al. (2009) used hybrid models to successfully identify both the vector of introduction and rate of spread of L. humile across the Iberian Peninsula using probability of presence outputs from a generalized linear model (GLM) as an input for a metapopulation dispersal model, and a spread model using rates of spread derived from previously conducted field studies of dispersal. Meentemeyer et al. (2008) found that models of Phytophthora ramorum dispersal in California produced by incorporating a dispersal kernel derived from distances between inoculum sites as an input variable in a GLM were more accurate than models using only GLMs or dispersal kernels alone. Vaclavik and Meentemeyer (2009) also found that hybrid models incorporating parametric and non-parametric SDMs (GLMs and classification trees) and dispersal kernels produced more accurate models of P. ramorum than SDMs alone. In an ALB specific study, Peterson et al. (2004) modelled the interaction of dispersal and climate suitability of ALB in the continental USA using an SDM trained from native species presence and a dispersal model, and found that ALB could potentially spread to suitable north-eastern forest habitats within the next 120 years.
While numerous studies have employed hybrid models to better understand the spatio-temporal dynamics of invasive insect and pathogen spread over heterogeneous landscapes (Meentemeyer et al. 2008;Roura-Pascual et al. 2009;Meentemeyer et al. 2008), no study has employed hybrid models to predict the risk of ALB spread and establishment in Massachusetts. This study approaches this gap by: (1) modelling the risk of ALB spread and establishment in Massachusetts after its initial discovery in 2008-2009 using a hybrid model and (2) identifying locations within Massachusetts in need of resources for early detection/eradication and educational outreach to prevent the establishment of satellite ALB populations.

Study area
The study area consists of the state of Massachusetts (Figure 1) which contains 351 townships within 14 counties and encompasses 20,961 km 2 . Average annual temperature is 8 °C and ranges between 4 and 12 °C. The landscape consists of predominantly second-growth mixed temperate forests, which cover over 60% of the state (Foster et al. 2010;MAFoMP unpublished data). Massachusetts forests comprise a variety of hardwood and conifer species, some of which are potential ALB host species. The following genera are common in the study area: oak (Quercus), maple (Acer), poplar (Populus), birch (Betula), ash (Fraxinus), beech (Fagus) and hickory (Carya) genera and conifers in the pine (Pinus), hemlock (Tsuga) and spruce (Picea) genera (Foster 1988).
The ALB infestation within Massachusetts originated in the town of Worcester, and was likely brought via solid wood packing materials from China (Bartell & Nair, 2003;Dodds & Orwig 2011). ALB infestation was originally discovered in neighbourhood street trees in the north of Worcester in August 2008 (Baca et al. 2009). Prior to the discovery of ALB, the city government of Worcester reforested urban areas with susceptible Norway maples (Acer platanoides) following numerous natural and anthropogenic disturbances events in central Massachusetts, including the spread of the chestnut blight in 1912, an ice storm in 1923, a hurricane in 1938, the spread of Dutch elm disease in 1951 and a tornado in 1953 (Herwitz 2001;Harrington et al. 2003). Additionally, forest fragmentation from conversion of forest to urban and sub-urban development has increased the susceptibility of Acer hosts due to increased stresses from edge effects and improved accessibility (With 2004;Hu et al. 2009;Wright et al. 2011).

Species presence data
Species data consisted of 516 presence locations representing host trees with positive ALB infestations identified in 2008. Locations of ALB observations are collected jointly by the USDA-APHIS and the Massachusetts Department of Conservation and Recreation (DCR) through multiple ground surveys using binoculars, tree climbers and aerial lifts USDA-APHIS 2012). ALB locations used for this study were digitized and georeferenced from publically available maps of the Worcester County eradication programme published by the USDA-APHIS. However, official ALB location records containing time of infestation and the number of infested sites were not available from the USDA-APHIS, DCR or the City of Worcester for this study due to issues of privacy with residents affected by the outbreak.

Environmental variables
Environmental variables were chosen based on known biophysical relevance to ALB establishment, reproduction and spread Elith & Leathwick 2009) (Table 1). Variables included a host preferability map, enhanced vegetation index (EVI), distance-to-forest (m) and mean annual temperature (°C) from 2000 to 2012. This study used a host preferability map incorporating a combined measure from 0 to 1 of the abundance and observed preference of host tree species as a measure for assessing site attractiveness and potential carrying capacity of sites for establishment (Prasad et al. 2010). The host abundance maps were produced using the random forest method (Breiman 2001) with the US Forest Service Forest Inventory Analysis (FIA) data and topo-climatic and remotely sensed environmental variables (Prasad et al. 2006). All potential host species were included in this process; however, a threshold of ≥10% prevalence was applied in order to exclude sparsely distributed or otherwise rare host species observations. Therefore, the Aesculus, Salix and Ulmus host genera were excluded. Abundance models were linearly combined using a multi-criteria evaluation using weights determined by host preference inferred from laboratory observations (Bancroft et al. 2002;Ludwig et al. 2002;Jactel et al. 2005). For additional information on the development of host abundance and preference models, see Supplementary Material Sections S1-S4.
EVI is a measurement of vegetation abundance and vigour, calculated using the blue (450-520 nm), red (630-690 nm) and near infrared (760-900 nm) bands from Landsat-7 Enhanced Thematic Mapper Plus data captured September 2000 using the following equation: where ρ are the radiometrically calibrated surface reflectance values, and the coefficients are L = 1, C 1 = 6, C 2 = 7.5 and G = 2.5 (Huete et al. 2002). Vegetation indices such as EVI and the Normalized Difference Vegetation Index have been used for ecological studies and SDMs as surrogate measurements for the presence and abundance of host vegetation (Pettorelli et al. 2011;Bradley et al. 2012), but EVI was chosen because it accounts for canopy structure and tree diversity and is less prone to saturation and background soil effects (Huete 1988;sensu Roura-Pascual et al. 2006;Nightingale et al. 2008).
A distance-to-forest map was produced by calculating the Euclidean distance between study area locations and deciduous and mixed forests derived from land cover maps produced by the Massachusetts Forest Monitoring Programme. Distance-to-forest is included to delineate edge habitats from core forest habitats, where ALB outbreaks are more likely to occur due to the presence of abundant hosts (Williams et al. 2004;Hu et al. 2009), and to account for initial population establishment in more urban locales due to chance introduction from anthropogenic activities (Colunga-Garcia et al., 2010).
A layer of mean annual temperature from 2000 to 2012 was produced using NOAA weather station data (https://www.ncdc.noaa.gov/cdo-web/) and factors that influence temperature at the state-scale including elevation, potential solar insolation, continentality (or distance-to-coasts) and latitude. The final mean temperature map was derived at 30 m resolution using a regression-kriging method as summarized by Ninyerola et al. (2007) (see Vanwalleghem & Meentemeyer 2009). The mean annual temperature variable is employed to delineate locations within acceptable development thresholds for ALB larvae and pupae (10-30 °C) (Keena & Moore 2010). Additionally, mean annual temperature may also affect the dispersal characteristics of ALB, with more ALB dispersing short distances at cooler temperatures and fewer ALB flying greater distances at warmer temperatures (Bancroft & Smith 2005).

Producing SDMs using maximum entropy
A distribution model of ALB was produced with presence locations from 2008 and the environmental variables using a Maximum Entropy (Maxent) model. Maxent has been used widely to predict species distributions for both native and invasive species for numerous conservation and risk management objectives Elith et al. 2011;Merow et al. 2013). Maxent estimates the relative occurrence of a species in space by finding the probability distribution of maximum entropy, or one that is the closest to the study area background, with respect to constrains imposed by species location data (Phillips et al. 2006;Phillips & Dudík 2008). The probability distribution of maximum entropy is defined using pseudo-absences drawn from background locations and relating the probability densities of the underlying covariates with those drawn from species locations. Maxent was employed for this study because it not only relies upon presence-only data (Elith et al. 2011), but it is considered more accurate than the majority of presence-only SDMs currently in use (Elith et al. 2006). The potential for spatial and environmental bias in the sampling of species locations was accounted for by restricting the sampling of pseudo-absence locations within a buffer around all ALB presence locations (Phillips & Dudík 2008). The buffer distance was identified as the distance at which the most significant spatial clustering of ALB locations is present as defined by a Ripley's K analysis with 100 m intervals. While Maxent provides a number of feature transformations of the species response curves, including linear, quadratic, product, threshold and hinge features, to calculate species response curves, only hinge features were employed for this study because they are best at fitting non-linear relationships and producing generalized smooth responses (Elith et al. 2010;Merow et al. 2013). A regularization multiplier of two was used to further prevent over-fitting species response curves. Species-environment relationships derived from presence locations and background samples were projected to the remainder of the study area to assess habitat suitability on a state-wide scale.

Maxent model validation
Prior to model validation, the ALB presence data were partitioned into 10 training/validation data-sets using k-fold cross-validation. The area under the curve (AUC) of the receiver operating characteristic plot was used to measure the discrimination capacity for all 10 data-sets (Pearce and Ferrier, 2000). The AUC is a threshold independent measure of accuracy that compares the rate of true and false positives of validation data across all available habitat suitability thresholds, where values close to 1 indicate perfect discrimination capacity, values close to 0 indicate poor discrimination capacity and values close to 0.5 indicate discrimination capacity no greater than random (Fielding & Bell 1997). The AUC calculation is built into the Maxent software and was thus performed as a procedural component of SDM production (Phillips et al. 2006;Raes and ter Steege 2007). SDM model assessment was evaluated using a Boyce continuous index (BCI) (see Boyce et al. 2002). The BCI first bins the range of SDM model values into b equal interval classes of width w and identifies a window of a given width W (for this study, b = 100, w = SDM MAX /b and W = 10 × b where SDM MAX is the highest suitability value predicted). BCI is a moving window algorithm and calculates the predicted expected (P/E) ratio. The P/E ratio for the window is calculated as the quotient of the proportion of omitted validation locations and the proportion of the background study area, for all classes falling within the window (Hirzel et al. 2006). The window is then shifted (shift = b/100), and the P/E ratio is recalculated for each window until the last possible range of SDM model values is reached (SDM MAX − W/SDM MAX ). The BCI is the Spearman's rank correlation between the mean SDM value of a given class b and the average P/E ratio for all windows containing class b and ranges from −1 to 1. P/E values should ideally be monotonically increasing as suitability increases; thus, significantly high BCI values indicate ideal models with high calibration accuracy, significantly low values indicate models with poor calibration accuracy and BCI values equivalent to 0 indicate calibration accuracy no greater than random (Boyce et al. 2002;Hirzel et al. 2006). The BCI was performed using a custom Python 2.7 script (https://github.com/Ashatz/bcindex_idrpy).

Leptokurtic dispersal model
This study utilized a stratified dispersal model to account for natural and long-distance ALB dispersal. Numerous studies of ALB in both native and invaded habitats fit observed distances travelled by ALB using mark-recapture methods to spread functions and found that the mean dispersal distance per year for ALB individuals is approximately 129 m, with most beetles moving less than 300-500 m from their tree of origin (Bancroft & Smith 2005;Lu 2005). However, ALB individuals can travel farther than 2500 m within a year on rare occasions of long-distance dispersal (Smith et al. 2004). ALB dispersal patterns are characteristic of a fat-tailed leptokurtic dispersal function (Kot et al. 1996;Gilbert et al. 2004) defined as: where x is the distance between a potential infestation site and the nearest infested tree and α is a parameter that affects the steepness of the function. The parameter α is estimated by either finding the best-fit dispersal models to species location data for multiple values of α, or estimating α by fitting dispersal and population records collected from field studies to mathematically based spread models (Smith et al. 2001;Gilbert et al. 2004;Mercader et al. 2009). This leptokurtic spread model is commonly used when modelling invasive species, but effects of long-distance dispersal from rare dispersal events and anthropogenic agents may result in parameter estimates that overestimate the rate of natural species spread (Hastings et al. 2005). To find the value of α such that the mean distance of dispersal for ALB individuals is equivalent to field observations, the dispersal function was converted into a cumulative probability function defined as: The value for α was determined by solving the equation above for α such that F(129) = 0.5. Once parameterized, the leptokurtic dispersal function employed ALB presence locations from 2008 to produce a natural spread model. The probability of long-distance spread was determined by employing the leptokurtic dispersal function on a cost distance image created using ALB presence locations as a source and a Massachusetts 2010 roadways layer as a force allowing for increased dispersal capabilities (Eastman 1989). The spread models were used to determine sites where ALB individuals could potentially move to from their original locations, given the heterogeneity of patches suitable for establishment across the landscape.

Modelling risk of spread and establishment of ALB
The SDM and stratified dispersal models from 2008 ALB locations were combined to produce the final risk model for 2009 spread and establishment. The probability of an ALB individual dispersing from its location of origin and establishing a new population was defined as: where P(t + 1) is the probability of spread and establishment at time t + 1, Psdm(t) is the SDM output at a given location for time t, Pnat(t) is the probability of natural dispersal from a location of origin at time t and Pld(t) is the probability of long-distance dispersal from a location of origin at time t. The hybrid model was designed such that ALB individuals travel via natural or long-distance dispersal, but establishment is constricted by the quality of underlying habitat (Roura-Pascual et al. 2009;Gallien et al. 2010). Potential events of spread and establishment were modelled by subtracting a random number between 0 and 1 from locations in P(t + 1), with modelled events at locations with a value greater than 0. The risk of spread and establishment in 2009 was calculated as the average number of modelled dispersal/establishment events per location after 100 iterations. Risk categories were determined through user-defined intervals based on the classification scheme of Prasad et al. (2010) and (2) include lowest risk (0-0.01), low risk (0.01-0.25), medium risk (0.25-0.5), high risk (0.5-0.75) and extreme risk (0.75-1). Risk models were qualitatively analysed by extracting the mean and standard deviation of suitability, natural spread probability and long-distance spread probability from each risk category. Additionally, the quality of the risk models was assessed by calculating the proportions of 2009 ALB locations (n = 705) predicted within each risk category.

Predicted ALB distribution
The ALB distribution (habitat suitability >0) predicted using 516 presence locations from surveys conducted in 2008 ( Figure 2) encompassed 5617 km 2 or 26% of the state of Massachusetts. Within the quarantine zone, the towns of Worcester, Shrewsbury, Auburn and West Boylston contained areas of high suitability (>0.75). The counties of Worcester, Middlesex, Essex, Hampden and Hampshire contained suitable ALB habitat (>0), while Norfolk, Bristol and Franklin counties contained isolated patches of suitable ALB habitat.
Mean annual temperature and distance-to-forest were the most influential variables to the model (Figure 3). Mean annual temperature and distance-to-forest both return the highest gain when used individually and produce the lowest gain when omitted from model construction for both training and validation data. This indicates that both mean annual temperature and distance-to-forest best fit the training data and are better able to generalize omitted validation data. Host preferability and EVI display the opposite trend and contain both the lowest model gains when used individually and the highest model gains when omitted for both training and validation data. In some cases, Maxent models produced higher testing gains when either host preferability or EVI was omitted compared to the overall model gain employing all variables, suggesting that these two variables may worsen the overall model generalizability. Response curves of the 2008 ALB presence locations are shown in Figure 4(a)-(d). Response curves show high probability of presence (>0.5) at low values of host preferability (0-0.1). Conversely, high habitat suitability was detected for nearly all positive EVI values (0.05-0.99) and increased with increase in distance-to-forest (125-740 m). Conversely, high habitat suitability was detected within a narrow range of 282.2-282.5 °Kelvin (9-9.3 °C, respectively).

Maxent model evaluation
The AUC measured an average value of 0.87 (± 0.025) for all 10 k-fold replications of the Maxent model, indicating good discrimination capacity and generalizability of model predictions to validation locations omitted from model training ( Table 2).
P/E ratio curves ( Figure 5) show the mean and median values and 95% confidence intervals with respect to the average habitat suitability value across all 10 k-fold replications. The overall BCI value as derived from Spearman's rank correlation was 0.97 (± 0.013) and indicates excellent model calibration. A curve that shows a monotonically increasing P/E ratio with habitat suitability indicates the consistent prediction of omitted validation locations within highly suitable areas. The P/E ratio increases sharply from 0 to 11.5 between the suitability classes of 0 and 80 (Suitability = 0-0.68), but plateaus at P/E = 11.5 from Boyce classes 80-100 (Suitability = 0.65-0.85). Narrow confidence intervals for mean and median P/E < 11.5 indicate little model variability between k-fold partitions for locations with modelled suitability less than 0.68; however, confidence intervals are widest at the highest Boyce class of 100, indicating greater model variability at locations of high suitability.

Dispersal model
The cumulative probability function of Equation (3) calculates the area under the probability density function of the leptokurtic dispersal curve for all values α from 0 to 1 at a set distance x = 129 m. The value of α at which F(129) = 0.5 is 0.1478. This value of α was employed to calculate the probability of ALB dispersal for both natural and long-distance spread models (Figure 6(a) and (b)). Natural spread models show only local probability of spread from 2008 ALB locations, with little to no chance of dispersal beyond the quarantine zone within a year. However, the long-distance dispersal model, created via cost-distance analysis of ALB 2008 locations using roadways as a source of friction, indicated that areas along major interstates including I-195 heading north to the towns of Leominister and Fitchburg, I-290 and I-495 heading northeast to Marlborough and Lowell, I-390 heading south towards Connecticut and I-90 heading east to Boston and west to Springfield show moderate probabilities of spread (0.3-0.5).

Risk model production and evaluation
The ALB habitat suitability map was combined with the stratified dispersal model using Equation (4) to produce a model of 2009 spread and establishment risk for ALB (Figure 7). Locations of medium, high and extreme risk, where high habitat suitability values coincided with locations of high dispersal probability, are confined within the quarantine zone and encompass 3.07, 1.54, and 0.89 km 2 , respectively (0.05, 0.03 and 0.02% of suitable habitat). Locations with low risk of dispersal and establishment cover 22.5% of suitable habitat and extend beyond the quarantine zone south into Worcester county following I-390, north into Worcester and Middlesex counties following I-190, I-290 and I-495, and west into Hampden and Hampshire counties following I-90. Areas of least risk are the most extensive and cover the remaining 74% of suitable ALB habitat.
Mean and standard deviation values for habitat suitability, natural spread probability and long-distance spread probability with respect to risk categories are displayed in Table 3. Extreme risk areas had the highest mean suitability (0.70 ± 0.11) with decrease in mean suitability approaching lowest risk areas (0.08 ± 0.1). The mean and standard deviation of natural spread probabilities were greatest at extreme risk areas (0.64 ± 0.29) and reached zero in lower risk areas. The mean and standard deviation of long-distance spread probabilities were also greatest at extreme risk areas (0.83 ± 0.16) and decreased approaching lowest risk areas (0.03 ± 0.03). The standard deviation of suitability increased Table 2. roc and Boyce continuous index values derived from Spearman's rank (r s ) across all 10 k-fold model partitions. the average of 10 partitions for both validation metrics was calculated as the overall roc and r s for the model.  from extreme to lowest risk (0.24 ± 0.17), but both the mean and standard deviation values of natural and long-distance dispersal decreased substantially from medium to low risk areas.
Of the 705 ALB presence locations from 2009, 92 (13%) were located within Extreme risk areas, 191 (27%) were located within high risk areas, 203 (29%) were located within medium risk areas, 194 (28%) were located within low risk areas and 25 (3%) were located within lowest risk areas (Figure 8). Sixty-nine per cent of 2009 reported presence locations were predicted within the medium to extreme risk categories, which encompass a total of 1% of suitable habitat within the study area.

Discussion and conclusion
This study demonstrated the potential for producing models of ALB spread and establishment risk within the state of Massachusetts using a hybrid modelling approach that employs both a SDM and a stratified dispersal model. The SDM produced using Maxent predicted eight counties containing suitable habitat for ALB, five of which contain highly suitable habitat. Mean annual temperature and distance-to-forests were the most important variables for model construction due to good fit between with the training data and good generalizability with validation data omitted from model construction. Species response curves indicated that these highly suitable habitats were located primarily within sub-urban residential environments and contained few suitable hosts native to Massachusetts (host suitability = 0-0.1), wide range of vegetation cover (EVI = 0.05-0.99) and were predominantly greater than 125 m from forested locations. A mean AUC value of 0.87 and a BCI value of 0.97 indicate both good model discrimination and excellent calibration accuracy between model predictions and validation locations omitted from model construction.
The importance of mean annual temperature and distance-to-forests is likely due to the spatial arrangement of the ALB presence locations, which is likely due to the uncertainties of chance introduction and dispersal limitations after initial colonization in addition to potential biases towards urban forests in the sampling effort (Soberón & Peterson 2005;Václavík & Meentemeyer 2011). Presence location data were collected soon after the initial discovery of ALB within Worcester County; thus, the distribution of ALB at this time was more likely influenced by chance introduction from anthropogenic activities as represented by distance-to-forests than the availability of host species Colunga-Garcia et al. 2010). The importance of mean annual temperature likely reflects the broadscale climatic effects that influence physiological thresholds of the ALB activity and reproduction, and determine suitable locations for ALB colonization and spread into novel locales (MacLeod et al. 2002;Peterson et al. 2004;Broennimann et al. 2007;Keena et al. 2010). High influence of temperature may also be due to the overall low variability of temperature at the scale of ALB presence and sampled background locations, allowing for easier model fit and increased generalizability (Weaver et al. 2012).
Both host suitability and EVI were not important and in some cases detrimental to model construction, despite known observations of variable importance from previous studies of ALB in Worcester County (Shmookler et al. 2008;Shatz et al. 2013). Such a result is likely due to the effects of chance introduction, in which the presence of suitable host species plays a diminished role in the establishment  (4), which incorporates natural and long-distance dispersal.
of invasive pest species in urban environments (Soberón & Peterson 2005;Colunga-Garcia et al. 2010;Václavík & Meentemeyer 2011). Another consideration is that the host suitability model only accounts for the abundances of ALB hosts that are native to Massachusetts and did not account for Norway Maple, the predominant tree infested by the ALB when the outbreak was discovered in 2008 (Herwitz 2001;USDA 2012).
The stratified dispersal model produced in this study highlights the potential use of the cumulative distribution function to efficiently parameterize dispersal functions such as the leptokurtic function. Previous studies often rely on species location data collected at later time steps and find the model with parameters that produce the best goodness-of-fit between predicted dispersal models and species data (Gilbert et al. 2004). The presence of long-distance dispersal events, however, may affect parameter estimation, resulting in overestimation of species spread (Hastings et al. 2005;Pitt et al. 2009). The use of the cumulative distribution function eliminates this problem, but the dispersal estimates are dependent upon accurate species dispersal estimates derived from high-quality field-collected data (Smith et al. 2001(Smith et al. , 2004. Additionally, the landscape structure in which field data were collected will have an impact on the dispersal capabilities of the species due to landscape connectivity and species physiology Ellis et al. 2010). Least-cost pathways are considered an effective tool to account for the effects of spatial pattern and heterogeneity on dispersal rate (Gonzales & Gergel 2007). This study demonstrates the use of least-cost pathways for modelling the long-distance dispersal along roadways, but future studies  should additionally employ least-cost pathways for natural dispersal to address the uncertainty of dispersal estimates of ALB over varying land cover types. In addition, this study assumes that natural ALB spread is isotropic; thus, future studies should account for anisotropic agents of ALB spread such as summer wind velocity and direction (Morin et al. 2009;Rogan, personal communication). Finally, future studies should explore the relationship between additional anthropogenic factors that may influence the long-distance dispersal of ALB including human population density, housing density, volume of traffic and the movement of firewood (Prasad et al. 2010;Tobin et al. 2010;Bigsby et al. 2011). The risk model produced in this study identifies medium-extreme risk areas localized to a small area around the site of initial introduction in Northern Worcester, but areas of low risk spread out well beyond the borders of the quarantine zone and Worcester County into Middlesex, Hampden and Hampshire Counties. Mean and standard deviation values of suitability, natural dispersal and long-distance dispersal suggest that medium-extreme risk locations contain suitable habitats that are accessible by both natural and long-distance dispersal, but low risk areas contain suitable locations that can only be accessed by long-distance dispersal. These results have implications on effective risk management strategies with respect to predicted level of risk, such that areas of medium-extreme risk at the infestation epicentre may be effectively managed through intensive early detection/eradication strategies (Prasad et al. 2010), while low risk areas containing potential satellite populations are likely better managed through educational awareness and volunteer/local search efforts (Liebhold & Tobin 2008). The majority of 2009 ALB presence locations were found within the medium-extreme risk areas consisting of suitable locations for population establishment within close travelling distance to previously infested sites. However, nearly one-third of ALB presence locations were found within low risk areas containing suitable locations, but displaced from previously infested sites, suggesting the possibility of population spread and establishment by human-aided long-distance dispersal. Such dispersal events have been confirmed with the discovery of the ALB in Shrewsbury to the east of Worcester in 2009 and the discovery of six ALB-infested trees in Boston in 2010 (DCR 2010). The USDA-APHIS and the Massachusetts Department of Conservation and Recreation have already begun the process of screening low risk areas immediately beyond the quarantine zone using baited traps, ground surveys and tree climbing surveys (USDA 2012), but results from this study suggest that such efforts should be employed beyond Worcester County for future risk management efforts.
Despite the limited time frame of this study from 2008 to 2009, the risk model produced provides a representation of spread dynamics of ALB soon after its initial introduction/discovery in Central Massachusetts. However, the distribution of ALB is likely to change over time as individuals discover new environmental conditions through dispersal events as the species adapts to new environments (Broennimann et al. 2007;Dodds & Orwig 2011;Václavík and Meentemeyer 2011), and as habitats that are environmentally suitable for ALB physiology evolve over time due to global climate change (Mika et al. 2008;Kearney & Porter 2009;Elith et al. 2010). Thus, future studies should examine how areas at risk evolve in tandem with ALB range expansion, using information from the spread of ALB from 2009 to present, which was not available at the moment of this study. Finally, methods from this study should be applied and expanded upon to determine current and future impacts of ALB spread and establishment on timber, maple syrup and ecosystem service resources to the known outbreak in Ohio and potential outbreaks in the New England region.

Disclosure statement
No potential conflict of interest was reported by the authors.