Analysis of the spatial distribution of amyotrophic lateral sclerosis in Virginia

Abstract Objective Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disorder that is usually fatal. Environmental exposures have been posited in the etiology of ALS, but few studies have modeled the spatial risk of ALS over large geographic areas. In this paper, our goal was to analyze the spatial distribution of ALS in Virginia and identify any areas with significantly elevated risk using Virginia ALS Association administrative data. Methods We used Bayesian hierarchical spatial regression models to estimate the relative risk for ALS in Virginia census tracts, adjusting for several covariates posited to be associated with the disease. We used an intrinsic conditional autoregressive prior to allow for spatial correlation in the risk estimates and stabilize estimates over space. Results Considerable variation in ALS risk existed across Virginia, with greater relative risk found in the central and western parts of the state. We identified significantly elevated relative risk in a number of census tracts. In particular, Henrico, Albemarle, and Botetourt counties all contained at least four census tracts with significantly elevated risk. Conclusions We identified several areas with significantly elevated ALS risk across Virginia census tracts. These results can inform future studies of potential environmental triggers for the disease, whose etiology is still being understood.


Introduction
Amyotrophic lateral sclerosis (ALS), a progressive neurodegenerative disorder of the motor neurons, has an estimated global incidence of approximately 1.7 cases per 100,000 person-years with global prevalence estimates of approximately 4.5 per 100,000 person-years, with substantial geographic variation (1,2).The disorder is generally fatal, as most people die within two to five years of diagnosis (3).Although scientific advancements such as the mapping of the human genome have led to some findings regarding the genetic characteristics of ALS heritability, only approximately 10% are familial and another 10% of seemingly sporadic ALS cases have a genetic cause (4), leaving a considerable proportion of ALS etiology unexplained (5)(6)(7).Various risk factors for ALS have been posited, including pesticides (8,9), cyanobacteria such as beta-methylamino-L-alanine (B-MAA) (10)(11)(12)(13), heavy metals and organic solvents (14), military service (12,(15)(16)(17), blood lead levels (18)(19)(20), oxidative stress (21), bodily trauma (22), and smoking (23,24), but findings from these studies have often been inconclusive and do not consistently point to clear drivers of ALS risk.
Given the incomplete understanding of the etiology of ALS, one method to characterize ALS risk and potentially determine environmental exposure factors is to model spatial risk and identify areas of significantly elevated risk.These studies evaluate whether the number of disease cases in certain areas is greater than what would be expected by chance, and the finding of significantly elevated risk areas can suggest the existence of a common source of exposure, risk, or genetic commonality that drives the observed cases of disease (25).Often such studies arise in response to an apparent large number of cases in an area, which may turn out not to be statistically significant in the context of a larger surrounding study area.For example, several studies have found no evidence for excess spatial risk for ALS in New Jersey and southern Massachusetts (26,27).In contrast, some studies have identified geographic clusters of elevated ALS incidence.One conducted in Wisconsin using a spatial modification of chisquared tests of homogeneity and permutation tests found a cluster suspected to be related to cyanobacterial blooms (10) given the region's ecology (25).Another study conducted in the Piedmont region of Italy using Bayesian regression models found an area of elevated risk for ALS that suggested an association with agricultural chemical exposures (28).Additionally, an epidemiologic analysis of thirty years of ALS cases in Denmark using generalized additive models found elevated risk for ALS for both birth and diagnosis addresses near Copenhagen (29).Finally, an analysis of spatial autocorrelation in excess case counts of ALS in the northern New England states of Maine, New Hampshire, and Vermont identified four spatiallydistinct areas of high ALS incidence, all of which were in close proximity to bodies of water with methylmercury contamination and cyanobacteria blooms (10,30,31).To date, no large-scale epidemiologic study investigating spatial risk of ALS utilizing the ALS Association administrative data has been completed (32).Given concern of an apparent increase in rates of ALS cases in parts of Virginia, in this paper we evaluated the spatial distribution of ALS rates using the Virginia ALS Association (ALSA) administrative data over a 21year period.We adjusted for several covariates suspected to be associated with increased risk for ALS and mapped the residual spatial variation in ALS rates to identify regions having greater ALS rates than expected.

Data
We obtained all incident cases of ALS diagnosed in the period 2000-2021 captured in the Virginia ALSA administrative data (n ¼ 1276 cases).This database included a residential location for each diagnosed case.Additionally, we used data from the American Community Survey (ACS) for census-tract level covariates and population counts, taking the five-year estimates ending in 2019 (33).The set of adjustment covariates we primarily used included sex (proportion of males in a tract), median age, and proportion of the population aged 18 years or greater having ever served in the military due to their hypothesized association with ALS in the literature (12,15,16,(34)(35)(36)(37).Additionally, we adjusted for several measures of pesticide exposure on the census tract level.We took data from the National Water-Quality Assessment (NAWQA) Project by the United States Geological Survey (38) and used the tables of estimated annual agricultural pesticide use.For a given census tract we summed the estimated pesticide use in that county for three classes (Herbicides, comprising 2,4-D, Glyphosate, and Terbacil; Insecticides, comprising Carbaryl, Chlorpyrifos, and Permethrin; and Fungicides, comprising Mancozeb, Chlorothalonil, and Captan), drawing a random exposure from a uniform distribution with bounds of the lowest and highest estimates for a given pesticide.We chose the specific pesticides in each class from the list of pesticides having significant associations with ALS risk in a previous nationwide study (9), omitting three pesticides (MCPB, hymexazol, and paraquat) that were not reported in the dataset in Virginia during this time period.We summed these measures over all years available in the data (2000-2017) and converted these scores to quartiles to accommodate uncertainty in the pesticide estimates and manage collinearity between the pesticide classes.We also considered adjusting for county-level estimates of smoking prevalence from the Virginia Department of Health based on the Behavioral Risk Factor Surveillance System, although these were smallarea estimates at a different spatial scale then the other data.This study was approved by the Virginia Commonwealth University Institutional Review Board.

Statistical analysis
We used Bayesian hierarchical regression models to estimate the relative risk of ALS in Virginia census tracts.In the i th tract, we assumed that the count of ALS cases Y i was distributed as Poisson h i E i ð Þ, with relative risk h i and expected count E i , which was calculated as the overall rate of ALS cases in all tracts multiplied by the population of the i th tract.We fit the following four models using different definitions of the relative risk term.In Model 1, log , which included an intercept term b 0 and we assumed that residual variation in ALS diagnoses was not spatially structured, using unstructured random effect u i : Specifically, this model assumed that variation in ALS rates did not exist over space in census tracts in Virginia.In Model 2, log where we assumed that residual variation in ALS diagnoses was only spatially-structured using spatially-structured random effect v i : The spatially-structured random effects smooth the rates of ALS diagnoses across census tracts to produce more reliable ALS rates and identify tracts with rates of ALS that are significantly higher or lower than expected.In Model 3, log , where we assumed that residual variation in ALS diagnoses was due to both spatiallystructured and unstructured factors, using the Besag, York, and Mollie model (39).This model allows for smoothing of ALS rates over space due to an underlying spatial patterning as well as for additional variation unexplained by a spatial structure.In Model 4, we used the best-fitting of the previous three models and added the term X t i Á b to the relative risk term to adjust for a vector of tract-level covariates X t i (median age, proportion of males, proportion with military service, and herbicides, insecticides, and fungicides indices) along with county-level smoking and coefficient vector b: By comparing this model with the best-fitting of the previous three, we were able to evaluate how adjusting for covariates changed the census tracts identified as having significant risk for ALS and see which covariates were associated with ALS incidence in Virginia during the study period.

Model fitting
We fitted all models in the R and WinBUGS software, using the R2WinBUGS package (40)(41)(42).We used Markov chain Monte Carlo (MCMC) methods to generate samples of the joint posterior distribution for parameter inference.We considered a model parameter to have converged if its Gelman-Rubin statistic (43) was less than 1.1, and we compared models using the Deviance Information Criterion (DIC) (44), which provides a value of model fit that penalizes model complexity.Lower DIC values indicate better-fitting models.Finally, we identified census tracts with significantly elevated risk for ALS using exceedance probabilities (45) for the estimated relative risks, which are defined for the i th tract as b Þ and estimate the proportion of the posterior distribution for relative risk h i that exceeds the null value of 1.We considered census tracts with exceedance probabilities of 0.95 or greater to have significantly elevated risk for ALS.We give additional details regarding prior distributions and model fitting choices in the Supplemental Material.

Results
The majority of the cases were male (54.9%), White (70.7%), and with a median age of 64 years (Table 1).Additionally, the median proportion in a census tract having military service was 7.4 percent, and tracts were typically more exposed to herbicides than insecticides and fungicides, though this varied by tract and some tracts were not exposed to any of these three classes of pesticides.The distribution of many pesticides was relatively constant across the period 2000-2017, with slight increases over time for 2,4-D and Glyphosate, slight decreases over time for Carbaryl and Chlorpyrifos, and dramatic decreases for Terbacil in the last two years of the period.The model with spatially-structured random effects (Model 2, DIC 4008) fit the data better than the model with only unstructured (Model 1, DIC 4047) and both spatially-structured and unstructured random effects (Model 3, DIC 4032, Table 2), implying that beyond the spatially-structured variation in ALS rates, there was little evidence for additional unstructured variation.The residual spatial variation in ALS rates from Model 2 shows elevated relative risks for ALS in the western and north western regions of the state (Figure 1), as well as 54 census tracts with significantly elevated risk, many of which are in these regions (Figure 2).These areas indicate spatial clusters of a greater number of ALS cases in an area than would be expected by the population and covariates.However, adding tract-level covariates (Model 4) significantly improved the fit of the model (DIC 3944), and therefore we based inference on the results from Model 4. In Model 4, an increase of one year in the median age in a census tract was associated with a 3.5% increase in relative risk of ALS, and a five percent increase in the proportion of a census tract population having served in the military was associated with a 39.8% increase in relative risk of ALS (Table 3).Additionally, an increase of one quartile in fungicide exposure (including Mancozeb, Chlorothalonil, and Captan) was associated with a 14.4% increase in relative risk of ALS.The age (95% CI (1.025, 1.047)), military service (95% CI (1.216, 1.590)), and fungicide terms (95% CI (1.033, 1.270)) were statistically significant, and the sex (male proportion), herbicides, and insecticides terms were not statistically significant.
The estimated relative risks for ALS varied considerably across census tracts in Virginia based on Model 4 (Figure 3).Much of the southern part of the state and the Eastern Shore are characterized by relative risks very close to the null value of 1, indicating rates of ALS diagnoses very close to what would be expected if relative risk did not exhibit spatial variation.In contrast, a region in central Virginia surrounding the city of Richmond has elevated relative risk of 1.5-2.5 for several tracts in Henrico, Chesterfield, and Powhatan counties.Additionally, an area in southern Botetourt County near Roanoke has elevated relative risk of 3.75, and many tracts north and west of this area extending to the western border of the state have relative risks of 2-3.5.Finally, several tracts in the north western portion of the state as well as in King George County and the Virginia Beach metropolitan area have relative risks of 1.5-Spatial Distribution of ALS 671 2.5.Similarly, these regions constitute greater ALS rates in a spatial area than would be expected by the population and covariates.
Forty-three census tracts in Virginia had significantly elevated spatial risk for ALS in Model 4. This is eleven fewer tracts than Model 2 identified as being significantly elevated (Figure 2), which suggests that adjusting for covariates explained ALS rates in some tracts as the spatial patterning in significantly elevated areas is very similar between these two models.Henrico, Albemarle, Botetourt, and Rockingham counties had the greatest number of census tracts with significantly elevated relative risk for ALS (8, 8, 4, and 4 tracts, respectively, shown in Figure 4).Additionally, Charlottesville city and Chesterfield County had three tracts with significantly elevated relative risk.Three counties (Augusta, Fluvanna, and Warren) had two tracts with significantly elevated risk.city) had one tract with significantly elevated relative risk for ALS.Contiguous census tracts with significantly elevated risk in Albemarle, Botetourt, Rockingham, and Augusta counties constituted spatial clusters of elevated risk for ALS.

Discussion and conclusion
In this study, we estimated the relative risk for ALS across census tracts in Virginia using 21 years of ALS diagnoses from ALS Association administrative data with Bayesian hierarchical spatial regression models.We found considerable variation in relative risks for ALS across the state, with clusters of significantly elevated risk in Charlottesville/Albemarle County, Botetourt County, Rockingham County, and Augusta County.These elevated relative risks remained after adjusting for several census tract-level covariates including median age, proportion of males, proportion having military service, and indices of herbicides, insecticides, and fungicides exposures.
Our study is the first of its kind to utilize ALS Association administrative data, and its findings will guide further study of potential environmental triggers in the development of ALS including neurotoxins, heavy metals, and organic solvents.
The associations of several covariates in our study with ALS risk provides further evidence for   Spatial Distribution of ALS 673 these relationships.Notably, we identified a significant association between proportion in a census tract having military service and ALS risk.The supports a previous prospective study that assessed the association between military service and rates of death from ALS, finding significantly elevated risk for ALS mortality among those who served in the military (15).This trend held for service in many branches of the military, including Army, Navy, Air Force, and Coast Guard, as well as time period served, as the follow-up period for this study was approximately ten years.While the specific exposures experienced during military service that drive such an increased risk are more difficult to pinpoint, various exposures-including traumatic injury (46)(47)(48), intense physical exertion (49,50), and lead from firing weaponry (19,20,51)-have been posited, and future research could investigate these exposure-specific hypotheses when analyzing the relationship between military service and ALS.Additionally, we found that an index of fungicides exposure, including Mancozeb, Chlorothalonil, and Captan, was significantly associated with ALS risk.In the United States, Mancozeb is commonly used in agricultural and other commercial applications (9), and has been shown to have neurotoxic effects and disrupt mitochondrial processes (52,53).Maps of county-level pesticide use in the time period 2002-2012 (9,38) illustrate elevated rates of Chlorothalonil exposure throughout the nation's agricultural "Black Belt" in the southeastern part of the country, extending into far southern Virginia, and elevated rates of Captan exposure in western Virginia in the Appalachian Mountains.Captan has been shown to induce chromosomal aberration and DNA adducts (54) and has been linked to breast cancer through occupational exposures (55,56).Finally, the significant association we identified between census tract median age and ALS rates supports the contention that ALS is rare in young populations and risk increases markedly with age after approximately 40 years (57).
Our study has several strengths, the first of which is its use of the ALS Association administrative data for ALS cases seen in Virginia over the past 21 years.This dataset from the ALS Association has been estimated to capture 95 percent of ALS cases (58), which is a much greater case ascertainment rate than that of the national ALS registry (59-61).This data source allowed us to draw upon many years of ALS diagnoses and overcome some of the issues associated with performing analyses of rare diseases such as this one.Further, the demographics of our sample approximated those found in a recent study of the prevalence of ALS in the United States (62).Secondly, we used a Bayesian spatial regression model with a prior for the random effects that allowed the borrowing of strength across neighboring census tracts and smoothing of risk estimates over space that more realistically depicts the smooth distribution of disease risk.By estimating the parameters within the random effects, we allowed the data to inform the degree of spatially-structured and unstructured variation present in ALS rates.Additionally, our use of a regression framework to estimate relative risk for ALS allowed for adjustment for tract-level covariates that are likely to be associated with risk for ALS, in contrast to some other methods that cannot directly accommodate adjustment covariates such as the local spatial scan (63) or chisquared tests (25).In particular, the significance of census tract median age, proportion with military service, and fungicides exposure support the adjustment for these covariates in the model.Thirdly, our adoption of the Bayesian framework allowed for straightforward identification of areas of significantly elevated risk through calculating exceedance probabilities, which provide an intuitive estimate of the probability of excess spatial risk associated with each census tract.The strengths of our study should be considered alongside its limitations.First, it is possible that the address of ALS cases at diagnosis may not represent their true locations of exposure.In the absence of complete residential histories for each case, we assumed location at diagnosis to be a relevant proxy for exposure location.Future research that leverages residential histories for ALS cases may be able to identify more precise areas of elevated spatial risk where exposures occurred at an appropriate latency period (64).Second, in estimating the expected number of cases in each census tract, we used population counts derived from 2015-2019 population data.It is possible that for regions that have experienced considerable population growth or decline over the study period, their estimated expected number of cases may have been biased by using population data from this time range.Third, we did not adjust for other socio-economic factors that may be associated with risk for ALS.Fourth, we did not have access to genetic data for ALS cases in our study, which could possibly have indicated some genetic component or clustering of cases in our sample.Fifth, because ALS is a non-notifiable disease in the United States, it is possible that not all new ALS cases were reflected in our sample.However, our number of cases is reasonable given the population distribution and temporal extent of our study, and assuming that no geographic factors were associated with reporting to the Virginia ALS Association would imply little change in the spatial findings.Finally, it was not possible for our study to establish causality due to its retrospective and observational nature.
In conclusion, we have performed the first large-scale spatial analysis of ALS using ALS Association administrative data, focusing on the state of Virginia.We identified several clusters of significantly elevated risk for ALS that should inform future studies of their environmental causes.Identification of heightened exposure to environmental triggers in these areas can provide a better understanding of the etiology of ALS, the causes of which are still being understood.Additional research utilizing ALS registry data is needed to characterize the spatial distribution of this fatal neurodegenerative disease in other geographic settings and in populations with different demographic characteristics.

Figure 1 .
Figure 1.Relative risks for ALS for census tracts in Virginia based on Model 2.

Figure 2 .
Figure 2. Census tracts with significantly elevated risk for ALS in Virginia based on Model 2, with significance determined using 95 percent exceedance probabilities.

Figure 3 .
Figure 3. Relative risks for ALS for census tracts in Virginia based on Model 4.

Figure 4 .
Figure 4. Census tracts with significantly elevated risk for ALS in Virginia based on Model 4, with significance determined using 95 percent exceedance probabilities.

Table 1 .
Summary of ALS case demographics and census-tract covariates.
Finally, seven counties/cities (Bedford County, King George County, Montgomery County, Powhatan County, Rockbridge County, Norfolk city, and Waynesboro

Table 2 .
Summary of Bayesian model fit.
Note: DIC stands for Deviance Information Criterion, where lower scores indicate a better-fitting model.

Table 3 .
Summary of estimated relative risks for covariates in final Model 4.
(1.033, 1.270) Ã Note: Quantities in the table denote relative risk estimates.Asterisks in the table denote significance.