An Analysis of an Incomplete Marked Point Pattern of Heat-Related 911 Calls

We analyze an incomplete marked point pattern of heat-related 911 calls between the years 2006–2010 in Houston, TX, to primarily investigate conditions that are associated with increased vulnerability to heat-related morbidity and, secondarily, build a statistical model that can be used as a public health tool to predict the volume of 911 calls given a time frame and heat exposure. We model the calls as arising from a nonhomogenous Cox process with unknown intensity measure. By using the kernel convolution construction of a Gaussian process, the intensity surface is modeled using a low-dimensional representation and properly adheres to circular domain constraints. We account for the incomplete observations by marginalizing the joint intensity measure over the domain of the missing marks and also demonstrate model based imputation. We find that spatial regions of high risk for heat-related 911 calls are temporally dynamic with the highest risk occurring in urban areas during the day. We also find that elderly populations have an increased probability of calling 911 with heat-related issues than younger populations. Finally, the age of individuals and hour of the day with the highest intensity of heat-related 911 calls varies by race/ethnicity. Supplementary materials are included with this article.


Research Motivation
During the summer of 2011, Houston experienced one of the warmest, and driest, summers on record. The summer of 2011 in Houston saw the hottest day in city history (August 27), 27 daily record highs tied or broken, 33 daily high minimums, the highest temperature ever recorded for June, the earliest 38 • C (100 • F) day, and the most days of greater than 41 • C (105 • F) temperatures in a year, to name only a few (see http: //www.srh.noaa.gov/hgx/?n=Summer2011Records). Extreme heat scenarios such as these pose a threat to public health. For example, various studies have correlated extreme heat to increases in mortality (Peng et al. 2011;Heaton et al. 2014), emergency room visits (Rusticucci and Bettolli 2002), and other diseases (Li et al. 2012).
Given the potential negative consequences of high heat, this study uses 911 call data in Houston to understand demographics, locations, and temperatures that make individuals vulnerable to a heat-related illness. Ultimately, by understanding these conditions, demographics and locations, appropriate intervention strategies (e.g., education programs, heat advisories, construction of cooling centers, etc.) can be implemented to prepare for such situations and, we hope, reduce the occurrence of potential negative outcomes.
This work is a collaborative part of the System for Integrated Modeling of Metropolitan Extreme Heat Risk (SIM-MER) project at the National Center for Atmospheric Research (http://www.rap.ucar.edu/projects/simmer/). A primary goal of SIMMER is to determine the combined impact of extreme heat and the characteristics of urban environmental and social systems on human health. As part of this goal, this study characterizes the environmental and social impact of extreme heat on the volume of heat-related 911 calls.

911 Call and Meteorological Data
The data considered in this analysis consist of 2054 emergency calls to 911 classified under at least one of the terms "heat," "dehydration," "hyperthermia," "heat exhaustion," and "heat stroke." Of the 2054 calls, 1389 were complete cases where the Houston Fire Department recorded the date, exact latitude and longitude coordinates to where the ambulance was sent, time the ambulance departed to the scene, and the age, race/ethnicity, and gender of the caller. The remaining 665 calls had some missing data: 22% were missing the time of departure and 11%, 14%, and 13% were missing the age, race/ethnicity, and gender of the caller, respectively. All calls contained the spatial location and date of the call.
The primary goal of this analysis is to understand meteorological conditions that are associated with increased vulnerability to a heat-related illness requiring emergency medical attention. However, because observational temperature data at the specific time and location of a heat-related 911 call are not available, we obtained meteorology data for Houston using an offline version of the Noah Land Surface Model (Noah LSM), known as the High-Resolution Land Data Assimilation System (HRL-DAS; Chen et al. 2007) constrained by observational data. The motivation for using the simulated weather data from HRL-DAS rather than using observed temperatures is because there are a total of 15,625 HRLDAS 1-km grid boxes over greater Houston (a 125 × 125 model domain), whereas there are only about 20-30 meteorological stations scattered throughout the spatial domain. HRLDAS uses reanalysis data as forcing conditions (see, e.g., Cosgrove et al. 2003) and is coupled to an Urban Canopy Model (UCM; Kusaka et al. 2001) for the main purpose of providing more accurate simulations over urban regions.
For validation purposes, the HRLDAS output was compared to meteorological data from a network of weather stations installed during the summers of 2005 and 2006 for the Second Texas Air Quality Experiment (Parrish et al. 2009) as well as to Moderate Resolution Imaging Spectroradiometer (MODIS) remote sensing data and was found to accurately reproduce the diurnal cycle and spatial heterogeneity of near-surface air temperatures across Houston (see Hu et al. 2014;Monaghan et al. 2014, for details). Hourly temperatures simulated with HRLDAS were bilinearly interpolated to the latitude and longitude coordinates at the nearest hour to each 911 call. Because HRLDAS output was hourly, we rounded the time of ambulance departure to the nearest hour. Also, because 22% of the calls were missing the time of departure, we treat these 22% of observations as also missing the temperature at the time of emergency.

Methodological Overview and Contributions
Our primary objective in analyzing the described 911 call data is to investigate the trends in 911 call data and identify weather conditions, spatial locations, times of day, etc. that are associated with heat-related 911 calls. We emphasize that we are primarily interested in investigating conditions (e.g., spatial locations, temperatures, etc.) that are associated with an increased likelihood of calling 911 for a heat-related health issue. That is, this analysis focuses on the relationship between, for example, temperature and heat-related 911 calls rather than call volume. In this respect, this analysis is similar to vulnerability analyses of Dolney and Sheridan (2006), Golden et al. (2008), Bassil et al. (2011), andUejio et al. (2011). A deeper understanding of trends in the 911 call data will, for example, allow officials to locate spatial regions and temperatures associated with an increased risk of a 911 call.
A secondary goal of this analysis is to develop a public health data-based tool to predict the number, spatial locations, and time of incidence of heat-related 911 calls. By building a prediction tool, we can predict 911 call volume given a weather forecast and appropriately give advisories or enact other intervention strategies. Broadly, both of these goals are considered to be under the umbrella of public health surveillance (Brookmeyer and Stroup 2003;Lawson and Kleinman 2005) for heat-related morbidity (Kovats and Hajat 2008;Hajat and Kosatky 2010;Wilhelmi, de Sherbinin, and Hayden 2012).
To accomplish these goals, our strategy is to develop a statistical model for 911 calls. One common approach to modeling such data is to aggregate (count) the number of calls within an areal unit and use a Poisson spatial autoregressive model, say, for the spatial structure (see, e.g., Besag 1974;Waller et al. 1997). However, given the intricacies of the 911 call data, such an approach would result in some information loss. For example, aggregating the 911 calls in this manner, while still likely to produce interesting inferences, would forfeit the information in the 911 call data at sub-areal unit spatial scales and, effectively, discretize a continuous spatial domain. Notably, the degree of information loss could be controlled based on the resolution of the areal units but our preference is to treat the spatial domain as continuous.
Our strategy is to statistically model the 911 calls described in Section 1.2 as a marked point pattern arising from a nonhomogenous Poisson process (NHPP; Diggle 2003, Baddeley 2010. Let z ∈ Z denote a vector containing the spatio-temporal and marked location of a 911 call. The intensity surface of a NHPP, (z), is a nonnegative function and determines the number and marked location of the 911 calls. That is, the number of 911 calls over the domain Z follows a Poisson distribution with mean Z (z)d z and the corresponding locations (or marks) are determined by the normalized intensity (z)/( Z (z)d z). The inherent advantage of our point process approach is that we appropriately treat the spatial domain as continuous over the city of Houston and, hence, avoid the issues related to aggregation mentioned above.
Under our NHPP approach, the intensity surface (z) is the primary parameter of interest that we wish to learn from the data. That is, by estimating (z) we will be able to (i) perform inferences for locations and times that are associated with increased public health vulnerability and (ii) make predictions of heat-related 911 calls for a given temperature scenario. Specifying a valid model for (z) over the domain Z is challenging for a few reasons. For example, (i) the day of the year and hour of the day are circular variables and (ii) the spatial features (e.g., urban canyons, rivers, etc.) of the city of Houston are intricate requiring a high dimensional spatial model. To deal with both of these issues, we induce a log-Gaussian process for (z) using circular kernels in a discrete kernel convolution framework (Higdon 2002).
A particular challenge for analyzing the 911 calls is the presence of incomplete observations (i.e., missing marks). A complete case analysis would reduce the number of observations by 32% which would severely affect our estimate of the total number of heat-related 911 calls (thereby, biasing predictions). Rather than discarding the information in the incomplete cases, we include the incomplete cases by marginalizing (z) over the domain of the missing marks and, subsequently perform model-based imputation for the missing marks via compositional sampling.
This article is outlined as follows. Section 2 defines a hierarchical NHPP model for the 911 call data and details our statistical modeling strategy. Section 3 discusses the methods for estimating (z) including imputation of missing marks. Section 4 discusses results and Section 5 concludes. Supplementary material for this article is available online.

A NONHOMOGENEOUS POISSON PROCESS
MODEL FOR 911 CALLS

Data Model
Let the points {z i } denote a realization of a Poisson process on the space Z with an unknown and nonuniform intensity surface (z). We factor (z) as where w(·) is a known nonnegative weighting measure that captures features of the population (e.g., population density, age distribution, etc.), λ(·) is an unknown nonnegative density with respect to w(·) such that Z λ(z)w(z)d z = 1 and models deviations of the intensity surface (z) from the known features of the population w(z). Finally, δ is an unknown constant controlling the expected number of 911 calls. From (1), we emphasize the following. First, by construction, the expected number of 911 calls over the entire domain Z is given by Z (z)d z = δ Z λ(z)w(z)d z = δ where we use the integral notation loosely to denote integrating over continuous and discrete domains. In this way, δ controls the total number of 911 calls over Z. Second, λ(z) captures deviations of the intensity surface from the known population characteristics given by w(z). Notice if λ(z) ≡ 1 for all z ∈ Z, then (z) = δw(z) such that the intensity surface follows the characteristics of the population (e.g., the highest volume of calls will occur in areas of high population density). In this way, λ(z) represents the "relative-risk" of z with respect to the population characteristics.
For the point pattern of heat-related 911 calls described above, let z i = (s i , t i , x i ) where the vector s i denotes the spatial location of the ith 911 call for i = 1, . . . , 2054, t i = (y, d, h) i denotes the temporal location (i.e., the year, y, day of the year, d where leap days are excluded, and hour of the day, h) of the ith 911 call, and x i = (a, g, e, C) i denotes the attribute vector of the ith 911 call with attributes age (a), gender (g), race/ethnicity (e), and temperature given in degrees Celsius (C). We treat {s i } as a point pattern over the spatial domain S, {t i } as a point pattern over the temporal domain T and {x i } as a point pattern over the attribute space X . The spatial domain S is taken to be the Houston city limits displayed in Figure 1 The upper limit for the age set A and both upper and lower limits for the set C were chosen as reasonable end points. For example, the bounds chosen for C reflect record temperatures where the lowest temperature ever recorded in Houston is −15 • C on January 23, 1930 and the highest temperature ever recorded was 43 • C on August 27, 2011.
We define the weighting measure w(z) to reflect a "null hypothesis" of no change in 911 call volume from the population characteristics of Houston (Best, Ickstadt, and Wolpert 2000). Details on how we specified w(z) are given in the supplementary material but we highlight the primary features here. First, we assign uniform weights to all years (y), days of the year (d), and hours of the day (h) to reflect a "null hypothesis" that 911 calls for heat-related illness can occur at any time. Because at the time of the 2010 U.S. Census, Houston was 50.2% male and 49.8% female, we assume equal weights for male and female callers. We chose the spatial weights to vary by ethnicity where the weight of location s is the number of people (in thousands) of race/ethnicity e located in the census block group in which s resides (population numbers were obtained from the U.S. 2010 Census). We chose weights for temperatures to reflect the seasonal temperature changes in Houston. Obtaining data from the National Climatic Data Center (NCDC), the weight of temperature C on day d is set be the height of a Gaussian density function with mean equal to the 30-year average, average temperature on day d and standard deviation equal to 1/2 the 30-year average diurnal temperature range. Finally, the weights for age and race/ethnicity were set according to the estimated proportions from the 2010 U.S. Census.
For the deviation surface λ(z), we let where each of λ HS (s, h | e), λ ADC (a, d, C | e), λ G (g | e), and λ E (e) describe deviations in (z) from the respective population weights described in w(z). For example, λ HS (s, h | e) > 1 indicates that the intensity of 911 calls at spatial location s and hour h for race/ethnicity e is higher than population characteristics dictate (i.e., more 911 calls occur at (s, h) than expected given the population demographics). The form for λ(z) in (2) was formulated primarily from the exploratory analysis of the 911 call data given in the supplementary materials. While we emphasize the following key features of λ(z) in (2) here, we refer to the supplementary materials for detailed justification. First, because year-to-year trends in the volume of 911 calls are not of primary interest for this study, we assumed λ(z) is not a function of y (i.e., (z) is uniform with respect to year). Second, Equation (2) allows the intensity surface of every other mark to vary by race/ethnicity. This is reflected by the conditioning on e in (2) and, intuitively, means that race/ethnicity drives the spatial location and mark of the 911 call. Third, conditional on race/ethnicity, (h, s), (a, d, C), and g are modeled separately. In other words, we assume that there are no interactions between (h, s), (a, d, C), and g conditional on e in the intensity function.

Process Model
We desire a computationally tractable, yet flexible modeling strategy for each of λ HS and λ ADC . Here, we chose to rely on the discrete process convolution (DPC) approach of Higdon (2002) to construct a log-Gaussian Cox process model (Brix and Diggle 2001;Isham 2010) for the 911 calls. The DPC approach of Higdon (2002) assumes the basis function expansions, Using the DPC approach, we can easily constrain λ HS (s, h | e) and λ ADC (a, d, C | e) to circular domains through proper choice of kernel functions. To do so, we assume the kernel function K HS,b (h, s | e) to be the product of a Gaussian and von Mises kernel. Specifically, we assume, where {s b , h b } b are knot locations for the basis functions and σ 2 Se and σ 2 He are smoothing parameters estimated from the data. Note that (5) appropriately constrains the process with respect to the circular domain of h (hour of the day) through the use of the von Mises kernel (the second exponent term on the right-hand-side of (5)). That is, (5) appropriately correlates the intensity surface at hour h = 0 with hour h = 23. The knot locations {s , h } b were taken to be an equidistant grid over the spatial and hour of the day product space S × H. We defined the kernel basis functions K ADC,b in a similar fashion as Equation (5) where a Gaussian kernel was used for the age and temperature domains and a von Mises kernel was used for the day of the year domain. We refer to the supplementary materials for the technical details.
The specification in (3) and (4) allows the surfaces λ HS and λ ADC to be completely determined by a finite set of coefficients. This feature allows tractable computations in likelihood evaluations. We chose to use B HS = B ADC = 1500 which resulted in a sufficiently flexible model that is computationally feasible to work with.

Hyperprior Layer
Define η and γ to be the (B HS × 4) × 1 and (B ADC × 4) × 1 vectors of all η and γ coefficients from (3) and (4). The hyperpriors for η and γ under the DPC construction assumes η ∼ N (0, σ 2 η (R Eη ⊗ I B HS )) and γ ∼ N (0, σ 2 γ (R Eγ ⊗ I B ADC )) where I n is the rank n identity matrix and R is a correlation matrix. Notice that, under this hyperprior, the coefficients are allowed to be correlated so as to borrow information across race/ethnicity.
For {λ G (g | e) : g ∈ G, e ∈ E} we assume λ G (M | e) ∼ U(0, 2) independently for all e such that, a priori, E(λ G (M | e)) = 1 representing our null hypothesis that the intensity measure follow population characteristics. To induce a hyperprior for λ E (e) with the constraint that e λ E (e)w E (e) = 1 where w E (e) denotes the population weight for race/ethnicity e, we set where λ E (e) ∼ Gam(w E (e), 1) independently for each e and Gam(α, β) denotes the gamma distribution with shape α and rate parameter β. By construction, λ E (e)/( e λ E (e)) follows a Dirichlet distribution with E[ λ E (e)/ e λ E (e)] = w E (e); hence, (6) appropriately corresponds to a "null hypothesis" hyperprior that the ethnicities of 911 callers should correspond to the population of Houston.
Due to the equivalence of probability measures (Zhang 2004), we chose to use independent discrete hyperprior distributions for the bandwidth parameters in (5). As the hyperprior distribution for σ 2 He , for example, we let σ He ∼ DU{0.5 , , . . . , 10 } where DU{·} denotes the discrete uniform distribution and = h b+1 − h b is the common distance between knot locations. Jeffreys hyperprior distributions were used for the correlation matrices R Eγ and R Eη (see Zhang, Boscardin, and Belin 2006) but other choices include hierarchical and shrinkage hyperpriors such as those suggested by Kass (1999, 2001). Informally comparing hyperpriors, we found that the choice had little impact on the posterior distribution for λ. We also used a Jeffreys hyperprior δ, σ 2 η , and σ 2 γ .

PARAMETER ESTIMATION AND IMPUTATION
For each call, partition corresponds to the observed part of z i and z M i corresponds to the missing part of z i and let θ denote the vector of parameters for the model described in Section 2. The complete case likelihood for θ is given by, where 2054 are the total number of 911 calls within Houston city limits over the 5-year period 2006-2010 (see Diggle 2003;Møller and Waagepeterson 2004). We emphasize that calculating (7) where we can use the summation notation because the missing marks for this application are always defined over a discrete space (spatial locations are always known). Inserting i (z O i ) into the likelihood function in (7) yields a likelihood function using only observed data and is given by . The above method for accounting for incomplete observations assumes that the data are missing at random (MAR; Little and Rubin 2002). Alternatively, if the data are not missing at random (NMAR), the missing data mechanism would need to be included in the likelihood in (9). Building such a model for the missing data mechanism is beyond the scope of this article and requires additional data (or strong prior assumptions) which are not available to us.
One challenge in calculating the likelihood in (9) is enforcing the density constraints for each component in (2). For example, the model constrains S h∈H λ HS (s, h | e)w(s | e)(24 −1 )d s = 1 for all e. Enforcing such density constraints for λ ADF , λ G, and λ E is a straightforward normalization because these are defined over discrete spaces. For λ HS , however, this normalization cannot be done exactly because S is a continuous domain. Hence, to ensure λ HS (h, s | e) is a density we rely on the discrete Riemann approximation of the normalizing constant used by Liang, Carlin, and Gelfand (2009).
To obtain estimates of the parameters, we ran a Markov chain Monte Carlo (MCMC) algorithm to obtain 100,000 draws from the posterior distribution after a burn-in of 20,000 iterations. Note that the likelihood in (9) combined with the form for λ(z) in (2), factors into independent pieces conditional on race/ethnicity. This convenient forms allows for race/ethnicity specific parameters to be updated in parallel. Using parallel processing, one iteration using MATLAB took 0.35 sec on a 32 core Intel Xeon system with 512 Gb of memory environment. Hence, one chain took approximately 12 hr of computation time to obtain the desired number of posterior draws. The Gelman and Rubin (1992) diagnostic on −2×log-likelihood for 5 independent chains was used to assess convergence and indicated that the 20,000 iterations of burn-in was sufficient. Mixing of the post-burn draws was evaluated using trace plots. Accuracy of point estimates of posterior means and 0.025 and 0.975 quantiles were monitored using the mcmcse package in R(Flegal and Hughes 2012). The largest Monte Carlo standard error was associated with the posterior mean of δ at a value 0.12.

Model Comparison
The fit of the full model described in Section 2 was compared to a null model, separable model, and a separable-in-ethnicity model using DIC. Each model is described in more mathematical detail in the supplementary materials but, briefly, the null model assumes λ(z) = 1 for all z so that the weighting measure w(z) adequately explains 911 calling patterns, the separable model assumes that λ(z) is separable (i.e., modeled additively on the log-scale) in all marks and, finally, the separable-in-ethnicity model is of the same form as (2) but modified so that each deviation surface is common across race/ethnicity. Each of these models represent increasing complexity such that subsequent comparison determines the degree of complexity necessary to explain the 911 call data.
The DIC values for the null, additive, separable-in-ethnicity, and full model were 122,200; 39,361; 37,971; and 37,235; respectfully, suggesting that the null model is clearly inadequate and that the full model is preferred. The preference for the full model is substantiated by the fact that the full model had 4147 effective parameters compared to 85 for the separable and 1039 for the separable-in-ethnicity model. Hence, the increase in the number of parameters is justified by an increase in model fit. While the full model is preferred in terms of DIC, the decrease from the separable-in-ethnicity to the full model is small suggesting that including an additive race/ethnicity effect for the log-deviation surface may be an appropriate simplifying assumption.

Model Validation
We validated the full model fit by holding out the 2010 calls, fitting the model in Section 2 to data collected in 2006-2009 and subsequently predicting the locations of the 2010 calls according to a temperature scenario. Generating predictions under our point process model requires a temperature field for each day on which predictions are to be made. That is, locations, ages, genders, etc. of 911 callers are generated conditional on a temperature scenario. To generate a predicted point pattern of heat-related 911 calls, let R denote the joint region for (d, C) that corresponds to a predicted temperature scenario. Bayesian posterior simulation of a predicted point pattern occurs by (i) drawing the total number of 911 calls from a Poisson distribution with mean R = (d,C)∈R (d, C) where (d, C) is the intensity for (d, C) after marginalizing over the other marks and (ii) drawing the spatial locations, time periods and marks according to the conditional intensity (z | (d, C) ∈ R) for each draw of θ from the posterior distribution. Because s is continuous over the spatial domain S, we draw s by discretizing S into 1301 discrete locations given by the centroids of the 2010 census block groups within city limits (see Figure 1).
While high-resolution, hourly temperature predictions across Houston may be available on a short-time scale, high-resolution predictions for an entire summer are not typically available for predictive purposes. Indeed, the HRLDAS model cannot be used for predictive purposes because it is constrained by observational data. However, it is reasonable to assume that regional, daily maximums and minimums could be estimated from historical data and used for predictive purposes. For this reason, we used the daily minimum and maximum temperatures from the IAH airport weather station for June, July and August 2010 from the NCDC website (http://www.ncdc.noaa.gov/) as the 2010 temperature scenario. Figure 2(a) displays a histogram of the posterior predictive draws of the total number of heat-related 911 calls during summer 2010 where the vertical bar indicates the observed number of calls (229). Clearly, our model is useful in predicting the total number of calls that will occur under summer 2010 temperatures because the observed value of 229 falls in the center of our predictive distribution. To validate our model at finer spatial scales, let N b and N b represent the observed and predicted number of calls in block group b during the summer of 2010, respectively. Figure 2(b) displays the average predictive probability p k (n) = n −1 k b:N b =k Pr( N b = n | data) where n k is the number of block groups such that N b = k. Strong predictive accuracy is indicated by large probabilities along the increasing (left-to-right) diagonal. Figure 2(b) suggests that the model predicts well when N b = 0 because p k (0) is high when N b = 0. Also, encouragingly, the model frequently places positive posterior predictive mass at the observed N b where N b > 0 (probabilities along increasing diagonal). For example, the block groups where N b = 1 had an average predictive probability p k (1) of approximately 0.45. Qualitatively, however, the model seems to under-predict higher counts (e.g., counts greater than 1) but N b > 1 only occurred in 2% of the hold-out sample.
Results from Figure 2 indicate that our model accurately predicts the total number of calls and frequently places substantial predictive mass at the observed N b . We can substantiate the statistical learning in our model further by comparing P r( N b = N b | data) to the same probabilities using a "null model" that assumes heat-related 911 calls are completely explained by population demographics (i.e., a model where λ(z) ≡ 1 for all z). On average the posterior predictive probability Pr( N b = N b | data) under the full model was 6.9 times larger than under the null model suggesting successful statistical learning and improved prediction.
To validate predictive accuracy using interval estimators rather than point estimators, let P b represent a 95% central posterior predictive set for block group b where Pr( N b ∈ P b | data) ≈ 0.95. For predicting heat-related 911 calls during summer of 2010, 94.2% of {P b } contained the observed N b suggesting appropriate coverage of the predicted values.

Estimated Intensity Surface
Using all the data from 2006-2010, Figure 3(a) displays the posterior mean of log(λ ADC (a, d, C | e)) averaged over age and race/ethnicity. Positive (negative) values in Figure 3(a) correspond to increased (decreased) relative risk with respect to the weighting measure. Not surprisingly, high temperatures during the summer exhibit the strongest increased risk from the baseline measure w(·). Interestingly, note that the regions of greatest increased risk are above the average daily temperature. Investigating this further, let Pr(N > 0 | C, d) denote the probability of at least one heat-related 911 call for a given temperature (C) and day of the year (d) and C d denote the average temperature on day d. Figure 3(b) displays the posterior mean of the number of degrees above average (say, ε) such that Pr(N > 0 | C = C d + ε, d) is 25% larger than Pr(N > 0 | C = C d , d). From Figure 3(b), notice that on January 1, temperatures need to be about 6-9 degrees above average to increase Pr(N > 0 | C = C d , d) by at least 25%. On July 1, however, a 1-degree increase in the temperature can lead to a 25% increase in Pr (N > 0 | C, d). Hence, in terms of public (c) Posterior mean of the probability of at least one heat-related 911 call as a function of the NWS heat index. The dashed line corresponds to the current threshold of 42 • C (108 • F) which, when exceeded, a heat advisory is issued. health, slightly above average summer temperatures can lead to an increased risk of a heat-related 911 call.
Panels (a) and (b) in Figure 3 are in terms of air temperatures. However, humidity is also an important factor when considering the effect of weather on public health (Smoyer-Tomic and Rainham 2001;Rainham and Smoyer-Tomic 2003). Perhaps for this reason, the National Weather Service (NWS) uses a heat index (a combination of temperature and relative humidity) threshold of 42 • C to issue heat advisories. To validate this threshold, we also fit the model in Section 2 using the NWS heat index in place of temperature from the HRLDAS model output. Figure 3(c) displays the posterior predictive probability of at least one heatrelated 911 call as a function of day of the year and heat index (note that Figure 3(c) is restricted to summer months because the NWS heat index is only well defined during this time period). Notably, our results indicate that the current threshold of 42 • C (108 • F) is too high. That is, Figure 3(c) shows that the highest probability of at least one 911 call occurs toward the end of July at a heat index value of 39.5 • C (105 • F). This result has prompted discussion among Houston health officials regarding appropriate weather conditions at which to issue a heat advisory. Figure 4 displays the posterior mean of λ HS (s, h | e) for h = 9 (between 8:31-9:29 AM) and h = 16 (between 3:30-4:29 PM) and all ethnicities where λ HS (s, h | e) > 1 is interpreted as an increased risk. Hour 9 occurs a few hours after the daily warming cycle begins (minimum temperatures occur around 6:00-7:00 am), whereas hour 16 is about an hour after the warmest time of the day throughout the city. From Figure 4, the risks for h = 16 are, on average, higher than for h = 9 across all ethnicities suggesting that the hottest hours of the day pose a higher risk for heat-related 911 calls. Notice, also, that the regions of elevated risk during the h = 9th hr of the day seem to be concentrated in more residential areas than during the h = 16th hr (compare with Figure 1). This result seems to suggest that, during the work hours, urban areas present a higher risk for 911 calls. The exception to this pattern was with the Asian population that seems to have the opposite temporal trend than the other ethnicities (i.e., regions of elevated risk are more concentrated in urban areas in the morning rather than the afternoon). Figure 5 displays the posterior mean of λ ADC (a, d, C | e) for all ethnicities averaged across day of the year where a value greater than 1 indicates elevated risk. Marginally, the elevated risk of older populations for heat-related 911 emergencies is apparent across all ethnicities and higher temperatures. That is, λ ADC (a, d, C | e) for a > 80 and C > 35 is large for all ethnicities. For the Hispanic population, there is statistically significant evidence that the average value of λ ADC (a, d, C | e) for ages a ∈ [25, 40] was greater than 1 (in the sense that a 95% credible interval for the average value of λ ADC (a, d, C | e) in this region was greater than 1). This same increased risk was not significant for the Asian population. We hypothesize that this increase in risk is likely due to an underlying unmeasured sociological covariate (e.g., occupation or income level) but further investigation is required.
The posterior mean (95% credible interval) for the (normalized) gender intensity of male callers (i.e., the percent of male callers) for Caucasian, African American, Hispanic, and Asian populations was 0.58 (0.54, 0.62), 0.63 (0.60, 0.66), 0.74 (0.69, 0.78), 0.60 (0.56, 0.66), respectively. Each of these proportions differed from the population percentages of 50%, 47%, 51%, and 49% male Caucasians, African Americans, Hispanics, and Asians, respectively. Notably, the Hispanic population seems to exhibit different 911 calling patterns for heat-related calls than the other ethnicities by having a higher concentration of Male callers. Finally, the posterior mean of the (normalized) race/ethnicity intensity (i.e., distribution of calls by race/ethnicity) is 0.31, 0.44, 0.23, and 0.02 for Caucasian, African American, Hispanic, and Asian race/ethnicity, respectively, suggesting that African Americans are the most common heat-related 911 callers. These race/ethnic proportions are statistically different than those reflected in the population of Houston wherein Caucasians, African Americans, Hispanics  and Asians make up 49.3%, 19.6%, 37%, and 7.0% of the population.

Comparison to Complete Case Analysis
For comparison, we also estimated model parameters using only the 1385 complete 911 calls. The greatest difference between the complete case and the full analysis was the difference in the posterior mean of δ. In the complete case analysis, the posterior mean (95% credible interval) for δ was 1385.2 (1314.7, 1454.0) compared to 2054.7 (1960.5, 2138.2) in the full analysis. This result is expected because, in the complete case analysis, 669 calls were discarded leading to a gross underestimation of the total number of 911 calls over the domain Z.
The posterior mean of (z) and λ(z) in the complete case analysis changed little from those displayed in Figures 3, 4, and 5. However, the width of credible intervals increased by using only complete cases. For example, the 95% credible interval for the percent of male African American callers was (0.60,0.66) using all the data compared to (0.58,0.67) using only complete cases.

Imputing Missing Marks
We demonstrate our model's capability of imputing missing values for four hypothetical callers (imputation for the actual observations contains personal health information which cannot be shared publicly). We assume that these incomplete calls occur during the h = 16th hr of the day on August 1, 2010. The locations of these calls will be denoted by s M 1 , . . . , s M 4 and were taken to be the centroid of the 2010 census tract where the intensity λ HS (h = 16, s | e) achieved its maximum for each of the four ethnicities. Assume, also, that the temperature was known to be 38 • C at the time of the call. For this example we assume that the marks for gender, age, and race/ethnicity are missing. Imputing the missing values for this scenario amounts to sampling the missing components from the normalized conditional intensity, (10) for i = 1, . . . , 4 where the normalization constant is calculated as a,g,e (a, g, e, s M i , d = 227, h = 16, C = 100). Sampling from (10) for each iteration of the MCMC algorithm results in an estimate of the posterior distribution for the missing marks. Table 1 displays the posterior mean of the distribution for the race/ethnicity of each caller (i.e., the posterior mean of a,g (a, g, e | s, d, h, C)). We emphasize that because the calls were chosen at the location of maximum intensity for each race/ethnicity, calls 1-4 should, roughly, coincide with each of the four different ethnicities. This fact is reflected in Table 1 as the maximum probability for callers 1-3 corresponds to the hypothesized race/ethnicity. For caller 4, note that the posterior probability of an Asian caller is 300% larger than the marginal probability of an Asian caller (0.02). This gives credence that information from the spatial location, hour of the day, day of the year and temperature is being correctly used to impute the missing values.

Using the Statistical Model as a Public Health Tool
To demonstrate how our statistical model can be used as a public health tool for emergency preparedness, in this section we compare the predicted volume of 911 calls (along with all marks) under the temperature scenario during the summer of 2011 described in the Introduction to the observed number of calls during the same time period in 2010. Similar to the validation exercise in Section 4.2, we use the maximum and minimum daily temperatures at IAH as the summer 2011 heat scenario to generate predictions.
For the high temperatures experienced in 2011, the mean of the posterior predictive distribution for the number of heatrelated 911 calls was 250 with a 95% predictive interval of (220, 273). In comparison, there were 229 heat-related 911 calls in 2010 such that, with probability 0.93, a summer similar to 2011 will have more heat-related 911 calls than observed in 2010. Stated differently, we predict the number of heat-related 911 calls under the weather scenario in 2011 to be approximately 10% more than the observed scenario in 2010.
As this research focuses on assessing public vulnerability to heat, it is important to understand the time of day, locations, and attributes of those who are predicted to call 911 for a heatrelated health emergency. Figure 6 summarizes the simulated realizations of the 911 call point pattern under the summer 2011 temperature scenario. Figure 6(a) displays the average number of calls for each day of the summer and hour of the day. The simulated calls tended to cluster near the 15th hr (2:30-3:30 PM) which is approximately the hour where temperature reaches a maximum. An interesting feature in Figure 6(a) are the presence of a few days that did not realize a high volume of heat-related 911 calls (note the light vertical bands in Figure 6(a)). For example, June 22, on average, saw very few realizations of heat-related 911 calls. This is because the temperature profile for June 22, 2011, was relatively cool with a minimum of 20 • C (69 • F) and a maximum of only 29 • C (84 • F). Hence, the databased tool is appropriately predicting a fewer number of 911 calls for June 22 due to lower temperatures.
In Figure 6(b) we classify block groups as having high, moderate or low volume based on the predicted number of heat-related 911 calls within the block group after controlling for population. That is, Figure 6(b) classifies a block group based by its rank of the total number of simulated 911 calls within the block group divided by the total population. One strategy implemented by Houston to counteract the negative effects of high heat is to construct cooling centers that are available for public use (see http://www.houstonoem.org/go/ doc/4027/1871258/City-of-Houston-Cooling-Center-Locations). Using this map, public officials may be able to pinpoint high risk areas that may benefit from a cooling center. Table 2 displays the average number of calls for various age categories and all ethnicities. By considering the predicted ages and ethnicities of callers, public health officials will be able to  Table 2. Mean number of calls by age group and race/ethnicity across simulated realizations of 911 calls using the summer 2011 temperature more effectively design intervention strategies for each group. We predict that the age of Hispanic callers will be younger than those of Caucasian callers suggesting, perhaps, a different form of intervention for these two ethnicities. Comparing the numbers to the observed number of calls in 2010, we predict the largest increase in calls by African American and Hispanics in the (30, 40] age group.

DISCUSSION
In this article, we developed a tool for performing inference on the number and marked location of 911 calls for specified time periods and weather scenarios using an incomplete marked point pattern of 911 calls within the city of Houston. We relied on kernel convolution models to model the intensity surface of the observed point pattern and constrain the marks over the appropriate circular and cylindrical domains. To incorporate information from incomplete records, we marginalized the full intensity over the domain of the missing marks yielding a marginal intensity for the observed marks. By including incomplete cases, our estimate of the total number of 911 calls was more accurate and the width of credible intervals decreased.
The effect of heat stress on the public is manifest in many different forms such as emergency room visits, 911 calls, and, in some cases, mortalities. We emphasize that this research analyzes the incidence of a heat-related 911 call which is but one manifestation of a heat-related public health problem and the cumulative impact of heat on public health is more broad. Furthermore, this analysis focused on 911 calls where "heat," "dehydration," "hyperthermia," "heat exhaustion," and "heat stroke" were listed as a diagnosis. We admit that this is a restrictive definition but extending our definition of a "heat-related" 911 call, however, will likely increase the noise in the data because cases would be included that may not be heat-related. Rather than extending the definition of "heat-related" 911 call, subsequent analyses of the cumulative impact of heat could simultaneously analyze the many different types of public health data. This is part of the ongoing work associated with the SIMMER project.
Many extensions of our model are possible which we leave for future work. For example, several studies have found lagged temperatures to be associated with increases in negative public health outcomes (see, e.g., Hajat et al. 2005;Anderson and Bell 2009;Heaton and Peng 2012). However, including lagged effects is a difficult problem for this application because we don't have information on the whereabouts of the 911 caller prior to the call. Hence, any lagged temperature estimate would be an indirect proxy of the patient's heat exposure. Likewise, considering lagged effects of a day or more would necessitate including characteristics of the patient's home, such as the presence of air conditioning, which would certainly influence the timing and spatial location of a call.
As another extension, humidity and various heat metrics (e.g., the heat index, apparent temperature, etc.) could also be included because these also may affect the occurrence of a 911 call. Recent studies by Kovats and Hajat (2008), Gosling et al. (2009), andHeaton et al. (2014) have formally compared heat indices to raw temperatures and concluded that raw temperatures can be more explanatory of heat-related illnesses than heat indices. Rather than including only a single measure of heat, multiple heat indices could be included but this greatly increases the complexity of the model because these alternate heat metrics would need to be modeled nonseparately from the temperature intensity surface.
As mentioned in Section 4.2, model predictions rely on accurate portrayals of future temperatures. For the prediction examples described here, the temperature scenarios were simplified daily minimum and maximums at IAH and assumed to be constant across the city. While high-resolution, accurate temperature forecasts may be available for near future predictions (1-5 days), constructing accurate temperature scenarios over a long temporal range is difficult. Furthermore, any predicted temperature scenario is also associated with a degree of uncertainty. Similar to predictions of future climate (Tebaldi and Knutti 2007), incorporating the uncertainty in temperature scenarios into model predictions could be done by averaging predictions across various temperature scenarios weighted by the likelihood of each scenario.
We note that our treatment of the incomplete cases implicitly assumes that the marks are missing at random. That is, we assume that the missing data mechanism only depends on the observed marks. This is likely an unrealistic assumption. For example, we imagine that race/ethnicity was often not recorded for callers whose race/ethnicity was not obvious, was of a race/ethnicity not familiar to the paramedics, or was not recorded by request of the patient. This suggests that the marks are not missing at random. To account for this, additional data would need to be gathered to develop an appropriate missing data mechanism and either (i) include the missing data mechanism in the likelihood or (ii) impute missing marks at each iteration of the MCMC chain using the missing data mechanism.
In conclusion, the value of this work is realized when considered in the context of climate change and urbanization. That is, projections of increases in the number of warm days and nights (Oleson et al. 2013) and urbanization (Alig, Kline, and Lichtenstein 2004) will affect the volume of heat-related calls. For this reason, public health tools such as the one developed herein will be important to help mitigate the negative consequences of such changes. Using this tool to assess impacts of climate change and urbanization presents interesting statistical challenges-particularly the challenge of extrapolation. For example, under climate change temperatures greater than 38 • C (100 • F) could become more common. Whereas, in this dataset, such high temperatures were rarely observed rendering our estimated intensity surface to be low for this temperature region. Hence, to use this tool for predictive purposes under a climate change scenario would require additional constraints on the intensity surface to account for potential shifts in temperatures.

SUPPLEMENTARY MATERIALS
Technical Details: Technical details regarding the exploratory analysis of the 911 call data to justify modeling assumptions, the kernel convolution construction, and alternative models for model comparison. [Received August 2013. Revised September 2014