SpTe2M: An R package for nonparametric modeling and monitoring of spatiotemporal data

Abstract Spatiotemporal data are common in practice. Such data often have complicated structures that are difficult to describe by parametric statistical models. Thus, it is often challenging to analyze spatiotemporal data effectively since most existing statistical methods and software packages in the literature are based on parametric modeling and cannot handle certain applications properly. This paper introduces the new R package SpTe2M, which was developed for implementing some recent nonparametric methods for modeling and monitoring spatiotemporal data. This package provides analytic tools for modeling spatiotemporal data nonparametrically and for monitoring dynamic spatial processes sequentially over time. It can be used for different applications, including disease surveillance and environmental monitoring. The use of the package is demonstrated using the Florida influenza-like illness data observed during 2012–2014 and the PM2.5 concentration data in China collected during 2014–2016.


Introduction
Spatiotemporal data have become increasingly popular in many disciplines, including agriculture, climate science, epidemiology, neuroscience, and social science.In the literature, there have been many existing methods developed for analyzing spatiotemporal data (e.g., Cressie and Wikle 2011;Diggle et al. 2013).However, spatiotemporal data often have complicated structures, including complex spatiotemporal variation, latent spatiotemporal correlation, and unknown data distribution, which can hardly be described properly by parametric models (cf.Yang and Qiu 2020).But most existing methods for analyzing spatiotemporal data are parametric, and it is well demonstrated in the literature that nonparametric methods often perform better than the parametric methods in cases when some of the parametric assumptions are violated (cf.Subection 3.2, Qiu and Yang 2023).Therefore, there is a need to develop nonparametric methods and the corresponding software packages for analyzing spatiotemporal data.This paper introduces a new R package, called SpTe2M, that provides analytic tools for modeling spatiotemporal data nonparametrically and for monitoring dynamic spatial processes sequentially over time.
In the literature, there are some statistical methods and software packages available for analyzing spatiotemporal data.One popular approach is the dynamic spatiotemporal modeling framework that was first discussed in Stroud, M€ uller, and Sans� o (2001) and can be implemented by using the R package spBayes (Finley, Banerjee, and Gelfand 2015).In the dynamic spatiotemporal model, spatial observations are assumed to be normally distributed, and the sequence of observed spatial response surfaces over time is assumed to follow a linear evolution process.Some of its generalized versions have been proposed to extend the original dynamic spatiotemporal model to cases with multiple responses (Bradley, Holan, and Wikle 2015) and/or cases with a nonlinear evolution equation (Wikle and Hooten 2010).As an alternative, Møller, Syversveen, and Waagepetersen (1998) proposed a framework for modeling a spatiotemporal point process based on the log-Gaussian Cox process, where the logarithm of the related spatiotemporal intensity function is assumed to be a spatiotemporal Gaussian process, conditional on which the observed spatiotemporal data are assumed to have a Poisson distribution (e.g., Diggle et al. 2013).This method can be implemented using the lgcp package (Taylor et al. 2013).There are some other methods for spatiotemporal modeling and prediction, including the hierarchical Bayesian spatiotemporal modeling method via the R package spTimer (Bakar and Sahu 2015), the lattice kriging method via the R package LatticeKrig (Nychka et al. 2015), the local approximate Gaussian process method via the R package laGP (Gramacy 2016), the fixed rank kriging method via the R package FRK (Zammit-Mangion and Cressie 2017), the metakriging method (Guhaniyogi and Banerjee 2018), the non-stationary Gaussian process method via the R package BayesNSGP (Risser and Turek 2020), and the nearest neighbor Gaussian process method via the R package spNNGP (Finley, Datta, and Banerjee 2020).
The existing methods mentioned above require various parametric assumptions on the spatiotemporal data variation, spatiotemporal data correlation, and/or data distribution.Such parametric assumptions are rarely valid in practice, and nonparametric methods usually perform better in such cases.To fill the gap, Yang and Qiu (2018, 2019, 2022) developed a nonparametric framework for modeling spatiotemporal data based on local kernel smoothing, in which no parametric assumptions were imposed on the spatiotemporal mean function, spatiotemporal covariance function, and data distribution.Thus, this spatiotemporal data modeling approach is flexible and can be used reliably in many applications.Based on this modeling approach, Yang and Qiu (2020) and Qiu andYang (2021, 2023) proposed several methods for online monitoring of spatial processes and detecting possible shifts in their distributions.These spatial process monitoring methods can be used to solve many real-world applications, including environmental monitoring and disease surveillance.
Regarding spatial process monitoring, major existing methods and software packages in the literature include those based on the scan statistics via the R package scanstatistics (All� evius 2018) and those based on the Knox statistics via the R package surveillance (Meyer, Held, and H€ ohle 2017).The scan and Knox methods are designed mainly for retrospective data analysis, in which the time interval is fixed in advance.They could not accommodate some common structures in the observed spatiotemporal data, including seasonality and spatiotemporal data correlation.They require the data distribution to have a parametric form (e.g., normal, Poisson, or negative binomial).As a comparison, the online process monitoring methods discussed in Yang and Qiu (2020) and Qiu andYang (2021, 2023) do not require such parametric assumptions and can accommodate complex structures in the observed spatiotemporal data.
To help researchers to use the nonparametric spatiotemporal data modeling and monitoring methods suggested in Yang and Qiu (2018, 2019, 2020, 2022) and Qiu andYang (2021, 2023), the R package SpTe2M was developed to implement these methods.This article provides an introduction and some demonstrations of the package.The remainder of the paper is organized as follows.Section 2 provides a brief overview of the methods that can be implemented in the SpTe2M package.Section 3 demonstrates the functionality of the package using a real-world dataset about the influenza-like illness (ILI) in Florida observed during 2012-2014.Section 4 discusses the application of the package to another real data example about the PM2.5 concentrations in China during 2014-2016.Finally, Section 5 provides some concluding remarks.
Estimation of the mean function lðt, sÞ: In this part, we discuss estimation of the mean function lðt, sÞ from the observed spatiotemporal data fyðt i , s ij Þ, j ¼ 1, :::, m i , i ¼ 1, :::, ng: For any ðt, sÞ 2 ½0, T� � X, Yang and Qiu (2018) suggested estimating lðt, sÞ by the following local linear kernel smoothing procedure: where Then, the covariance function Vðt, t 0 ; s, s 0 Þ can be estimated from these residuals by the weighted moment estimation procedure, as discussed in Yang and Qiu (2019).More specifically, the variance function r 2 ðt, sÞ ¼ Vðt, t; s, sÞ can be estimated by r2 ðt, sÞ ¼ where w r ði, j; t, sÞ ¼ K t ððt i − tÞ=g t ÞK s ðd E ðs ij , sÞ=g s Þ, for 1 � j � m i and 1 � i � n, and g t and g s are two bandwidths that can be different from the bandwidths h t and h s .When ðt, sÞ 6 ¼ ðt 0 , s 0 Þ, the covariance function Vðt, t 0 ; s, s 0 Þ can be estimated by where w v ði, j, k, l; t, s, t 0 , s 0 Þ ¼ w r ði, j; t, sÞw r ðk, l; t 0 , s 0 Þ: Note that the estimates r2 ðt, sÞ and V ðt, t 0 ; s, s 0 Þ obtained by [5] and [6] may not be positive semidefinite, and thus may not be legitimate variance and covariance functions.To address this issue, a modification is needed to make them positive semidefinite.To this end, the matrix-projection-based modification procedure discussed in Yang and Qiu (2019) is used, which can be implemented using the function nearPDðÞ in the R package Matrix.
A three-step estimation of lðt, sÞ and Vðt, t 0 ; s, s 0 Þ: Although the local linear kernel smoothing procedure in [3]-[4] provides a reliable estimate of lðt, sÞ, the covariance structure of the observed data is not taken into account in that procedure.Thus, it still has room for improvement.To improve the local linear kernel smoothing estimator defined in [4], Yang and Qiu (2022) developed a three-step local smoothing approach for estimating the spatiotemporal mean and covariance functions.This method consists of three main steps described below.(iii) Calculate the final estimate of lðt, sÞ using a weighted local linear kernel smoothing procedure, in which the estimated spatiotemporal covariance function is used.Next, we describe the third step in more details.Let RY be the estimated covariance matrix of where f 1 and G are defined immediately after [4].From [7], it can be seen that the final estimate lðt, sÞ has taken into account spatiotemporal data correlation through RY in RK :

Online monitoring of spatiotemporal data
In this part, we provide a detailed description of the online spatiotemporal process monitoring methods suggested in Yang and Qiu (2020) and Qiu andYang (2021, 2023).Assume that an in-control (IC) dataset i , s � ij Þ, j ¼ 1, :::, m � i , i � 1g could be spatiotemporally correlated, and their IC means could vary across both space and time.Thus, they cannot be monitored effectively by conventional statistical process control (SPC) charts, because the conventional charts are designed for cases with independent and identically distributed IC process observations.In the SPC literature, it has been well discussed that the conventional charts are unreliable and can give misleading results when they are used for monitoring serially correlated data (cf.Apley and Tsung 2002;Qiu, Li, and Li 2020).To overcome this limitation, we can sequentially decorrelate the observed data before online process monitoring, using the following data decorrelation and standardization procedure: respectively, all of which are obtained from V ðt, t 0 ; s, s 0 Þ: After using the above data decorrelation and standardization algorithm, the original spatiotemporal n If the process under monitoring is IC, then these decorrelated observations should be asymptotically uncorrelated with each other, and the asymptotic mean and variance of each of them should be 0 and 1, respectively.Then, process monitoring can focus on these decorrelated and standardized observations rather than the original observations.
Online monitoring of a spatial process.From the description in the previous part, it can be seen that each decorrelated and standardized observation êðt � i , s � ij Þ is a linear combination of the original process observations.Thus, its IC distribution should be close to the standard normal distribution under some regularity conditions by the central limit theorem.Consequently, the asymptotic distribution of degrees of freedom of m � i , and the asymptotic mean and variance of ð should be 0 and 1, respectively.Then, Yang and Qiu (2020) suggested the following cumulative sum (CUSUM) chart for online process monitoring: and the chart gives a signal at time t � i if the charting statistic C i exceeds a control limit L, where C 0 ¼ 0 and c > 0 is often called a reference value or an allowance constant (cf.Qiu 2014, Chapter 4).
Usually, the performance of a control chart is evaluated by the IC average run length (ARL), denoted as ARL 0 and defined as the average number of observation time points from the beginning of process monitoring to a signal by the chart when the process is IC, and the out-of-control (OC) ARL, denoted as ARL 1 and defined as the average number of observation time points from the occurrence of a process distributional shift to a signal time by the chart.When designing a control chart like the one in [8], the ARL 0 value is usually prespecified at a given level and its control limit is chosen to achieve the prespecified ARL 0 value.Then, the chart performs better if its ARL 1 value is smaller for detecting a given shift.See a systematic discussion about the design of a control chart in Qiu (2014).
To use the CUSUM chart [8], we need to specify the allowance constant c.In the literature, it has been well studied that a large c value is good for detecting large shifts and a small c value is good for detecting small shifts (Qiu 2014, Chapter 4).Commonly used c values include 0.1, 0.2, 0.3, 0.5, and 1.For a prespecified ARL 0 value, we also need to determine the control limit L such that the prespecified ARL 0 value is reached.To this end, the block bootstrap procedure described in Section B of the supplementary file can be used.
Online monitoring of a spatial process using covariate information.The CUSUM chart [8] provides a reliable tool for online monitoring of a spatial process.In many applications, the spatiotemporal response yðt, sÞ is associated with some covariates.For example, disease incidence can be affected by many risk factors including air temperature, humidity, and other weather and/or environmental conditions.In such applications, it is our belief that proper use of the covariate information can improve the performance of the related control chart.Next, we introduce the exponentially weighted moving average (EWMA) chart suggested by Qiu and Yang (2021) that can make use of covariate information.This chart is denoted as EWMAC hereafter, where the last letter "C" denotes "covariates."The EWMAC method consists of two main steps.First, a semiparametric spatiotemporal model is fitted from an IC dataset to estimate the regular spatiotemporal pattern of the process under monitoring in presence of the related covariates.Second, the EWMAC chart is developed for online process monitoring, which can accommodate the covariate information.Next, we provide more details about these two steps.
Let fyðt i , s ij Þ, j ¼ 1, :::, m i , i ¼ 1, :::, ng be an IC dataset obtained before online process monitoring.Observations in this dataset are assumed to follow the semiparametric spatiotemporal model where X 1 ðtÞ is a vector of p 1 time-dependent covariates, X 2 ðt, sÞ is a vector of p 2 space/time-dependent covariates, b 1 and b 2 are their regression coefficients, lðt, sÞ is the mean of yðt, sÞ after excluding the part explained by X 1 ðtÞ and X 2 ðt, sÞ, and �ðt, sÞ is a zeromean random error, for any ðt, sÞ 2 ½0, T� � X: It should be pointed out that there could be time-independent covariates (they could depend on space) that are associated with yðt, sÞ in practice.Such covariates are not included explicitly in the model [9] because they would not provide any helpful information about the temporal variation of yðt, sÞ and thus are not helpful for online process monitoring.
To estimate the regression coefficients Qiu and Yang (2021) proposed the following iterative algorithm: i. Set b ¼ 0 and obtain an initial estimate of lðt, sÞ by [4], denoted as lð0Þ ðt, sÞ: ii.In the kth iteration, for k � 1, implement the following two steps: iii.Compute the least squares estimate of b, denoted as bðkÞ , from the linear model Z ðkÞ ðt, sÞ ¼ for each i and j.The updated estimate of lðt, sÞ is denoted as lðkÞ ðt, sÞ: v.The iterative algorithm stops when k bðkÞ − bðk−1Þ k 1 =k bðk−1Þ k 1 � 1, where 1 > 0 is a prespecified small number and kxk 1 denotes the summation of absolute values of all elements in the vector x: Then, bðkÞ and lðkÞ ðt, sÞ are the final estimates of b and lðt, sÞ, respectively.These final estimates are also denoted as b and lðt, sÞ: By using the above iterative algorithm, we obtain the estimated regression coefficients b1 and b2 : Let zðt, sÞ ¼ X T 1 ðtÞ b1 þ X T 2 ðt, sÞ b2 denote the estimated covariate effect on the response yðt, sÞ, l y ðt, sÞ and V y ðt, t 0 ; s, s 0 Þ be the mean and covariance functions of yðt, sÞ, and l z ðt, sÞ and V z ðt, t 0 ; s, s 0 Þ be the mean and covariance functions of zðt, sÞ: These functions can be estimated from the IC dataset by using [3]-[4] and [5]-[6], and their respective estimates are denoted as ly ðt, sÞ, V y ðt, t 0 ; s, s 0 Þ, lz ðt, sÞ and V z ðt, t 0 ; s, s 0 Þ: Next, we provide a detailed description of the EWMAC chart for detecting an upward mean shift in the spatiotemporal process under monitoring.The EWMAC chart for detecting downward shifts can be discussed similarly.Assume that the spatiotemporal observations to monitor are observed at times t � i and locations fs � ij , j ¼ 1, :::, m � i g, for i � 1: These observations are denoted as , and the related observations of the time-dependent and space/time-dependent covariates are denoted as , respectively.Before monitoring the spatiotemporal process, let us first apply the data decorrelation and standardization algorithm discussed before to both , and their decorrelated data are denoted as fê y ðt � i , s � ij Þg and fê z ðt � i , s � ij Þg: Then, the following EWMA chart is used for detecting upward shifts in the process zðt, sÞ: where k 2 ð0, 1� is a weighting parameter and E z, 0 ¼ 0: If there is an upward mean shift in zðt, sÞ at or before the time t � i , then the value of E z, i would be relatively large because of the shift.Therefore, the EWMA charting statistic E z, i provides a measure of the likelihood of an upward mean shift in zðt, sÞ: In the spatiotemporal process monitoring problem, our ultimate goal is to detect shifts in yðt, sÞ, which may or may not be caused by shifts in zðt, sÞ: In addition, shifts in zðt, sÞ are not our major concern in the current process monitoring problem, although any helpful information in zðt, sÞ should be used in process monitoring.By these considerations, Qiu and Yang (2021) suggested the following EWMAC chart for online monitoring of yðt, sÞ : where E y, 0 ¼ 0, and WðE z, i ; k, jÞ 2 ð0, 1� is a weighting parameter that depends on the covariate charting statistic E z, i and two parameters k 2 ð0, 1� and j > 0: The chart gives a signal at time t � i if E y, i > L, where L > 0 is a control limit.
Online monitoring of a spatial process using exponentially weighted spatial LASSO.In many spatiotemporal process monitoring applications, a systematic process shift starts in some small regions that are spatially clustered (Wang et al. 2018).This kind of spatial feature has not been taken into account in the CUSUM chart [8] or the EWMAC chart [10] because they treat the spatial locations equally in their construction.To address this issue, Qiu and Yang (2023) proposed an effective method for spatiotemporal process monitoring by combining the ideas of regression modeling in the space domain by spatial LASSO and exponentially weighted data smoothing in the time domain.The related control chart is denoted as EWSL, representing "exponentially weighted spatial LASSO."Some details about this chart are given below.
For online process monitoring, a basic strategy is to use as much available information about the process under monitoring as possible and give less weight to process observations that are collected farther away from the current observation time in the decision-making process.On the other hand, the spatial feature that process shifts usually start in small clustered spatial regions should be taken into account as well.Based on these considerations, Qiu and Yang (2023) proposed the EWSL chart that combined the following two procedures.First, a spatial LASSO procedure is proposed to find small clustered regions with possible shifts in the process distribution at the current observation time, which can accommodate complex spatial data structures.Second, an exponentially weighted smoothing procedure is suggested in the temporal domain to make use of all historical spatiotemporal data.These two procedures are then combined seamlessly in the spatiotemporal domain for online process monitoring.More specifically, by combining the spatial LASSO idea with the exponentially weighted data smoothing idea, Qiu and Yang (2023) proposed the following penalized exponentially weighted kernel smoothing procedure for estimating the means of êðt where h > 0 is a bandwidth, K s ð�Þ is the Epanechnikov kernel function, w 1 , w 2 > 0 are two tuning parameters, are the adaptive LASSO weights (cf.Zou 2006), and have taken into account the information from previous process observations.Due to the shrinkage property of the spatial LASSO penalty in [11] (i.e., the last two terms in that expression), many elements in fl ê ðt � i , s � ij Þg would be zero and only those at locations with process shifts would be nonzero.So, to detect process shifts, it is natural to consider the statistic ÞÞ T , and Rl ê, i is an estimate of the covariance matrix of lê, i : Then, the charting statistic of EWSL is the following standardized version of Q i : where ÊðQ i Þ and d VarðQ i Þ are the estimates of the mean and variance of Q i , respectively.The chart [12] gives a signal at time t � i if SQ i > L, where L > 0 is a control limit.

Description of the R package SpTe2M
The R Package SpTe2M has been developed recently to implement spatiotemporal data modeling and monitoring methods described in Section 2. The package can be downloaded directly from the Comprehensive R Archive Network (CRAN) with the web address https://cran.r-project.org/.After the package is downloaded to your computer, it can be installed using the following R commands.R > install.packages("SpTe2M")R > library("SpTe2M") In the current version of the SpTe2M package, there are seven main functions that are summarized in Table 1.The package manual contains additional information for all these functions.This documentation can be accessed conveniently in R using the following commands: R > help(package ¼ "SpTe2M") # Access the package manual R>? spte_meanest # Read documentation about the spte_meanest() function The package also includes a vignette to illustrate how to implement the main functions in a real data application.Users can run the R code below to have access to the vignette.R > vignette('SpTe2M',package¼'SpTe2M') # Read the vignette Demonstration of the package SpTe2M using the Florida ILI dataset.In the next two subsections, the functionality of the package SpTe2M is demonstrated using the Florida ILI data that were collected by the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) of the Florida Department of Health (FDOH).ILI is a respiratory infection caused by a variety of influenza viruses.Patients with ILI usually have severe respiratory illness with fever, cough, sore throat, and even difficulty in breathing.In the United States, it is reported that 15% to 40% of the population experience illness from influenza every year.Among those patients with ILI, there are about 36,000 hospitalizations and 114,000 deaths due to influenza infection (Fiore et al. 2010).Previously, the method to estimate the incidence rate of ILI has been to carry out repeated seroprevalence surveys that are resourceintensive and slow.Thus, it is unfeasible for early detection of ILI outbreaks.To overcome this difficulty, FDOH built a syndromic surveillance system (i.e., ESSENCE) for collecting near real-time prediagnostic data from 264 emergency departments and urgent care centers that are distributed in all counties of Florida.As a demonstration, the ILI incidence rates collected by the ESSENCE system for all 67 Florida counties on 06/15 (a summer time) and 12/15 (a winter time) in the three years 2012-2014 are presented in Figure 1.It can be seen from the figure that ILI incidence rates have seasonal patterns, with more ILI cases in the winters and fewer ILI cases in the summers.Moreover, there seems to be an unusual pattern of ILI incidence rates in the winter of 2014 because the incidence rates on 12/15/2014 were much higher than those on 12/15/2012 and 12/15/2013.Next, we will demonstrate the use of different functions in the SpTe2M package using this built-in ILI dataset that contains the observed daily ILI incidence rates at 67 Florida counties during 2012-2014.In the literature, spatial data can be roughly classified into the following three categories: areal data, geospatial data, and point process data (cf.Cressie and Wikle 2011).The ILI dataset belongs to the category of areal data since observations included in the dataset are measured for 67 geographical regions.In addition to observations of the variable Rate (i.e., ILI incidence rate), the dataset also contains observations of the following seven variables: County, Date, Lat (i.e., latitude of the centroid of a county), Long (i.e., longitude of the centroid of a county), Time, Temp (i.e., air temperature), and RH (i.e., relative humidity).

Spatiotemporal data modeling
This subsection demonstrates how to estimate the mean and covariance functions based on the local linear kernel smoothing procedure estimated values of the mean function.Its default is NULL: In such a case, stE ¼ st: The output of the function spte meanestðÞ is a list that contains three elements-bandwidth, stE, and muhat-which correspond to the bandwidths ðht, hsÞ, the times and spatial locations, and the related mean estimates, respectively.
As The estimated mean values for three other counties-Lake, Pinellas, and Seminole-are also presented in Figure 2. From the figure, it can be seen that the estimated spatiotemporal mean function can well describe the longitudinal pattern of the observed ILI incidence rates.Thus, the function spte meanestðÞ can indeed provide a reliable tool for estimating the spatiotemporal mean function.
Next, we describe how to estimate the spatiotemporal covariance Vðt, t 0 ; s, s 0 Þ by using the function spte covestðÞ: A typical spte covestðÞ code snippet looks as follows: spte_ covest(y,st,gt Regarding the above command, the auguments y and st in spte covestðÞ are the same as those in spte meanestðÞ explained earlier.The rest of the arguments are described in Table 2.The output of spte covestðÞ is also a list, and its element covhat is the estimated covariance.To estimate the covariance of the ILI data in the year 2013 by the weighted moment estimation procedure [5]-[6], we can use the following code: R > cov.est <-spte_covest(y ¼ y.sub,st ¼ st.sub)In the above command, we did not specify the arguments gt and gs: In such cases, the two bandwidths (g t , g s ) would be determined by minimizing the mean squared prediction error defined in (A3) of the supplementary file.

Spatiotemporal data monitoring
This subsection demonstrates how to monitor a spatiotemporal process by the methods described in Subsection 2.2 using the package SpTe2M.Recall that construction of the CUSUM chart [8] consists of two main steps.The first step is to model an IC spatiotemporal dataset and estimate the regular spatiotemporal pattern of the IC data.Then, in the second step, the control limit of the chart is first determined by the block bootstrap procedure and then the chart can be used for online process monitoring.See the discussions in Section 2 for more details.For the ILI data, the observed ILI incidence rates in the years 2012 and 2013 are quite stable (cf. Figure 1).So, these data will be used as the IC dataset for setting up the CUSUM chart [8].The IC observations are divided into two parts: the IC data in the year 2013 are used for estimating the mean and covariance functions of the IC spatiotemporal model, and the data in the year 2012 are used for determining the control limit of the chart, as described in Section B of the supplementary file.Then, the chart [8] is used to monitor process observations in the year 2014.To this end, we first decorrelate the observed ILI incidence rates in the year 2014 and then apply the CUSUM chart [8] to the decorrelated data.All these procedures can be accomplished by using the function sptemnt cusumðÞ: The main arguments of this function are listed below: � y : A vector of spatiotemporal observations of the response y.� st : A three-column matrix specifying the locations and times of the observations in y: � type : A vector specifying the type of each observation, and type could be "Mnt, " "IC1" or "IC2, " where type ¼ ``Mnt 00 implies that the related observation is for process monitoring, type ¼ ``IC1 00 implies that the related observation is in the IC data for determining the control limit, and type ¼``IC2 00 implies that the related observation is in the IC data for estimating the IC spatiotemporal mean and covariance functions.If there is a single IC dataset specified, then it will be used for both purposes.If no observations are specified as IC data, then an error will be returned.The following code is an example to monitor the ILI data using the CUSUM chart [8]: R > y <-ili_dat$Rate; st The bandwidth g t used in ( 5) and ( 6).Its default value is NULL: If gt¼NULL, then gt is chosen by (A3) of the supplementary file.gs The bandwidth g s used in ( 5) and ( 6).Its default value is NULL: If gs¼NULL, then gs is chosen by (A3).stE1 A three-column matrix specifying ðs, tÞ: Its default value is NULL: In such a case, stE1 ¼ st: stE2 A three-column matrix specifying ðs 0 , t 0 Þ: Its default value is NULL: In such a case, stE2 ¼ st: In this example, the bandwidths (h t , h s ) for estimating the spatiotemporal mean function are chosen based on the modified cross-validation procedure in (A1)-(A2) of the supplementary file.This bandwidth selection procedure can be implemented by using the function mod cvðÞ in SpTe2M, and the selected bandwidths for (h t , h s ) are 0.05 and 6.50, respectively.Regarding the bandwidths (g t , g s ) for estimating the covariance function, they are determined by the mean square prediction error criterion defined in (A3) of the supplementary material.After running the function cv mspeðÞ, they are chosen to be 0.25 and 1.50, respectively, in this example.
The function sptemnt cusumðÞ in the above code returns a list, and the computed charting statistic values and the control limit of the CUSUM chart [8] can be accessed via ili:cusum$cstat and ili:cusum$cl, respectively.Then, we can plot the CUSUM charting statistic values versus the observation times by using the following code: The resulting plot is shown in the left panel of Figure 3.To better perceive the charting statistic values around its first signal time 10/16/2014, the charting statistic values during the time period from 09/15/2014 to 10/31/ 2014 are presented in the right panel of Figure 3.
In some applications, we may want to apply the data decorrelation and standardization procedure to some spatiotemporal observations.This can be accomplished by using the spte decorðÞ function.For instance, to decorrelate the observed ILI incidence rates in the year 2013, we can do the following: R > decor <-spte_decor(y ¼ y.sub,st ¼ st.sub,y0 ¼ y.sub,st0 ¼ st.sub)Then, the decorrelated and standardized data can be obtained via decor$std:res: The arguments y, st, y0, and st0 of the above R command are explained below: � y : A vector of the spatiotemporal observations to decorrelate.� st : A three-column matrix specifying the locations and times of the observations in y: � y0 : A vector of the IC spatiotemporal observations that we can use to estimate the IC mean and covariance functions by the procedures [3]-[4] and [5]-[6].� st0 : A three-column matrix specifying the locations and times of the observations in y0: In the ILI dataset, there are observations of the two covariates "air temperature" and "relative humidity" available.Next, we try to monitor the observed ILI incidence rates data by using these covariate information.To this end, the association between the two covariates and the ILI incidence rates can be investigated by fitting the semiparametric spatiotemporal model [9], which can be accomplished by using  � tol : The tolerance level Ϛ of the convergence criterion in the iterative estimation procedure for estimating Model [9].Its default value is 0.0001.
Then, we can use the following code to fit the semiparametric model for the ILI data in 2013, and the estimated regression coefficients b can be obtained via semi:est$beta: From the estimated model, the estimated regression coefficients of "relative humidity" and "air temperature" are −1:17 � 10 −6 and −1:06 � 10 −6 , respectively.Therefore, both covariates are negatively associated with the observed ILI incidence rates.This is consistent with our intuition and the conclusions found in the literature (cf.As can be seen from the code, the arguments of the function sptemnt ewslðÞ are the same as those of the function sptemnt ewmacðÞ: Here, we would like to mention that the EWSL chart is based on the penalized exponentially weighted kernel smoothing procedure [11].To use this procedure, the bandwidth h and the tuning parameters ðw 1 , w 2 Þ should be chosen properly.When using sptemnt ewslðÞ, h is chosen by the cross-validation procedure in [A6] and  To plot the EWSL chart, we can first obtain the EWSL charting statistic values and the control limit via ili:ewslcstat and ili:ewsl$cl: Then, the plot can be generated, which is shown in the left panel of Figure 5.Its zoom-in part in the time period during 09/15/2014-10/31/2014 is shown in the right panel of the figure .From the figure, we can see that the EWSL chart gives its first signal on 10/6/2014, which is 10 days earlier than that of the CUSUM chart [8] shown in Figure 3. It should be pointed out that we choose to use the county centroids as the locations of the 67 Florida counties in this example because this strategy is commonly used in the statistical and epidemiological literatures (e.g.Hooten, Ver Hoef, and Hanks 2019).In some applications, however, it may be more natural to represent locations of the areal units by population centers or some other strategies.For instance, most disease cases occur around the population centers rather than the county centroids for some counties in the ILI example.In such a case, population centers might be more appropriate to represent the county locations.After taking this into consideration, we also implement the related methods in the package by using the population centers as the county locations.The corresponding results are presented in Figure S1 and Table S1 of the supplementary file.By comparing these results with those described above, it can be seen that the performance of our methods is quite robust to the choice of county locations, which can be explained below.Our spatiotemporal data modeling methods are based on the local kernel smoothing procedure.The related nonparametric estimators at a given location and time are weighted averages of neighboring data, where the weights are determined by the kernel function and the neighborhood size is controlled by the bandwidths.After replacing the county centroids by the population centers as the county locations, the weights in our estimation procedures become different for some counties but the selected bandwidths are almost the same.In the kernel smoothing literature, it has been well discussed that the choice of the kernel functions (or the weights) is less important than the choice of the bandwidths (cf.Brabanter et al. 2011).This explains why our estimators are quite robust to the choice of the county locations in this example.

PM 2.5 data example
In this section, we further demonstrate the main functions in SpTe2M by using another real data example related to the observed PM 2.5 concentration levels described below.PM 2.5 refers to fine inhalable particles in the air, with diameters less than 2.5 micrometers.Since PM 2.5 concentration is positively associated with the incidence of many diseases such as cancers and respiratory diseases, governments around the world have invested a great amount of money in establishing data collection and monitoring systems to collect PM 2.5 data and monitor them sequentially over time.For instance, the China National Environmental Monitoring Center (CNEMC) has developed such a system to collect daily PM 2.5 concentrations at 183 major cities in China.Next, we apply the SpTe2M package to the PM 2.5 data collected by CNEMC during the years 2014-2016, which contain the observed data collected at 183 air pollution monitoring stations.Next, we will discuss how to monitor the observed PM2.5 data by using the functions sptemnt cusumðÞ and sptemnt ewslðÞ: We cannot use the function sptemnt ewmacðÞ in this example since there are no covariates contained in the PM2.5 dataset.To use the related functions, we first need to specify the argument type, which separates the observed data into three parts: the IC observations for estimating the IC spatiotemporal model, the IC observations for determining the control limit of a chart, and the observations for process monitoring.To this end, the observed data in the year 2014 are used for estimating the IC model, the observed data in the year 2015 are used for determining the control limits of the related control charts, and the observed data in the year 2016 are for online process monitoring using the CUSUM chart The two control charts are presented in Figure 7. From the figure, it can be seen that the first signals of the CUSUM chart [8] and the EWSL chart [12] are given on 4/7/2016 and 3/17/2016, respectively.So, in this example, the first signal of the EWSL chart is about 20 days earlier than that of the CUSUM chart.

Concluding remarks
This paper introduces the R package SpTe2M that was developed recently to implement some nonparametric methods for modeling and monitoring spatiotemporal data discussed in Yang and Qiu (2018, 2019, 2020, 2022) and Qiu andYang (2021, 2023).These methods can well accommodate the complex structures of spatiotemporal processes and thus are reliable to use in practice.Some original computer codes of these methods were written in Fortran for efficient computation.For users' convenience, SpTe2M has packaged these Fortran codes in R: Thus, the package SpTe2M should provide a convenient and effective tool for modeling and monitoring spatiotemporal data.However, there are still some issues about the package that need to be addressed in the future.For example, the current package can only handle cases with univariate spatiotemporal data (i.e., the response variable yðt, sÞ is univariate).It is our belief that the related methods can be extended to cases with multiple spatiotemporal response variables.In addition, it is often our interest in practice to study the relationship between the response variable yðt, sÞ and some of its covariates, as discussed in Section 2. The semiparametric model [9] and the related R function spte semiparmregðÞ in the package are designed for such a purpose.But, they depend on the assumption that the relationship is linear.Of course, this assumption could be violated in certain applications.In such cases, the semiparametric model [9] should be generalized to the one with space-/time-varying regression coefficients or even to a nonparametric model.In addition, after the related control charts give their signals, a reliable post-signal diagnostic tool should be developed to figure out when and where the detected shifts occur in both the time and space domains.All these issues will be addressed in our future research, and the R package SpTe2M will be updated accordingly.1996-1998, and as an Assistant Professor (1998-2002), Associate Professor (2002-2007), and Full Professor (2007-2013)

Figure 2 .
Figure 2. Observed ILI incidence rates in four Florida counties during the year 2013 (small circles) and the corresponding estimated mean functions (solid lines).
� ARL0 : The prespecified IC average run length ARL 0 : The default value is 200.� gamma : The prespecified allowance constant c in the CUSUM chart [8].The default value is 0.1.� B : Bootstrap sample size.The default value is 1,000.� bs : The block size b of the block bootstrap procedure.The default value is 5.
spte semiparmregðÞ: The arguments y, st, ht, hs, and stE of this function have the same meaning as those of the function spte meanestðÞ discussed earlier.The other arguments of spte semiparmregðÞ are explained below: � x : A p-column matrix containing the observed data of the p covariates.� maxIter : The maximum number of iterations allowed in the iterative estimation procedure for estimating Model [9].Its default value is 1,000.

Figure 3 .
Figure 3.The CUSUM chart for monitoring the ILI data in the year 2014 (left panel), together with its zoom-in part during the time period from 09/15 to 10/31 in 2014 (right panel).In each plot, the horizontal line denotes the control limit of the chart.

Figure 4 .
Figure 4.The EWMAC chart for monitoring the ILI data in the year 2014 (left panel), together with its zoom-in part in the time period from 09/25/2014 to 10/31/2014 (right panel).In each plot, the horizontal line denotes the control limit of the chart.
ðw 1 , w 2 Þ are chosen by BIC defined in [A7] of the supplementary file.

Figure 5 .
Figure 5.The EWSL chart for monitoring the ILI data in the year 2014 (left panel), together with its zoom-in part in the time period during 09/15/2014-10/31/2014 (right panel).In each plot, the horizontal line denotes the control limit of the chart.

Figure 7 .
Figure 7.The two control charts CUSUM and EWSL for monitoring the observed PM 2.5 concentration levels in the year 2016.The dashed horizontal line in each plot denotes the control limit of the related control chart.

Kai Yang ,
Ph.D., is an Assistant Professor in Biostatistics at the Medical College of Wisconsin (MCW).He joined the Division of Biostatistics at MCW in the fall of 2021, after receiving a PhD degree in biostatistics from the University of Florida in the summer 2021.His major methodological research interests include spatial epidemiology, online monitoring of medical outcomes, and quality control and improvement.He is particularly interested in developing statistical methodologies and computational tools for nonparametric modeling and sequential monitoring of largescale spatio-temporal data.His collaborative research interests are focused on infectious diseases, cancers, and cardiovascular diseases.He has close collaborations with several research groups at MCW, including the Department of Orthopaedic Surgery, Department of Physical Medicine and Rehabilitation, and Department of Surgery.Peihua Qiu, Ph.D., is Dean's Professor and Founding Chair of the Department of Biostatistics at the University of Florida.He received his PhD in statistics from the Department of Statistics at the University of Wisconsin at Madison in 1996.He then worked as a senior research consulting statistician for the Biostatistics Center at the Ohio State University during of the School of Statistics at the University of Minnesota during 1998-2013.He joined the University of Florida to develop its new Department of Biostatistics in 2013.Qiu has made substantial contributions in the research areas of jump regression analysis, image processing, statistical process control, survival analysis, dynamic disease screening, and spatio-temporal disease surveillance.So far, he has published two research monographs and over 160 research papers in refereed journals in these areas.He is an elected fellow of the American Association for the Advancement of Science (AAAS), an elected fellow of the American Statistical Association (ASA), an elected fellow of the American Society for Quality (ASQ), an elected fellow of the Institute of Mathematical Statistics (IMS), and an elected member of the International Statistical Institute (ISI).He served as associate editor for a number of top statistical journals, and was the Editor-in-Chief of the flagship statistical journal Technometrics during 2014-2016.
h t and h s are two bandwidths, K t ð�Þ and K s ð�Þ are the Epanechnikov kernel function K e ðxÞ ¼ 0:75ð1 − x 2 ÞIðjxj � 1Þ (Epanechnikov 1969), and d E ðs ij , sÞ is the Euclidean distance between the two spatial locations s ij and s.The solution of [3] to h l is defined to be the estimate of lðt, sÞ, which has the expression lðt, sÞ where W ¼ diagfw l ð1, 1; t, sÞ, :::, w l ðn, m n ; t, sÞg, w l ði, j; t, sÞ ¼ K t ððt i − tÞ=b t ÞK s ðd E ðs ij , sÞ=b s Þ, and b t and b s are two

Table 1 .
Main functions in the SpTe2M package.