Constrained Bayesian optimization and spatio-temporal surveillance for sensor network design in the presence of measurement errors

ABSTRACT The optimal placement of sensors is studied to construct a surveillance sensor network for a complicated stochastic system with random measurement errors. The problem is formulated as a joint problem of constrained black-box optimization for the fast detection of an anomaly event and spatio-temporal change-point detection for a low false alarm rate. An algorithm is proposed called Confidence-Set based Constrained Bayesian Optimization (CSCBO) that models performance measures as Gaussian Processes (GPs) and provides a flexible and easy-to-implement framework for handling noisy black-box constraints. As the decision variables of this problem are high-dimensional binary variables, the Wasserstein similarity metric is introduced as a distance measure among different solutions to capture the similarity among solutions properly. Finally, a newly proposed detection statistic for spatio-temporal surveillance is combined with CSCBO to identify the optimal sensor placement while controlling the false alarm rate. The combined procedure is applied to the Altamaha River.


Introduction
Advances in sensor technology have enabled online monitoring of complicated systems. A sensor network consists of a group of sensors dispersed in space that collects multiple data streams in real time. Since there are usually random measurement errors, the design of a sensor network should not only involve the placement of sensors in space but also how the data streams collected by the sensors should be analysed. This article focuses on designing sensor networks for water quality monitoring in a river system. The sensor network generates online monitoring statistics and aims to detect a contaminant spill event quickly once it happens.
A sensor network measures the amount of a certain type of contaminant at each sensor location and raises an alarm when the measured contaminant level at a sensor location, or a processed quantity over the entire sensor network, is more than a certain threshold. A spill location of a certain contaminant is random, and how much is transported depends on complicated river hydrodynamics, and dilution in case of rain. Thus, a process simulator is needed to simulate the hydrodynamics, and random events such as spill events and rain events also need to be simulated. A spill will be detected early if a sensor is located close to the spill location, but it will be delayed if the contaminant has to travel for a while and is diluted by rain. It is even possible that the spill is not detected at all. Owing to randomness in the spill and rain events, along with the complex hydrodynamics and contaminant transport in a river, there are no analytical forms for the detection delay and the probability of detection, which can only be evaluated via stochastic simulation. In that sense, optimal sensor placement can be viewed as a stochastic black-box optimization problem.
A sensor network is often used for water quality monitoring (Adu-Manu et al. 2017;Jiang et al. 2020). For a water quality monitoring network, the optimal selection of sensor locations has been studied in much prior work. Strobl and Robillard (2008) provide a comprehensive review of past approaches. Telci et al. (2008Telci et al. ( , 2009) are among the first to consider uncertainties in spill and rain events. They formulate the problem as an optimization problem with two objectives: minimizing the detection delay and maximizing the probability of detection. A genetic algorithm is used to solve the optimization problem. Park et al. (2010Park et al. ( , 2014 formulate the problem as a constrained optimization problem to minimize the detection delay with a constraint on the probability of detection and adopt a combined procedure of the Nested Partition (NP) (Shi and Ólafsson 2000) and the penalty function with memory (Park and Kim 2015). The combined procedure is referred to as NP + PFM, which they prove to converge almost surely to the true optimal feasible solution (Park and Kim 2015). Although their algorithm is shown to work well, it require a good number of runs or replications; however, the environmental black-box simulation is expensive in the sense that one run or replication easily take two hours on a personal computer (Park et al. 2014).
Bayesian Optimization (BO) is a prevalent method for expensive black-box optimization problems (Mockus, Tiesis, and Zilinskas 2014;Brochu, Cora, and de Freitas 2010). A BO algorithm typically models a function by a Gaussian Process (GP) (Rasmussen and Williams 2006) and guides the search based on an acquisition function. An acquisition function quantifies how much improvement in the objective function is expected when a potential solution is sampled and a solution with the largest acquisition function is selected as the next evaluation point. See Brochu, Cora, and de Freitas (2010) for popular acquisition functions in BO. Snoek, Larochelle, and Adams (2012) provide an overview of how BO algorithms are applied to various problems. For the feasibility detection problem, He (2020) proposes a Bayesian algorithm. However, the proposed algorithm is to detect feasible regions rather than to find an optimal feasible solution. Some researchers extend BO algorithms to multi-task or multi-output optimization. For example, see Swersky, Snoek, and Adams (2013), Pearce and Branke (2018) and Chowdhury and Gopalan (2021). They formulate the optimization problem as multi-objective optimization rather than constrained optimization. BO algorithms in the presence of black-box constraints are proposed in Gardner et al. (2014). They propose the product of the expected improvement and the probability of feasibility for the constraints as an acquisition function. Although easy to implement, one issue of this method is that the acquisition function values for the feasible solutions close to the boundary between feasible and infeasible regions are penalized too much as the probabilities of feasibility for those solutions are around 0.5. In many applications, however, the best feasible solutions are those boundary solutions. Works with more complicated acquisition functions for constrained BO include Gramacy andLee (2011), Hernández-Lobato et al. (2016), Lam and Willcox (2017), and Letham et al. (2019). However, their acquisition functions are difficult to implement in practice because they do not have closed forms and require Monte Carlo simulation for estimation.
BO algorithms require a similarity metric among different solutions, which is critical for applications of BO algorithms in sensor network design. Typically, applications of BO algorithms have continuous decision variables with a relatively low dimensionality, in which case the Euclidean distance measure works well. However, in the problem of sensor network design, the decision variables are sets of sensor locations (i.e. vectors of categorical variables). The Euclidean distance measure does not properly capture the similarity among vectors of categorical variables. Garnett, Osborne, and Roberts (2010) propose the earth mover's distance, which is also known as the Wasserstein metric, as the similarity metric and apply it to an unconstrained BO algorithm to find the optimal locations for weather sensors in the United Kingdom. Here the Wasserstein distance measure is also employed but for a different setting.
Besides the selection of sensor locations, data analysis is another important aspect of sensor network design. In practice, sensors are usually subject to random measurement errors. Kim et al. (2017) point out that decision making assuming perfect sensors and ignoring sensor measurement errors is unreliable if randomness in data is not accounted for by a scientific approach. In water quality monitoring, detection of contamination using sensor networks can be formulated as a statistical change-point detection problem. For wireless sensor networks, Ciuonzo, Rossi, and Willett (2017),  and Wang, Li, and Varshney (2018) propose statistical tests for anomaly detection. Kim et al. (2017) adopt Statistical Process Control (SPC) methods including a Shewhart chart and a CUSUM chart in sensor network design when measurement errors exist. However, these approaches do not capture the spatial and temporal correlation in the sensor data. Recently, Chen, Kim, and Xie (2020) proposed a new score statistic called S 3 T, which detects the emergence of a signal that causes a shift in the monitored process from a noisy background. The S 3 T statistic captures both spatial and temporal correlation in the data and hence is particularly good at detecting weak signals, which is important in environmental monitoring.
This article's contribution is trifold: it (i) formulates the problem of sensor network design for water quality monitoring on river systems as a joint problem of constrained black-box optimization and online statistical change-point detection; (ii) proposes a new algorithm called Confidence-Set based Constrained Bayesian Optimization (CSCBO) that provides a flexible framework to handle noisy black-box constraints and is easy-to-implement; and (iii) adopts the Wasserstein similarity metric to enable CSCBO to solve problems with a high-dimensional binary search space and combines CSCBO with the S 3 T statistic to find sensor networks robust to sensor measurement errors; here, the temporal correlation of the errors is assumed to follow a first-order vector autoregressive VAR(1) model.
Note that there are some recent works in sensor network design for water quality monitoring. For example, de Winter et al. (2019) consider imperfect water quality sensors, Cardoso et al. (2021) formulate the problem as a multi-objective problem, and Bartos and Kerkez (2021) and Taha et al. (2021) develop new metrics for the problem of sensor network design in water quality monitoring systems. These works use different assumptions (imperfect sensors instead of sensor measurement errors) or different formulations (a multi-objective formulation instead of constrained optimization) and consider a different problem (new measures instead of finding the optimal sensor location).
The rest of the article is organized as follows: Section 2 provides backgrounds on the constrained black-box optimization and the BO approach. Section 3 presents the CSCBO algorithm. Section 4 combines CSCBO with the Wasserstein metric and S 3 T. Finally, Section 5 shows experimental results from the proposed method on the Altamaha River, followed by concluding remarks in Section 6.

Background
This section provides a general setting of a constrained black-box optimization problem, a brief overview of the BO method, and a problem formulation for the optimal selection of sensor locations when random measurement errors do not exist. Table 1 provides the notation used throughout the article.

Constrained black-box optimization
Suppose that a stochastic system of interest has J + 1 performance metrics. One is considered as the primary performance measure, and the rest are secondary performance measures used as guardrail performance measures or nonlinear constraints. The goal is to find the optimal parameters of the system that lead to the best primary performance measure subject to non-negative constraints on each guardrail performance measure. In the mathematical notation, the following constrained the primary performance function f j the jth guardrail performance function, j = 1, 2, . . . , J y j (x) a simulation estimate of the jth performance measure of x λ j (x) the sampling variance of the jth performance measure of x Bayesian optimization μ 0 (x) a prior mean function of a prior kernel function x (n) the nth evaluation solution a posterior kernel function given x (1:n) f * 0 the true best function value among solutions evaluated so far μ pdf of a standard normal distribution Sensor network D the number of nodes in a network M the number of sensors c it a quantity measured by the ith sensor at time t c t (c 1t , c 2t , . . . , c Mt ) W t detection statistic constructed based on the measurements up to t b detection threshold t 0 the starting time of an abnormal event t a (x) the timestamp when the sensor network at x raises an alarm an indicator function for detection q threshold value for a constraint on the reliability E[R(x)] Results NUM the number of SWMM runs required until each algorithm stops ECEDD conditional sample mean of t a (x) given R(x) = 1 over 1000 SWMM runs ER sample mean of R(x) over 1000 SWMM runs optimization problem is considered: where x is a d-dimensional vector of input variables (or simply a solution), X ⊂ R d is a bounded search space, f 0 : X → R is the primary performance function and f j : X → R (j ≥ 1) denotes the jth guardrail performance function. Note that X is a discrete set and it contains a finite number of points. For a bounded continuous space, the search space is discretized. The functions f j , j = 0, . . . , J, are considered to be unknown black-box functions that are expensive to evaluate and subject to noise. In other words, for any point x ∈ X, f j (x) cannot be evaluated analytically. Usually, f j (x), j = 0, . . . , J, are expectations of random estimates from simulation data for each performance measure, given a parameter x, Here, y j (x) is a simulation estimate of the jth performance measure, which can be either a single simulation observation or a sample average of multiple observations. It is assumed that where λ j (·) is referred to as the sampling variance of the jth performance measure. In practice, λ j (·) can be unknown and needs to be estimated if unknown. From one run of simulation with input parameter x, it is possible to evaluate all performance measures jointly and obtain a vector of observations [y 0 (x), y 1 (x), . . . , y J (x)].

Bayesian optimization
BO was originally proposed to solve an unconstrained version of problem (1), i.e. J = 0 (Brochu, Cora, and de Freitas 2010;Frazier 2018). A BO algorithm is a sequential procedure and typically consists of two components: a surrogate model characterizing values of the function over x and an acquisition function guiding which solution to simulate next. In practice, GP is widely adopted to model the black-box objective function owing to its flexibility and tractability (Brochu, Cora, and de Freitas 2010;Frazier 2018). In the initial stage of the optimization, a GP prior is imposed, which is specified by a mean function μ 0 (x) : X → R and a kernel function K 0 (x, x ) : X × X → R as follows: Then, the posterior distribution on f 0 can be evaluated by updating the prior with the observations using the Bayes rule. Owing to conjugacy, the posterior is still a GP with updated mean and kernel functions, and For the derivation of Equations (2) and (3), see Rasmussen and Williams (2006). Typically, a BO algorithm determines the next evaluation point x (n+1) by maximizing an acquisition function that relies on the posterior distribution of f 0 , One of the most commonly used acquisition functions is the Expected Improvement (EI) (Jones, Schonlau, and Welch 1998), which is given by where (·) and φ(·) are the Cumulative Distribution Function (CDF) and probability density function (pdf) of a standard normal distribution, respectively; 0 (x, x); f 0 is the true best function value among solutions evaluated so far x (1:n) ; and the expectation is taken over the posterior distribution of f 0 . Note that the original EI proposed in Jones, Schonlau, and Welch (1998) assumes that λ(x) = 0 for all x ∈ X, i.e. the function to be optimized is not subject to any evaluation noise. When noise exists, the evaluation of (4) is difficult as f 0 is unknown and it can only be estimated. Gramacy and Lee (2011) propose modifying EI by replacing f 0 in (4) with the GP posterior mean estimate of the best function value μ Please note that, for the rest of the article, the modified version defined in (5) is adopted as the EI. Another example of an acquisition function is the Upper Confidence Bound (UCB) (Srinivas et al. 2009). The UCB returns the next evaluation point with the highest upper confidence interval based on the posterior distribution of f 0 : While EI and UCB are powerful methods and are successful in many applications, they only tackle unconstrained problems. Section 3 proposes an efficient and practical algorithm that is capable of handling optimization problems with noisy black-box constraints.

Sensor placement problem formulation with perfect sensors
Consider a sensor network for monitoring with the goal of quickly and accurately detecting any anomaly in the quality of a measure. For example, a sensor network for monitoring river systems aims to detect any (upward) shift in certain chemical concentration levels in the water as soon as possible. In this setting, the problem of optimal sensor placement can be formulated as a constrained blackbox optimization problem under the assumption that sensors are perfect, i.e. that sensor measurement errors do not exist.
Assume that there are D nodes in the network, indexed from one to D. Each node is a potential location to place a sensor. Denote the number of sensors as M and assume M D. The decision variable x is a D-dimensional binary vector with x i = 1 if a sensor is placed on the ith node and zero otherwise. Thus, the search space X = {x ∈ BV D : |x| 1 = M}, where BV D denotes a D-dimensional binary vector space and | · | 1 denotes the 1 norm of a vector.
The sensors collect data sequentially and in real-time. At time t, the M sensors obtain a vector of measurements c t = (c 1t , . . . , c Mt ), where c it denotes a quantity measured by the ith sensor at time t. Under the perfect sensor assumption, sensors always report exact measurements. A detection statistic W t is constructed based on the measurements collected up to t. A simple example of a detection statistic is A sensor network raises an alarm at time t if W t exceeds a pre-specified threshold b. Since sensors are assumed to be perfect, no alarm raised by the network is a false alarm. Denote t 0 as the starting time of an abnormal event that causes a shift in the monitored quality measure and t a (x) as the timestamp when the sensor network at x raises an alarm (i.e. t a (x) = inf{t : W t ≥ b}). Then the detection delay T(x) is defined as T(x) = t a (x) − t 0 , which is the amount of time elapsed between the start and the detection times of an anomaly event. A sensor network may fail to detect the anomaly event. In such a case, the sensor network never raises an alarm and T(x) = ∞. An indicator function R(x) is defined as R(x) = 0, if the sensor network fails to detect an anomaly event (i.e. T(x) = ∞); 1, otherwise.
Note that both T(x) and R(x) are random variables observed via stochastic simulations.
In the case of river water quality monitoring, rain events and spill events (such as spill location, start time, intensity and duration) cause uncertainty in T(x) and R(x) and a black-box simulation is needed to obtain T(x) and R(x) (thus, there is randomness). An anomaly event in river water quality monitoring corresponds to a contaminant spill event. In this setting, the problem of optimal sensor placement can be formulated as in Park et al. (2014): where 0 ≤ q ≤ 1. The notation E[T(x) | R(x) = 1] is referred to as the conditional expected detection delay and E[R(x)] as the reliability, both of which can be estimated only by stochastic black-box simulations.

Confidence-set based constrained Bayesian optimization
This section presents a new algorithm for problem (1), namely Confidence-Set based Constrained Bayesian Optimization (CSCBO), for solving (1).
CSCBO imposes independent GP priors over each function f j at the initialization step, Recall that μ j (·) and K j (·, ·) are the prior mean and prior kernel functions for f j . In the nth iteration, suppose the algorithm chooses to evaluate the functions at x (n) and obtains a vector of observations [y where μ (n) j (·) and K (n) j (·, ·) denote the posterior mean and kernel functions for f j after n evaluations and they can be calculated using (2) and (3).

Confidence set
To handle stochastic constraints, a confidence set (denoted as CS 1 ) is constructed using the posterior distributions, which is a subset of the search space X and defined as follows.
• Define a reward function for each constraint j = 1, . . . , J as follows: • Calculate the expected reward function with respect to the posterior distribution of each function, where 1{·} denotes an indicator function. • Confidence set CS 1 is constructed based on the expected rewards for all constraints, where h 1 ∈ [0, 1] is a pre-specified constant.
Confidence set CS 1 contains points whose probabilities of satisfying each constraint are at least h 1 for all constraints. In each iteration, the algorithm selects the next evaluation point using the EI criterion defined in (5) but only from the set CS 1 , which gives the following modified selection rule: For reporting an optimal feasible solution found so far, another confidence set CS 2 is constructed with a different parameter h 2 . The point with the largest posterior mean in CS 2 is reported as the best feasible solution. A formal description of CSCBO is given in Algorithm 1.
The stopping criteria for CSCBO are up to a decision maker. One may choose to terminate the algorithm after a pre-specified maximum number of iterations. Alternatively, one can terminate the algorithm after the same solution is reported as the current optimal feasible solution in k consecutive iterations.
To achieve a good performance of CSCBO, the choices of h 1 and h 2 are important. The purpose and meaning of h 1 and h 2 are discussed and guidelines are provided for the choice of the parameters in the next subsection.

Choices for h 1 and h 2
A trade-off that one faces when dealing with a constrained black-box optimization problem is whether to (i) search for a promising solution in terms of the primary performance measure first and then check the feasibility or (ii) attempt to identify the boundary between feasible and infeasible regions first and then search for promising solutions. While eventually an optimal feasible solution is desired, the two manners may lead to different search schemes. CSCBO is designed based on the idea that improving the primary performance metric should be of higher priority. Specifically, the feasibility of a solution matters only when the solution seems to have a large value of the primary performance measure. Thus, in CSCBO, the acquisition function itself is based purely on the primary performance function f 0 , and the constraints only affect the selection of the next visited solution via the confidence set CS 1 . Confidence set CS 1 restricts a region for potential next visited solutions and is supposed to eliminate solutions from consideration only when there is strong evidence that they are infeasible. Thus, h 1 should be a relatively small value.
Confidence set CS 1 is initialized to include all points in X, and hence the restriction on the search space is small in the early iterations of the optimization. In practice, a computation budget is limited, and the optimization algorithm is subject to termination with a finite number of iterations. The confidence set CS 2 is constructed to report an optimal feasible solution after a finite number of iterations. The current best feasible solution identified in the nth iteration, which is denoted as x (n) , is a point with the highest posterior mean in CS 2 . To avoid reporting an infeasible solution as the current best feasible solution, it is recommended to set h 2 > h 1 so that a reported solution x (n) has strong evidence of being feasible. The impact of h 1 and h 2 on the performance of CSCBO is demonstrated using an example. For the results, see Section S1 of the online supplement, which can be accessed at https://doi.org/10.1080/0305215X.2021.2014475. Based on the roles of h 1 and h 2 in CSCBO and the small experimental study in Section S1, it is recommended to use h 1 = 0.025 or 0.05 and h 2 = 0.5.

Combining CSCBO and S 3 T for sensor network design
This section extends problem (7) to the case where random measurement errors exist. CSCBO and a spatio-temporal change-point detection chart, S 3 T, are combined to solve the problem.

Sensor measurement errors and statistical surveillance
In practice, sensors are usually subject to measurement errors, which means measurements collected by sensors fluctuate around the true values by small random amounts. It is assumed that, when there is no contaminant event, the measurements collected by the M sensors at each timestamp are a series of i.i.d. multivariate normal random variables, where 0 represent an M-dimensional zero vector and is the spatial covariance matrix of the measurements. It is often the case to have enough reference data in the in-control case (no anomaly event) to estimate the mean and temporal correlation to whiten the data. Thus, the assumption that c t has zero mean and no temporal correlation when there is no anomaly event is reasonable (since the mean can be pre-estimated and subtracted). It is also assumed that all sensors in the network are of the same type and that is known or has been estimated. When an anomaly event occurs, the mean of c t is shifted and is no longer a zero vector. In addition, the anomaly usually causes spatial and temporal correlation in the observations (for example, due to the spread of the contaminant along the river in case of water quality monitoring).
In the presence of sensor measurement errors, a sensor network not only needs to detect an anomaly with a low detection delay and a high reliability but also to ensure a low false alarm rate. A false alarm is an alarm raised when there is no anomaly event. The in-control Average Run Length (ARL) is used as the metric to quantify the frequency of a false alarm, which is the average number of time steps until an alarm is raised (from the beginning) where there is no anomaly. For a specific x, ARL 0 (x) is used to denote the in-control ARL of the sensor network at x. Then, the problem formulation (7) with perfect sensors is modified to the following problem in the presence of measurement errors: The detection delay T(x) here depends on not only the sensor placement x but also the detection statistic W t . This article uses the score statistic for spatio-temporal surveillance procedure (S 3 T) proposed in Chen, Kim, and Xie (2020) as the detection statistic. The S 3 T statistic captures both spatial and temporal correlation information and is effective in detecting a weak anomaly event. For brevity of this article, the description of S 3 T is given in Section S2 of the online supplement. Also, the section shows how to calibrate the threshold b to achieve a desired ARL 0 target value to control the false alarm rate.

Wasserstein similarity metric
BO algorithms exploit the 'smoothness' of a function, i.e. similar decision variables have similar function values. The similarities among different points in the search space are captured by a kernel function of a GP. An example of the commonly used kernel functions is the square-exponential kernel, where σ > 0 is the kernel bandwidth and d(·, ·) ≥ 0 is a similarity metric. A smaller d(x, x ) indicates higher similarity between x and x in performance. If the decision variables are continuous variables or ordinal discrete variables, a natural choice for d is the Euclidean distance, where > 0 is the length scale parameter. However, the decision variable x is a D-dimensional binary vector in problems (7) and (9), thus the Euclidean distance is inappropriate as the similarity metric. Section S3 of the online supplement demonstrates that the Euclidean distance does not properly capture the difference between sensor locations using an example in Figure S3. The Wasserstein metric is proposed as the similarity metric in the problem in this article. It was originally proposed as a measure of discrepancy between two distributions (Villani 2008) and is widely applied to image processing (Rubner, Tomasi, and Guibas 2000). The metric is adopted for the setting of the sensor placement problem. For a specific sensor placement, the D-dimensional binary vector x is first converted into a series of weights {w i > 0} D i=1 by assigning w i = 1/M if a sensor is placed at the ith node and w i = 0 otherwise. Note that D i=1 w i = 1. For two different sensor placements x and x and the corresponding weights , the Wasserstein distance, denoted as d w (x, x ), is intuitively the minimum amount of work required to move all sensors from placement x to x . A linear program to formally calculate d w (x, x ) is given in Section S3 of the online supplement.

Experiments
This section solves problems (7) and (9) for designing an optimal sensor network for water quality monitoring on the Altamaha River.

River process simulation
The Altamaha River is located in Georgia, United States, and is the largest watershed in the state. Figure 1(a) shows the shape of the Altamaha River network with 100 nodes. Each node represents a potential spill or sensor location. In order to evaluate E[T(x) | R(x) = 1] and E[R(x)] for different sensor placements on a river sensor network, a simulation model capable of simulating hydrodynamics and contaminant transport is needed. The Storm Water Management Model (SWMM,

Rossman 2010) developed by the United States Environmental Protection
Agency is widely used in environmental engineering for water-related studies. In SWMM, the Manning's equation is solved to calculate the depth of flow in all conduits, and the momentum equations with the conservation of mass are used to calculate the flow within a conduit link. By assuming that each sub-reach of the conduit behaves as a continuously stirred tank reactor, contaminant transport within a conduit link is simulated in a model. Additionally, SWMM requires geologic, geometric and fundamental hydrodynamics data to construct a river network. The Altamaha River system is constructed in the SWMM model based on the United States Geological Survey (USGS) digital elevation data in the National Elevation Dataset (Telci et al. 2009). For more details of the hydrodynamics adopted in the article, see Telci and Aral (2011).
For each spill, it is assumed that the spill location is uniformly randomly selected at a mesh discretization point in a specific segment of the river network, the starting time is uniformly distributed between 0 and 240 hours, and the intensity is uniformly distributed between 10 and 1000 grams/liter. Random rain events are generated by SWMM based on the same data used in Telci and Aral (2011). The Altamaha River watershed is divided into ten sub-catchments, as shown in Figure 1(b). The rainfall measurements were obtained from different USGS stations close to the ten sub-catchments in 2006. Based on a statistical analysis of these measurements, over nine million rain patterns are generated for the entire watershed. Each rain pattern describes time-dependent rainfall events and keeps changing hydrologic conditions in each-catchment during the simulation.
Given the rainfall information, as well as the location, starting time and intensity of a contaminant spill, SWMM simulates the contaminant transport process through the river over a period. In an SWMM trial, simulating hydrodynamics under rain events takes a long time to simulate, while contaminant transport under given hydrodynamic conditions can be done relatively fast. Thus, Park et al. (2014) perform 100 different spill events under each given set of random rain events for the ten sub-catchments. Each SWMM trial run until the simulation clock reach 40 days and report concentration levels at each potential sensor location every 15 minutes of the simulation clock for each spill event. In each iteration of an optimization algorithm, a new SWMM trial is implemented and then 100 observations of t a (x) and R(x) for any solution x are obtained from the SWMM trial. Sample averages of the 100 observations are used as y j (x) in our experiment. Section 4.1 of Park et al. (2014) provides more details of parameters for the river process simulation.
In the experiment, the maximum number of SWMM trials available is 1000 and the goal is to identify the optimal sensor network design using fewer SWMM trials to be computationally efficient. In each SWMM trial, 100 contaminant events are generated and each event is a single instantaneous spill at a different location. Note that the spills generated by the same SWMM trial share the same rain pattern, but have different intensity and starting time. For a specific sensor placement x, each SWMM trial returns estimates of the conditional expected detection delay and reliability based on the 100 contaminant events.
Once each algorithm stops satisfying its stopping criteria, three performance measures are reported: (i) the Estimated Conditional Expected Detection Delay (ECEDD) of the sample best feasible solution found by the algorithm; (ii) the Estimated Reliability (ER) of the sample best feasible solution found by the algorithm; and (iii) the number of SWMM trials required until termination of the algorithm (NUM). ER and ECEDD of the sample best feasible solution found by each algorithm are calculated using 1000 SWMM trials. With this number of SWMM trials, ER and ECEDD are statistically valid up to the 0.001th digit and the 0.01th digit, respectively.
To demonstrate the efficiency of CSCBO compared with a competing optimization algorithm, namely NP+PFM adopted in Park et al. (2014), perfect sensors are first considered. Section S4.1 of the online supplement gives the values of implementation parameters of NP+PFM. Although sensors are perfect with no measurement errors, there is still randomness due to stochastic rain and spill events. For brevity of this article, the detailed experiment setup and results for this case are reported in Section S4 of the online supplement. The experiment results show that CSCBO achieves significant savings in computation time (up to 65% savings on NUM) while showing similar performance.
Sensors with measurement errors are considered in the next subsection.

Sensors with measurement errors
This section considers sensors with measurement errors and applies the combined procedure of CSCBO and the S 3 T statistic to the corresponding problem (9). When measurements are noisy, the value of threshold b should be chosen carefully to control false alarm rates.

Experiment setup
The 'tail-up' model proposed in Ver Hoef and Peterson (2010) is adopted for the spatial correlation of the concentration measurements collected by the sensors in a river system. Section S5 of the online supplement provides a description of the tail-up model including required notation and parameters. The marginal variance of the measurement errors is set to ζ 1 ∈ {0.005, 0.01}. Note that, in practice, ζ 1 should be related to the specification of a sensor. In the experiment, ζ 1 = 0.01 is considered as a high variance for the measurement errors as it is sufficiently large to cause a massive amount of false alarms if the values of threshold b adopted in Section S4.1 of the online supplement are used. ARL target is set to 10,000, which is approximately equivalent to 100 days as the inter-reporting time of concentration levels is set to 15 minutes in the SWMM model.
Finally, for the S 3 T statistic, the window length, ω = 20 is used. For CSCBO, the same parameter settings are used as in Section S4.1 of the online supplement. The minimum requirement for reliability is set to q = 0.9.

Comparison on detection delay
To demonstrate the advantage of the S 3 T statistic, S 3 T is compared with two other detection statistics in terms of conditional expected detection delay: the Shewhart statistic and the max-CUSUM statistic. For the sake of saving space here, the descriptions of the two competing statistics and the experiment results are given in Section S6 of the online supplement.
As shown in Table S2 of the online supplement, S 3 T outperforms the other two statistics in all experiments and achieves seven to twelve hours shorter conditional expected detection delay compared with the Shewhart statistic and one to three hours compared to the max-CUSUM statistic. The advantage is more obvious when the variance of measurement errors is high (ζ 1 = 0.01). This effectiveness of S 3 T is due to its capacity to capture both spatial and temporal correlation in data.

Optimal solution in the presence of measurement errors
Now the performance of the combined procedure of CSCBO and S 3 T is presented on problem (9). In each iteration of the optimization, a new sensor placement x is evaluated and the S 3 T statistic is monitored from collected concentration levels from the sensor network. The value of threshold b for the current tested sensor placement x is adjusted so that its actual ARL 0 is approximately equal to the target of 10,000. Figure 2 presents the optimal feasible solutions found by the combined procedure for M = 5 and M = 7. As the locations of the sensors tend to be at the roots of subtrees with multiple branches in order to achieve high detection reliability, the number of potential sensor locations is reduced to 50 for M = 5 sensors and 30 for M = 7 sensors. Table 2 reports the performance metrics of these solutions based on 1000 SWMM trials. From Table 2, it is observed that all solutions have estimated reliability higher than q = 0.9, and hence the reported solutions seem feasible. In addition, it can be observed from Figure 2 that the optimal sensor placements found by the combined procedure do not differ much for different levels of the marginal variance ζ 1 of measurement errors. This means that the optimal sensor network design identified by the combined procedure is robust to different levels of the variance of sensor measurement errors.

Conclusion
In this article, the problem of optimal sensor network design in the presence of random measurement errors is studied for water quality monitoring in river systems. The spatial and temporal correlation in the sensor measurement is taken into consideration. The problem is formulated as a joint problem of constrained black-box optimization and spatio-temporal change-point detection. The proposed algorithm CSCBO combined with a Wasserstein similarity metric and the S 3 T statistic quickly identifies sensor network designs shown to have low conditional detection delay while satisfying pre-specified levels of detection reliability and false-alarm rate. The identified sensor networks are also robust to different levels of measurement error variance. The CSCBO algorithm imposes independent GP priors on each performance measure and does not capture potential correlations among them. The extension of CSCBO to GP priors that capture correlated performance measures is ongoing.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The work reported herein was supported by the National Science Foundation [Grant CMMI-1538746].

Data availability statement
The data that support the findings of this study are available in the Georgia River Network, the Summit to the Sea, and the Altamaha River Basin Management Plan, GA EPD, at https://garivers.org/altamaha-river/, http://coastgis.marsci.uga.edu/summit/altamaha.htm and https://epd.georgia.gov, reference number Altamaha River Basin Management Plan 2003, Georgia Department of Natural Resources Environmental Protection Division. These data were derived from the resources available in the public domain as listed above.