Multivariate versus Univariate Sensor Selection for Spatial Field Estimation

—The paper discusses the sensor selection problem in estimating spatial ﬁelds. It is demonstrated that selecting a subset of sensors depends on modelling spatial processes. It is ﬁrst proposed to exploit Gaussian process (GP) to model a univariate spatial ﬁeld and multivariate GP (MGP) to jointly represent multivariate spatial phenomena. A Mat´ern cross-covariance function is employed in the MGP model to guarantee its cross-covariance matrices to be positive semi-deﬁnite. We then consider two corresponding univariate and multivariate sensor selection problems in effectively monitoring multiple spatial random ﬁelds. The sensor selection approaches were implemented in the real-world experiments and their performances were compared. Difference of results obtained by the univariate and multivariate sensor selection techniques is insigniﬁcant; that is, either of the methods can be efﬁciently used in practice.


I. INTRODUCTION
Sensor selection for estimating spatial fields, which we call spatial sensor selection, is a fundamental but critical problem in various monitoring applications [1]. In practice, spatial sensors, monitoring spatial phenomena, can be excessively deployed in a sensing task partly due to their low cost, which may lead to an over-sampling issue in a system. The over-sampling problem is defined as multiple sensors deployed maximally recording quite similar data causing redundancy in sensor measurements. The redundancy is operationally expensive since it requires more memory, computation, communication, maintenance and energy resources while redundant data does not contribute any additional information to a sensing system. Therefore, selecting the most informative sensors out of all possible ones is an important problem to address.
The critical crux in the spatial sensor selection is that selected sensors are able to not only effectively observe spatial phenomena but also efficiently predict the fields at any unmeasured locations of interest [2]. In other words, selected sensors are equipped by a spatial process model to represent spatial fields and to predict the fields at unobserved locations. The argument leads to the spatial sensor selection problem that selects the most informative sensors through minimizing prediction uncertainties at predicted locations. Thus, spatial modelling is important in the spatial sensor selection formulation.
Single or univariate spatial process, e.g. indoor temperature, can be mathematically modelled through a statistical model such as Gaussian process (GP) [3]. And, the sensor selection problem in monitoring a univariate spatial field, which is also called the univariate sensor selection, has been well studied [4]. In cases there exist multiple or multivariate spatial processes in the same environment, e.g. indoor temperature and humidity, they may have cross-correlation [5]. If those spatial phenomena are simultaneously observed, presence of their cross-correlation may influence on results of the spatial sensor selection for monitoring multivariate spatial fields, which here we define as the multivariate sensor selection. Since, to the best of our knowledge, the multivariate sensor selection has not been well studied yet, does it outperform the univariate sensor selection? In this paper, we discuss comparison of spatial prediction results correspondingly obtained by both the approaches.
We first exploit GP to model a univariate spatial field. To represent multivariate spatial phenomena we employ multivariate Gaussian process (MGP) [6] with a Matérn crosscovariance function that was proved to produce valid crosscorrelation between multiple spatial processes [7]. The spatial sensor selection approaches were implemented in the real-life experiments where the obtained results demonstrate their effectiveness. The remaining of the paper is arranged as follows. We introduce the univariate and multivariate models in Section II before discussing the spatial sensor selection problems in Section III. Section IV summarizes our experimental results with discussions before conclusions are drawn in Section V.

II. UNIVARIATE AND MULTIVARIATE MODELS
In order to formulate the multivariate and univariate sensor selection problems for finding the most informative sensor nodes in efficiently estimating spatial fields, this section introduces the corresponding multivariate and univariate models that can be employed to represent spatial processes given a limited number of sensor measurements and then predict them at unmeasured locations of interest.

A. Univariate Spatial Prediction
In univariate scenarios, we only consider a single spatial random phenomenon, e.g. temperature [8]. It is assumed that n sensors positioned at locations s = (s T 1 , ..., s T n ) T ∈ R n×d , where d is dimension of the environment, are used to measure that spatial process at their locations. A measurement y i collected by an i th sensor can be mathematically modelled by where x(s i ) is a vector of the spatially referenced covariates while β is a vector of the trend parameters. The x(s i )β presents trend of the spatial phenomenon at locations s i . The spatial trend can be constant, first or second orders and chosen by users. For instance, if we select the second order trend, then x(s i ) and β are a 1×6 and a 6×1 vectors, respectively. w(s i ) is a latent random function that is proposed to be represented by a zero-centred Gaussian process (GP) while ε(s i ) is a normally distributed independent and identically distributed noise with a zero mean and an unknown variance τ 2 . Let y = (y 1 , ..., y n ) T , w(s) = (w(s 1 ), ..., w(s n )) T and x(s) = (s(s 1 ) T , ..., x(s T n )) T . If it is assumed that there are p unobserved locations of interest in the environment z = (z T 1 , ..., z T p ) T ∈ R p×d , then their corresponding variables can be predicted through a posterior normal distribution where its posterior mean and covariance matrix can be computed by, where x(z) is a covariate matrix corresponding to z while I is an n × n identity matrix. Both Σ and Σ zz are covariance matrices of random variables at locations s and z, respectively. Σ zs is an p × n cross-covariance matrix between random variables of a single spatial process presented at z and s. It is noted that the covariance and cross-covariance matrices Σ, Σ zz and Σ zs can be calculated by the use of a covariance function such as the squared exponential function [2], where its hyperparameters can be estimated by the maximum likelihood method [9] given collected sensor measurements.

B. Multivariate Spatial Prediction
In some practical scenarios, several spatial phenomena are monitored simultaneously [5]. Though each spatially distributed process can be modelled individually, crosscorrelation between them is ubiquitous [10]. In those cases, we expect to model multiple spatial fields jointly.
It is assumed that we have m different types of sensors to monitor m spatial phenomena that the different types of sensors are co-located. n sensors for each type are positioned at locations s, and all the collected measurements are mathematically modelled by where The measurement noises are now modelled by a multivariate The multivariate crosscovariance matrix Σ m is given by where C ii is a univariate covariance matrix computed from only a single i th spatial field. C ij is a cross-covariance matrix calculated between any two i th and j th spatial phenomena, where 1 ≤ i = j ≤ m. Computing the univariate covariance matrix C ii is widely known [2]; however, working out the cross-covariance matrix C ij is not completely straightforward due to its ill-famed requirement of positive semi-definite. To this end, Gneiting et al. in [7] proposed a colocated correlation coefficient ρ ij to account for cross-correlation between spatial process components. That is, the cross-covariance matrix can be calculated as follows, where COV(h | θ) is a covariance function, θ is a vector of its hyperparameters and h ∈ R d is a separation vector between any two locations s. The authors of [7] proposed to use the Matérn covariance function and proved a valid ρ 12 for two spatial processes so that a cross-covariance matrix C ij is guaranteed to be positive semi-definite, which demonstrates in the following theorem. Theorem 1: [7] If ν 12 = 1 2 (ν 1 +ν 2 ) and κ 12 ≥ max(κ 1 , κ 2 ), the colocated correlation coefficient ρ 12 is computed by where ν i and κ i are the Matérn smoothness and spatial scale parameters of a component process, respectively, while Γ(·) denotes a gamma function. Similar to the univariate modelling presented in Section II-A, multiple spatial phenomena at unmeasured locations z can also be predicted by the use of (2) and (3) with some updates. Both y and x(s) are replaced by Y and X(s), respectively. Σ is replaced by Σ m while τ 2 I is altered by Ψ.
Both a covariance matrix Σ zz ∈ R mp×mp and a crosscovariance matrix Σ zs ∈ R mp×mn can be computed as Σ m . It is noted that these covariance and cross-covariance matrices are accounted for multiple spatial fields.

C. Comparison of Multivariate and Univariate Modelling
In order to demonstrate how well the univariate and multivariate modelling can model and then predict spatial processes, we conducted the experiments with the temperature and humidity sensors to monitor the indoor climate. There were 20 temperature and 20 humidity sensors utilized in the experiments. The experiment set-up and the data collection will be discussed in detail in Section IV. In this section, we present difference of two predictions, for each spatial phenomenon of temperature and humidity, in the whole experimented environment, obtained by the univariate and multivariate models. In other words, given the 20 temperature measurements and the 20 humidity measurements, we built two univariate models as presented in Section II-A separately. Each built model was then employed to predict its corresponding spatial process at the 2500 unmeasured locations on a grid uniformly distributed in the space. The obtained results are visualized in Figures  1b for the temperature and 1e for the humidity, respectively. Moreover, we also used the 40 temperature and humidity measurements to jointly build a multivariate model as presented in Section II-B, which was then exploited to concomitantly estimate the temperature and humidity at the similar 2500 unobserved locations. The predicted results are depicted in Figures 1a and 1d correspondingly. Qualitatively, a pair of Figures 1a and 1b (likewise 1d and 1e) are highly comparable. More importantly, to quantify difference in each pair, we computed difference of the predictions at each of the 25000 locations and summarized all the differences in histograms as shown in Figures 1c and 1f, for the temperature and humidity, respectively. Interestingly, there is no difference between the predictions obtained by the multivariate and univariate models in the temperature scenario. However, the difference occurs in the humidity example though the different values are very small, less than 1%, quite uniformly ranging from -0.8% to 0.7%.

III. SPATIAL SENSOR SELECTION
Spatial sensor selection is a fundamental problem in various applications [1]. The critical crux of a sensor selection problem is to select the most informative subset of sensor nodes out of all possible ones, which can be utilized to efficiently predict spatial fields at any unmeasured positions of interest. In equivalent words, the sensor selection problem requires an incorporated spatial modelling representation such as the univariate or multivariate models introduced in Section II. It is noticed that in multivariate scenarios, different types of sensors are assumed to be colocated. For instance, to monitor temperature and humidity spatial processes, two temperature and humidity sensors are embedded on a single device and deployed at one location.

A. Problem and Solution
In order to effectively predict spatial phenomena observed by selected sensors as expectation of the spatial sensor selection problem, uncertainty at prediction results must be minimized. It is required to minimize predicted variances at unmeasured locations of interest given measurements gathered by selected sensors. If the multivariate and univariate models are employed, a predicted covariance matrix can be calculated by (3) for a single field or a similar calculation for a multivariate as discussed in Section II-B. Then all predicted variances of predicted locations z lie along a diagonal of the covariance matrix. Therefore, if P denotes a set of all possible sensors and C denotes a selected subset of sensors, then a spatial sensor selection problem can be formulated as an optimization problem as follows, where tr(Σ z|s C ) is trace of a covariance matrix Σ z|s C , and s C is locations of the selected sensors in the subset C. Mathematically, it is quite straightforward to prove that (6) is NP-hard [2]. Although solving (6) in polynomial time is intractable, the spatial sensor selection optimization problem can be efficiently and near-optimally addressed by an approximate algorithm, e.g. greedy. Interested readers are referred to [2] for more detail about the algorithm.

B. Selection Comparison
To show how the spatial sensor selection works and compare influence of the multivariate and univariate spatial modelling on sensor selection results, we continue the experiments we briefly mentioned in Section II-C. It is supposed that we expect to select 1 to 10 most informative sensors out of 20 potential ones in both the temperature and humidity cases. Given the measurements, we first built two univariate models and one multivariate model. We then run the greedy algorithm [2] to solve the optimization problem (6) three times, given the univariate temperature model, the univariate humidity model and the multivariate model, respectively. The values of the objective function in (6) corresponding to the selected subset of the sensors were summarized and are demonstrated in Fig.  2. More specifically, the red solid curve presents the objective function values when the algorithm selected 1 to 10 most informative sensor nodes given the multivariate model. Likewise, the blue curve is a sum of all the predicted variances in both the temperature and humidity fields, obtained by two separate algorithm runs for two univariate corresponding models. It can be clearly seen that difference between the optimization values under the multivariate and univariate modelling is trivial, slightly increased when a number of the selected sensors increase.
Moreover, locations of the 10 selected sensor nodes are illustrated in Fig. 3, where the blue circles are locations jointly selected by the multivariate sensor selection method for both the temperature and humidity sensors. In considerations of the univariate sensor selection, the 10 most informative temperature sensors are located at the red stars in Fig. 3a while those for measuring the humidity are positioned at the red triangles in Fig. 3b. In both the scenarios, the selections obtained by the multivariate and univariate algorithms are concurrent up to 9 out of 10 sensor nodes.

IV. EXPERIMENTAL RESULTS
The section demonstrates in more detail how effectively a subset of sensor nodes selected by the multivariate and univariate sensor selection techniques can monitor spatial fields. We implemented the methods in the experiments using two networks of two different types of temperature and humidity sensors to observe the indoor climate at the Nanyang Technological University campus, Singapore.

A. Experimental Data Collection
The experiments were conducted at the room S2. of April, 2016. In the experiments, 20 Libelium temperature and 20 Libelium humidity sensors were utilized to observe the indoor temperature and relative humidity. Due to the Libelium sensor board design, a pair of the temperature and humidity sensors are integrated in the same device. That is, when we deployed the sensors, that pair was colocated. We formed the 40 sensors into two separate networks monitoring two separate spatial random fields. In each network, the sensors were wirelessly communicated and deployed at 20 predefined locations in the room. The collected measurements at each sensor were transmitted to a base station through a wireless router [11]. As the collected data can be used to evaluate human comfort in a building, we deliberately deployed all the sensors on the same plane at a sitting level.

B. Results and Discussion
We now investigate to see how well the 10 most informative sensor nodes selected by the method as presented in Section III can monitor the spatial phenomena. Note that one may select 6, 12 or any number of sensors that suit to a particular application. For reasons of space, in this work, we demonstrate the results corresponding to the 10 selected nodes only.
Let us remind that there are two subsets of the 10 selected temperature (likewise, humidity) sensor nodes obtained by both the multivariate and univariate sensor selection approaches as discussed in Section III. In the multivariate scenario, since the 10 most informative sensor nodes were jointly selected by the multivariate sensor selection approach, we employed all the 10 corresponding temperature and 10 corresponding humidity measurements to build a multivariate model and then utilized the model to simultaneously predict two spatial processes at the 2500 unmeasured locations as introduced in Section II-C. The results of the predicted fields and variances are depicted in Figures 4a and 4b for the temperature and Figures 4c and 4d, respectively. In contrast, in the univariate scenarios, for each subset of the selected sensors, we developed a univariate model from the 10 corresponding measurements. That is, we have two univariate models, one for the temperature and one for the humidity. By the use of two the univariate models separately, we also independently predicted the spatial phenomena at the 2500 unobserved positions. The predicted results in the univariate scenario are shown in Figures 4e, 4f, 4g and 4h. In each column of Fig. 4, we can now compare the results corresponding obtained by the multivariate versus univariate sensor selection approaches. Visually, difference in each column is trivial. Furthermore, though only the 10 sensors were used to take the measurements, the prediction uncertainty, in both the scenarios, is quite low, which is highly practical in building monitoring applications [11]. More specifically, we quantified errors between the predictions obtained by the selected subsets of the sensor nodes and those obtained by all the 20 sensors. In other words, we computed errors between the predictions shown in Fig. 4a against Fig. 1a and Fig. 4c against Fig. 1d for the multivariate scenarios, and likewise Fig. 4e against Fig. 1b and Fig. 4g against Fig. 1e for the univariate scenarios. Those errors were summarized in the box plots as illustrated in Fig. 5. It can be clearly seen that the errors in both the scenarios are relatively small, which demonstrate effectiveness of the spatial sensor selection methods. Furthermore, we statistically summarize quantiles of these prediction errors in Table I. The quantile results show that 90% of the temperature errors between the predictions obtained by the multivariate and univariate sensor selection techniques are within about 0.5 o C while those in the humidity scenarios range at most from -2.47% to 1.21% though the univariate sensor selection occurs slightly better.
V. CONCLUSIONS Two spatial sensor selection problems utilizing univariate and multivariate models in effectively monitoring spatial random processes have been considered in the paper. Spatial modelling is required in the spatial sensor selection problems as it provides selected sensors with ability to predict spatial fields at unmeasured locations. Therefore, GP and MGP are exploited to model a univariate and multivariate spatial phenomena, respectively, which also allows selecting the most informative sensor nodes through minimizing prediction uncertainty. The efficient greedy algorithm [2] was used to solve the sensor selection optimization problems, where its implementation and the corresponding results obtained in the real-life experiments of monitoring the indoor temperature and relative humidity validated effectiveness of the selection methods. Comparison of the prediction results obtained by the univariate and multivariate sensor selection techniques was also made with insignificant difference shown.