Multimodal Sensor Selection for Multiple Spatial Field Reconstruction

—The paper addresses the multimodal sensor selection problem where selected colocated sensor nodes are employed to effectively monitor and efﬁciently predict multiple spatial random ﬁelds. It is ﬁrst proposed to exploit multivariate Gaussian processes (MGP) to model multiple spatial phenomena jointly. By the use of the Mat´ern cross-covariance function, cross-covariance matrices in the MGP model are sufﬁciently positive semi-deﬁnite, concomitantly providing efﬁcient prediction of all multivariate processes at unmeasured locations. The multimodal sensor selection problem is then formulated and solved by an approximate algorithm with an aim to select the most informative sensor nodes so that prediction uncertainties at all the ﬁelds are minimized. The proposed approach was validated in the real-life experiments with promising results.


I. INTRODUCTION
In the data era with strong support from the increasingly booming Internet of things, a fundamental but critical question is how to efficiently collect sensor measurements.Given advancement of low-cost sensor technology, a large number of sensors may be excessively used in a specific sensing task.Technologically, if sensors are deployed within vicinity of other counterparts, they may record similar data samples, which is called an over-sampling issue.The oversampling problem practically causes redundancy in sensor measurements, which does not contribute to any additional information.Moreover, in a long run, the over-sampling is operationally expensive due to unnecessary sensors, data collection and analysis with increased maintenance responsibility and higher energy depletion.Therefore, in fact, it is expected to select the most informative sensor nodes out of all possible ones, which are technically sufficient to serve in a sensing task.This is called sensor selection [1].
In this work, we focus on the sensor selection problem where sensors are employed to monitor spatial fields.More specifically, selected sensors are required to not only effectively observe spatial phenomena but also efficiently predict the fields at any unmeasured locations of interest [2].In fact, the sensor selection problem in monitoring a single/univariate spatial field, e.g.indoor temperature, has been well considered [2].Nevertheless, selecting the optimal sensor nodes for efficiently monitoring multiple/multivariate spatial fields, here we call multimodal sensor selection, is not yet considered.This work provides detailed discussion about the multimodal sensor selection for reconstructing multiple spatial phenomena.
There are some critical crux about multimodal sensor selection for spatial fields in practice.The first fact is that technological development in micro-electro-mechanical systems enables different types of sensors, e.g.temperature, humidity, gas, air pollution ..., to be embedded in a single small-size lowcost integrated electronic circuit board.That is, an individual sensor node device can take multiple types of environmental data.The second fact is that spatial parameters may present cross-correlation.For instance, changing temperature in a room also causes change of humidity in that room.In the work [3], Mastrantonio et al. exploited multivariate spatiotemporal Bayesian hierarchical framework to examine the extreme temperature and precipitation jointly.The third fact is that addressing the spatial sensor selection problem requires to model spatial fields as the selected sensor nodes are expected to effectively predict the fields at unobserved positions of interest.If a spatial phenomenon is modelled and predicted, then a subset of sensor nodes is selected if it can minimize prediction uncertainty.In literature, modelling a univariate spatial field has been widely considered with a lot of success.For instance, in our previous work [2], we successfully employed a widely-used machine-learning-based Gaussian process (GP) to model a spatio-temporal process.Nonetheless, incorporating multivariate spatial phenomena into a single model is not straightforward since each spatial process exhibits different statistical characteristics.More importantly, if a GP is employed to represent multiple spatial random fields, its covariance structure plays a critical role in quantifying crosscorrelation among the fields.In other words, cross-covariance matrices in a multivariate Gaussian process (MGP) [4] are required to be positive semi-definite so that the statistical model can provide efficient prediction.There are some approaches proposed to develop cross-covariance functions for multiple spatial random fields [5]; and among those, Gneiting et al. [6] built and proved the Matérn covariance functions to compute valid cross-correlation between any two of multivariate spatial phenomena.
In this paper we first exploit MGP with a Matérn crosscovariance function to model multiple spatial fields and employ the model to predict the phenomena at unmeasured locations.We then propose an efficient approach to address the multimodal sensor selection problem.The proposed method was implemented in the real-life experiments where the obtained results demonstrate its effectiveness.The remaining of the paper is arranged as follows.We introduce the multimodal sensing in Section II before discuss the multimodal sensor selection problem in Section III.Section IV summarizes our experimental results with discussions before conclusions are drawn in Section V.

II. MULTIMODAL SENSING
This section introduces the multimodal sensing that can be efficiently employed to simultaneously monitor multiple spatial fields such as temperature, humidity and air pollutants.In this work, it is assumed that all types of sensors are embedded on the same board when taking measurements.That is, when deployed in monitoring tasks, these types of sensors are co-located.

A. Multivariate Modelling
Let us recall the sensing model for a single type of sensors monitoring a specific spatial phenomenon [7], e.g.temperature in ocean.If there are n sensors positioned at locations s = (s T 1 , ..., s T n ) T ∈ R n×d , where d is dimension of the deploying space, the collective measurements gathered by all the sensors can be denoted by y = (y 1 , ..., y n ) T , which can be modelled by where x(s) is a matrix of the spatially referenced covariates and β is a vector of the trend parameters.It is noted that x(s)β presents trend, i.e. constant, first or second orders, of the spatial field at locations s.Hence, dimensions of x and β depend on the trend selected.For instance, the first order trend leads to a n × 3 matrix x and a 3 × 1 vector β.w(s) is a n × 1 vector of latent random variables that are favourably modelled by a zero-centred Gaussian process (GP) while ε(s) is a n × 1 vector of independent and identically distributed noises following a normal distribution with a zero mean and an unknown variance τ 2 .Now let us consider m spatial phenomena monitored by m different types of sensors that are co-located and each type has n sensors.All the sensors are positioned at locations s, and all the collected measurements are mathematically represented by where Y = (y T 1 , ..., y T m ) T and β = (β T 1 , ..., β T m ) T .The covariates X are specified by If all different types of sensors are colocated, then The measurement noises are now modelled by a multivariate normal distribution ∼ M V N (0, Ψ), where the dispersion matrix Ψ is arranged by The important component in the model ( 2) is a mn × 1 vector process W(s), which is also called MVP [4], W(s) ∼ M GP (0, Σ).The multivariate cross-covariance matrix Σ is given by where C ii = E(y i (s + h)y i (h)) is a univariate covariance matrix computed from only a single i th spatial field.C ij = E(y i (s + h)y j (h)) is a cross-covariance matrix calculated between any two i th and j th spatial phenomena, where 1 ≤ i = j ≤ m.Note that h ∈ R d is a separation vector between any two locations s.
Computing the univariate covariance matrix C ii is widely known [2]; however, working out the cross-covariance matrix C ij is not completely straightforward due to its ill-famed requirement of positive semi-definite.To this end, Gneiting et al. in [6] proposed a colocated correlation coefficient ρ ij to account for cross-correlation between spatial process components.That is, the cross-covariance matrix can be calculated as follows, where COV(h | θ) is a covariance function and θ is a vector of its hyperparameters.More specifically, they discussed scenarios of employing the Matérn covariance function where σ 2 is a marginal variance while ν and κ are the Matérn smoothness and spatial scale parameters, respectively.K ν is Fig. 1: Cross-covariance between temperature and humidity in the whole environment.White circles are the sensor locations.
the modified Bessel function of the second kind with order ν > 0, and Γ(•) denotes a gamma function.Here θ = (σ 2 , ν, κ).In this work, for the simplicity purpose, we consider a sensing model of two types of sensors observing bivariate spatial phenomena.However, the model can be extended to more than two spatial fields [6].In other words, in the case of modelling two spatial component processes using the Matérn covariance function, in order to guarantee positive semi-definite of the cross-covariance matrix, the colocated correlation coefficient ρ 12 can be selected by the following theorem.
To demonstrate cross-correlation between two spatial phenomena we employ the temperature and humidity spatial fields collected by 20 temperature and 20 humidity colocated sensors monitoring the indoor climate.Detail of the data collection will be presented in Section IV.By the use of ( 5) and ( 6), cross-covariance between the two spatial processes in the whole environment can be computed as depicted in Fig. 1.

B. Multiple Spatial Field Prediction
Due to a limited number of deployed sensors, a spatial field in whole environment cannot be measured directly.In this section we discuss how to predict multiple spatial phenomena from the multivariate model given gathered measurements.
Let us define z = (z T 1 , ..., z T p ) T ∈ R p×d as unmeasured locations of interest, hence x(z) is a corresponding matrix of covariates.The multiple spatial fields at z can be predicted by the posterior multivariate Gaussian distribution as follows, where .
Both Σ zz ∈ R mp×mp and Σ zs ∈ R mp×mn can be computed similarly to Σ in (4) with a notice that Σ zz presents prior covariance and cross-covariance among spatial fields at z while Σ zs presents cross-covariance between spatial phenomena at z and those at s.It is to be noted that β and the hyperparameters θ used to compute ( 7) and ( 8) can be estimated by the maximum likelihood technique [8].

III. MULTIMODAL SENSOR SELECTION
Sensor selection is well known, mainly for a univariate/single spatial field, for selecting the most informative subset of sensor nodes out of all possible ones [1].In this section we extend the sensor selection problem to multivariate spatial phenomena monitored by multiple types of sensors with the multimodal sensing paradigm.

A. Problem Statement
The critical crux in selecting the most informative subset of sensor nodes is to minimize uncertainty at prediction results of all spatial process components.In equivalent words, the selected sensor subset (among all potential ones) must be able to reduce predicted variances at unmeasured locations of interest at the most.Let us define P as a set of all possible sensors and C as a selected subset of sensors.As can be computed by (8), predicted variances of all spatial fields at unmeasured locations of interest z lie along a diagonal of the posterior cross-covariance matrix Σ z|s C .Note that s C is locations of the selected sensors in the subset C. Therefore,  the multimodal sensor selection problem can be formulated by where tr(Σ z|s C ) is trace of the matrix Σ z|s C .Unfortunately, the multimodal sensor selection problem ( 9) is NP-hard [2]; that is, finding its optimal solution in polynomial time is intractable.

B. Approximate Algorithm
In fact, an efficient way to near-optimally solve a NP-hard optimization problem is employ a greedy heuristic algorithm.More specifically, in this work, we start at C = and pick one sensor node from P. We add the picked node to C, compute tr(Σ z|s C ) accordingly then remove that node from C. It is noted that as each node consists of multiple colocated types of sensors, Σ z|s C is a cross-covariance matrix.We repeat the procedure until all nodes in P are picked.The minimum of all the computed trace values leads to a corresponding sensor node in P picked again and added back to C permanently.We call this operation as OP.
We move to the second iteration where C now already has one sensor node.To find the most informative sensor node and add it to C, we redo OP.Repetition of OP can occur as many times as expected until cardinality of C reaches a predefined number, which is an expected number of selected sensor nodes.
To demonstrate efficacy of the sub-optimal algorithm in addressing the optimization problem (9) we implemented it in our experiments of selecting 1 to 10 most informative sensor nodes out of 20 available ones in monitoring both indoor temperature and humidity climates.The near-optimal values (i.e.trace values) of the objective function are summarized by the red solid curve depicted in Fig. 2. For the comparison purpose, we also selected the sensor nodes randomly from the potential set.Due to randomness, we run the random selection 1000 times and summarize the results in the box plots as illustrated in Fig. 2. As can be clearly seen that the results obtained by the approximate algorithm in this work are always better than those obtained by the random selection regardless a number of selected sensor nodes.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
In order to demonstrate how to apply the multimodal sensor selection for selecting the best locations for multiple types of sensors and efficiently monitor multivariate spatial fields simultaneously, we implemented the proposed approach in the experiments of monitoring the indoor climate in the Nanyang Technological University campus, Singapore.Note that only two spatial fields of temperature and relative humidity are discussed in this section.

A. Experiments and Data Collection
We conducted experiments at the room S2.1.B4.01, which has size of 19.80 m in length and 14.86 m in width, on the 25 th of April, 2016.In those experiments, we employed Libelium temperature and humidity sensors, 20 of each, to measure the indoor temperature and relative humidity.The sensors were utilized to form two networks that are wirelessly connected and spatially deployed at 20 predefined locations in the room as depicted by the white circles in Fig. 1.It is noticed that due to design of the Libelium sensor board, both the temperature and humidity sensors are embedded on the same integrated circuit board.That is, each pair of the temperature and humidity sensors are colocated.After taking measurements, the sensor nodes transmitted their data to a central station via the network routers [9].Since the collected data can be used to evaluate human comfort in a building, we deliberately deployed all the sensors on the same plane at a sitting level.

B. Results and Discussion
In order to simultaneously represent both the temperature and relative humidity spatial fields, we employed the multivariate model presented in Section II.We chose a first order function of spatial locations to account for trends of the spatial phenomena.Moreover, the Matérn covariance function was exploited to compute covariance and cross-covariance matrices of the model (2).Given the 20 measurements collected for each process component, by the use of the multiple spatial field prediction formulated in Section II-B, we conducted prediction of the two spatial processes in the whole environment, as demonstrated in Fig. 3a for the temperature and Fig. 4a for the relative humidity, respectively.It is noticed that in the whole environment prediction we discretized the room into the 2500 unmeasured locations of interest positioned on an 50 × 50 uniform grid and then predicted the both indoor climates at those 2500 locations.The corresponding predicted variances of the fields at the 2500 grid positions in the whole room are summarized and demonstrated in Figures 3b and 4b, respectively.Both Figures 3b and 4b show that the prediction uncertainty of the either temperature or humidity is fairly low, which efficiently proves efficacy of the multivariate model in representing multiple spatial phenomena concomitantly.Now it is assumed that we expect to select 10 most informative sensor nodes out of all 20 ones.Since each node has two temperature and humidity sensors, it is required to address the multimodal sensor selection problem.To this end, we run the proposed algorithm presented in Section III, where values of the objective function are plotted in Fig. 2. It is important to note that due to different ranges of measurements of spatial fields, all the measurements of each process were normalized before utilized in the multimodal sensor selection algorithm.In Fig. 2 it also reveals that the more sensor nodes are selected the more accurate prediction results are as total of the predicted variances is reduced following increase of the number of the selected sensors.The 10 corresponding nearoptimal sensor node locations are shown by white circles in Fig. 3d (or in Fig. 4d).
Let us consider whether the selected sensors can be used to efficiently monitor spatial fields as compared with all possible ones.For instance, given the measurements gathered by the 10 selected sensor nodes, i.e. 10 temperature measurements and 10 humidity measurements, we trained the multivariate model again and conducted prediction of the two phenomena at the 2500 grid locations covering the whole room.The predicted fields are illustrated in Fig. 3c for the temperature and in Fig. 4c for the humidity.It qualitatively shows that the prediction results obtained by the most informative 10 sensor nodes are highly comparable to those obtained by all the 20 sensor nodes, as demonstrated in Figures 3a and 4a.In addition, though exploiting only 10 sensor nodes to observe the climate in the room, the prediction uncertainty as shown in Figures 3d and 4d are reasonable as compared with that obtained by all the 20 sensors as depicted in Figures 3b and 4b.Reduced number of sensors are highly practical in building monitoring applications [9].More importantly, to quantify difference between prediction obtained by the selected subset of the 10 sensors and that obtained by all the 20 sensors, we computed errors between the predicted results shown in each pair Figures 3a and 3c for the temperature and Figures 4a and 4c for the humidity.
The temperature and relative humidity errors are summarized in histograms as plotted in Figures 5a and 5b, respectively.Moreover, we specifically calculated quantiles of the errors in both scenarios and summarized them in Table I.Statistically, 90% of difference between the prediction in the whole environment obtained by the 10 selected sensor nodes and that obtained by all the 20 ones are within about 0.5 o C for the temperature and ranged from -2.47% to 1.21% for the relative humidity.

V. CONCLUSIONS
The multimodal sensor selection problem for reconstructing multivariate spatial phenomena has been discussed in the paper.Statistically, modelling multiple spatial random fields simultaneously is not trivial due to positive semi-definite requirement of cross-covariance matrices.However, by using the MGP and Matérn cross-covariance function, multivariate spatial processes can be represented by one model and predicted at any unobserved locations, which leads to emergence of the multimodal sensor selection.The multimodal sensor selection problem aims to select the most informative sensor nodes out of all potential ones so that selected sensors can effectively monitor multiple phenomena and efficiently predict those processes jointly at unmeasured positions.The proposed approach was implemented in the real-world experiments where the obtained results demonstrate its practical effectiveness.

Fig. 2 :
Fig. 2: Values of the objective function in the multimodal sensor selection problem where two spatial temperature and humidity fields are considered.The red solid curve was obtained by our algorithm while the box plots summarises the 1000 experiments with random sensor selection.

Fig. 3 :
Fig. 3: Predicted temperature field in the whole environment obtained by 20 sensors (top row) and 10 selected sensors (bottom row), respectively.White circles are the sensor locations.

Fig. 4 :
Fig. 4: Predicted humidity field in the whole environment obtained by 20 sensors (top row) and 10 selected sensors (bottom row), respectively.White circles are the sensor locations.

Fig. 5 :
Fig. 5: Differences between the spatial field predictions obtained by all 20 sensors and 10 selected sensors: (a) temperature comparison and (b) humidity comparison.

TABLE I :
Quantiles of Prediction Differences