Boundary Tracking of Continuous Objects Based on Binary Tree Structured SVM for Industrial Wireless Sensor Networks

—Due to the ﬂammability, explosiveness and toxicity of continuous objects (e.g., chemical gas, oil spill, radioactive waste) in the petrochemical and nuclear industries, boundary tracking of continuous objects is a critical issue for industrial wireless sensor networks (IWSNs). In this article, we propose a continuous object boundary tracking algorithm for IWSNs – which fully exploits the collective intelligence and machine learning capability within the sensor nodes. The proposed algorithm ﬁrst determines an upper bound of the event region covered by the continuous objects. A binary tree-based partition is performed within the event region, obtaining a coarse-grained boundary area mapping. To study the irregularity of continuous objects in detail, the boundary tracking problem is then transformed into a binary classiﬁcation problem; a hierarchical soft margin support vector machine training strategy is designed to address the binary classiﬁcation problem in a distributed fashion. Simulation results demonstrate that the proposed algorithm shows a reduction in the number of nodes required for boundary tracking by at least 50% . Without additional fault-tolerant mechanisms, the proposed algorithm is inherently robust to false sensor readings, even for high ratios of faulty nodes ( ≈ 9% ).


INTRODUCTION
P RODUCTION activities in petrochemical, plastics, and energy industries are always accompanied with the risk of normal accidents [1] which may result in leakage of chemical gases, radioactive contamination, oil spill and other continuous objects [2]. These continuous objects tend to spread over a wide region, which may cause poisoning, dangerous fire, or an explosion. Studying the spatio-temporal evolution of continuous objects is of significance in disaster prediction and reduction, including hazard suppression and personnel evacuation [3].
Incorporating Wireless Sensor Networks (WSNs) within industrial automation systems -i.e., Industrial Wireless Sensor Networks (IWSNs) -can assist in mitigating the impact of such accidents. IWSNs embed smart electronics into production systems that can run intelligent algorithms [4] [5]. A typical IWSN consists of hundreds of wireless tiny sensor nodes installed on industrial equipment or deployed in the field [6]. The nodes are low-cost, but with the capabilities of sensing, processing, and communication. By offering sensing services and ubiquitous networking to industrial systems, IWSNs show great potential in serving industrial safety before and after the occurrence of an accident. For example, after the leakage of a toxic gas, using pervasive sensor nodes in IWSNs to acquire the concentration of gas in different positions, the distribution of leak gas can be visualized and emergency relief operation can be carried out orderly [7].
Recently, research efforts around the globe are focusing on exploiting the fine-grained perception of IWSNs to track the movement of continuous objects [8] [9]. Detecting the boundary of diffusing continuous objects by identifying the sensor nodes located closest to the evolving front line attracts intensive attention [10]- [12]. The identified nodes are termed as boundary nodes [13]. Boundary nodes report their coordinate positions to a base station for delineating the area affected by the hazard [14].
The challenges behind the above-mentioned idea focus on two aspects. First, the base station monitors either the interior or the exterior boundary by selecting the boundary nodes that are either inside or outside the event region. As shown in Fig. 1, the interior boundary is formed by connecting interior boundary nodes w, x, y, z, while the exterior boundary is formed by connecting exterior boundary nodes m, n, o, p, q. The predicted actual boundary is determined between the interior and the exterior boundary. Secondly, because continuous objects constantly change their sizes and shapes, the number of boundary nodes increases significantly with the expansion of the boundary. In addition, low-cost nodes are prone to failure in harsh industrial environments, leading to the sensor readings that contradict reality. Based on false sensor readings, a fault node may yield a spurious event boundary, potentially deceiving the base station and preventing the signaling of an actual event [15] [16].
In this article, a binary tree structure-based continuous object boundary tracking algorithm (BTS-COT) is proposed for IWSNs, which makes active use of on-site supervision data. The BTS-COT addresses the problem of boundary tracking by learning a classification hyperplane about the environment, where event and non-event nodes exist. Taking advantage of the sparse representation that support vector machine (SVM) provides for decision boundaries, we regard sensor readings as training samples, and enable sensor nodes to distributively implement SVM learning under the binary tree architecture. As a result, boundary node selection is transformed into support vector searching in a high dimensional feature space. Without specially designed fault-tolerant mechanisms, the algorithm is robust to the abnormal sensor readings caused by node failure. The main contributions of this article are: • We propose a full binary tree structured network partition mechanism to achieve boundary area mapping, reducing the searching space of boundary nodes. • We realize boundary tracking using soft margin SVM with binary tree architecture, and design a hierarchical SVM learning strategy. • The exterior and interior boundaries are used as upper and lower bounds, in order to estimate the real boundary probabilistically. • Simulation experiments are performed to evaluate the tracking accuracy of the proposed algorithm in scene of uniform/non-uniform diffusion, and to verify the fault-tolerance of the proposed algorithm. The remainder of this article is organized as follows: Section II reviews related work regarding continuous objects tracking in sensor networks. Section III presents the boundary area mapping under full binary tree structured IWSNs. Section IV introduces the boundary track-ing algorithm using soft margin SVM with binary tree architecture. Experimental results are given in Section V, and conclusions and future work are shown in Section VI.

RELATED WORK
In recent years, multiple research studies have attempted to realize boundary tracking of continuous objects, by partitioning the sensor network into uniform or nonuniform cells. The partition units include grids, Delaunay triangulation cells, Voronoi cells and other types of closed polygons -generated through different planarization algorithms.

Grid-based Boundary Tracking
Han et al. [17] proposed a continuous object tracking scheme with two-layer grid model (TGM-COT) in sensor networks. The TGM-COT divides the network into coarse-grained grids for continuous objects detection. Fine-grained grids are established within the coarsegrained grids, which contain the boundary of the continuous object. Boundary node selection is performed separately per each fine-grained girds. The TGM-COT is easy to implement; it only involves grid-based partition and node clustering. However, frequent cluster formation and deletion downgrade the energy efficiency of the TGM-COT.
Oh et al. [18] combined the two-layer grid model in the TGM-COT with a convex hull algorithm to track the boundary of continuous objects. Since convex hull methods cannot work on concave polygons, the authors proposed a recovery mechanism to detect the shape loss between the obtained convex hull and the actual boundary. This method performs well in densely deployed networks, but incurs in high communication overheads when tracking the boundaries of high degree of irregular.
Kim et al. [19] proposed an origin-mediated communication scheme to support sink mobility (OCSM) for phenomena detection in IWSNs. When a phenomena occurs in a specific location, the sensor node which has the largest intensity of sensing is selected as a source node. The source node establishes a data delivery path to a mobile sink through a virtual backbone network. The virtual backbone network consists of routing nodes deployed in the form of grids. Although the OCSM utilizes a mobile sink to relieve the problem of energy imbalance, for the backbone nodes in the virtual backbone network, the communication overhead cannot be reduced.

Voronoi-based Boundary Tracking
Imran et al. [20] proposed an efficient continuous object detection approach for Voronoi diagram-based WSNs. In this method, each node in the network is positioned in a Voronoi cell. The node that detects the object within its Voronoi cell utilizes two-hop neighbor detection information to avoid reporting the minimal change in the phenomenon shape. If a node depletes its energy, a spare node is waken up in the Voronoi cell where the dead node is detected. It is noted that the division of one network using Voronoi cells is non-unique, leading to variable results in boundary tracking.

Delaunay Triangulation-based Boundary Tracking
Han et al. [21] proposed a novel boundary recognition and tracking algorithm for continuous objects (BRTCO) in WSNs. With the help of triangular mesh, the nodes located outside the event region -but close to the boundary -collaborate to determine the geometric characteristics of the boundary line, and filter out less reliable nodes towards improving the tracking accuracy. Similarly to the Voronoi-based partition, Delaunay triangulationbased partitions cannot ensure the uniqueness of the results of boundary tracking.

Planarization-based Boundary Tracking
Ping et al. proposed [22] a two-stage boundary face detection mechanism for duty-cycled IWSNs. When a target event is detected by active nodes, the boundary faces of the continuous object are constructed by planarization algorithms. Boundary nodes are selected within each boundary face so that the size of boundary faces can be refined. The authors adopted four kinds of planarization algorithms including Gabriel graph (GG), relative neighborhood graph (RNG), Yao graph (YG), and k-localized Delaunay graph (LDelk). Experimental results show that LDel2 achieves the best accuracy among these planarization algorithms. Rahman et al. [23] combined four proximity graphs with different spatial interpolation methods for accurate boundary estimation in a dense duty-cycled WSNs. The four proximity graphs studied in this work are the same as those in [22]. To estimate the sensor values of the geographical points where asleep nodes are positioned, the authors studied four spatial interpolation methods, including Inverse Distance Weighting, Kriging, Spline, and Natural. Then, the inner and outer boundary can be determined without awaking dormant nodes.

Discussion
Through network partition, the process of boundary node selection is easy to manage. Simultaneously, boundary tracking can be performed energy-efficiently. However, the fixed cells utilized in the existing algorithms cannot adapt to the evolution of continuous objects. To track the boundary of continuous objects adaptively, a dynamic network partition mechanism, which is able to adjust the resolution of partition cells, appears to be more suitable than a static one.

BOUNDARY AREA MAPPING UNDER FULL BINARY TREE STRUCTURED NETWORK
This section introduces a full binary tree structured network partition mechanism. The mechanism adaptively adjusts the size of partition cells based on the area of the event region covered by the continuous object. The aim is to localize the boundary area of the object by a collection of non-uniform partition cells in the full binary tree structured network.

Coarse-grained Event Region Search based on Inter-cluster Collaboration
For the sake of concreteness in the description of the movement of continuous objects, we focus on the diffusion of leakage gas -which is the most common event in petrochemical production. Nowadays, various types of chemical sensor nodes are available to monitor gas leakage. The nodes provide analog quantity outputs to measure the gas concentration. The presence of gas leakage always corresponds to a high sensor reading, while a low reading indicates no event. In order to suppress external interference and measurement noise, a node judges whether a specific event is detected by mapping its numerical sensor reading into a binary variable. If the node is reached by the evolving gas cloud, its numerical sensor reading will be mapped into a positive value -or a negative value otherwise.  As shown in Fig. 2(a), we consider an IWSN in which N homogeneous nodes are organized into clusters. The nodes in each cluster switch between sleep and active state following a predefined duty cycle. At each time step, the active nodes in the network achieve full coverage to detect the existence of leakage gas. The area covered by leakage gas is termed as the event region. In each cluster, the active nodes which detect the existence of gas send Event M sg packets to their cluster head. The cluster head extracts the node ID and corresponding coordinates to infer the upper bound and lower bound of the event region, at the scale of the local cluster. Through an integrated combination of the local bounds determined by all of the cluster heads, a global bound of the event region can be drawn.
An example is showcased in Fig. 2(b); the cluster head CH A in cluster A extracts coordinates from the Event M sg packets received from its cluster members, and then determines the maximum and minimum of x coordinate and y coordinate. The local bound of the event region in cluster A can be represent by a 4-tuples Edge A = [Xmin A, Xmax A, Y min A, Y max A]. CH A sends Edge A to its neighboring cluster heads CH B and CH C. CH B compares its local 4-tuples Edge B with Edge A to determine the bound of the event region covered by cluster A and B. The 4tuples keep on exchanging and updating in the network until convergence. Finally, the global bound of the event region convergence to a 4-tuple: Edge = [Xmin C, Xmax B, Y min C, Y max A]. It can be observed [ Fig. 2(b)] that a rectangle zone can be determined under the constraint of the global 4-tuples Edge. The algorithm uses the minimum circumscribed circle of the rectangle as the outmost layer of a full binary tree, and proposes a full binary tree structured network partition mechanism.

Full Binary Tree Structured Network Partition
Binary trees are defined recursively as a collection of elements (starting at a root), where each element has at most two children. The elements that are directly under an element are called children. The element directly above something is called the parent. The elements with no children are referred to as leaves. A binary tree is considered full if every element in the tree has either two or zero children, and every element without children is on the lowest level of the tree. Inspired by the hierarchy of full binary trees, the organization mode of full binary trees is embedded into network partition in our work.
Once deployed in an area of interest, sensor nodes form a self-organized network. We partition the sensor network into non-uniform cells. Cells are basic partition units which occupy certain areas. Nodes in different locations belong to different cells. As shown in Fig. 3, the sensor network is arranged in a full binary tree structured pattern. The binary tree structured network is made up of cells that have either two or zero children cells. The cell located in the center of the tree structured network is the root. Cell 1 has exactly two children, namely, cell 2 and cell 3. Similarly, cell 4 and cell 5 are the children of cell 2, while cell 6 and cell 7 are the children of cell 3. Once the spacing d of two adjacent layers is determined, the area of interest can be occupied by different sizes of cells distributed in h layers. It can be seen that the number of cells increases exponentially with the number of layers. The aims to determine an event region composed of coarse-grained cells from inner layers, while profiling the boundary area using fine-grained cells in outer layers.

Mathematical Foundation of Full Binary Tree Structured Network
The profile of the full binary tree network structure is determined mathematically by exploiting the knowledge behind Fick's law [24]. Fick's law is a classic theory in fluid mechanics. It works in a two-dimension static aquatic environment where a pollution source diffuses with a constant rate. Let c (x, y, t) denote the diffusion concentration at location (x, y) with diffusion time t. The partial differential equation of the diffusion concentration derived by Fick's law is: where λ denotes diffusion rate.
Assuming that an initial amount of pollutant A is discharged at the location (x 0 , y 0 ) at t = 0. By integrating Eq. 1, the diffusion concentration c(x, y, t) at location (x, y) with diffusing time t (t > 0) can be derived: where d(x, y) denotes the distance between the sampling location (x, y) and the location of the pollution source (x 0 , y 0 ). Eq. 1 has been widely used in a number of approaches [25]- [27] and validated by experiments in [28]. According to Eq. 2, the spatial distribution of the concentration at diffusion time t follows a Gaussian distribution. Furthermore, if the concentration threshold detected in gas boundary is c th , the analytic expression of the timevarying gas boundary can be derived by solving Eq. 2: Eq. 3 shows that, governed by Fick's law, the boundary of a diffusing object is an circle evolving over time. That is why the proposed full binary tree network structure adopts a set of concentric circles as its basic outline. It is worth noting that the diffusion of continuous objects does not always follow Fick's law in the real world, thus, we further divide the concentric circles into different sizes of cells for fine-grained boundary mapping.

Boundary Area Mapping under Full Binary Tree Structured Network
As shown in Fig. 3, each cell in the full binary tree structured network is assigned with an unique ID. The area number is sorted anticlockwise. A node whose polar coordinate is (ρ i , θ i ) determines the ID of the cell in which it locates as follows: where L i denotes the layer of the node in the full binary tree. L i can be calculated as follows: We term the cells which contain event nodes as event cells. The cells which contain no event nodes are termed as non-event cells. The cells which can be used to characterize the boundary of event region are termed as boundary mapping cells. We further subdivide boundary mapping cells into non-leaf boundary mapping cells and leaf boundary mapping cells. The definitions of the two types of boundary mapping cells are as follows: Definition 1. Non-leaf boundary mapping cell: A cell contains event nodes and at least one of its child cell is non-event cell.

Definition 2.
Leaf boundary mapping cell: A leaf cell (the cell located in the outmost layer of the tree structured network) which contains event nodes.
In Fig. 3, cell 9 represents an event cell, and its children cells are event cell 18 and non-event cell 19. Cell 9 can be identified as a non-leaf boundary mapping cell. Cell 16 is an event cell and both of its children cells are nonevent cell, cell 16 can be also identified as a non-leaf boundary mapping cell. Cells 35, 36, 47 and 48 are leaf boundary mapping cells. All the mapping boundary cells are marked in green. It can be observed that only a few boundary mapping cells in outer layers are required to profile the border region.

Rationality Analysis of Full Binary Tree Structured Network
Compared with grid-based, Voronoi-based, Delaunay triangulation-based, and planarization-based network partition mechanisms, the proposed full binary tree structured network partition has the following superiority in boundary tracking.
First of all, the full binary tree structured network partition extracts the features which represent border shape as much as possible. According to Fick's law, the boundary of a diffusing object can be roughly approximated to an evolving circle over time, motivating the proposed network structure to adopt a set of concentric circles as its basic outline. Considering the anisotropy of the boundary, the concentric circles are further divided into different sizes of cells following the rules of full binary tree construction. With parameters control of the network structure, the body of the continuous object will be occupied by coarse-grained cells in inner layers of the full binary tree structured network while the boundary area will be profiled using fine-grained boundary mapping cells in outer layers. Various combinations of different sizes of cells ensure the irregularity of the boundary can be captured. Based on the above-mentioned analysis, the proposed full binary tree structured network partition is able to provide part of boundary feature information before boundary tracking.
Secondly, the proposed full binary tree structured network partition is dynamically reconfigurable. With the diffusion of the continuous object, the network adaptively adjusts the resolution of partition cells. Thus the cost of maintaining the full binary tree structured network partition will not increase exponentially, as the network partition mechanisms using fixed cells.
Finally, in the next section, we will show how to visualize the peripheries of continuous objects in great detail through boundary node selection. In the phase of boundary node selection, full binary tree structured network partition is an effective method for reducing the search space of boundary nodes from the global network to boundary mapping cells. Further, under two types of cycle cascade architectures, which are the substructures extracted from the binary tree structured network, the nodes in boundary mapping cells are allowed to implement boundary node selection collaboratively.

Binary Classification Problem Formulation in Boundary Tracking
Assuming node i locates at The observation of node i is a binary variable z i . z i is set to positive one if node i detects the object, and negative one otherwise. Each node encapsulates its observation and coordinate information into a packet. If all the nodes upload the packets to a base station, the packets can be viewed as training samples from the perspective of machine learning. The training samples generated in IWSNs can be fed into a classification algorithm to output a decision boundary f : R 2 → {−1, +1}. It makes sense to draw a particular analogy between the decision boundary and the boundary of continuous objects. The decision boundary divides the feature space into positive and negative sample space while the boundary of continuous objects divides the monitoring area into event region and non-event region. Due to functional equivalence, boundary tracking in IWSNs can be formulated as decision boundary searching in a binary classification problem.
Popular binary classification algorithms include Artificial Neural Networks (ANN), SVM, Naive Bayes, K Nearest Neighbors (KNN), and decision trees. The choice of these algorithms always depends on the task at hand. Considering challenges of continuous object tracking in IWSNs while learning from comparison study of binary classification algorithms in [29]- [30], we provide the following remarks to explain why we select original softmargin SVM as a prototype to be reformed for boundary tracking in IWSNs.
Small sample size: For the issue of energy conservation in IWSNs, it is recommended to use as few nodes as possible for generating data samples. Learning machines are required to achieve high classification accuracy with small sample size. Reviewing current classification algorithms, ANN and decision trees pursue high accuracy at the cost of sample size whereas SVM and Naive Bayes is not restricted by the amount of sample. Besides sample size, KNN and Naive Bayes ask for the number of the training examples per class to be balanced.
Robust for imprecise samples: Imprecise training samples are inevitable for low-cost, easy-to-failure nodes working in harsh industrial field. Reviewing current classification algorithms, ANN and SVM are able to provide reasonable classification results using samples with incorrect labels. KNN is sensitive to irrelevant features because its similarity measures can be easily distorted by errors in attribute values. Contrary to KNN, decision trees are resistant to noise because their pruning strategies avoid overfitting. Naive Bayes is naturally robust to missing values which are simply ignored in computing probabilities and hence have no impact on the final decision.
Low time complexity: Boundary tracking of continuous objects, characterized by real time, requires learning machines with low time complexity. Naive Bayes trains quickly due to only a single pass on the data either to count frequencies or to compute the normal probability density function. KNN has a large time complexity on calculating the similarity between the unlabeled sample and all training samples. The calculation complexity of SVM is stemmed from solving a Lagrangian dual problem. Compared with SVM, computational time effort turned out to be much more intensive for ANN. Decision trees work faster than neural networks and SVMs.
Ease of use: The number of model parameters to be tuned by the user is an indicator of an algorithm's ease of use. KNN has only one single parameter which is relatively easy to tune. SVM involves the choice of kernel function and the determination of a regularization parameter. Decision trees require to select a criterion for feature selection and determine a number of parameters for pruning. ANN have the largest number of parameters to be determined, such as the number of layers, the number of units in each layer, maximum training time, learning rate, types of activation functions.
Interpretability: Neural network is a black box model. The knowledge hidden in ANN is hard to be explained. KNN calculates the similarity between the unlabeled samples and all training samples to implement classification. Different definitions of similarity make KNN lack of interpretation. Decision trees have good interpretability due to the interpretation and assessment of rules and the expression of knowledge. Naive Bayes and SVM also have good interpretability because they are supported by well-defined mathematical theory. Naive Bayes is based on probability theory while its attribute independence assumption is often violated in the real world. SVM is based on statistical learning theory, which focuses on the machine learning of small sample size, and can trade off between the complexity of models and generalization performance. Last but not least, SVM provides the decision function of separation hyperplane, which is unavailable in ANN, KNN, decision trees, and Naive Bayes.
Based on the above-mentioned analysis, it can be concluded that SVM addresses the problem of small sample study of particular advantage, has strict mathematical theory of support and less parameters to be tuned. All these features are desirable for boundary tracking in sensor networks. However, the most fundamental reason for weeding out NN, KNN, decision trees, and Naive Bayes is that all these classifiers only judge discrete points within or outside the region covered by a continuous object. For the points lying on the boundary of the continuous object, they cannot provide a unified, definitive and formal mathematical model.
On the contrary, the decision function provided by SVM can be equivalent to the functional form of the boundary of the continuous object. Moreover, the sparseness representation of the decision boundary indicates that we only need to identify a small number of nodes, whose training samples are support vectors, as boundary nodes. To strengthen the fault-tolerance of SVM, we finally select Soft-margin SVM as the most promising original algorithm. All that remains is how to run Softmargin SVM in IWSNs in a rapid manner.

Preliminaries of Soft-margin SVM
Soft-margin SVM is a supervised learning model that attempts to find an optimal hyperplane, denoted by H(p), that maximizes the margin between two classes. When the training data is linearly separable, these two classes can be separated by two margins parallel to the optimal hyperplane. The optimal hyperplane taking the form H (p) = w · p + b can be found by solving the quadratic problem as follows: where w denotes the normal vector to the hyperplane, b denotes the offset vector that determines offset of the hyperplane from the origin. Slack variable ξ i measures the amount of violation of the constraints. Penalty parameter C is used to evaluate the cost of constraint violation. By solving the Lagrangian dual of the above problem, one obtains the simplified problem: where α i denotes Lagrange multipliers. The training samples corresponding to non-zero Lagrange multipliers are called support vectors. By solving Eq. 7, w can be written as a linear combination of the support vectors, and the hyperplane H(p) can be derived as follows: Nevertheless, real-world problems are usually nonlinear in nature. The solution derived in the case of linear classification is inapplicable to these problems. In a general formulation, SVM works by mapping the training set that are non-linearly separable in the input space into a high or even infinite dimensional feature space. This higher-dimensional space is called the transformed feature space. In a transformed feature space of sufficient dimensionality, any non-linearly separable training set in the input space can be made linearly separable. The optimal hyperplane determined in the feature space thus corresponds to a nonlinear decision boundary in the input space.
Given a nonlinear mapping p → φ (p) , the classifiers is transformed into the form: This formulation can be extended to general nonlinear functions by using the concepts of kernels. Kernels are symmetric positive definite functions that allow inner products to be calculated directly in feature space without defining mapping operator φ (·). By kernel trick, we have K (p i , p j ) =φ (p i ) · φ (p j ), Eq. 9 can be transformed into the form: The selection of an appropriate kernel function is important, since the kernel function defines the transformed feature space in which the training set instances will be classified. In our work, Radial basis function (RBF) kernel function is selected for SVM training. Reasons lie in that RBF is a universal kernel functions which can be applied to any of the distribution of the samples through the choice of parameters. Many researches demonstrated that RBF kernel is one of the most popular functions with convincing performance and is a reasonable first choice [31]- [34].
The expression of RBF kernel function is: For soft-margin SVM with RBF kernel, there are two parameters, C and γ, remain to be determined. The determination of C and γ, known as hyper-parameter tuning, is usually done by performing a grid search over pairs of C and γ with cross-validation [35]- [37]. We use sigmoid function to assess the probability that a new sample p belongs to either positive or negative class: p belongs to positive class if Pr > 0.5, while negative class if Pr < 0.5. Setting the value of Pr to be 0.5, we have an equation with one unknown p. The solution of this equation is the set of points on the decision boundary.

Overview
Ongoing research in the machine learning community seeks to design statistically sound learning algorithms that scale to large data sets. Massive training samples generated by nodes in IWSNs lay the basis for machine learning algorithms to solve the classification problem of large-scale data. Notice that in traditional centralized SVM, all the training samples should be sent to a base station for decision boundary learning. However, broadcasting communication in an IWSN is highly energy inefficient and easy to cause collisions. Moreover, the time complexity of standard SVM which solves quadratic program is O N 3 , where N is the number of samples fed into SVM [38]. If standard SVM is applied to IWSNs directly, real-time responses will be unavailable to industrial applications. Thus, it is preferable for IWSNs to perform SVM training in a distributed fashion. Through fully exploiting the collective intelligence of nodes in sensor network, we design a hierarchical SVM training strategy under the proposed binary tree structured IWSNs. By means of domain decomposition, a large-scale SVM training problem can be turned into a number of smaller discrete sub-SVM problems. The sub-SVM problems will be addressed locally in each boundary mapping cell. Our training strategy parallelizes the learning process and diffuses local support vectors generated in each boundary mapping cell over the entire network until convergence.

Search Space Reduction
Eq. 9 which involves a total of N training samples is a normal form of the separation hyperplane. This equation indicates that Standard SVM regards the complete training set as search space to output support vectors. However, from the mathematical point of view, the non-support vectors whose Lagrange multipliers are zero are superfluous items. A compact expression of the separation hyperplane can be obtained by removing non-support vectors in Eq. 9. Inspired by the compact expression, the original search space can be reduced by safely removing those training samples which are nonsupport vectors.
Considering that non-support vectors do not contain any knowledge for classifier training while accounting for a large portion in the training set, we are allowed to build a representative subset by eliminating non-support vectors out of the original training set as much as possible. The representative subset is small in volume but covers all the knowledge contained in the complete set. Then, a reduced SVM can use this subset to formulate a less-scale quadratic programming problem. The computation for solving this new quadratic programming problem is far less than the standard SVM.
Remember that all the training samples are provided by sensor nodes, search space reduction can be carried out by limiting the number of sensor nodes allowed to provide training samples. After narrowing the possible distribution area of the hyperplane to the collection of the boundary mapping cells, the sensor nodes outside the boundary mapping cells do not need to generate training samples because they are far away from the hyperplane. The search space of support vectors can be reduced to the set of training samples generated inside the boundary mapping cells. Assume the number of nodes in the boundary mapping cells is M . Only the M nodes are required to generate training samples for a reduced SVM learning.

Hierarchical SVM Training under Binary Tree Structure
After search space reduction, nodes in different boundary mapping cells will be organized into different numbers of equal-sized learning clusters. Assuming an individual node is able to train support vectors from a training set of K samples (K << M << N ), the size of each learning cluster is less than K. As shown in Learning clusters in the boundary mapping cells perform distributed learning under two types of cycle cascade architectures, which are the substructures extracted from the binary tree structured network. Fig. 4 illustrates the vertical cycle cascade architecture where the boundary mapping cells are in a parent-child relationship from top to bottom. For example, in Fig. 3, there is a parentchild relationship between cells 9 and 18, and cells 18 and 36. The learning clusters in the three cells will perform distributed SVM training under a three-layer vertical cycle cascade architecture.
As shown in Fig. 4, LC i 1, LC i 2,..., LC i 4 are learning clusters formed in boundary cells located in the i-th layer of the full binary tree. Each learning cluster outputs local support vectors and then uploads the support vectors to the learning clusters formed in its descendant boundary cell in i+1-th layer. LC i + 1 1 and LC i + 1 2 will receive the support vectors from their ancestors and add the support vectors in their local support vectors sets to output corresponding support vectors. Afterwards, LC i + 1 1 and LC i + 1 2 upload their support vectors to the learning clusters formed in its descendant boundary cell in i+2-th layer. LC i + 2 1 will receive the support vectors and add the support vectors in its local support vectors set to output corresponding support vectors. We introduce a feedback loop that enters the resulting support vectors in the last layer into the first layer. The training procedure will keep cycling under the vertical cascade architecture until the convergence of the final support vectors.   Fig. 5, at the beginning, LC i 1 outputs local support vectors and then transmits the support vectors to LC i 2 formed in LC i 1's sibling cell. LC i 2 receives the support vectors from LC i 1 and adds the support vectors in its local support vectors set to output corresponding support vectors. Afterwards, LC i 2 transmits its support vectors to LC i 3 formed in LC i 2's sibling cell. We also introduce a feedback loop that enters the resulting support vectors in the last layer into the first layer. The training procedure will keep cycling under the horizonal cascade architecture until the convergence of the final support vectors.
A layered cascade architecture is guaranteed to converge to the global optimum if we keep the best set of support vectors produced in one layer, and use it in at least one of the subsets in the next layer [39]. This is the case shown in Fig. 4 and Fig. 5.
Finally, the function of the decision boundary is as follows: where p sv i denotes support vectors generated under the binary tree architecture, α * i denotes the non-zero Lagrange multiplier of p sv i . N sv denotes the number of support vectors. The nodes whose training samples are support vectors can be regarded as boundary nodes. After the base station receives support vectors and corresponding Lagrange multipliers from boundary nodes, the analytical solution of the boundary line can be derived by substituting H (p) for H (p) in Eq. 12.

Complexity Analysis
The basic idea behind the proposed BTS-COT is to use the coordinates of sensor nodes as input, sensing labels as output to find an optimal hyperplane. BTS-COT employs Soft-margin SVM to train the hyperplane due to its generalization performance in small sample size, solid mathematical foundation, ease of use, and fault-tolerance. However, traditional soft-margin SVM requires large memory requirement and computation time when dealing with large data sets. Thus, BTS-COT trains soft-margin SVM in a hierarchical framework so that training time can be dramatically reduced meanwhile the burden of training can be tolerable for sensor nodes with limited memory and computing capacity. The complexity of BTS-COT within the context of hyperplane training is discussed as follows.
Review that N sensor nodes are deployed in the IWSN, traditional soft-margin SVM tends to scale with the cube of the number of training vectors O N 3 . After boundary area mapping, a certain number of boundary mapping cells will be selected to profile the border region. From the perspective of SVM training, boundary area mapping eliminates non-support vectors early from the optimization. Review that the total number of sensor nodes in the boundary cells is symbolized by M , the complexity of traditional soft-margin SVM can be For further savings in computation and speeding up training SVM, nodes in the boundary mapping cells are grouped into learning clusters. The learning clusters split soft-margin SVM with O M 3 complexity into a number of smaller quadratic programming subproblems and learn in parallel these subproblems under either horizontal or vertical cascade architecture. Review that the maximum number of samples can be trained by a node is K, the number of sensor nodes in each learning cluster is symbolized by P (P < K). In initial, multiple soft-margin SVM with O P 3 complexity are trained in parallel in each learning cluster. Under either horizontal or vertical cascade architecture, a learning cluster will transmit its local support vectors to its descendant or sibling learning cluster. The learning cluster which receives the support vectors will train a soft-margin SVM with O (P − Q + R) 3 complexity. Q is the number of local non-support vectors separated from P . R is the number of received support vectors. The set of the support vectors are optimized iteratively, until the global optimum is reached.

Simulation Setting
Simulations were performed in a square area using Matlab. For a network with fixed area, we use node density to control the number of nodes deployed in it. Node density is the ratio of the number of nodes to the network area, thus the number of nodes is the product of node density and network area. We set two experimental environments, where the gas diffused evenly and unevenly, respectively. The source of leak was initiated at the center of the sensor field and leakage gas diffused at a rate of 1 m/s in average. Boundary tracking was performed every 10 s. When carrying out hierarchical SVM Training in each learning cluster, a grid search over pairs of C and γ is performed for hyper-parameter tuning. Both C and γ range from 2 −8 to 2 8 with a step of 1. The remaining simulation parameters and their default values are listed in Table 1. BTS-COT was compared with VFPOD, TGM-COT and BRTCO, which are all proposed for continuous objects tracking in sensor networks. Performance indicators include the number of boundary nodes, related contour accuracy, and overlap ratio. The number of boundary nodes reflects the energy-efficiency while related contour accuracy and overlap ratio evaluate the tracking accuracy. The mathematical definition of related contour accuracy is: where A act denotes the actual area of the continuous object, A est denotes the area of the zone bounded by tracking boundary.
The mathematical definition of overlap ratio is: We first simulated an ideal environment where the gas diffused evenly and there was no risk of node failure. We established Cartesian coordinates on the deployed sensor field, the source of the gas was set in ordinate origin initially, and then kept expanding. By gradient transportation, the boundary of the gas exhibited circle shape with a growing radius over time. Fig. 6 shows the snapshots of BTS-COT, VFPOD, TGM-COT and BRTCO in the first quadrant when the diffusion radius of the gas was 30 m. In the first quadrant, the boundary of the gas is shown as a quarter of circle arc in red. Black dots are non-event nodes while red dots are event nodes. It can be observed that VFPOD works in a network partitioned by Voronoi diagram, TGM-COT works in a network partitioned by uniform grids, and BRTCO works in a network partitioned by Delaunay triangles. For BTS-COT, the full binary tree network partition only works on the sensor field covered by continuous objects rather than the entire network area. The boundary nodes selected by each algorithm are marked by squares. Fig. 7 shows the number of boundary nodes selected by each algorithm in the process of diffusion. The diffusion radius of the gas was the product of tracking interval and mean diffusion rate. It can be observed that, with the growth of gas boundary, more boundary nodes are involved in boundary tracking. BTS-COT always introduces the least number of boundary nodes, because support vectors typically account for a little portion of the total samples. Through searching support vectors in a high dimensional feature space, the impact of expanding boundary line on the number of boundary nodes is weakened. In Fig. 8, we used curve fitting of selected boundary nodes to estimate the boundary line. The similarities between the estimated boundary and the actual boundary is measured by RCA and OR. Fig. 8(a) shows the RCA obtained by each algorithm. BRTCO and TGM-COT select non-event nodes as boundary nodes. The area enclosed by the resulting fitting curve is larger than that enclosed by the actual boundary. VFPOD selects event nodes as boundary nodes so that the fitting curve is enclosed by the actual boundary. BTS-COT fitted curves probabilistically using support vectors. Support vectors consist of both event and non-event nodes. The fitted curves is in between the upper bound and lower bound determined by non-event nodes and event nodes, respectively. Therefore, the RCA of BTS-COT outperforms that of the other three algorithms.    8(b) shows the OR obtained by each algorithm. It can be observed the OR obtained by BRTCO and TGM-COT is always 1. It is because the boundary estimated by BRTCO and TGM-COT is delimited by non-event nodes so that actual boundary is within the estimated boundary. The boundary estimated by VFPOD is bound-ed by event nodes. Thus, the estimated boundary is within the actual boundary. BTS-COT also shows high OR compared with BRTCO and TGM-COT. However, BTS-COT does not exaggerate the area of the continuous object as BRTCO and TGM-COT do. Fig. 9 depicts the relationship between the number of boundary nodes and the node density. In this experiment, the radius of the leakage gas was set to 30 m. It can be observed that, with the increasing of the node density, the number of boundary nodes selected by each algorithm increases in varying degrees. BTS-COT is the least sensitive to the node density among the algorithms. Reasons lie in that the number of boundary nodes selected by BTS-COT is equal to the number of support vectors. With the increase of node density, if all the added nodes are far away from the hyperplane, the Lagrange multipliers of the added nodes are zero, no new support vectors will be produced.

Experiment Results in Scene of Non-uniform Diffusion
In a real environment, the leakage gas is susceptible to outside influences and always follows non-uniform diffusion. We use degree of irregular (DOI) to evaluate the irregularity of the boundary. In this set of experiment, we set the node density to be 0.2. The radius of the leakage gas in different directions follows a Gaussian distribution N (R, R * DOI). R is the product of tracking interval and mean diffusion rate. As the gas spreads, its boundary line suffers serious external interference and becomes more irregular. Fig. 10 depicts the relationship between DOI and the number of boundary nodes at 30s. It can be observed that with the increasing of DOI, the number of boundary nodes selected by BTS-COT increases linearly. The reason is that the irregular boundary indicates a hyperplane which requires more support vectors to characterize. For VFPOD, TGM-COT and BRTCO, more localization information of boundary nodes is introduced to profile a boundary of high DOI. Thus, a great number of boundary nodes will be involved for boundary tracking. Fig.11 shows the impacts of DOI on tracking accuracy. It can be observed that with the DOI varying, the tracking accuracy of each algorithm is stable. The RCA achieved by BTS-COT is about 0.9 while the OR achieved by BTS-COT is about 1. The satisfying performance of BTS-COT is guaranteed by the increasing number of support vectors.

Convergence Speed of the BTS-COT
In this section, numerical experiments are carried out to show the convergence speed of BTS-COT. Specifically, the convergence speed of BTS-COT depends on the number of feedback iterations required by soft-margin SVM training, under the horizontal and the vertical cascade architecture. In this set of experiments, each learning cluster starts with 500 training samples for its local SVM training.
For the case of uniform diffusion, most of the learning clusters train their local SVM under horizontal cascade architecture. We vary the number of learning clusters from three to six to build different cluster sizes of horizontal cascade architecture. Table 2 shows that, SVM training under horizontal cascade architecture always converges with one feedback iteration. Compared with standard SVM training, horizontal cascade architecture obtains more than 50% reduction in training time, and with the increase of training samples, such reduction becomes increasingly clear. For the case of non-uniform diffusion, learning clusters train their local soft-margin SVM under horizontal and vertical cascade architecture jointly. Because BTS-COT profiles the boundary area of the continuous object using boundary mapping cells in outer layers, the number of layers of the vertical cascade architecture is always in a low level. Experiment results show that the number of layers of the vertical cascade architecture is usually two or three. Similar to horizontal cascade architecture, SVM training under vertical cascade architecture always converges with one feedback iteration. Compared with standard SVM training, the reduction in training time obtained by vertical cascade architecture is remarkable, even under the two-layer vertical cascade architecture.
The numerical experiments show that BTS-COT has great advantage in convergence. This is likely to be caused by the fact that the BTS-COT decomposes the traditional SVM training problem into a number of independent smaller optimization tasks, and the partial results are combined in later stages in a hierarchical way. All the subproblems are processed in parallel at each step, and non-support-vector data are filtered out sequentially.

Experiment Results in Scene of Node Failure
Besides the influence of the external environment on the network, a high ratio of faulty nodes in the industrial field can also affect the tracking accuracy. The ratio of faulty nodes is defined as the ratio of the number of faulty nodes to the total number of nodes. A faulty node might produce a false positive, which is a high reading indicating an event occurred when it did not, or a false negative, which is a low reading indicating the absence of event when one did occur. Lacking of fault-tolerant mechanisms, TGM-COT, VFPOD, and BRTCO cannot handle a failure of nodes but blindly trust the sensor readings. Consequently, some false boundary nodes will be selected to fabricate a non-existent boundary. Soft margin SVM minimizes the structural risk minimization and provides good performances in terms of global op- timization and good generalization abilities. Thus, BTS-COT has inherent ability to eliminate the influence of abnormal sensor readings. In Fig. 12, we validated the fault-tolerant capability of BTS-COT. We set the average radius of leakage gas to be 30 m, DOI to be 0.05, and node density to be 0.2. As shown in Fig. 12, red solid line denotes the actual boundary while blue solid line denotes the tracking boundary. Faulty nodes are marked by green squares. It can be observed that, when the ratio of faulty node is no more than 0.09, BTS-COT is able to eliminate the influence of false sensor readings. The result of boundary tracking agreed well with the actual boundary. However, when the ratio of faulty node is 0.12, the BTS-COT is not robust enough to defense the faults. As shown in Fig.  12(c), the hyperplane learned by BTS-COT is determined by both real boundary nodes and faulty nodes. When the ratio of faulty node is up to 0.15, the real hyperplane has been distorted by high percentage of faulty nodes. It can be demonstrated that BTS-COT shows fault tolerance in IWSNs where a small number of nodes is allowed to report false sensor readings.

CONCLUSIONS AND FUTURE WORK
In this article, a boundary tracking algorithm of hazardous continuous objects was proposed for IWSNs. Through organizing the network nodes into a full binary tree structured network, the border region of the continuous objects can be profiled by a small number of boundary mapping cells. To study the geometry characteristic of boundary line in great detail, we transformed the boundary tracking problem into a binary classification problem, and designed a hierarchical soft margin SVM training strategy -under the binary tree structured IWSNs. The boundary node selection was realized by searching support vectors in a high dimensional feature space. Simulation results demonstrated that the proposed algorithm achieves high tracking accuracy with a small number of boundary nodes. Even without specially designed fault-tolerant mechanisms, the algorithm displays inherent ability to eliminate the influence of spurious sensor readings caused by faulty nodes.
The proposed boundary tracking algorithm is designed for continuous objects with planar motion. Future research will be directed towards studying a variant configuration of the full binary tree structure, which can be applied for boundary mapping in 3D environments.