MM-PHD filter-based sensor control for tracking multiple maneuvering targets hidden in the DBZ F

—To improve the performance of tracking multiple maneuvering targets hidden in the Doppler blind zone (DBZ), we put forward the idea of using sensor control technique to suppress the DBZ masking problem for the first time, by utilizing the principle that the absolute Doppler of a target with respect to a sensor is affected by the target-to-sensor relative geometry and extending multi model probability hypothesis density (MM-PHD) filter for DBZ masking to the partially observable Markov decision process (POMDP) framework. First, the process flow of sensor control is systematically constructed based on our existing work. Second, in the core sensor controller module, we devise three objective functions (including a new safety indicator ensuring sensor safety, a novel reward rule for the DBZ avoidance, and the Cauchy-Schwarz divergence (CSD) compatible with the multi-maneuvering-target tracking) and a decision-making logic for the selection of control commands. Finally, the feasibility and effectiveness of the proposed control scheme are verified through numerical examples, and it is demonstrated that it is obviously superior to the random control strategy and the earlier work without using the control technology.


I. INTRODUCTION
OR a ground moving target indication (GMTI) radar, the existence of Doppler blind zone (DBZ) is a main challenge [1]. The DBZ arises from the clutter cancellation to separate moving target returns from the clutter background. When the targets fall into the DBZ, the sensor cannot obtain the measurements originated from those targets. Then, a series of missed detections lead to many small and short trajectories, make track labels switch frequently, and dramatically degrade the tracking performance [2].
In [30] and [31], the PHD and CPHD filters were applied to the GMTI tracking respectively, by modeling the effect of Doppler blindness with a state-dependent detection probability. However, these references did not provide rigorous derivation. Moreover, they only utilized the minimum detectable velocity (MDV) knowledge [ 32 , 33 ] while ignored the Doppler information. Hence, we proposed a PHD-based filter that exploited both the MDV and Doppler information in [20]. However, the above works cannot be applied to the target maneuvering case. Actually, multi-maneuvering-target tracking (MMTT) has been a core difficult problem in the field of target tracking. When the problem of target maneuvering is coupled with the DBZ problem, the tracking task becomes very challenging.
Based on our previous work [20], the more complicated MMTT in the presence of the DBZ was firstly investigated in [34] via the multiple model (MM) PHD filter [35]. Nevertheless, the proposed MM-PHD_DBZ tracker is based on the PHD filter, which is only the first-order moment approximation of the multi-target filter and cannot output track labels in principle. To improve the tracking performance, we further proposed an MM-GLMB_DBZ filter in [36] based on the multiple model generalized labeled multi-Bernoulli (MM-GLMB) filter [37 -39 ] . However, in the above works including ours, the sensor trajectory is preset. To our best knowledge, the sensor control technique has never been used for suppressing the influence of the DBZ. To note, although many RFS control approaches have been extended to the partially observable Markov decision process (POMDP) framework to concurrently solve the multitarget tracking (MTT) and the sensor control problems [40 ] , such as the PHD-POMDP [41], MB-POMDP [42][43][44][45][46][47][48], LMB-POMDP [49][50][51][52][53][54][55][56][57], GLMB-POMDP [58], etc, these algorithms cannot be applicable to the DBZ masking case and are not suitable for multiple maneuvering targets. In response to these problems, this paper proposes an online sensor control algorithm for tracking multiple maneuvering targets hidden in the DBZ. This is accomplished by formulating it as a POMDP with the MM-PHD_DBZ tracker [34]. The underlying idea behind the algorithm can be briefly described as follows: the MM-PHD_DBZ tracker is used to track multiple maneuvering targets masked by the DBZ, meanwhile, the sensor control is exploited to suppress the DBZ, since the radial velocity of a target with respect to a sensor is affected by the relative geometry between the target and the sensor. Concretely speaking, the main contributions of this work are as follows.
1) we put forward the idea of using the sensor control technique to suppress the DBZ masking problem and verify its feasibility and effectiveness by simulation for the first time.
2) we construct systematically a sensor control processing flow diagram for the GMTI MMTT, by combining the sensor control technique and the MM-PHD_DBZ tracker.
3) we devise a new safety indicator with the advantages of simple and easy implementation. 4) we design a novel reward rule for the DBZ avoidance, based on a newly defined concept: absolute Doppler (see Section II.A) and its change rate. 5) we derive the Cauchy-Schwarz divergence (CSD) compatible with the MM-PHD_DBZ tracker. In other words, the existing CSD for single model system is extended to the MM version. 6) we present a decision-making logic for the selection of control commands, given the above three objective functions, i.e., the safety indicator, the reward for the DBZ avoidance and the CSD for the MM-PHD_DBZ tracker.
The remainder of the paper is organized as follows. Section II provides the necessary background including problem statement, MM-PHD_DBZ tracker and POMDP. Section III presents the sensor control algorithm. Section IV provides numerical results to illustrate the effectiveness of the proposed algorithm. Section V reports concluding remarks.

II. BACKGROUND
To jointly solve the sensor control and multi-target tracking problems, we formulate the sensor control problem under the POMDP framework in conjunction with an MM-PHD_DBZ tracker [34]. Hence, this section provides an overview of: i) the problem formulation for GMTI tracking; ii) the MM-PHD_DBZ tracker; and iii) the POMDP framework.

A. Problem formulation
Consider an MMTT scenario with a GMTI radar. Suppose that at time 1  k there are ||  X targets, whose states are r , a , e and r denote the true range, azimuth, elevation and Doppler of the target with respect to the sensor, respectively. In this paper, r is called as the relative Doppler since the velocities of both of sensor and target are taken into account in (5) Its absolute value is called the absolute Doppler, i.e., || ac dn (7) The word "absolute" in absolute Doppler has two meanings. First, it is named absolute Doppler to distinguish it from relative Doppler r (5), since only the target velocity is taken into account in (6) while both the target velocity and the sensor velocity are incorporated in (5). Second, the absolute value symbol ||  is included in (6). The clutter notch not only suppresses clutters but also influences the low-Doppler target detection. Specifically, . Hence, a low-Doppler target, whose absolute Doppler magnitude falls below the MDV, can easily be masked by such a DBZ. This results in a series of missed detections and significantly deteriorates the tracking performance.

B. Summary of the MM-PHD_DBZ tracker
To track multiple maneuvering targets hidden in the DBZ, we have proposed the MM-PHD_DBZ tracker in [34]. For the sake of completeness, we present a brief overview of this tracker, which is composed of prediction, update, multi-target state estimation (MSE), and ATI, as depicted in Fig. 1.

MM-PHD_DBZ update
Multi-target state estimation The outline of the tracker is as follows. The prior intensity (including the posterior intensity 1  k v and the birth intensity is returned by the tracker. At the next time k , the prior from the previous step is processed through the prediction step of the MM-PHD_DBZ tracker resulting in the predicted intensity |1  kk v , which has a Gaussian mixture (GM) form, i.e., (8) where and denote the kinematic state and the mode state respectively, ( ; , )  mP denotes a Gaussian density with mean m and covariance P , and , , , JwmP are the associated parameters of the GM intensity.
The predicted intensity together with the sensor measurement set Z are fed to the MM-PHD_DBZ update block, outputting the updated intensity k v , which also has a GM form, i.e., The updated posterior k v is further post-processed (e.g. the "pruning-merging" or resampling procedure is exploited for the GM or Sequential Monte Carlo implementation). The resulting posterior is then used as a surviving prior in the next time step. Moreover, it is fed to the MSE block, which provides the tracker's output, i.e., multi-target state estimation ˆk X . Meanwhile, the output ˆk X is used to remove measurements located near ˆk X and select possible measurements originated from birth targets to realize measurement-driven ATI. The ATI block outputs the birth intensity ,k v , which is used as the birth prior at the next time. To avoid repetition, see [34] for more details of prediction, update, MSE, and ATI.

C. POMDP
Based on the MM-PHD_DBZ tracker, we employ the sensor control technique for suppressing the influence of the DBZ and further improving the tracking performance. The sensor control problem is formulated in a POMDP framework [60] [61].
The POMDP used in this work comprises the following four parts of elements at any time k . 1) elements related to multiple maneuvering targets include: : the predicted intensity to time k , which comes from the prediction step of the tracker; • ( , ) k v : the posterior intensity at time k , given by the update step of the tracker; • , ( , ) k v : the birth intensity at time k obtained from the ATI step of the tracker; • ˆk X : multi-target state estimation, i.e., the output of the tracker.
Remark: the above elements are associated with the multitarget filter in essence. 4) elements related to the controller include: • : a discrete space of allowable sensor control action (command)  a . In the POMDP framework, the action space is infinite and continuous in principle. However, it is assumed that this space is discrete to reduce computational complexity; • 1: H is the length of the control horizon. Generally, the sensor control implementation with an infinite horizon for a large state space problem is prohibitively expensive. Hence, the implementation follows a myopic policy-with one-step-ahead planning in this paper. The term "myopic" means that only one control action is decided at a time, rather than planning multiple actions into the future, although the sequence of actions is devised over H look-ahead steps, i.e., , 1, ,   ki a a i H ; • () a : a real-valued objective function via applying an action command a , which can be () a , () a or () a in Section III.B. These functions are used to measure the quality of control actions.
In the POMDP framework, the goal of sensor control is to find the executing control command exe a satisfying and optimizing objective functions (reward or cost).

III. SENSOR CONTROL
The problem we focus on here is the online sensor control for tracking multiple maneuvering targets hidden in the DBZ, in which the sensor is to be controlled to improve the tracking performance. We first outline an overview of the overall flow diagram involved in the proposed approach. Having the big picture in mind, we then present the details of the flow diagram.

A. Overall flow diagram
Assume that an MM-PHD_DBZ tracker, summarized before in Section II, is running in a controlled sensor, and the sensor can be controlled via the control command  a . Fig. 2 shows the block diagram of the MMTT system in the presence of the DBZ via sensor control. The key difference between Fig.2 and Fig.1 is the incorporation of an additional "Controller" module, which is highlighted in gray in Fig.2, and whose details are provided in Fig.3.
Look at Fig. 2   Za is a future hypothetical measurement set that would be observed at time  kh , if control action a was executed. Then, the hypothetical PIMS is taken as the actual measurement set. The term "ideal" means that the PIMS is clutter-free and noisefree. Although all possible future measurement sets, including measurement noise, clutter, and miss-detections, can be considered to compute the following objective functions, the substantial computations needed for averaging over the whole measurement set distribution is expensive. Hence, the computationally lower cost PIMS approach is exploited. Using the PIMS associated with a control command, the MM-PHD_DBZ update step is performed. Since the PIMS is a hypothetical measurement set, the step is called the pseudo update.  Fig. 3. The detailed contents of the controller block in Fig.2.
The resulting k v h a will be used to calculate appropriate objective functions. After all objective function values are computed for all possible sensor commands, they are then processed by a decision-making module to output the control command exe a to be executed. The selected command is applied to the sensor (for instance, it is displaced or rotated according to the chosen action command). Then, the sensor state changes accordingly and the sensor acquires the real measurements Z . Using these measurements, the predicted intensity |1  kk v is updated. The updated posterior k v is further processed by the MSE and ATI modules, which are the same as those in Fig.1.

B. Designing objective functions
The core of the Controller module in Fig.3 is the calculation of objective functions. Indeed, the proper choice of meaningful and computationally tractable objective functions is a critical part of the control design task. Here, we devise three objective functions including a safety indicator, a reward for the DBZ avoidance and a CSD compatible with the MM-PHD_DBZ tracker.

1) Safety indicator
Generally speaking, the sensor needs to maintain a safe distance from tracked targets, although getting close to the targets of interest helps to improve tracking accuracy. The notion of void probability is usually employed to meet the safety requirement [51] [58]. The void probability in an exclusion region is the probability of no object existing in that region. However, in general cases, its closed-form may not exist, so it must be computed using numerical methods such as cubature or Monte Carlo integration. For this reason, this section devises another safety indicator, which has the advantages of simple and easy implementation. Given Meanwhile, the future sensor state () ,  a k h s is given by (13). Then, the distance between ( , ) 2) The reward for the DBZ avoidance The reward design for avoiding DBZ masking is inspired by two observations. First, for a target outside the DBZ (e.g., target 1 or target 2 in Fig.4), if its absolute Doppler is increasing, then it can ensure that the target will not fall inside the DBZ in the following periods. This is what we most want to see, so we give it the highest score, for example, 2 points.
Second, the first case may be difficult to satisfy, and the absolute Doppler may be decreasing and approaching the MDV (see target 3 or target 4 in Fig.4). Nevertheless, the decreasing rate is such small that the absolute Dopplers over the future H times are still outside the DBZ. This case basically meets the need of avoiding the DBZ, so we also give it a lower score value, e.g., 1 point. In other words, the first case is preferred compared with the second case.
Then, given each single-target estimation () Finally, we obtain the total scores for all of the estimated targets, which is also called the reward for the DBZ avoidance, given by

3) Extended Cauchy-Schwarz divergence
To improve tracking accuracy, the information divergence between the predicted intensity and the posterior intensity can be used as the third objective function. The rationale is that the information divergence quantifies the expected information gain from predicted intensity to posterior intensity, and the latter is equally or more informative than the former [62] [63]. Hence, a larger information divergence often results in a more informative posterior intensity, and consequently, yields better estimation results.
Compared with commonly used Kullback-Leibler and Renyi divergences [64], the CSD [65] [58] has a closed form of the analytical expression for GM intensities, hence leads to a more efficient implementation. Thus, the CSD is adopted in this work. The CSD between predicted intensity () K denotes the unit of hyper-volume in , and the dependence of the intensities on control action a is ignored for clarity whenever no confusion occurs.
It should be noted that the existing CSD formula [65] is for a single model system, so it is not suitable for the case of multiple maneuvering targets. Let's extend it to a multi-model system.
The standard inner product is usually defined by , ( ) ( )  (24) Then, substituting predicted intensity (23) and posterior intensity (24) into (21), we have w w a aa m m P P (27) Finally, substituting the above results (25)-(27) into (21) yields the CSD for the MM-PHD_DBZ tracker. To note, the resulting CSD incorporates with the mode variable . Hence, it is an extension of (9) in [65].
Remark: we find that in the sensor control algorithm, the calculation of the CSD consumes most of the time. Note that the original CSD in (21) (29) C. Decision-making Given these objective functions, the decision-making module outputs executing action exe a which satisfies some optimal condition in terms of the multi-objective function. To realize this aim, we present a simple heuristic method.
The decision-making logic, as depicted in Fig.5

IV. SIMULATION EXPERIMENTS
This section validates and evaluates the proposed control approach against the random control scheme and the existing MM-PHD_DBZ tracker without using the control technology [34] through 100 Monte Carlo runs. They are compared by using a performance metric called the optimal sub-pattern assignment (OSPA) for tracks (OSPA-T) [67], which includes the labeling error as well as the cardinality and localization errors, The order and cut-off (see [68] for the meaning of these parameters) are set to be 1  p and 50m  c , and an additional metric parameter is set to be 50m  (see eq. (7) in [67] for the meaning of this parameter).

A. Experimental Settings
For illustration purposes, we consider two scenarios where single or multiple maneuvering targets are tracked with measurements from a controlled and moving GMTI sensor in the 2-dimensional x-y plane.
The test scenarios contain 3 targets or one of them (e.g., target 1 in Table 1). Each target has survival probability 0.99  S p . It is assumed that each target is moving and mode space includes three types of motion models: constant velocity, right turn (coordinated turn with a 3 angle), and left turn (coordinated turn with a 3  angle), which can be expressed as ()  Fv (30) where the target state becomes x y x y , the state transition matrices for these three models can be obtained by substituting ω = 0 (for 1  ), ω = 3π/180 (for 2  ) and ω = −3π/180 (for 3  ), respectively, in  . The safe distance for the safety indicator calculation is min 500m  r , and 1  for the reward calculation. When the control strategy is preset (i.e., no control), it is moving in a straight line, i.e., the default action is angular (turn) rate of 0rad/s. When the random control scheme is adopted, the angular rate is randomly selected from space .
The sensor's detection probability is 0.98  Clutters are assumed to be uniformly distributed in the detection region, and their cardinality distribution is Poisson with mean 10  c . The parameters used in the MM-PHD_DBZ tracker are totally the same as those in [34]. See [34] for details.

B. Experiments and Results
In the first experiment, the test scenario involves relatively simple non-maneuvering single-target tracking for easy elaboration and understanding. In the second experiment, the test scenario involves more complicated MMTT to demonstrate the effectiveness of the proposed sensor control.

1) Scenario 1: single non-maneuvering target tracking
In Scenario 1, only target 1 in Table 1 is selected as the tracking object. For this scenario, sensor trajectories from different control schemes under 100 Monte Carlo runs are shown in Fig.6, the true relative distance of the target with respect to the sensor and the corresponding absolute Doppler of the target are depicted in Fig.7 and Fig.8, respectively, and the performance comparison with different schemes is provided in Fig.9. When the no control strategy is performed, the periods when the target is inside the DBZ are 39s-58s (see Fig.8). Hence, the total OSPA and OSPA cardinality (Card) component approach the cut-off value of 50m over 39s-58s due to missed detections caused by the DBZ masking (see Fig.9). To note, since the sensor executes the no control, the results (e.g., sensor trajectories in Fig.6, the relative distances in Fig.7 and the absolute Doppler in Fig.8) under different runs are coincident. When the random control strategy is executed, sometimes the relative distances fall below the safe distance (see Fig.7, where the safe distance is represented by the red line), and the time of the target inside the DBZ is also uncertain (see Fig.8). It's not surprising, after all, that the control scheme doesn't take the DBZ masking problem and the safety factor into account. As a result, its performance is not satisfactory, see Fig.9. However, when the proposed control scheme is implemented, the sensor successfully avoids the DBZ (Fig.8) while ensuring its safety (Fig.7). Therefore, the proposed scheme obtained significant performance gain in comparison with other control schemes, as indicated by the lower OSPA distance (especially during 39s-58s) in Fig.9.

2) Scenario 2: multiple maneuvering target tracking
The second scenario contains all targets (including the nonmaneuvering target and maneuvering targets) in Table 1. In Scenario 2, the scenario geometry, relative distances, absolute Doppler curves, and performance comparison are given in Fig.10, Fig.11, Fig.12, and Fig.13, respectively. Similar conclusions from Scenario 1 can be obtained for Scenario 2. In a word, it is verified that the proposed control scheme can significantly improve the tracking performance of multiple maneuvering targets by avoiding the DBZ while satisfying the security constraint.
Discussion: it should be noted that for our control problem, maintaining a safe distance, avoiding the DBZ and improving the tracking performance are the three objective functions driving the control. For single non-maneuvering or maneuvering target tracking, the conflict of control actions comes mainly from these objective functions. However, in the case of multiple targets, these conflicts come not only from objective functions but also from multiple targets. The goal of the control is to resolve these conflicts, by attempting to provide a decision that optimizes the multi-target estimation performance in an overall sense. In other words, all targets are without distinction for the proposed control. Since the absolute Doppler is affected by the relative geometry of a target and a sensor, to avoid one target falling into the DBZ may be at the expense of another target falling into the DBZ. Hence, it is expected that not all targets can meet the requirement of safety distance, especially to avoid DBZ requirement, in the presence of multiple conflicting targets. Possible future research is the sensor control for selective target tracking [55,56,57] because different targets have different priorities for a sensor. For instance, the closer the target is, the higher the priority is, the more it should avoid the DBZ as much as possible to ensure excellent tracking performance. Also due to the consideration that the absolute Doppler is affected by the target-to-sensor geometry, another possible investigation is the multi-sensor fusion technique to maintain continuous uninterrupted tracking of targets by using sensors at different locations for collaborative DBZ elimination.

V. CONCLUSIONS
In this paper, a sensor control algorithm for tracking multiple maneuvering targets hidden in the DBZ is proposed to further improve the tracking performance. The proposed algorithm is developed under the POMDP framework in conjunction with our previous MM-PHD_DBZ tracker. We first construct an overall flow diagram, where the Controller block is a core module. Then we design three objective functions, including a simple and easy safety indicator to implement, a novel reward rule for avoiding the DBZ and an extended CSD for the MMTT, and present the decision-making logic. Simulation results show that the proposed approach improves the tracking performance by the DBZ avoidance and the CSD guidance while ensuring sensor safety, and outperforms significantly the existing works with no control, which is further superior to the random control scheme. To further repress the DBZ masking problem, some of the future works will focus on sensor control for selective multitarget tracking and multi-sensor fusion techniques.