Diffusion Particle Filtering on the Special Orthogonal Group Using Lie Algebra Statistics

In this letter, we introduce new distributed diffusion algorithms to track a sequence of hidden random matrices that evolve on the special orthogonal group. The algorithms are based on the Adapt-then-Combine and the Random Exchange methods, and diffuse Gaussian approximations of posterior densities computed in the Lie algebra of the special orthogonal group. Simulation results show that, in scenarios with nonlinear observation functions, the proposed algorithms perform closely to the centralized particle filter estimator and can outperform competing Extended Kalman Filters.

In [32], instead, we built Gaussian parametric approximations in tangent spaces to the Stiefel manifold to represent such p.d.f.'s. A different approach to tangent space parametrization for the Stiefel manifold is also found in [4].
In this letter, we consider a state tracking problem in which the hidden state evolves as a random walk [11] on the special orthogonal group, denoted SO(n), i.e., the set of the real orthogonal matrices of dimension n × n with determinant equal to one. We follow a different approach to represent the p.d.f.'s that are transmitted over the network, using Gaussian approximations computed in the Lie algebra of SO(n). The state estimates at each network node, on the other hand, are determined via the averaging algorithm described in [33] that approximates the Karcher mean of the particle set, which, in turn, generalizes to non-Euclidean spaces the notion of minimum mean-square error estimates.
The signal model in this letter can be employed, for example, to track the attitude of a rigid body [14], [15]. The proposed algorithms can deal with arbitrary nonlinearities in the observation function, which could be used to model limitations or defects of the sensing components, encoders, transmitters or receivers. Experimental results show that, by explicitly taking into account that the hidden state matrices evolve on SO(n), the proposed distributed PFs can achieve significant performance gains compared to previous methods.
The remainder of the letter is organized as follows: in Section II-A we present a brief summary of the properties of SO(n). The signal model is then described in Section II-B. We present the proposed ATC and RndEx algorithms in Sections III and IV, respectively. Results of numerical simulations are reported in Section V. Finally, our conclusions are left to Section VI.
Notation: We use regular weight lowercase or uppercase fonts to denote deterministic scalars (e.g., k, λ, Q), lowercase boldface italic fonts to denote deterministic vectors (x) and uppercase boldface italic fonts to denote deterministic matrices (X). Random vectors and matrices are denoted by lowercase (x) and uppercase (X) boldface sans-serif fonts, respectively. The sentence "Sample S (q) ∼ p(S)" means the same as "Draw a real matrix sample S (q) of a random matrix S with p.d.f. p(S)". Similarly "Sample s (q) ∼ p(s) " means that a vector sample s (q) of s is drawn according to p(s).

A. Geometry of SO(n)
The Special Orthogonal Group is a matrix Lie group [34] closed with respect to matrix multiplications and, accordingly, we will denote it in the remainder of this letter as G.
The matrix group G is also a smooth manifold [35]. The tangent space to G at a point M ∈ G, denoted T M G, is the space of n × n real matrices X such that M T X is skew-symmetric, where (·) T denotes the transpose of a matrix. Furthermore, it is possible to map a point S ∈ G into a point X in the tangent space T M G and vice-versa using respectively the logarithmic and exponential maps, which, for an arbitrary tangency point M ∈ G, are given by [33] Log where logm(·) and expm(·) are the matrix logarithm and matrix exponential functions [33], respectively. In particular, making M equal to the identity matrix I, the tangent space T I G is by definition the Lie algebra g of the matrix Lie group G, which, in this case, is the set of all real skewsymmetric matrices of dimension n × n.
If X Log M (S), then the geodesic connecting M and S on G is the curve γ : R → G, such that γ(t) = Exp M (Xt) [36]. Moreover, the geodesic distance d G (M , S) between M and S can be then shown to be ||X|| F [36], where || · || F is the usual matrix Frobenius norm.
It is also possible to move from a point X ∈ T M 1 G to another point in T M 2 G, with M 1 , M 2 ∈ G, using the transport operator [34] T : In particular, if M 2 = I, the transport of X ∈ T M 1 G to the Lie algebra g is simply the skew-symmetric matrix M T 1 X. Finally, since an n × n skew-symmetric matrix has only d n(n − 1)/2 free-varying entries, we can define a bijective mapping Φ(·) that associates each skew-symmetric matrix to a vector in R d , collecting its free entries in an arbitrarily chosen order. The inverse mapping is denoted by Φ −1 (·).

B. Problem Formulation
Let {S k } be a sequence of n × n random matrices that evolve on the Special Orthogonal Group G according to the random walk model [6], [11] where k ∈ N is the time index, {v k } is a sequence of independent, identically distributed Gaussian random vectors taking values in R d with zero mean and covariance matrix λ 2 I, and λ ∈ R is a hyperparameter. Multiple nodes on a sensor network record at each time instant k the observations {Y k,r }, taking values in R n×n , such that where the index r ∈ {1, . . ., R} denotes the r-th node in the network, H k,r : R n×n → R n×n is a (possibly nonlinear) function, and N R n×n is the matrix normal p.d.f. in R n×n [37], whose expression is where Ω r ∈ R n×n and Γ r ∈ R n×n are symmetric, positivedefinite matrices, and etr(·) and | · | denote the exponential of the trace and the determinant of a matrix, respectively. Estimation Objectives Given a realization {Y j,r }, 1 ≤ j ≤ k, 1 ≤ r ≤ R, of the observations {Y j,r }, we want to recursively estimate the value assumed by the matrix state S k in a distributed, cooperative fashion.

III. ATC DIFFUSION PARTICLE FILTER ON THE LIE GROUP (ATC-PF)
Assume that, at instant k − 1, node r has a posterior p.d.f. p k−1|k−1,r (S k−1 ) conditioned 1 on all network measurements that have been assimilated by node r up to instant k − 1. The ATC diffusion filtering method [38] is divided into two steps: Adapt Step We update p k−1|k−1,r top k|k,r , defined as the posterior p.d.f. obtained by assimilating the measurements {Y k,u } available at instant k at all nodes u in N (r), the neighborhood of node r, defined as the set of all nodes connected to r including itself. Assuming that the observation vectors Y k,u are conditionally independent from node to node given the state, we makep If p k−1|k−1,r (S k−1 ) is represented by a weighted particle set {w k,r } forp k|k,r (S k ) using a marginal particle filter [39] as follows: 2) Evaluate the weights with proportionality constant such that Q q=1w (q) k,r = 1. The optimal local state estimateS k|k,r prior to sensor fusion at node r is the intrinsic mean S ∈ G that minimizes E[d 2 G (S, S k )] [36], where the previous expectation is taken with respect top k|k,r . The particle filter approximates the intrinsic mean by the Karcher mean [33], [41] of the weighted particle set {w (q) k,r ,S (q) k,r } on the group G, i.e., and transmits it to nodes u ∈ {N (r) \ {r}}. Combine Step We fuse the local state estimatesS k|k,u , u ∈ N (r) to obtain a new merged estimate at node r. We propose to do it in a principled way that conforms to the geometry of the matrix group G.
Upon receiving the local state estimatesS k|k,u from each node u ∈ {N (r) \ {r}}, node r first computes the Karcher mean where {a r,u } is a set of positive real weights such that u a r,u = 1 for all r. Next, node r executes the following steps: k,r ) ∈ TS k|k,r G, with Log(·) defined as in (1), evaluate the skew-symmetric matrix Z 2) Fit a d-variate Gaussian approximation π k|k,r to the p.d.f.π k|k,r using the Geometric Average (GA) fusion rule [29], [42] π k|k,r (z k ) ∝ using the same network weights a r,u as in (10).

6) Resamplez
where Exp(·) is defined as in (2). 7) Reset the particle weights w k,r }, which represents the final posterior p k|k,r that is propagated to the next time step. The fusion rule in (11) finds the p.d.f.π that minimizes u a r,u D KL (π π k|k,u ), where D KL is the Kullback-Leibler divergence. An extensive theoretical analysis of that and other alternative p.d.f. fusion rules is found in [43] and [44].
Implementation Details Given (1), the points Z (q) k|k,u in the Lie algebra g are obtained directly at each node u as Similarly, from (2), the points S (q) k,r on the manifold G at each node r are computed as On the other hand, since all underlying p.d.f.'s π k|k,u defined in R d are Gaussian, then the merged p.d.f.π k|k,r on the left-hand side of (11) is also Gaussian with fused mean and covariance matrices given by [17] Finally, to obtain the Karcher mean of a weighted particle set {w (q) , S (q) }, we used the iterative steepest descent algorithm in [33]. For numerical stability, the expm(·) and logm(·) functions were implemented using the Rodrigues' Formula (see [45]).

IV. RNDEX DIFFUSION PARTICLE FILTER ON THE LIE GROUP (RNDEX-PF)
Assume that, at instant k − 1, node l has a posterior p.d.f. p(S k−1 |Ỹ 0:k−1,l ) conditioned on all measurementsỸ 0:k−1,l that have been assimilated by node l up to instant k − 1. As explained in [28], [38], in the RndEx diffusion method, node l first exchanges a representation of its posterior p.d.f. with another random node r in the network using a pre-specified random exchange protocol. Node r, upon receiving that representation, updates the received posterior p.d.f. assimilating the measurementsỸ k,r that are available at the nodes u ∈ N (r).
Specifically, let {w k−1,l } be a weighted particle set that represents p(S k−1 |Ỹ 0:k−1,l ) and letŜ k−1|k−1,l be its corresponding Karcher mean. The RndEx-PF executes then the following steps: 1) Compute . . , Q} using sample moment matching, and transmit the parameters to a random node r according to the RndEx protocol.

2) Compute the importance weights
The state estimateŜ k|k,r is then the Karcher mean of the particle set {w

V. SIMULATION RESULTS
To evaluate the performance of the proposed algorithms we ran Monte Carlo simulations consisting of 1,000 independent runs. We used a network with five nodes: nodes 1 to 4 are on the vertices of a square and node 5 is at its center and is connected to all other nodes, as in [26], [27]. The noise covariance matrices were set to Ω r = I and Γ r = I · 10 −α r /10 , with α r equal to 3, 6, 10, 13 and 20 dB for r = 1, . . . , 5, respectively. The weights {a r,u } in (10) were obtained using the Metropolis rule (see [17]). We assumed that n = 3, Q = 200, and λ = 0.15.
The observation function H k,r in (4) was defined as i,j denotes the matrix element with the given indexes and h(·) : R → R a scalar function. We employed in the simulations two formulations for h(·), For comparison, we ran in the same setup the following competing methods: i) A centralized 2 EKF (Euclid.-EKF), based on a vectorized version of the model in (3) and (4): the state evolves according to an adjusted Gaussian random walk in the Euclidean space R n 2 ×1 , and its estimates were projected onto G using their polar factors; ii) A centralized manifold-constrained EKF (MC-EKF), based on [12] and [13]; iii) Particle filters (Isol-PF) that operate in isolation at each node; iv) A centralized particle filter (Joint-PF); v) A modified likelihood consensus distributed PF (LC-PF), which follows the algorithm in [47], but is adapted to the geometry of the Special Orthogonal Group for a fairer comparison.
Methods iii)-v) sample particles from the state transition p.d.f. in G defined by (3), and compute estimates as the Karcher means of their local particle sets as in (9). Fig. 1 displays the simulations results. As shown in the Figure, the ATC-PF has steady-state performance similar to that of the centralized particle filter for both formulations of h(x). On the other hand, the RndEx-PF performs worse than the ATC-PF for the polynomial nonlinearity, but better than the LC-PF. In both cases, the proposed methods outperform the particle filters operating in isolation and the EKFs by a large margin.
For the saturation-type nonlinearity ( Fig. 1(b)), all distributed PFs perform similarly to the centralized PF. The proposed methods have, however, smaller internode communication costs than the LC-PF, as it can be seen by replacing the simulation parameters n = d = 3, k c = 8 [47], and k e ≤ (|N (r) − 1|) in the expressions on the right column of Table I.  In this letter, we introduced new distributed diffusion PF algorithms to track a sequence of hidden random matrices that evolve on the Special Orthogonal Group. The algorithms employ a Gaussian parametric approximation of the weighted particle set defined in an isomorphism of the Lie algebra of the matrix group. Simulation results show that, in setups with different nonlinear observation functions, the performance of the proposed algorithms surpasses or equals that of an alternative distributed PF at a lower communication cost, and can outperform a manifold-constrained centralized EKF by a large margin.