Intrinsic spherical smoothing method based on generalized Bézier curves and sparsity inducing penalization

This study examines an intrinsic penalized smoothing method on the 2-sphere. We propose a method based on the spherical Bézier curves obtained using a generalized de Casteljau algorithm to provide a degree-based regularity constraint to the spherical smoothing problem. A smooth Bézier curve is found by minimizing the least squares criterion under the regularization constraint. The de Casteljau algorithm constructs higher-order Bézier curves in a recursive manner using linear Bézier curves. We introduce a local penalization scheme based on a penalty function that regularizes the velocity differences in consecutive linear Bézier curves. The imposed penalty induces sparsity on the control points so that the proposed method determines the number of control points, or equivalently the order of the Bézier curve, in a data-adaptive way. An efficient Riemannian block coordinate descent algorithm is devised to implement the proposed method. Numerical studies based on real and simulated data are provided to illustrate the performance and properties of the proposed method. The results show that the penalized Bézier curve adapts well to local data trends without compromising overall smoothness.


Introduction
This study considers a regularized curve fitting method on the unit sphere. The spherical Bézier curve combined with a sparsity inducing penalty is introduced to fit a smooth path to a given set of noisy spherical data at known times. Spherical data observed over a continuum arise frequently in various fields including cardiology, computer vision, physiology, and geophysics; see, for example, [11,12,15,26,27] and the references therein.
Previous work on spherical smoothing methods can be broadly categorized into two groups. One way of approaching this problem is to map the data to a tangent space and use the usual smoothing techniques developed for Euclidean data. A seminal work in this direction is the smoothing spline method based on the rolling and wrapping procedures introduced by [12], which fully takes account of the geometric structure unlike previous studies. Kim et al. [14] and Kume et al. [15] extended the method using parallel transport in the context of shape data analysis. The primary advantage of this approach is that it enables researchers to apply well-developed Euclidean methodologies to various smoothing-related problems on general manifolds. However, the mapping between the manifold and tangent space is not direct, as it depends on a base path, which usually requires an iterative algorithm.
Another approach is to formulate an intrinsic optimization problem on the sphere. A smoothing curve is usually obtained by minimizing the sum of the squared Riemannian distance between the curve and data points under certain regularizing constraints. As pointed out by [23], the classical regularity constraint of restricting the curve to the family of polynomial functions cannot be directly applied since the notion of a polynomial is not straightforwardly defined on general manifolds. Instead, variational calculus approaches have been proposed to extend the smoothing spline construction to more general Riemannian manifolds; see, for example, [6,18]. More recently, [23,26] considered a gradient-based method with global regularizing constraints defined as the mean squared velocity or acceleration of the curve, as in [16,17].
Although the concept of algebraic polynomials does not carry over to the sphere, the Bézier curve can be generalized to provide a degree-based regularity constraint to the smoothing problem on manifolds. The Bézier curve has numerous applications in computer graphics, animation, modeling, CAD, and CAGD. It also provides a foundation for the B-spline construction. One may refer to, for example, [10] for an overview. Unlike the algebraic polynomial, the concept of the Bézier curve generalizes to manifolds through a generalization of the classical de Casteljau algorithm [3,8] obtained by replacing line segments with minimal geodesics. In particular, [25] generalized the de Casteljau algorithm to a three-dimensional unit sphere and provided a condition for a uniform C 1 spline, while [7] studied the algorithm in compact Lie groups and spheres. Popiel and Noakes [20,21] investigated the C 2 interpolation problem on Riemannian manifolds based on the generalized de Casteljau algorithm. The performance of the Bézier curve is completely determined by the number and location of the control points. The number of control points coincides with the order of the Bézier curve and their location is crucial for a Bézier curve of a specified order to capture both the local and global trends of the given data.
In this study, we introduce an intrinsic curve fitting method based on the generalized Bézier curve and a local penalization scheme on the unit sphere. A smooth Bézier curve is found by minimizing the least squares criterion under a regularization constraint. To select a control polygon in a data-adaptive way, we propose a penalty function that regularizes the differences in the velocity vectors of consecutive linear Bézier curves that construct the higher-order Bézier curve. The imposed penalty function has the form of the group lasso penalty [31] and this induces sparsity on the control points. As a result, the proposed method determines the number of control points, or equivalently the order of the Bézier curve, in a data-adaptive way. The majority of previous curve fitting methods use certain global smoothness constraints that do not take account of local fluctuations. On the contrary, our method adopts a local penalization scheme that can simultaneously provide a global regularity condition by the determination of the order of the Bézier curve. Therefore, the resulting Bézier curve adapts well to spatially inhomogeneous fluctuations without compromising global smoothness. We devise an efficient Riemannian block coordinate descent algorithm [29] to implement the proposed method. Numerical studies based on simulated data are provided to illustrate the performance and properties of the proposed method. We also apply the method to Triassic and Jurassic polar wander data for North America [13] and elephant seal data that record the daily position of a migrating elephant seal [4]. The obtained paths coincide with the findings of previous research and illustrate that our method provides appropriate orders of Bézier curves in a data-adaptive way.
The remainder of this paper is organized as follows. Section 2 defines the penalized intrinsic spherical Bézier curve and explains the penalization scheme. Section 3 summarizes the Riemannian gradients of the squared loss and velocity difference penalty functions with respect to the control points. An implementation scheme based on a coordinate descent algorithm and a model selection criterion are described in Section 4, followed by numerical studies in Section 5. Section 6 summarizes the results of the study and discusses possible generalizations of the proposed method. The proofs of the technical lemmas and theorems are relegated to the supplementary material.

Penalized intrinsic spherical Bézier curve fitting
Although the ensuing arguments can be readily extended to higher-dimensional scenarios, we focus on the 2-sphere case to illustrate the main idea in a lucid manner with the aid of visualization. Let S denote the 2-sphere. Given two points v and w on S, a linear Bézier curve γ 1 : I = [0, 1] → S is simply a geodesic segment between v and w defined as where θ = d(v, w) arccos(v w) denotes the great circle distance between v and w. Consider now the spherical Bézier curve of an arbitrary order. Assuming the distance between any two consecutive points is less than π, we adopt the generalized de Casteljau construction to define the spherical Bézier curve of an arbitrary order; see [20] and the references therein. Let ξ = (ξ 0 , . . . , ξ J ) be a set of distinct control points on S. We define where j = 2, . . . , J and i = 0, . . . , J − j. Here, ξ l:m , l ≤ m, denotes (ξ l , ξ l+1 , . . . , ξ m ). The curve is called the spherical Bézier curve of degree J (order J + 1) with control points ξ 0 , . . . , ξ J . Figure 1 illustrates the Bézier curves for J = 1, . . . , 4, which correspond to a linear, quadratic, cubic, and quartic Bézier curve in the Euclidean case.
Let (t n , y n ) N n=1 be a set of data, where t n ∈ I are the given time points and y n are the associated data points on S. Our goal is to find a spherical Bézier curve that fits the data well. To this end, we adopt the squared distance loss function: where the parameter space is the (J + 1)-product of S satisfying the aforementioned assumption.
The main difficulty that arises in the polynomial curve fitting problem is determining an appropriate order of the polynomial. To avoid the oversmoothing and undersmoothing problem, we propose a penalty function that enables us to choose the order of the Bézier curve in a data-adaptive way. The order of a Bézier curve is completely determined by the number of control points. In other words, the recursive definition implies that the global smoothness of the curve is characterized by the number of geodesic segments.
We introduce a smoothness penalty function which regularizes the number of geodesic segments needed to fit the data. Here, · above a curve denotes its derivative with respect to t and | · | denotes the Euclidean 2 -norm of a vector.
The elimination process reduces the order of the resulting Bézier curve by one. Figure 2 illustrates the eliminating process that starts with J = 2. The process works in a similar way for higher-order cases. In the limiting case in which λ → ∞, all the velocity differences in the penalty term tend to zero so that only two control points are left. This minimal model provides a linear Bézier curve that fits the given data, which can be viewed as an analog of the least squares line in the Euclidean case. Section 4.2 describes the procedure for finding an optimal complexity parameter λ and determining the order of the PISB curve.

Riemannian derivatives of the objective function
Our implementation scheme is based on a block coordinate-wise gradient descent algorithm that sequentially updates each control point. This section summarizes the Riemannian gradient of the objective function with respect to the control points, which provides the foundation of the proposed algorithm. We first set some notations in the differential calculus. The derivative of g : R n → R m in standard bases is . . .
Example of the elimination process that starts with J = 2. Plot (a) shows the initial Bézier curve (black line) with three initial control points (blue dots) and the corresponding control polygon (blue line). As the complexity parameter λ increases, the difference in the velocity vectors of the two geodesic segments becomes smaller as in Plot (b). When the two segments are aligned on the same great circle, we eliminate a control point, which reduces the order of the Bézier curve as in Plot (c).

The (Euclidean) gradient of a scalar-valued function
Suppose thath is a smooth scalar field defined on a Riemannian manifold M. The Riemannian gradient ofh at x, denoted by grad xh (x), is the unique element of the tangent space of M at x defined through the directional derivative that has the steepest ascending property. Consider now a Riemannian submanifold M of M endowed with the Riemannian metric on the tangent bundle of M. Then, the gradient of h is equal to the projection of the gradient ofh onto the tangent space of M at x so that where P x denotes the projection operation. One may refer to [1] for an overview of the first-order geometry on Riemannian manifolds. In the case of the 2-sphere, we view S as a Riemannian submanifold of R 3 with the inner product inherited from the usual inner product on R 3 . Suppose thatḡ : R 3 → R and g denotes the restriction ofḡ to S. Then, following the above argument, we obtain where ∂ xḡ (x) is the Euclidean gradient ofḡ in the ambient manifold R 3 defined as and P x = I − xx is the projection matrix. Here, I denotes the identity matrix. Hereafter, we identifyḡ with g and suppress the subscripts x and x j when no confusion occurs. All the proofs of the results in this section are collected in Section 1 of the supplementary material.

Squared distance loss
For s ∈ I and z ∈ R, let and denote Define Then, we can re-express the Bézier curves as We first obtain the usual Euclidean derivative of a linear Bézier curve with respect to each endpoint.

Theorem 3.1:
Let for s ∈ I and z ∈ R.
Theorem 3.2 enables us to recursively compute the derivatives of the Bézier curves of an arbitrary order with respect to the control points.
Theorem 3.3 states the Riemannian gradient of the squared distance loss. This provides an update formula for the proposed block coordinate descent algorithm when no smoothness penalty is imposed.

Theorem 3.3: We have
where φ n = d(y n , γ J (t n ; ξ)) and the derivative of the curve γ J (t n ; ξ) is given by the recursive formula in Theorem 3.2.

Velocity difference penalty
To obtain the Riemannian gradient of the velocity difference penalty function, we express it in terms of elementary functions. To this end, we first reconsider the linear Bézier curve. Let Consider now a set of control points ξ 0 , . . . , ξ J . The above relations imply Applying Lemmas 4 and 5 yields the Riemannian gradient of the penalty term with respect to the control points.

Block coordinate descent algorithm
To obtainξ λ for the PISB curve, we use a block coordinate descent method, where each block consists of the three-dimensional coordinates of each control point; see, for example, [29] for the block coordinate algorithm and its standard convergence properties. First, we set the initial values of ξ by J + 1 number of data points y n associated with (j/J)th quantiles of the time points t n for j = 0, 1, . . . , J. Denote the initial values by With this initialization scheme, the proposed method can fail if the given observations on the sphere are separated by a large distance in view of the requirement on the generalized de Casteljau algorithm. However, we believe that this issue is of little practical concern, since it does not make much sense to fit a smooth path to a set of spherical data with drastic fluctuations.
From Theorems 3.3 and 3.4, we can obtain the Riemannian gradient of λ with respect to ξ j in view of (1) and (2). Let ξ (l) j denote the current values of the control point ξ j in the lth iteration. For l = 1, 2, . . ., the update formula has the following form: where ρ l,j denotes a specified step size for the update of ξ j in the lth cycle. Here, Exp u , u ∈ S, denotes the usual exponential map from the tangent space of S at u to S itself; see [9] for more details. As a line search strategy, we adopt the step-halving method to determine the step size ρ l,j to guarantee the stable convergence of the proposed algorithm. When the objective function does not decrease with the current step size, the step size is halved. The step-halving procedure is repeated until the objective function decreases with the reduced step size. The iteration stops when the difference in between the current and the updated objective values of λ is less than = 10 −6 .

Optimal smoothness parameter and order selection
The procedure for finding an optimal order of the PISB curve is as follows. The Bézier curves that minimize the objective function are computed for an increasing sequence of the complexity parameters λ 1 < · · · < λ K , where λ K is sufficiently large so that only two control points are left. As the complexity parameter increases, we eliminate the control points that are no longer active.
In general, the choice of the polynomial order plays a crucial role in the curve fitting problem. We start with a sufficiently high order J to guarantee the flexibility of the model. Since the order of the Bézier curve is determined by the number of control points, the elimination process coincides with the order reduction process. Thus, we run the algorithm to obtain a Bézier curve whose order is adaptively determined by the given set of data.
To choose an optimal complexity parameter, we use a spherical version of the Bayes information criterion (BIC; [24]). The BIC for a sequence of the complexity parameters is defined as where J k denotes the number of control points for λ k . The optimal value for λ is chosen as λk, wherek = argmin 1≤k≤K BIC k .
We can also use the cross-validation (CV) to select an optimal complexity parameter. However, an optimal complexity parameter based on the CV results in overfitting for the PISB, as shown in Section 5.2.

Numerical studies
We present several numerical studies that illustrate the properties and performance of the proposed method. Section 5.1 compares the proposed first-order optimization algorithm with a second-order method in a simplified setup to provide insights into the optimization problem. A simulation study is conducted in Section 5.2 to examine performance of the PISB using a set of data generated from it with noise. In Section 5.3, we apply the proposed method to the Triassic and Jurassic polar wander data for North America to obtain a smooth apparent polar wander (APW) path. Section 5.4 presents an analysis of elephant seal data that consist of 73 daily positions of a migrating female elephant seal.

Discussion on the second-order optimization method
We briefly discuss the second-order optimization method and compare it with the proposed method. For ease of analysis, we first assume J = 1 and N = 1, and denote the control points by v and w. We fix the first control point v and minimize the squared distance loss with respect to w. In other words, we consider the problem of finding a geodesic segment connecting v and a single data point y.
In this case, the Riemannian gradient with respect to w can be immediately obtained from Theorem 3.2. Thus, to implement a second-order method, it suffices to compute the Riemannian Hessian defined as (see [1]) where ∂ 2 w (w) denotes the Euclidean Hessian of the squared distance with respect to w. Theorem 5.1 provides the Riemannian Hessian, with its proof given in Section 2 in the supplementary material.

Theorem 5.1: Denote
For s ∈ [0, 1], let R s (θ ) and R s (θ ) be the first and second derivatives of R s (θ ) with respect to θ and define Then, we have and For the Newton-Raphson update formula, we want to solve The optimization problem is inherently a two-dimensional problem on the tangent space at w, and thus we re-express it with the orthonormal basis E = {ε 1 , ε 2 }, where We denote by the subscript [·] ε the coordinates of a vector in terms of the basis E. Let the matrix for the linear map hess f : and the coordinate vectors of η and grad f be respectivelȳ Then, we have Therefore, solving the spherical normal equation (3) reduces to solving the Euclidean normal equation: hess fη = grad f .
We implement a block coordinate descent algorithm based on the second-order Newton-Raphson method with the step-halving line search method. We first compare the number of outer iterations for the first-order and second-order methods in the geodesic fitting problem with a single data point. Through 100 replicates, we record the number of outermost iterations for both algorithms. The result shows that the average number of iterations is 2 for the first-order method and 10.55 for the second-order method. Second, we generate N = 10 and N = 20 data points from a linear Bézier curve with noise. Through repeated experiments, we find that the second-order method is more sensitive to the initial values. Although it has a relatively stable local convergence property (see also the discussions in [5,19]), it often fails to converge when an initial control point is far away from the true one. For these reasons, we adopt the proposed first-order optimization method explained in Section 4. The derivation of the second-order optimization method in Section 2 of the supplementary material can be readily extended to implement the PISB method or other related optimization problems on the unit sphere. However, this seems to be of little practical importance for the current method, especially because the proposed first-order algorithm works well in practice.

Simulation
In this section, a simulation study is conducted to assess performance of the proposed method. We compare the the proposed method with a spherical smoothing spline (SS) in [12]. The R-code for the PISB is available at https://github.com/Jae-Kyung/PISB-method.git. We consider the three example spherical curves given as • example 1: Bézier curve with J = 1 • example 2: Bézier curve with J = 2 • example 3: C 1 quadratic Bézier spline with four control points.
For construction of the C 1 quadratic Bézier spline, refer to [20]. For each case, we consider N = 50, 100 and generate a set of observations as follows. First, N equispaced time points t 1 , . . . , t N are generated on [0, 1]. We set control points corresponding to each case and denote by γ the curve determined by the corresponding control points. To simulate noisy data, we adopt the von Mises-Fisher distribution on S. For each n = 1, . . . , N, we generate y n from the von Mises-Fisher distribution with the mean direction γ (t n ) and a concentration parameter 5 × 10 2 . We use the rvmf function from the Directional package [28] in R to obtain random samples from the von Mises-Fisher distribution, which uses rejection sampling, as suggested by [30]. To evaluate performance of the proposed method, we compute the mean squared error (MSE) between a underlying curve and a fitted curve at given points. Moreover, we measure the order of the PISB curve to assess whether the proposed method identifies the correct order of a Bézier curve from noisy data.
We start with an initial Bézier curve of order 12 (J = 11). The optimal complexity parameter for the PISB method is selected based on both BIC and CV. Through 100 replicates, we record the MSE and the order of the PISB curve obtained from a set of simulated data (t n , y n ) N n=1 . Tables 1 and 2 summarize the averages and standard errors of MSE and order for each case, respectively. We focus on results for the PISB (BIC) since the PISB (CV) tends to overfit compared with the PISB (BIC). Table 1 shows that the proposed method performs well compared with the SS for cases of example 1 and 3. Figure 3 illustrates the simulation results for N = 50. The underlying curves (black line), typical examples of the PISB (red line) and SS curves (blue line) with the data are displayed, showing that the PISB curves closely approximate the true Bézier curves. Especially, the PISB curve is a geodesic segment in the case of example 1 while the SS curve cannot be a geodesic since it is based on smoothing spline. Moreover, the proposed method results in the smoother curve than the SS method in the case of example 3, by locally controlling the smoothness. Table 2 shows that the proposed method can provide a reasonable estimate of the order of an underlying Bézier curve. In example 1, the proposed method yields geodesic segments for all 100 replications. Our method seems to overestimate the order for the case of example 2. This is because underestimation does not occur in this scenario. Underestimation implies that the obtained PISB curve is a geodesic segment with only two control points, which cannot be selected as an optimal curve since the underlying Bézier curve is quite curved.

Triassic and Jurassic APW path for North America
We consider the polar wander data presented by [13]. They argued that previous estimates of Triassic and Jurassic paleolatitudes for North America tend to be biased because of inclination error in sedimentary rocks. They thus constructed a new composite APW path for Triassic through Paleogene based on igneous rocks, which are not subject to inclination error, and corrected sedimentary results.
In the data presented by [13], the 17 Triassic/Jurassic cratonic poles from other major cratons are rotated into North American coordinates and combined with the 14 observations from North America. Our method is applied to these 31 observations ranging in age from 243 to 144 Ma (millions of years ago), which covers the late Triassic and Jurassic periods. We start with an initial Bézier curve of order 12 (J = 11) and the tuning parameter is selected based on the BIC. A sequence of 30 tuning parameters is generated on the log-scale between λ 1 = 10 −8 and λ 30 = 10.
The APW path chosen by the BIC is a Bézier curve with J = 3. Figure 4 presents the plots of the obtained APW path (red line) and the associated control points (red points) in the geographic coordinates. The path goes from left to right in the plots. It shows that the path has a clockwise rotational trend on the projection map. We see 4 control points are sufficient to capture the local fluctuations of the data. The data point far below the path corresponds to the observation at 243 Ma. Kent and Irving [13] started their APW path at 230 Ma because there are no observations between 243 and 227 Ma. In our result, the proposed algorithm automatically pulls the initial control point toward around the 230 Ma point based on the data. The control polygon appears to be formed more inward than expected. This is due to the fact that the data points are not strictly ordered along the path. One may refer to [13] for more details about the data and implications of the results.

Migration path of elephant seals
We apply the proposed method to the elephant seal data presented in Table 1 of [4]. These data consist of 73 daily positions (by longitude and latitude in degrees) of a migrating female elephant seal. The positions are recorded to keep track of a round trip of the migration. An interesting hypothesis is that elephant seals migrate through a great circle [4]. If that were the case, it would imply that elephant seals possess an ability to assess their positions and constantly adjust their routes to follow the shortest path.
To estimate the path of a migrating elephant seal, we divide this dataset into inbound and outbound trips. The first 39 points represent the outbound trip, while the remainder depict the inbound trip. Figure 5 displays the two sets of data in the longitude and latitude coordinates. We regard the first point of the outbound trip data as the departure point and the 39th point as the migration destination. The same applies to the inbound trip data.  We incorporate this information by fixing the first and last control points at the departure point and the destination of each set of data. We start with an initial Bézier curve of order 12 (J = 11) and the tuning parameter is selected based on the BIC. A sequence of 30 tuning parameters is generated on the log-scale between λ 1 = 10 −8 and λ 30 = 10.
The resulting PISB paths for the two trips are Bézier curves with degree 1, which represent great circle paths. The BIC value is −29.8. Figure 6 displays the obtained path (red line) with the associated control points (red points) and data points for the outbound trip in the geographic coordinates. The left plot shows the path on the globe and the right plot is  We omit the plot of the PISB path for the inbound trip data since it shows an identical trend. The result of our analysis implies that the elephant seal moves toward a destination via the shortest route, which supports the hypothesis that she migrates through a great circle. This implies that such seals are aware of their destination at the time of departure and can adjust their routes in the most efficient way during their migration. The obtained result coincides with the previous findings by [4,22] even though our method is a strict testing procedure.

Concluding remarks
This study investigated an intrinsic smoothing method on the 2-sphere using spherical Bézier curves and sparsity inducing penalization. The proposed method finds a smooth Bézier curve based on a set of spherical data observed at known times whose order is determined in a data-adaptive way using the velocity-based penalization scheme. The Riemannian derivatives of the objective function are obtained in a recursive way to implement the PISB curves of an arbitrary order. The numerical studies show that the local penalization scheme helps detect subtle data trends while regularizing the overall smoothness of the PISB curves using the degree-based constraint. In particular, the estimated APW path and migration path obtained from the polar wander data and elephant seal data illustrate the advantages and promising possibilities of the proposed method in various fields of application.
The results of this study can be extended in several ways. First, [2,7,21] considered the generalization of the de Casteljau algorithm on various Riemannian manifolds. Because the principal scheme of our approach does not depend on spherical geometry, it can be directly generalized to intrinsic smoothing methods on general Riemannian manifolds. For example, it can be extended to the space of positive matrices to analyze data from computer vision and medical imaging such as diffusion tensor imaging. It can also be generalized to higher-order cases to handle shape data. In addition, our study can provide a foundation for the development of penalized spline smoothing methods on Riemannian manifolds. Since generalized Bézier curves can be used to construct splines on Riemannian manifolds [20,21], the differential calculus results obtained in our study carry over to this setup to implement spline-based smoothing methods. Indeed, the velocity difference penalty introduced in this study provides a knot removal condition for C 1 splines. A higher-order variant of the penalty function can be combined with C r , r ≥ 1, generalized spline to induce sparsity in a knot sequence. This approach is expected to provide a flexible intrinsic spline smoothing method whose knots are selected in a data-dependent way.