Bayesian Framework for Simultaneous Registration and Estimation of Noisy, Sparse and Fragmented Functional Data

In many applications, smooth processes generate data that is recorded under a variety of observation regimes, such as dense, sparse or fragmented observations that are often contaminated with error. The statistical goal of registering and estimating the individual underlying functions from discrete observations has thus far been mainly approached sequentially without formal uncertainty propagation, or in an application-specific manner. We propose a unified Bayesian framework for simultaneous registration and estimation, which is flexible enough to accommodate inference on individual functions under general observation regimes. Our ability to do this relies on the specification of strongly informative prior models over the amplitude component of function variability. We provide two strategies for this critical choice: a data-driven approach that defines an empirical basis for the amplitude subspace based on training data, and a shape-restricted approach when the relative location and number of local extrema is well-understood. The proposed methods build on elastic functional data analysis, which separately models amplitude and phase variability inherent in functional data. We emphasize the importance of uncertainty quantification and visualization of these two components as they provide complementary information about the estimated functions. We validate the framework using simulations and real applications to medical imaging and biometrics.


Introduction
Functional Data Analysis (FDA) is an area of statistics in which the primary objects of interest are more naturally understood as functions rather than vectors [22,8,25].This perspective is advantageous in a wide range of application domains including biology, medicine, environmental science and engineering, where the underlying evolution of variables of interest is often smooth.Though the goals of FDA, including sample summarization and visualization, inference, and prediction, are similar to those of multivariate statistics, they are more challenging due to the inherent difficulty of working in infinite-dimensional function spaces.Importantly, functional data often exhibit two types of structural variability: amplitude variability, akin to differences in magnitude within variables in the multivariate setting, and phase or warping variability, related to differences in the timing of amplitude features, absent in the multivariate setting.Failure to account for these sources of variability in FDA can result in misleading summaries and biased inference [20].

Motivation
The survey of FDA approaches in Section 1.2 highlights that existing methods frequently follow a sequence of estimation steps which perform the necessary tasks of (1) smoothing (projecting data to a lower-dimensional function space), (2) registration of smoothed functions (separation of amplitude and phase variability), and (3) summarization and/or inference.While conceptually straightforward, this pipeline generally suffers from two drawbacks: difficulty in formally propagating uncertainty between stages of estimation leading to over-confidence in the results, and lack of flexibility under different observation regimes.We argue that both of these issues can be resolved by a unified Bayesian inferential framework that performs smoothing, registration and inference simultaneously.Within this framework, data-or information-driven prior choices are necessary to extract meaningful observationlevel information regardless of the collection protocol and/or issues with data quality, which often occur in practice.
In particular, functional data objects are observed on a finite subset of an interval [a, b] ⊂ R, resulting in a variety of possible observational regimes: (1) densely observed functional data, wherein a large number of observations per function is available; (2) sparse functional data, comprised of a small number of non-uniformly spaced observations per function; (3) fragmented functional data, where observations of each function are unavailable over certain subsets of [a, b].While scenario (1) is often assumed for modeling, scenarios (2) and (3) are common in biomedical and industrial applications.Additionally, in each of these scenarios, observations are frequently measured with noise or error.Figure 1 illustrates such data and resulting inference on the underlying trajectory of a single observed function.Throughout this paper and in the figures, we transform the domain to the unit interval without loss of generality.Panels (a) and (b) show fractional anisotropy (FA) data, extracted from diffusion tensor magnetic resonance images (DT-MRIs), that are (a) fragmented early in the domain, or (b) densely observed.Panel (c) shows densely measured, but noisy, growth velocity curves for a group of children, and panel (d) shows both fragmented and sparse observations of bone mineral density (BMD).These examples vary in terms of the amount of sparsity, the degree of fragmentation, and the amount of noise.Panels (e)-(g) illustrate posterior uncertainty based on the proposed unified framework in different components (amplitude, warping, and their composition) of a smooth function underlying a single fragmented observation selected from panel (a).Posterior samples (grey lines) and pointwise 95% credible intervals (dashed lines) illustrate the extent of posterior uncertainty, with two estimates generated by existing methods [35,28] (PACE in red and WPACE in blue) provided for comparison.
Estimating the functions in such diverse observational regimes in the presence of amplitude and phase variation is a challenging problem.While the problem has been tackled for specific observation regimes, to the best of our knowledge, it has not been studied under a unified methodological framework.Such a framework should exploit the assumption that functional data share certain features, such as number of modes or inflection points.For example, in panels (a), (b) and (c) of Figure 1, the different functional observations contain We propose a Bayesian approach that satisfies these desiderata for simultaneous estimation and registration of functional data.

Related work
A popular approach to estimate underlying functions from discrete observations is penalized least squares (PLS [31]).Coefficients of a truncated basis expansion are estimated via least squares subject to an application-specific penalty, which enforces a soft constraint on some features of the function such as "wiggliness" as measured by total concavity [22,25].Sub-sequently, the phase and amplitude variation of the estimated functions can be extracted via landmark, metric-or model-based registration procedures [22,25,14,19,3].A drawback is the lack of a systematic mechanism to propagate uncertainty between the stages of such procedures.While [19] propose simultaneous smoothing and registration within a Bayesian framework, the model pools information from all available sample elements to estimate a common underlying state, or template, rather than the underlying functions, which are often of interest, but cannot be estimated individually in data-poor observation settings.
Certain observation regimes pose further difficulties.Under a low signal-to-noise ratio, individual estimation of functions via PLS tends to overlook population-level features, resulting in misleading uncertainty estimates in subsequent analyses.To address this problem, methods have been developed to simultaneously infer the functions underlying the observations and population quantities such as mean and covariance functions.Mixed model approaches have been effective in modeling discrete observations with a fixed population mean function, random subject-specific functions and pointwise error terms [24,34,23].However, none of these approaches consider phase variability as part of the model.In contrast, Raket et al.
[21] define a mixed effects model that partially accounts for phase variability by assuming a random subject-specific warping of the population mean.Descary and Panaretos [7] take an entirely different approach and model both small-and large-scale variation instead of considering small-scale variation as noise.Underlying functions are modeled as a mixture of smooth and rough components and are estimated from discrete, noisy data.
Shape-invariant models [17,9,29] jointly estimate a population template function and subject-specific scaling factors, translations and warpings from discrete, noisy observations.However, such models are often too rigid for inference of individual functions that may exhibit more complex subject-specific variation which cannot be captured by translating, scaling and warping a single, fixed template function.A related set of approaches consider shape-constrained estimation [33], wherein additional structure is built into the model, such as the number of peaks and valleys, to constrain the resulting shape of the estimated func-tions.This idea was also recently used in the context of density estimation by Dasgupta et al. [4] to constrain the number of modes.Methods to estimate functional data that is sparsely-observed with pointwise noise have also been developed.We adopt the informal convention that a functional observation is sparsely recorded if the number of observations for a subject is orders of magnitude smaller than the number of subjects in the dataset, as is common in longitudinal studies [32].The proposal in [23] and [29] consists of fitting mixed models and shape-invariant models to sparse data, respectively.Principal Analysis through Conditional Expectation (PACE), a framework for inference on sparse functional data proposed in [35], pools discrete observations across all sample elements to estimate mean and covariance functions via smoothing and functional Principal Component Analysis (fPCA).
Classification of fragmented functional data was studied in [6,12], without focusing on estimation or registration.Estimation of the individual functions, along with the population mean and covariance, for fragmented functional data was studied in [5] and [15] under fragmentary regimes distinguished by their assumptions on the coverage of observation intervals.
While these works implicitly assumed a missing-at-random mechanism for the unobserved parts of the function, recent work in [18] carried out the estimation tasks under a missingnot-at-random assumption.

Contributions and Organization
The issues raised above may be addressed by simultaneously combining coherent uncertainty propagation between smoothing, registration and inference, while automatically informing both the phase and amplitude structure of individual functions.In particular, we argue that an inferential framework under a general observation regime must include an automatic method to appropriately restrict the model space [21,33], whether based on a physical understanding of the underlying process or on previously observed data.Bayesian inference is a natural framework for introducing such hard and soft constraints through the choice of a prior distribution.We focus on two automatic approaches for prior specification on subject-specific amplitude components to carry out this restriction: 1.When densely-observed data is available, a data-driven prior for inference on sparsely observed and/or fragmented samples drawn from the same population (e.g., Figure 1(a)-(b)) is designed to capture important modes of amplitude variation.
2. When concrete physical understanding about the data-generating mechanism is available, such as the distribution of peaks and valleys (e.g., Figure 1(c)-(d)), we leverage a prior that informs the shape and smoothness by controlling the number of extrema.
We further model phase variation via a recently proposed point process-based prior on warping functions compatible with the discrete nature of sparse and fragmented data [2].
While Bayesian models for functional data with phase variation have been considered under the sparse and dense observational regimes (e.g., [29,36]), we are unaware of any work that deals with estimation for fragmented functional data in the presence of phase variation.Moreover, the proposed model for handling amplitude and phase variation in the dense and sparse regimes adds flexibility relative to current approaches, by allowing for subject-specific templates rather than assuming a common shape template.
The remainder of this paper is organized as follows.Section 2 describes an approach for prior modeling of amplitude and phase variability in densely observed functional data, which is used to inform the Bayesian model described in Section 3; this enables subjectspecific inference under general observation settings.We validate the proposed framework on simulated and real data in Section 4, and close with a brief discussion and some directions for future work in Section 5.The Supplementary Materials include (1) detailed statistical analysis of the complete FA functions, (2) an additional simulation for the shape-restricted amplitude prior model, (3) an additional real data example that considers estimation of CD4 cell count functions for HIV patients, (4) the Markov chain Monte Carlo (MCMC) algorithm used to sample from the posterior distribution and other implementation details, and (5) MCMC diagnostics.
2 Phase-Amplitude Separation via EFDA Functional samples from a population often share common features, such as the number and relative location of peaks and valleys.Differences in magnitude and location of these features along the domain are commonly referred to as amplitude and phase variability, respectively, as formalized in [22,25,20].It has been shown in multiple places [25,26,30] that modeling functional data without appropriately accounting for phase variability can result in inaccurate descriptive statistics and biased or misleading inference.Decoupling these two sources of variation is known as registration or alignment of functional data.For a sample of densely-observed functions, f i : [0, 1] → R, i = 1, . . ., k, a registration procedure yields corresponding amplitude functions fi : [0, 1] → R and phase functions γ i .Their . ., k reconstructs the original functions uniquely.A formal registration procedure must thus define a representation space of phase (these can vary in flexibility from linear to diffeomorphic transforms) and an appropriate optimality criterion.
To perform simultaneous smoothing and registration coupled with informative shape restriction, we choose a warping functional form and optimality criterion based on the elastic functional data analysis (EFDA) framework of Srivastava et al. [26].Indeed, its ability to define amplitude purely in terms of the shape of a function, that is, the number and (relative) heights of peaks and valleys independent of their locations on [0, 1], is consistent with our goal of semi-parametric shape restriction.Phase variability is modeled via smooth, where γ is the time derivative.Though most other methods use the L 2 metric as the optimality criterion for registration, this choice suffers from major deficiencies including the pinching effect (distorted amplitude) and asymmetry in registration (ill-defined amplitude) [25,20].This is because the L 2 metric is not invariant to simultaneous warping of functions, making its use inconsistent with the goal of registration.Instead, EFDA uses an extension of the Fisher-Rao Riemannian metric as a foundation for registration and statistical modeling [26]; we omit its formal definition here for brevity.While this choice has useful mathematical properties, including the critical invariance to identical warping, it is difficult to use directly.
However, the simple transformation described below reduces this metric to the standard L 2 metric, facilitating simplified computation.
Let F = {f : [0, 1] → R | f is absolutely continuous} denote the function space of interest.For any f ∈ F, define its square-root velocity function (SRVF) using the mapping The space of SRVFs corresponding to F is denoted as Q.Because the SRVF is invertible up to a translation, the original function can be reconstructed from its SRVF using

EFDA defines the amplitude of a function f as an equivalence class
Equivalently, on the SRVF space Q we have [q] := {(q, γ) | γ ∈ Γ}, where (q, γ) = (q • γ) √ γ is the corresponding action of Γ on Q under the SRVF map f → q.The set of equivalence classes forms a partition of Q, and is referred to as the quotient space Q/Γ, i.e., Q/Γ defines the amplitude space.Therefore, registration under this framework requires the determination of an average or mean equivalence class, and alignment of all functions to one of its elements.Let f 1 , . . ., f k denote a sample of densely-observed functions, and q 1 , . . ., q k their corresponding SRVFs.The sample mean equivalence class is taken to be the Karcher mean: [μ q ] = argmin To ensure identifiability, a representative element of the mean equivalence class, μq ∈ [μ q ], is chosen such that the average of the optimal warpings of all functions to µ q is the identity warping γ id (t) = t.In addition to the mean amplitude function, μf = Q −1 (μ q , f (0)) where f (0) = , this registration procedure also yields (1) phase functions, Functional Principal Component Analysis (fPCA) is a decomposition of the total sample variation into orthogonal modes of variability.We propose to perform fPCA on amplitudes to construct informative empirical priors over this component of variation.Also called vertical fPCA, it was first used within the EFDA framework by [30] in the context of building generative models for functional data.fPCA on the amplitude component corresponds to an eigendecomposition of the sample covariance function across the aligned SRVFs, K(s, t) = where φb , b = 1, 2, . . .are fPCs that form an orthogonal basis for the space of aligned SRVFs.With this datadriven basis, a finite representation of aligned SRVFs can be obtained through truncation Then, a generative model consists of (1) drawing a random coefficient vector c ∼ M V N B (0, diag( λ1 , . . ., λB )), ( 2) constructing a random SRVF as q r = μq + B b=1 c b φb , and (3) computing the corresponding function f r = Q −1 (q r , T ), where T ∼ N ( f (0), τ 2 ) is a random translation, and τ 2 is the sample variance of the function starting points f 1 (0), . . ., f k (0).Thus, the resulting random function f r is necessarily aligned to the mean amplitude function μf .
To illustrate EFDA, we consider the sample of simulated functions shown in Figure 2 T is a small translation that is included for improved display.The primary mode of variability captures differences in the height of the right peak, while the second mode describes variability in the left peak, and to a lesser extent, the relative heights of the left and right peaks.The remaining modes of variability, which are not displayed here, capture a negligible amount variability.
A model for functional data can be defined either on the original function space F or on the SRVF space Q.While modeling phase variation through the usual warping action f → f • γ is simpler on F, the simple L 2 geometry of Q makes dimension reduction through a (possibly data-driven) basis expansion possible.We choose a marriage of the two options, and assume that functional observations are generated under the model, where i (t) ind ∼ N (0, σ 2 i ) (in most cases, one can fix all σ 2 i to a common value), and the subscript i indexes each sample element (e.g., subject).Thus, individual curves y i are pointwise perturbations of a function Q −1 (q i , T i ) warped by γ i .This allows us to specify structured prior distributions on the amplitude component based on a judicious choice of shape-defining basis functions, while specifying a prior distribution on warping functions Γ that is naturally compatible with possible fragmentation and sparsity.This model is characterized by subjectspecific amplitude (and phase) components, in contrast to a common amplitude template as done in [29,36].We let y = (y 1 , . . ., y m ) and t = (t 1 < • • • < t m ) denote a vector of noisy, discrete functional observations and the corresponding time grid on which this data was observed, respectively.Then, for a function f , f (t) = (f (t 1 ), . . ., f (t m )) is a vector of function evaluations on this same time grid.

Likelihood Specification
We now formulate error model (1) for discretely observed functions, as is typical in practice.
The vector of observations of function i, denoted by y i , is obtained from evaluations over a subject-specific grid of time points ) of an appropriately translated (via T i ) and warped (via γ i ) SRVF q i specifying the overall shape, as This formulation allows general observation regimes, such as when t i is sparse or fragmented.
Under the normal distributional assumption on i , the likelihood for our model, based on observation y i , is given by

Prior Models for Translation, Phase and Error Variance
We first focus on model components whose probable values a-priori may be assumed relatively similar between problems.We assume that the translation parameters, T i , i = 1, . . ., n, and error variances, and σ 2 i ∼ Inverse-Gamma(α σ , β σ ).The choice of hyperparameters τ 2 , α σ and β σ is largely problem-specific.In all applications, we use µ T = 0.
Defining a prior process on warping functions is more challenging due to its restricted functional form.Common approaches in the literature model phase functions using basis expansions with constrained coefficients [9,29] or directly via the Riemannian geometry of the group of warping functions [19].In contrast, we use a piecewise linear process proposed in Bharath and Kurtek [2], consisting of random phase increments p(γ i ) = (γ i (t 1 ), . . ., γ i (t j ) − γ i (t j−1 ), . . ., 1 − γ i (t mγ )) over m γ successive time points where U * (0, 1) is a vector of order statistics from a Uniform(0, 1) random sample of size m γ , and θ γ acts as a precision parameter.This finite-dimensional model specification is computationally convenient; furthermore, Bharath and Kurtek [2] show that it has desirable asymptotic properties by relating it to a stochastic process with (measurably) increasing sample path as the time increments become arbitrarily small.
In this prior specification, the choice of the order statistics dictates where the prior process is centered.For example, uniform order statistics result in a prior distribution over Γ centered at the identity warping γ i d; order statistics from an arbitrary distribution G on [0, 1] would result in the distribution being centered at G −1 .We choose the uniform order statistics to regularize the prior model toward the identity warping.We evaluate the prior approximately by assigning the random order statistics to a uniform spacing of the domain grid, since the successive spacings in both cases are essentially of order O(1/m γ ).
The precision hyperparameter θ γ controls the spread of the prior over Γ: a small value results in a diffuse prior, whereas a large value concentrates the prior around its mean warping.The choice of θ γ should in part depend on the number of discrete time observations in the data (observational regime), i.e., when the functions are densely sampled θ γ can be small, but when faced with sparse data, a considerable amount of regularization is required to identify subject-specific model components.The second type of prior is appropriate when information is available about the maximum number and relative locations of local extrema that underlie the functional observations.

Shape-informed Prior Models for Amplitude
Importantly in practice, this prior model does not require the location of the extrema on the domain.We will also explain how this second scenario has connections to landmark registration [13].

Empirical Amplitude Prior
When densely-observed training data f 1 , . . ., f k is available, we can construct a shapeinformed prior for the amplitude of n new, partially observed functions on the subspace spanned by the empirical basis constructed from the amplitudes of training data SRVFs the Karcher mean μq so that the SRVFs q i , i = 1, . . ., n in model ( 1) can be represented as Thus, a prior process over the amplitude component q i can be defined through a prior distribution over the coefficient vector c i , such as

Shape-restricted Amplitude Prior
When reliable information about the number and relative ordering of extrema of underlying functions is available, a prior on the amplitude q i can be specified by choosing a set of basis functions that reflects this information.In contrast to the empirical amplitude prior, such a prior does not require training data.We assume the following form for the SRVFs q i : The basis functions are defined as , where U b are B-spline basis functions.This basis system is based on a modification of the shaperestricted B-splines developed by Wheeler et al. [33].These bases relate to the derivative of the function since amplitudes are defined on the SRVF space rather than the original function space.This basis system forces the SRVFs to be exactly zero at the change point locations α 1 , . . ., α H , corresponding to extreme values at these locations in the corresponding amplitude functions.The constant M ∈ {−1, 1} is application-specific and defines the order of extreme values.
Our use of this basis system differs from [33] as our model accounts for phase variability explicitly, i.e., we treat the change points as constants that determine the common locations of extreme values of the underlying amplitude functions, unlike the original work where the change points are also inferred.We use a diffuse exponential prior model on the coefficient vector c i to ensure that each basis coefficient is positive; this, in turn, guarantees that all of the amplitude functions have the same ordering of extrema.For this prior specification, the size of the discretization grid for the phase functions, m γ , cannot be set arbitrarily; to ensure identifiability, we set m γ = H.
The shape-restricted amplitude prior model has a clear connection to landmark-based registration [13] in which one must first identify a set of common landmarks on each function in a dataset.The landmarks either correspond to mathematical features of the data, e.g., extrema, or application-specific, interpretable features.Given a set of landmarks on each function, the functions are registered to each other via a piecewise linear warping that aligns the landmark locations exactly.If only extrema are considered as landmarks, then the change points in the shape-restricted amplitude prior act as domain locations for landmark alignment.In general, selecting appropriate landmarks can be challenging, especially when the number of functions and/or landmarks is large.The process of selecting non-mathematical landmarks can also be highly subjective.Recently, Strait et al. [27] developed an automated approach for mathematical landmark selection that alleviates the aforementioned issues.amplitude space are only registered based on the extrema.In this sense, the empirical basis is far more informative than the shape-restricted basis as illustrated in Figures 3 and 4.

Simulations and Real Data Applications
In this section, we discuss several simulated and real data examples which illustrate the performance of the proposed framework.We compare our methodology to PACE, implemented via a publicly available software package [35,28].The PACE framework is the most appropriate for comparison to our approach as it is able to accommodate sparsity, pointwise noise, and phase and amplitude variability in functional data.In brief, the PACE approach uses three steps: (1) estimation of population parameters, such as the mean and principal components, based on noisy and sparse observations, (2) estimation of fitted functional observations on a common domain grid based on the parameter estimates from (1) [35], and (3) registration of the estimated functions via a penalized L 2 metric criterion [28].There are two versions of the procedure, one for observations recorded over a common grid of domain points and one for observations recorded on different grids.When observations are recorded on a common grid, the PACE procedure begins with steps (1)-(3) (step (3) results in the final phase functions).Then, the estimate from step (2) is disregarded; instead, the phase functions are applied to the original noisy observations followed again by steps ( 1) and (2) to estimate the final amplitude functions.In this setting, a single estimate of the original observations is generated through composition of the final phase and amplitude estimates (termed PACE in results).On the other hand, when observations are recorded on different domain grids, following a first iteration through steps ( 1)-(3) (step (3) here results in a first estimate of amplitude functions and the final phase functions), steps ( 1) and ( 2) are applied again to the registered estimated functions to extract a second estimate of amplitude.Thus, one can actually obtain two different estimates: the first through composition of the final phase functions and the first estimate of amplitude functions (termed PACE in results), and the second through composition of the final phase functions and the second estimate of amplitude functions (termed WPACE in results).All tuning parameters are set by default in the package.In the simulation examples, we consider three estimators of a sparsely-observed or fragmented function f from a posterior MCMC sample {q [j] , j = 1, . . ., N } after burn-in: 2. The maximum a-posteriori (MAP) estimator fMAP = Q −1 (q [p] , T [p] ) • γ [p] where p is the index with the largest unnormalized log-posterior density.

The pointwise estimator fpointwise
The performance of each estimator is compared with the PACE and WPACE estimators.

Bayesian Model with Empirical Amplitude Prior
We begin the results section with a few examples that use the empirical amplitude prior in the proposed framework to fit fragmented and sparse functions.A major advantage of the proposed approach over PACE and WPACE is its ability to assess uncertainty in the estimated function.As evidenced by the credible bands, uncertainty is smaller along the portion of the function where data was observed than the portion that was not observed, both in terms of the height and location of the missing peak.
To provide a quantitative comparison of our method with PACE/WPACE, we simulated 100 missing portions of different functions from the complete data in Figure 5 the diffuse prior.Consequently, the MAP estimate generally fits the data very well in the non-fragmented region, but exhibits random warping in the fragmented region; this results in a high level of misalignment between the estimated and true functions in the fragmented region, which is greatly penalized by the L 2 distance.That said, the estimated amplitude portion of the MAP samples is very accurate and reflects the shape of the true function in terms of the heights of the peaks and valley.the body surface.Often, one studies the shape of PQRST complexes extracted from a long ECG signal, which can be associated with abnormal heart function [16].The letters PQRST refer to the first peak (P wave), the shallow, deep valley followed by the sharp second peak and another valley (QRS complex), and finally the third peak (T wave).In this simulation example, we study the performance of the proposed Bayesian model in the context of PQRST complex estimation from sparse observations.Figure 6  (e) describing posterior uncertainty in the unknown PQRST complex function.It turns out that the posterior distribution in this case is bimodal, with two modes formed by the phase functions.Thus, we display modewise summaries for the phase sample in panel (d) and the composition of amplitude and phase in (e).Again, the means are shown in bold black and the 95% confidence bands as dashed lines.The two modes correspond to two plausible locations of the QRS complex given the data.Indeed, none of the sparse observations cover the sharp R peak making its location difficult to predict.Importantly, the structure of the estimated PQRST complex based on each mode of the posterior distribution is valid.In contrast, the QRS complex in the PACE and WPACE estimates is highly distorted.
As with the preceding example, we numerically assess estimation performance of all methods.In this simulation, for each of 100 iterations, we select a PQRST complex from the training data at random, and artificially subsample it to ten observations chosen indepen- are not at all successful at capturing the true shape of the PQRST complex.To confirm this behavior, we additionally display the L 2 distances between the amplitude components of the estimated PQRST complexes and the true amplitude component.This is done by first optimally aligning the estimates to the true PQRST complex using the eFR metric.
The corresponding boxplots are displayed in Figure 7(b).It is clear that, with respect to this measure, the proposed model recovers amplitude features much better than PACE or WPACE.In fact, the plug-in estimate performs better than the PACE and WPACE estimates in 95% and 98% of the 100 iterations, respectively.Similarly, the MAP (pointwise) estimates perform better than the PACE and WPACE estimates in 85% (83%) and 97% (97%) of the 100 iterations, respectively.FA is given by the scalar quantity , where ν = ν 1 +ν 2 +ν 3

3
. A large FA value indicates a large degree of anisotropy.For practitioners, this summary of a DT-MRI provides a measurement of the quality of neuronal connections between particular regions of interest, and has been found to be a useful quantity to study subjects with various diseases including multiple sclerosis (MS) [10].In the MS setting, the autoimmune disease causes lesions and damage to tracts that results in a decrease in FA.Thus, FA can be used as a diagnostic measurement to distinguish between healthy controls and subjects with MS,

Bayesian Model with Shape-restricted Amplitude Prior
Next, we focus on examples where the proposed shape-restricted amplitude prior is most appropriate to estimate functions under considerable noise and sparsity.

Simulated Example 1: Functions with Low Signal-to-Noise Ratio
We first consider simulated functional data that not only contains phase and amplitude variability, but also low signal-to-noise ratio.The data is shown in Figure 10 phase component is underestimated resulting in a fair amount of phase variability that remains in the amplitude estimates.On the other hand, the proposed model is able to appropriately account for the pointwise noise.It results in estimated amplitude functions that have properly aligned peaks and valleys.Furthermore, there is a common degree of smoothness provided by the shape-restricted amplitude prior.In fact, the proposed model is able to estimate the true error variance (red line) in the likelihood fairly well (panel (b)).

Real Data Example 1: Berkeley Growth Velocity Functions
To further illustrate the structure imposed by modeling amplitude with the shape-restricted prior, we applied our methods to the well-known Berkeley growth dataset [22], in which the heights of children were tracked over the course of their lives from one to 18 years of age.In this study, we only use a subset of the data corresponding to 39 boys.In many cases, it is more natural to study the rate or velocity of growth, i.e., the derivative of height with respect to time, than growth itself.This allows for better understanding of growth patterns of the subjects wherein peaks in the velocity functions correspond to growth spurts, with the last, largest peak being the pubertal growth spurt.Consequently, we consider two different settings for our model; the first one allows for a single pubertal growth spurt in the amplitude functions, H = 2, α = (.57,.72),while the second one allows for an additional earlier growth spurt, H = 4, α = (.23,.57,.57,.72).We also set θ γ = 10 and B = 20 in both models.additional growth spurt estimated for some of the subjects in (b).Contrasting these results to the WPACE registration and smoothing results shown in (c), the pre-pubertal growth spurt is often smoothed-out, with the model failing to properly align many of the subjects' pubertal growth spurts.This is especially surprising since the phase functions estimated by WPACE are much less regular than those estimated using the proposed model.Figure 12 shows detailed inferential results for three individuals with and without the pre-pubertal growth spurt.In panel (a), it appears that the subject has a fairly significant pre-pubertal growth spurt.When the amplitude prior in our model is restricted to allow for a single peak, the pattern of observations around this growth spurt is treated as noise and consequently it is missed in the resulting estimated function.Additionally, posterior uncertainty in this region is relatively large.In contrast, when the prior is relaxed to allow for an additional growth spurt, we are able to nicely estimate both the pre-pubertal and the pubertal growth spurts.
In panel (b), there appears to be a single large growth spurt.Both estimates provided by our model appear to fit the data well.Finally, in (c), it is unclear whether there is a small prepubertal growth spurt.Again, both estimates provided by our model are reasonable.This example suggests that fixing the amplitude hyperparameter H a-priori can be limiting in

Real Data Example 2: Bone Mineral Density
As a last example, we consider indirect x-ray measurements of bone mineral density (BMD), associated with skeletal health and diseases such as osteoporosis [1].While osteoporosis affects individuals later in life, bone development during adolescence through early adulthood can be used to assess an individual's risk for the disease.We focus on a subset of the entire dataset corresponding to females aged nine to 25 years that had their BMD measured during two, three or four doctor appointments over the course of several years.Of primary interest in the study was the mean difference in BMD for four different ethnicities.Figure 13(a) shows the data where the different ethnic groups are highlighted using different colors.
While this subset of the data has been used to classify individuals in previous work [12,6], our focus is on estimation of a mean BMD trajectory for each group that accommodates phase variability.We restrict our model by forcing all individuals within the same ethnic group to have a common translation and a common strictly increasing amplitude function (prior hyperparameters are set to B = 5, H = 0 and M = 1).We do allow for individual phase variability with θ γ = 100 and m γ = 1, resulting in very simple and regular phase function estimates.The composition of the amplitude and phase samples then corresponds to individual BMD trajectories, and our interest is in their mean estimate.Our assumption that the mean BMD function is strictly increasing stems from the age range of the individuals in the study, and enforces the principle that, on average, BMD increases during this period.
The resulting groupwise mean BMD trajectory estimates, with 95% credible intervals, are shown in Figure 13(b).The results obtained via the proposed model visually corroborate the major finding of the original study that the Black ethnic group (red) has a significantly higher BMD than the other groups.Structurally, the estimated mean function for the Asian group, shown in blue, is also quite different from the others: there is a lot of BMD growth early followed by minimal growth later on in life.The estimated mean BMD functions for the Caucasian and Hispanic groups appear extremely similar.The original study concluded that the Asian, Caucasian and Hispanic groups have BMD patterns that are difficult to distinguish from one another.

Discussion and Future Work
We have presented a Bayesian framework for simultaneous registration and inference for functional observations that can handle challenging observational regimes such as sparsity, fragmentation and low signal-to-noise ratio.The framework explicitly accounts for phase and amplitude variability, and imposes amplitude restrictions in situations where different types of prior information are available.Indeed, we show that the key to inference under general observation regimes is the ability to inform the prior on the underlying structure of amplitude.We demonstrate the performance of the proposed model on several different simulated and real data examples to show the diverse range of scenarios that can be analyzed.
We also compare the proposed method to a state-of-the-art competitor.
We have identified two main directions for future work.First, in both the empirical and shape-restricted amplitude prior scenarios, we suggest additional modeling strategies that will further improve the proposed framework.In the first case, we will incorporate an additional hierarchical layer corresponding to subspace estimation (via fPCA) for the amplitude subspace.This will allow for direct propagation of uncertainty from the training stage to the estimation stage.In the second case, we will treat the number of local extrema allowed in the amplitude estimates H as well as their ordering pattern M , as unknown quantities to be estimated.This will result in a shape-restricted amplitude mixture prior wherein each mixture component corresponds to a different combination of H and M .Finally, we will extend the proposed model to other functional data scenarios including shapes of higher-dimensional curves, images and shapes of surfaces.

Figure 1 :
Figure 1: (a) Fragmented and (b) densely observed functional data from fractional anisotropy (FA) measurements.(c) Noisy growth velocity functions.(d) Fragmented and sparsely observed functions of bone mineral density (BMD).(e)-(g) Posterior summaries from the proposed Bayesian model for recovering a single FA function from (a) showing posterior uncertainty in the amplitude, phase, and composition of amplitude and phase, respectively.The blue and red functions in (g) correspond to estimates provided by existing methods.
(a).The phase and amplitude functions extracted via the eFR-based registration procedure are shown in panels (b) and (c), respectively, with the amplitude mean shown in bold in (d) on top of the original functions.The first two principal modes of variability for the amplitude component are given in panels (e) and (f), respectively.That is, we plot

Figure 2 :
Figure 2: (a) Simulated, (b) phase, and (c) amplitude functions.(d) Same as (a) with mean amplitude in bold.(e) First and (f) second principal modes of amplitude variability.

3
Model-based Estimation and RegistrationWe summarize the notation used thus far and introduce notation for discretely observed functional observations, which become key at the modeling and implementation stages.Let F be the set of absolutely continuous functions f : [0, 1] → R, Q the corresponding SRVF space under the mapping f → Q(f ) = sign( ḟ ) | ḟ | := q, and T ∈ R a scalar translation.The function Q −1 : Q × R → F takes in an SRVF and a translation, and maps them to the corresponding point f in the function space F. A warping function is denoted by γ ∈ Γ, the input domain [0, 1].The partition size is a user-specified value m γ ≤ min i m i , whose magnitude provides a trade-off between model flexibility and computation time.Each finite-dimensional vector of phase increments follows a Dirichlet distribution, p(γ i ) iid ∼ Dirichlet(θ γ p(U * (0, 1))), Besides allowing direct interpretation of phase and amplitude uncertainty, modeling underlying functions in model (2) via the EFDA framework allow us to define the amplitude model on the mathematically convenient SRVF representation space.Choice of the prior model over the registered SRVFs, representing amplitude, is both problem-specific and has a potentially strong impact on posterior inference under general observation regimes.We propose two semi-automatic approaches to model prior information about amplitude in different data collection scenarios.The first type is appropriate when, along with data that is sparsely observed or fragmented, we also have access to functional observations of different subjects from the same population that are neither fragmented nor sparse; we refer to this data as training and define an empirical prior model based on statistics computed from this data.

q 1 ,
. . ., q k .Elastic fPCA is carried out by first jointly performing registration and computation of the Karcher mean μq , and then decomposing the sample covariance of the amplitude components of the training data to obtain eigenfunctions { φb , b = 1, . . ., B} and corresponding eigenvalues λb , b = 1, . . ., B. The basis functions φb represent amplitude variation about )).This prior specification provides a data-driven way of imposing structure on the amplitude model based on the training data.Note that the training observations must be representative of the important features of amplitudes in the population, and their number must be sufficient to be able to estimate these.The proposed prior model is illustrated in Figures2 and 3for a simulated training dataset.Figure3(a)shows the sample mean SRVF (black) and fPCA basis computed using the amplitude functions in Figure2(c).SRVF draws from the prior, and corresponding amplitude functions obtained by the transformation Q −1 , are shown in Figure3(b)-(c), respectively.We illustrate visually that these random functions are registered to the amplitude functions in the training data, as required.

Figure 3 :
Figure 3: (a) Sample mean SRVF (black) and fPCA basis elements computed using the training data in Figure 2. (b) SRVF draws, and (c) corresponding amplitude functions transformed under Q −1 , generated from the proposed empirical amplitude prior model.

Figure 4 Figure 4 :
Figure 4 illustrates the shape-restricted amplitude prior and the structure it enforces on the amplitude component in our observation model.Panel (a) depicts the basis system that is used to represent the SRVFs.Panels (b) and (c) show several prior draws of SRVFs and corresponding amplitude functions, respectively.The basis system is constrained to take zero values at the change point locations.Combined with the appropriate restrictions on M and the basis coefficients, this pattern propagates to the generated SRVFs.Zeros of SRVFs map to extrema of amplitude functions; the relative heights of the extrema are flexible.Both methods proposed here for informing the amplitude model consist of constructing a set of meaningful basis elements, either empirically based on training data or from prior shape information concerning the number and relative location of extrema.The shape-restricted amplitude prior relies on the practitioner to specify the number and pattern of extreme values, while the empirical amplitude prior achieves this automatically, but requires training data.Another difference is that functions in the amplitude space spanned by the empirical basis are aligned with respect to the eFR metric, while elements of the shape-restricted

4. 1 . 1
Figure 5(a)-(b) shows simulated training data and a fragmented functional observation (black points) that we wish to infer, respectively.Within our unified framework, we employ an empirical amplitude prior with θ γ = .1,m γ = 8 and B = 8, to infer the full underlying

Figure 5 :
Figure 5: (a) Simulated training data.(b) Fragmented simulated function with observations shown as black points.Posterior draws (transparent), pointwise mean (solid black) and 95% credible interval (dashed) for the (c) amplitude and (d) phase components.(e) Composition of amplitude and phase, and PACE (red) and WPACE (blue) estimates.(f) Boxplots of L 2 distances between the true function and estimated function.Each boxplot corresponds to a different approach: (1) plug-in, (2) MAP, and (3) pointwise estimators based on posterior samples from the proposed Bayesian model, and (4) PACE and (5) WPACE.

4. 1 . 2
Simulated Example 2: Sparse ECG Signals The electrocardiogram (ECG) is an important diagnostic tool for many conditions including myocardial infarction.It records fluctuations in electrical potential of the heart muscle on

Figure 7 :
Figure 7: (a) Boxplots of L 2 distances between the true function and estimated function.Each boxplot corresponds to a different approach: (1) plug-in, (2) MAP, and (3) pointwise estimators based on posterior samples from the proposed Bayesian model, and (4) PACE and (5) WPACE.(b) Same as (a) but for the amplitude component only.(c) The true function (black) and corresponding sparse observations (black points), and plug-in (magenta), MAP (green) and pointwise estimates (blue).(d) Same as (c) but with PACE (red) and WPACE (orange) estimates.
dently and uniformly along the domain; the remaining PQRST complexes are treated as training data to generate the empirical prior for the amplitude.We consider the same three posterior estimates as in the previous section and compare them to the PACE and WPACE estimates.Boxplots of the L 2 distance from the true function to the five different estimates are shown in Figure7(a).It appears that all methods show comparable performance.Consensus amongst the different estimators occurs mainly due to the fact that the L 2 distance criterion used here greatly penalizes misalignment between the true and estimated functions.Consider the example visualized in Figure7(c)-(d).The three estimates in (c) are based on the proposed model, while the two estimates in (d) correspond to PACE and WPACE.The estimated functions in panel (c) are clearly better at capturing the shape of the PQRST complex.Unfortunately, the estimated phase results in a slight misalignment of the very pronounced R peak, which carries a significant penalty based on the L 2 distance.This misalignment in the estimate is due to a lack of observations along this important feature of the PQRST complex.On the other hand, the PACE and WPACE estimates in panel (d)

4. 1 . 3 Figure 8 :
Figure 8: (a) Observed complete FA functions.(b) Observed fragmented FA functions.(c) Zoom-in on the fragmented region.Posterior mean estaimtes of amplitude (d) and phase (e) for the nine fragmented FA functions shown in (a).(f) Composition of the amplitude and phase posterior means.

Figure 10 :
Figure 10: (a) Simulated observations with high noise level.(b) Normalized histogram of posterior draws of the error variance σ 2 , with the value used to generate the data in red.(c) Posterior means of the amplitude (top) and phase (bottom) components estimated using the proposed Bayesian model with the shape restricted amplitude prior.(d) Estimated amplitude (top) and phase (bottom) using WPACE.

Figure 11 :
Figure 11: Posterior means of the amplitude (top) and phase (bottom) components based on the proposed model with the shape restricted amplitude prior with (a) H = 2 and (b) H = 4. (c) Estimated amplitude (top) and phase (bottom) functions using WPACE.

Figure 1 (Figure 12 :
Figure 1(c) shows the observed growth velocity data.The estimated amplitude and phase functions under the two amplitude prior settings are visualized in Figure 11(a)-(b).A comparison to the result generated by WPACE is given in panel (c).As expected, the registration and level of smoothness for the pubertal growth spurt is similar across panels (a) and (b).The main difference between the two sets of amplitude estimates is in the
practice and motivates future work to jointly estimate H for different subjects.The WPACE estimates are shown in each panel in red; the WPACE estimate in (a) appears to severely oversmooth and underestimate the pubertal growth spurt.