On non-parametric density estimation on linear and non-linear manifolds using generalized Radon transforms

Here we present a new non-parametric approach to density estimation and classification derived from theory in Radon transforms and image reconstruction. We start by constructing a"forward problem"in which the unknown density is mapped to a set of one dimensional empirical distribution functions computed from the raw input data. Interpreting this mapping in terms of Radon-type projections provides an analytical connection between the data and the density with many very useful properties including stable invertibility, fast computation, and significant theoretical grounding. Using results from the literature in geometric inverse problems we give uniqueness results and stability estimates for our methods. We subsequently extend the ideas to address problems in manifold learning and density estimation on manifolds. We introduce two new algorithms which can be readily applied to implement density estimation using Radon transforms in low dimensions or on low dimensional manifolds embedded in $\mathbb{R}^d$. We test our algorithms performance on a range of synthetic 2-D density estimation problems, designed with a mixture of sharp edges and smooth features. We show that our algorithm can offer a consistently competitive performance when compared to the state-of-the-art density estimation methods from the literature.

1. Introduction.Motivation: Consider Figure 1, a point cloud that is obtained by IID sampling from an unknown density f : R n → R (in this case n = 2, but we consider the theoretical aspects of the general n dimensional case in this paper).A principal goal in machine learning is to estimate the distribution function f from a given finite sample, under the assumption that f is smooth [15,8,14,18,20].The primary contribution of this paper is the development and analysis of methods for recovering f from such a sample set using ideas from the theory of Radon transforms.More specifically, Radon theory provides for a highly flexibly family of algorithms for stably estimating f from projections of the empirical data set onto low dimensional manifolds such as affine subspaces, spheres, and the like.See figure 1 for an approximation of a half space Radon transform projection using the empirical distribution function (by observation counting in half spaces).Moreover, unlike more common density estimation methods, the Radon-based technique is naturally adapted to densities, which may not be smooth.The prediction of one dimensional densities is well studied, from a theoretical standpoint [30] and in implementation.Indeed there are many effective techniques which can be applied with good results in the one dimensional case [8, Figure 1: Point cloud of IID samples from an unknown density f (left).Half space projection of f along the x axis (right).Here θ = (1, 0) is in the direction of the x axis.The values in the right hand plot are the density count in the right half space to the line shown in the left hand plot as it is translated in the direction θ. 15,18,20].It is well known that the problem of reconstructing a density f (of a certain type) from its Radon transform Rf , given enough projection data, is mildly ill-posed [6, page 42], and hence we can expect a small amplification in the error from a prediction of Rf in the reconstruction of f from Rf .By combining known approximation techniques for one dimensional density estimation and reconstruction methods in image reconstruction and inverse problems, we propose new techniques in non-parametric density estimation, for which we can derive convergence estimates and demonstrate the effectiveness of our methods in implemenation on a variety of challenging synthetic density estimation problems.
2. Relevant literature.The classical and still the most-widely used methods for nonparametric density estimation are based on a Kernel Density Estimate (KDE) [15,8,14,1], which approximates the unknown density by its convolution with a known density function (e.g.isotropic Gaussians).Other methods in non-parametric density estimation consider the reconstruction of the density by maximizing the log-likelihood max f i log f (X i ) [18,20] with additional smoothness regularization on the density f .We give a detailed comparison of such ideas and our own approach later in section 7. Extensions of the KDE approach to density estimation on manifolds (embedded in high dimensions) has been considered in [4,3,2].
The approaches based on KDE can fail to capture precise details in the density (e.g.edge effects) due to smoothing, particularly if the bandwidth is too large.To alleviate this issue the use of the hyperplane Radon transform for density estimation has been considered in the literature [21,22,23].In [22] the hyperplane Radon transform is approximated by the application of a smoother to empirical density functions and the filtered backprojection formula is used to reconstruct Gaussian mixture densities in two dimensions.In [21] it is shown that when the Radon transform is approximated by a series of one dimensional kernels, the reconstruction is a type of convolution kernel estimate.Using this idea new kernel density estimators are derived.In [23] a new approach to Gaussian mixture model fitting is presented which uses one dimensional projections of the target density, which are described by the Radon transform, to give a more stable convergence with random initializations than the Expectation Maximization (EM) algorithm.
3. Main contributions.This paper advances the use of Radon transforms for non-parametric density estimation in the following novel directions: • We use new types of empirical distribution functions to approximate generalized Radon transform projections of the density.Here we consider spherical and half space Radon transforms, whereas hyperplane transforms are only considered in previous work.This allows us to elegantly derive error estimates for our approach and provides the basis for a new, fast method in density estimation, which is completely non-parametric.• While the current methods use the explicit inversion formula for the hyperplane Radon transform (a filtered backprojection approach), we discretize generalized Radon transform operators on pixel grids (assuming a piecewise constant density) and use existing machinary for solving regularized inverse problems with Total Variation (TV) regularization.We choose TV as it is a powerful regularizer commonly applied image reconstruction which favours piecewise smooth solutions [27,28].The idea of using TV for density estimation has so far been only generally applied to the 1-D case when using the likelihood approach [18,20].A 2-D example is considered in [20] but we extend on the bivariate case significantly and introduce a variety of synthetic 2-D test examples in this paper.• For the test densities considered, we give a full error and image quality comparison using spherical Radon transforms, half space Radon transforms, the methods presented in [22,20] and a kernel estimate.We see a consistently high performance using spherical Radon transforms when compared to the other methods considered, and across a range of point cloud sample sizes.• We derive convergence estimates for our approach by combining the theory of Radon transforms [6, page 94] and results on the estimation of one dimensional densities by empirical distributions as described by the Dvoretzky-Kiefer-Wolfowitz inequality [30].In [22] convergence estimates are given for a hyperplane approximation if certain assumptions are met.For assumption (c) in 5.1.2. in [22] to hold based on the suggested approximation using truncated Fourier series, we require knowledge of the Sobolev norms g θ k , where g θ are the (unknown) projections to be approximated.
Here we give estimates for half space and spherical approximators only assuming a degree of smoothness for the underlying distribution.• Somewhat in a vein similar to extending KDE methods to density estimation on manifolds, [4,3,2], we extend the Radon idea from Euclidean space to density estimation on manifolds embedded in R d by reconstructing the density on local tangent spaces.This combines the theory of Radon transforms with the theory of manifold sampling and reconstruction, taking inspiration from a number of methods in non linear dimensionality reduction, such as Locally Linear Embedding [25] and Local Tangent Space Alignment (LTSA) [29,5].The main idea is to embed the data locally into low-dimensions using these methods and then use the proposed density estimation approach on these patches.
• We present two new algorithms which can be readily applied to implement our density estimation techniques in low dimensional Euclidean space and on low dimensional manifolds embedded in R d .The codes are available from the authors on request.
4. Organisation of the paper.This paper is organized as follows.In section 5 we state some mathematical preliminaries and theory on Radon transform, and explain the intuitive microlocal idea behind edge detection and reconstruction using Radon transforms.We then introduce our main idea, where we propose to use empirical distribution functions derived from Independent and Identically Distributed (IID) point clouds to approximate Radon transforms of the probability density.
In section 6 we provide error and convergence estimates for our approach by combining theory from geometric inverse problems [6, page 94] and empirical distribution function estimators [30].We show that the convergence rate of the least squares error in our approximate density is (1/m) 1 2(n+2) when using the half space Radon transform, where m is the number of observations and n is the dimension.We also give estimates for the expected error in a spherical Radon transform approximation and explain the increase in error with n, the number of projections taken and the confidence level.
In section 7 we present simulated density reconstructions in two dimensions using an algebraic approach with TV regularization, and give an error and side by side image reconstruction comparison with the methods presented in [22,20] and a Gaussian kernel estimate.The algebraic approach implemented is formalized in Algorithm 7.1 for density estimation by inverse Radon transform in low dimensional Euclidean space.Later in section 7.2 we extend our ideas from Euclidean space in low dimensions to density estimation on low dimensional manifolds embedded in R d , using theory and techniques in manifold learning [5,24,25].

5.
The main idea and approach.In this section we state some mathematical preliminaries and theory on Radon transforms, and explain briefly their inversion and edge detection properties.We then introduce our main idea for using empirical distribution functions of IID point clouds to approximate Radon transforms of the probability density.

Mathematical preliminaries and Radon
Transforms.Throughout this paper X = {x 1 , . . ., x m } will denote a point could in R n and f will denote the density from which X is drawn.We also denote: 1. Z = R × S n−1 as the unit cylinder in R n+1 .
2. C k 0 (Ω) as the space of k continuously differentiable functions compactly supported on Ω ⊂ R n .
3. Ω n as the unit ball in R n .
4. g p L p (R) = S n−1 R |g(s, θ)| p dsdθ as the L p norm for functions g on the cylinder.5. V s,n−1 and w s,n−1 as the volume and surface area of a sphere radius s in R n respectively.
Then the hyperplane Radon transform of f is defined as where δ denotes the Dirac delta function.
Then the spherical (volumetric) Radon transform of f is defined as Here the set of spheres in R n are parameterized by (s, x), where s ∈ R + is the radius and x ∈ R n is the center of the sphere.After differentiating with respect to the s variable and dividing by s n−1 the transform R M f is equivalent to the spherical means transform defined in [9].
Then the half space Radon transform of f is defined as where δ denotes the Dirac delta function.
Here the set of hyperplanes in R n are parameterized by (s, θ) ∈ Z. See figure 2. After differentiating with respect to the s variable the transform R H f reduces to the classical hyperplane Radon transform Rf .It is well known [7, page11, Theorem 2.5], that the hyperplane (and hence half space) Radon transforms have an explicit left inverse and are injective for f ∈ C ∞ 0 (R n ).Similarly for the spherical Radon transform we can derive explicit reconstruction formulae for the density [9,10,11], and the solution is unique if enough spherical centers and radii are known (e.g. for any fixed radius s, if all centers x are known, the reconstruction is obtained by spherical deconvolution or deblurring).The previous works of [22] apply the formula [7, page11, Theorem 2.5] directly in density estimation.Our reconstruction method discretizes the linear operator R H (or R M ) on voxel grids and solves the resulting set of sparse linear equations with regularization.
Radon transforms are known to have edge detection properties [12,13].If we know the integrals of f over a set of hypersurfaces tangent to a direction ξ ∈ S n−1 in a small neighborhood of a given x ∈ R n , then we can detect jump singularities in f near x in the direction ξ (this forms the intuition behind the theory of microlocal analysis applied to Radon transforms).See figure 3. We can see that the jump discontinuities in the density are present in the derivatives (finite differences) of Radon transform (either spherical or half space) approximations, but the projections are noisy.We will see later that we can combat this noise while effectively preserving edges in the density by a discrete inversion of the spherical or half space Radon transform with a Total Variation (TV) penalty.TV is a powerful regularizer commonly applied in image reconstruction which favours piecewise smooth solutions.Figure 3: An IID sample taken from two overlapping uniform densities (left figures).There are four edges (marked by E1, E2, E3 and E4) in the direction ξ highlighted.The jump discontinuities are present in the derivative of a spherical (bottom right) or half space (top right) projection.Here s is the radius of the sphere with fixed center (as pictured).That is, with center (x, y) = (12.5,40).

Application to density estimation:
The core idea.Let X = {x 1 , . . ., x m } be a point cloud in Ω n drawn as IID samples from a density f ∈ C ∞ 0 (Ω n ).Then the half space Radon transform R H of f can be approximated as the set of one dimensional empirical distribution functions That is, for every fixed θ, the projection R H f θ defined by R H f θ (s) = Rf (s, θ) (which is a one dimensional cumulative density function) can be approximated by the empirical distribution function of the projection of X to the line {sθ : s ∈ R} [22].Similarly for the spherical Radon transform we have So by counting the number of observations which lie in a set of half spaces or spheres we can approximate Radon transforms of the density f .See figure 4. We aim to discretize the Radon transform operators above to a pixel grid (assuming a piecewise constant density as is commonly applied in image reconstruction and inverse problems) and solve the resulting sparse system of linear equations to recover the density.Here we will make use of widely applied regularization techniques (such as TV) to combat the noise due to finite sampling, and apply known automated methods in tomography to choose the regularization parameter (such as the Generalized Cross Validation (GCV)).First we provide theoretical performance garuantees for our method, in the next section.

Error estimates.
Here we give error estimates for our method for density estimation combining results on the stability of inverse Radon transforms and the expected error of empirical distribution functions.The next result is a statement of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality [30] which explains the error in an approximation of a univariate density from its empirical distribution function.
We now have our main theorem which gives bounds for the error in a half space Radon transform approximation from a full set of projected one dimensional empirical distribution functions.Theorem 6.1.Let x 1 , . . ., x m be continuous, independant and identically distributed random variables in Ω n with probability density function f ∈ C ∞ 0 (Ω n ) and let 1 be an approximation to the half space Radon transform g = R H f .Let K projection directions {θ j } K j=1 be uniformly spread over S n−1 .Then with probability 1 − p for any 0 ≤ p ≤ 1, where g θ (s) = g(s, θ) and g m,θ (s) = g (s, θ) and is the error term in approximating the integral over S n−1 by a Reimann sum, where the area elements w j are such that K j=1 w j = w 1,n−1 and w 1,n−1 is the surface area of S n−1 .Proof.The proof is deferred to section A. Corollary 6.2.Let x 1 , . . ., x m be continuous, independant and identically distributed random variables in Ω n with probability density function f ∈ C ∞ 0 (Ω n ) and let 1 be an approximation to the half space Radon transform g = R H f .Then the worst case error in a reconstruction f m of f from g m is which holds with probability 1−p.Here K is the number of projections (which are assumed to be uniformly distributed on the sphere), f H 1 2 ≤ ρ, and (K) is an error term in approximating the integral over S n−1 by a finite sum.
Proof.The result follows immediately from Theorem 6.1 and Theorem B.8.
Regarding the above results and the error term (K), the idea here is that we can pick K large enough (indeed we have no restriction (in theory) on the number of projections taken) to imply a stable inversion (or such that (K) ≈ 0).That is, we can choose K so that the inverse problem of reconstructing f from R H f (known for a finite number of projections) is (effectively) mildly ill-posed, and such that there are no image artefacts, in the sense that the singularities of f are stably recovered in all directions [12,13] (hence the choice of uniformity on S n−1 for the set of directions {θ j } K j=1 ).
Let K = k n−1 (we expect the number of projections needed to adequately cover S n−1 to increase exponentially with n) be large enough so that the error (K) ≈ 0 is negligible.For example we could fix k = 360 for an angular step size of 1 with probability 1−p.The bound above explains how we would expect the error in our Radon transform approximation to increase with n, m and the confidence 1 − p.For fixed n, k and p we expect the convergence rate to the solution of (1/m) 1 2(n+2) with m.The next theorem describes the least squares error expected in a spherical Radon transform approximation using empirical distribution functions.Theorem 6.3.Let x 1 , . . ., x m be continuous, independant and identically distributed random variables in Ω n with probability density function f ∈ C ∞ 0 (Ω n ) and let 1 be an approximation to the spherical Radon transform g = R M f .Then with probability 1 − p, for any finite set {x j } K j=1 of circle centers.Here g x j (s) = g(s, x j ) and g m,x j (s) = g m (s, x j ).
Proof.The proof is deferred to section A. See Theorem A.4.
So we would expect the same rate of decrease in the error in a spherical Radon transform approximation as with half space data.The spherical Radon transform is mildly ill-posed and to the same degree as the half space transform if enough spherical centers x and radii s are known (this can be shown using the theory presented in [9]).So we would expect the same amplification in the error in our solution.In general, if we approximate Radon transform projections (e.g.ellipsoidal, hyperbolic) as empirical distribution functions, we can expect the same rate of convergence, provided that the Radon transforms used have a stable inverse.

Method and results.
Here we present practical methods for density estimation in low dimensions by a discrete inversion of Radon transforms.We test our approach on a number of synthetic densities in two dimensions and give a side by side comparison with a kernel estimator.Later in section 7.2 we expand our approach to density estimation on manifolds and provide an error analysis of the reconstructions of densities on manifold patches of varying levels of curvature and radius.
There are two algorithms presented in this section, which we will now introduce.Algorithm 7.1 implements density estimation in low dimensions by an explicit, discrete inversion of Radon transforms with regularization.Algorithm 7.2 applies Algorithm 7.1 to local tangent spaces of manifolds to reconstruct densities on low dimensional manifolds embedded in R d .

Density estimation in low dimensions.
To implement our density estimation method in low dimensional (n ≤ 3) Euclidean space we will approximate Radon transform operators as discrete sums over a uniform grid (which would typically represent image pixels in image reconstruction).Specifically we aim to minimize the following functional where R is the discrete form of the Radon transform (either spherical or half space) and G is a regularization penalty with regularization parameter λ.For example To solve the least squares problem (7.1) we apply the iterative solvers of [19], and choose the regularization parameter λ via a Generalized Cross Validation (GCV) method [32, page 95].To estimate densities in low dimensions we implement Algorithm 7.1.

Algorithm 7.1 Density estimation by inverse Radon transform Result: Density estimation by inverse Radon transform
Input: A point cloud X ⊂ R n , a pixel grid which covers X, with pixel centers j=1 and half spaces {h j } l 2 j=1 in R n Output: A vector of density values v, where Set the transform (T ), either spherical (T = T 1 ) or half space (T = T 2 ) Initialize a zero sparse matrix R with l 1 columns and l 2 rows, and a zero vector b with l 2 entries if T = T 1 then for j ← 1 to l 2 do Find the p i such that p i ∈ s j and set R ji To show the effectiveness of our method we give an error and image reconstruction comparison with a collection of density estimation methods from the literature using TV and Radon transforms.The first is that of Koenker et.al. [20] (denoted by Koen), which finds the density maximizing the log-likelyhood function with TV penalties.In [20] the selection of λ is not automated and they set λ = 2.We could employ the use (ordinary) cross validation (CV) methods here to choose λ.However this would require an impractically long run time, as is discussed in [35].Further, the use of GCV methods would be ill-advised due to the arguments given in [18, page 11].Hence, for every density and sample size m considered, we report the best result (in terms of ) over three levels of smoothing, namely λ = 0.5, 1, 2. The second point of comparison is a Gaussian kernel estimate with a uniform badwidth kernel (denoted by Ker), with the bandwidth chosen to be optimal for Gaussian densities [8].The third point of comparison considered in the main text is that of O'Sullivan et.al. [22] (denoted by Os).Here we use the truncated Fourier series approximations to the Radon projections g θ given on page 518 of [22].Such approximation ideas are shown to the satisfy the performance guarantees provided in [22].We set where ĝθ denotes the Fourier transform of g θ , rect is the rectangle function and h m is a chosen bandwidth parameter.This approximation is equivalent to a 1-D kernel estimate using sinc functions.That is In [22, page 518] a choice for h m is suggested based on the size of the Sobolev norms g θ H k .However, as g θ H k is not known a-priori, we report the best result (in terms of ) over three levels of smoothing h m = 0.5, 1, 2 (scaling the sinc functions to half and twice the pixel length of 1) for every density and sample size m considered.We calculate (7.2) for θ = { πi 180 : 0 ≤ i ≤ 179} and then reconstruct the density using the filtered back-projection formula, as in [22].In the appendix, we also present the density estimation errors using TV and Poisson image denoising approaches, and using the Split Bregman techniques of [35], which offer a faster implementation of the density estimation method suggested by Koenker [20].The results using these methods are left to the appendix (see figure 12) as they were found to produce a significantly higher error than those considered in figure 5. Hence we focus on the more competitive approaches in the main text.It seems, based on the current experiments, that a direct density estimation using TV (by minimizing the least squares error as in [33]) is ill advised for the types of densities considered, and we see a better performance with a density estimation in the Radon domain combined with a TV regularizer in the inversion.
Let v be the vectorized form of the original density, and let v m denote an approximation.Then we measure the error by = v−vm 2 v 2 .We consider four test densities, whose probabilty density functions are displayed in the left hand of figures 8-7.See figure 13 for a larger view and description of the ground truth densities.We give the estimation errors for m ∈ {100, 500, 1000, 2000, 3000, 4000, 5000} and present image reconstruction comparisons for the two most competitive methods (with the least error) for m = 2000.All densities considered are piecewise smooth, some with more edges and more smooth features than others.We use these as example densities to test the effectiveness of our reconstruction method with TV for recovering the jump singularities in the density while also preserving smoothness.Each of the test densities displayed are to be reconstructed on a 100-100 pixel grid (the density is assumed to be piecewise constant with 100 2 densities, the matrix R has 100 2 columns).For half space Radon transform data, we sampled s and θ as s ∈ {−50 + i : 0 ≤ i ≤ 100} Figure 5: Comparisons of errors in density reconstructions 1-4, using the methods Sph, Hs, Os, Ker and Koen.and θ = { πi 180 : 0 ≤ i ≤ 179} to be perfectly uniform on Z, as is most optimal by Theorem B.9, and in line with the theory presented in the previous section.For spherical Radon transform data, we sample circle centers x ∈ {(0.5 + i, 0.5 + j) : 0 ≤ i, j ≤ 99} and radii s ∈ {4 + i : 0 ≤ i ≤ 16}.Counting observations in circles of smaller radii (s = 4), while being a more noisy approximation, helps to highlight finer features in the density, whereas the larger radii (r = 20) offer less precise information but with less noise.We find that using a range of radii is most optimal and overall gives better performance than using half space data (see appendix D for a full comparison), which is what we would expect as spherical Radon transform data is overdetermined.
We find that a density reconstruction using Sph and Koen gave the best and most consistent performance in terms of for all densities considered, and for varying sample sizes m.See figure 5 for a comparison of the errors using our methods (spherical and half space, denoted as Sph and Hs respectively), and the methods Os [22], Koen [20] and Ker.For a full comparison of spherical vs half space consult appendix D. To generate the Gaussian mixture in density 1 we randomly (uniformly) selected 100 Gaussian centers (means) on a 1-100 meshgrid, with constant covariance, and took the sum.To see how the methods considered compare over a variety of Gaussian mixture densities, see table 1, where we have given the mean and standard deviations of the errors in a reconstruction of density 1 with 20 random Gaussian mean sets and a constant sample size m = 1000.As we see an occasional failure in the reconstruction of density 3 using Hs (we see a spike in the error in the red curve in the bottom left hand of figure 5 at m = 3000), we also give the mean and standard deviation reconstruction errors from 20 random draws of density 3, with m = 1000 samples.See table 2. We see further evidence of a more robust and accurate performance using Sph and Koen when compared to the other methods.Although as suspected, the range and mean of errors is far greater using Hs, so there are cases when Algorithm 7.1 may break down using Hs techniques.
Refer to figures 8-7, where we have presented density reconstruction comparisons (as 2-D images) for the two most competitive methods (with the least error) on each density 1-4, for m = 2000.In each case the most competitive method to Sph, was Koen.We see a similar image reconstruction quality and error level when we compare Sph and Koen across densities and sample size.Although we see a greater image quality using Koen for densities 3 and 4, and conversely using Sph for densities 1 and 2. It is noted that the choice of smoothing parameter is not automated using Koen, and in [20] no such method is suggested.In [35] Mohler et.al. provide a faster implementation of the Koen method using Split Bregman techniques, which allows for an automated choice of λ by 10-fold CV.The increased speed of implementation seems to be at a detrement to the error however, as can been from the results presented in C, and the errors when using the faster implementation of Koen are not competitive with Sph.As the choice of the smoothing λ is one of the most crucial aspects in density estimation (which we can fully automate using GCV with spherical Radon transforms, given the linear inversion), and there is no known automated method for choosing λ using the more accurate but less efficient method of [20], on this basis, the above discussions and the experiments presented here, we conclude that a spherical Radon approach is optimal for the types of densities considered.

Density estimation on low dimensional manifolds.
Here we extend our previous results to density estimation on low dimensional manifolds embedded in R d , where we will incorporate our previous reconstruction methods and theory, the theorems of [5] and ideas in   manifold learning.First some definitions.Let X be a metric space.Throughout this section we denote B X r (x) to be the ball of radius r centerd at x and B n r (x) = B X r (x) for X = R n .We now have the theorem from [5] which gives bounds for the error in a local tangent space approximation of a manifold M in terms of its principle curvatures.So far we have dealt with density estimation in low dimensions, where the true data dimension is known (n = 2 in the above simulations), in the sense that the observation sets X lie on 2-manifolds in R 2 .In many machine learning applications however, the true data dimension n is hidden and and we are given a set of observations x i ∈ R d , where n < d.It is often assumed that the observations lie on a manifold M which is embedded in R d .More precisely for every 1 ≤ i ≤ m, x i = φ(x i ), where φ : M → R d is an embedding.This is the idea behind the field of "manifold learning".
Let M ⊂ R n be an n-manifold and let our observation set X = {x 1 , . . ., x m } ⊂ M. Let us assume that the data we are given is X = {φ(x 1 ), . . ., φ(x m )}, where φ : M → R d is an embedding in the general topological sense (a homeomorphism onto its image), with d > n.Let M be described by a collection of charts {U l , ϕ l }, where U l are open sets which cover M and each ϕ l : the charting {U l , ϕ l } is known, then we can recover densities on M using Radon transforms.
More precisely, for every x ∈ φ(M), there exists an l such that x ∈ U l , and the function h = f • ϕ −1 l : Ω n → R can be reconstructed explicitly from its spherical or half space Radon transform.Once the function h is known we can compose it with ϕ l to recover the corresponding density value for x .For a finite sample X on M the error in the reconstruction of a patch U l is described by Corollary 6.2.This is under the assumption of course that the charting of M is known, which is not often the case.We aim to tackle this problem in this section.κ = 0.01 κ = 0.05 κ = 0.
, provided the principle curvatures of M are low or r is small enough.This is an idea used in techniques in manifold learning and non-linear dimensionality reduction [5,29].For finite data X ⊂ M we will take subsamples of the data S = X ∩ B d r (x) (to represent a sample on U l ), for r small enough, and project onto the first n principle components.This approximates the projection of S onto T x M (or the application of the chart ϕ l : U l → Ω n ).So here the atlas {U l , ϕ l } which describes M is such that each U l is approximately flat.After we have transformed the samples S to principle component space, and let us denote the projected subsample by S, we can reconstruct the density from which S is drawn (this would be h, using the notation above) by applying Algorithm 7.1.The reconstruction in the last step therefore determines an approximation to the density f on the neighborhood (or patch) U l .Precisely, we will implement Algorithm 7.2 estimate densities on low dimensional manifolds embedded in R d .
To demonstrate this consider figure 11, where we have embedded an IID sample X of density 1 (f : M → R, where M = [0, 1] 2 ) into three dimensions via Here κ > 0 is a bound for the principle curvatures of the manifold at the origin.Note that a direct application of Algorithm 7.1 here would be ill advised.Indeed if we extend f to be zero for x ∈ R 3 \φ(M), then R M f = R H f = 0 (the support of f has measure zero in R 3 ).In this case (κ = 0.05 and r = 20) the curvature of M is low on the patch U l = M ∩ B 3 r (x) (x is located at the center of the patch) highlighted and the data dimension is recovered correctly Algorithm 7.2 PCA patching density estimation Result: PCA patching density estimation Set the transform (T ), either spherical (T = T 1 ) or half space (T = T 2 ) Set the radius r > 0 and the variance percentage p for i ← 1 to m do Set S = X ∩ B d r (z i ) Center S so that the elements of S have mean 0 Set S ⊂ R n as the first n principle component scores of S , choosing n to be large enough to account for p% of the variance in S Set z i ∈ R n as the transformation of z i to principle component space Reconstruct the density h of the point cloud S using Algorithm 7.1, with the transform set to T Output the corresponding density value f (a density sample on U l ) to principle component space we can reconstruct the density on U l by applying Algorithm 7.1.See table 3 for the relative errors in reconstructions of the density on U l for a range of r and κ values (keeping the x and y coordinates of the center of U l constant).Here we measure the relative error using equation (7.1) as in section 7. We see that the relative error increases with the curvature κ.This is as expected by Theorem 7.1 as the local tangent space approximations of the manifold become less accurate.We can attempt to deal with this by reducing the ball radius r, to counter balance the increase in the upper bound κr 2 on the error in local tangent space approximations for increased κ.However a reduction in r also serves to the lower the sample size m on U l (see table 5), and hence the reconstruction error is further amplified (by Corollary 6.2), which is evident from the figures in table 3. See also table 4 for the errors in the reconstruction of the density on U l using the half space Radon transform.

Conclusions and further work.
Here we have presented a new approach to nonparametric density estimation inspired by ideas and theory in image reconstruction and Radon transforms.Using classical theory in geometric inverse problems we derived error estimates for our approach in section 6.This combined the classical theory of [30] with that of Natterer in [6], where we showed that the expected convergence rate to the solution was (1/m) 1/2(n+2) , where m is the number of observations and n is the dimension.We then went on to develop new density estimation methods in section 7 from known numerical reconstruction algorithms in image reconstruction and inverse problems.The implementation of which was described in Algorithm 7.1.In particular we found that a discrete inversion of the spherical Radon transform with a total variation regularizer produced satisfactory piecewise smooth approximations to four example densities (with varying amounts of edges and smooth features) in low dimensions (n = 2), and offered a consistently improved performance when compared to the techniques of [22,20] and a kernel estimate.In particular, the spherical Radon method was the most effective in recovering singularities in the density while retaining the smooth features.The half space techniques presented were also shown to offer a good performance in most cases.However we discovered that in some cases Algorithm 7.1 broke down using half space transforms.This was indicated by the results of table 2.
We extended our methods to reconstruct densities on manifolds which are embedded in R d in section 7.2, using ideas in manifold learning and the theory of [5].The implementation is described by Algorithm 7.2, which is an extention of Algorithm 7.1 to local tangent spaces of manifolds.We gave an analysis of the errors in density reconstructions on manifold patches for varying levels of the principle curvature κ (at the origin) and patch radius r.Our analysis was consisitent with the theory of [5], and the higher curvature manifolds proved more troublesome for an accurate density estimation.To deal with this we propose to apply methods in nonlinear dimensionality reduction [24,26,25] to better "unravel" the manifold patches of higher curvature and compare the performance to a linear approximation as is presented here.
We note that the current work is only suited to density estimation on low dimensional manifolds (mainly due to the practical restrictions of discretizing Radon transform operators in high dimensions).To deal with this, in further work we aim to develop a filtered backprojection approach for high dimensional data.As the filtering process described by an explicit inversion from hyperplane Radon transform data is ill-posed (and this is more pronounced in high dimensions), it is desired to develop a more stable filtering process that is effective in edge reconstruction, while offering little amplification in the noise.For spherical Radon transforms we aim to apply the formulae derived by John [31, page 82] for expressing densities in terms of their iterated spherical means, which again will require sufficient regularization.
Throughout this paper, we have chosen to focus on the use of empirical distribution functions to approximate spherical and half space Radon transform data in this paper.For further work we aim to use more advanced methods in density estimation (such as kernel estimators and those in [18,20]) to more accurately estimate the projection densities of Radon transforms, and test for an improvement in the results presented here.
with probability 1 − p.Given K projections {θ j } K j=1 we have p m with probability 1−p, for all 1 ≤ j ≤ K by the Bonferroni inequality.Let the θ j be distributed uniformly over S n−1 .Then we have with probability 1 − p, where (K) is the error in approximating the integral in the first step above by a finite sum, and the w j are small area elements of S n−1 , where K j=1 w j = w 1,n−1 .We now restate Theorem 6.3 and provide proof of the result.
Theorem A.4.Let x 1 , . . ., x m be continuous, independant and identically distributed random variables in Ω n with probability density function f ∈ C ∞ 0 (Ω n ) and let 1 be an approximation to the spherical Radon transform g = R M f .Then with probability 1 − p, for any finite set {x j } K j=1 of circle centres.Here g x j (s) = g(s, x j ) and g m,x j (s) = g m (s, x j ).
Proof.For each x j , let us extent g x j and g m,x j to the real line by defining g x j (s) = g m,x j (s) = 0 for s < 0. Then g m,x j is the empirical distribution function of g x j and we have (A.9) with probability 1 − p, for all 1 ≤ j ≤ K by the DKW theorem and Bonferroni's inequality.The result follows.
Appendix B. Results on the stability of Radon tranforms.Here we provide background theory and results regarding the stability of inverse Radon transforms.First we have some definitions.
From [38, page 59] we have the definitions of Sobolev spaces.
Definition B.2.Let α ∈ R. Then we define the Sobolev spaces with the norm for functions on the cylinder Z, g ∈ H α (Z) if the norm exists and is finite.Here the Fourier transform is taken with respect to the variable s.
Let Ω be an open subset of R n .Then we define Next from [6, page 11] we have where the Fourier transform is taken with respect to the s variable.
From Natterer [6, page 42], we have: Now we prove a similar result for the half space Radon transform, which is a simple consequence of the additional degree of smoothing (+1) applied by the integration in the s variable when transforming hyperplane integral data to half space integrals.
Corollary B.5.For each α ∈ R there exist positive constants c(α, n), C(α, n) such that for where the Fourier transform is taken with respect to the s variable.From which it follows that where the last step follows from Theorem B.3.Substituting ξ = σθ we get The next theorem gives bounds for the least squares error in a reconstruction from Radon transform data in terms of the least squares error in the Radon transform approximation.The proof of which [6, page 94] follows from Theorem B.4 and the interpolation inequality.
Theorem B.7.Let f ∈ C ∞ 0 (Ω n ), let g = Rf , and let g be such that g − g L 2 (Z) < .Then there exists a constant c(n) such that For the half space Radon transform the theorem above reads the same except for the difference in the exponent Theorem B.8.Let f ∈ C ∞ 0 (Ω n ), let g = R H f , and let g be such that g − g L 2 (Z) < .Then there exists a constant c(n) such that where f So a uniform sampling of Z is preferred to obtain the best solution using Radon transform data.Typically in tomography applications, X-ray scanners are designed to satisfy a uniform sampling rate of which there are limitations due to the hardware.In the current work we can sample Rf as we please (although using higher k and l will be more computationally expensive).
Appendix C. Additional tables and reconstruction errors.
Here we give additional tables showing the errors in reconstructions of density patches for varying levels the principle curvature κ (at the origin) and ball radius r.We also give the sample sizes m on the patch U l (as is represented in figure 11) in table 5, for all combinations of κ and r.
In figure 12, we present graphs showing reconstruction errors (as in figure 5), using TV [33] (denoted as ROF) and Poisson [37] (denoted as Poi) image denoising techniques.Here our noisy input image is a histogram with 100 2 equally sized bins with centers on a 1-100 meshgrid (the histogram bins are the image pixels p i ).We choose the best performing (in terms of ) λ from λ = 0.5, 1, 5 (ROF) and report the results for J = 1, 2 (i.e. the J such that 2 J divides the image dimension of 100 pixels) with LET2 (for Poi, as is shown to be most optimal in [37]).We also report the reconstruction errors using the Split Bregman methods of Mohler et.al. [35] (denoted by Moh).Here we use 10-fold CV to choose λ (the λ chosen is that which maximizes the sum of the log-likelihood over all folds) as in [35].When we ran the cross validation (using the Matlab code available in the supplementary materials of [35]) on density 3, we experienced crashing issues upon multiple runs and on mulitple machines.Hence, for density 3, we report the minimal over the 25 λ values considered (in [35] they also suggest to choose from 25 λ values).In figure 12 we also include the reconstruction errors using Sph for comparison, and we crop the y axis of the plot to only show errors between 0 and 1 (to aid the visual comparison, discounting errors greater than 100%).Note that for density 4, the errors using Poi were all greater than 1 (for all m), and hence why there are no errors curves corresponding to Poi in the bottom right hand of figure 12. κ = 0.01 κ = 0.05 κ = 0.

Figure 2 :
Figure 2: A hyperplane in two dimensions parameterized by the perpendicular distance s from the origin and a direction θ (right).A circle parameterized by a center point x and radius s (left).

Figure 4 :
Figure 4: Half space and spherical Radon transforms in two dimensions, approximated by counting observations in a set of half spaces and (inside of) spheres in the plane.

= 1 Rv − b 2 2 +
⇐⇒ p i ∈ s j Count the number of observations in s j and set b j = |X ∩ s j | end else for j ← 1 to l 2 do Find the p i such that p i ∈ h j and set R ji = 1 ⇐⇒ p i ∈ h j Count the number of observations in h j and set b j = |X ∩ h j | end end Set the regularization function G and find arg min v λ 2 G(v) using an iterative solver, choosing λ by a GCV technique Output v

Figure 6 :
Figure 6: From left to right, Density 1, a reconstruction using Sph, and Koen.

Figure 7 :
Figure 7: From left to right, Density 2, a reconstruction using Sph, and Koen.

Figure 8 :
Figure 8: From left to right, Density 3, a reconstruction using Sph, and Koen.

Theorem 7 . 1 .
Let M be a submanifold of R d whose principle curvatures are bounded by κ

Figure 9 :Figure 10 :
Figure 9: From left to right, Density 4, a reconstruction using Sph, and Koen.

Figure 11 :
Figure 11: Density 4 embedded in three dimensions (top left) with a patch U l highlighted.The principle component eigenvalues (top right) of the patch subset.The patch density in principle component space (bottom left) and a reconstruction using spherical Radon transforms and TV (bottom right).

B. 10 )
which proves the left hand inequality.The right hand inequality is left to the reader.Next we have the interpolation inequality for Sobolev spaces[6, page 203].Theorem B.6 (Interpolation inequality)

Table 5 :
Number of samples m on the patch U l for varying levels of curvature (κ) and patch radius (r).

Figure 14 :
Figure 14: Gaussian mixture reconstructions using half space (left) and spherical (right) Radon transforms, with TV.

Table 3 :
Relative errors in patch density reconstructions for varying levels of curvature (κ) and patch radius (r).The reconstructions were produced using Algorithm 7.2 with the method set to spherical and the variance percentage set to p = 90%.7.2.1.Local tangent space approximations.Let {U l , ϕ l } be an atlas of an n-manifold M embedded in R d , with n < d, and let each U l = B M r (x) be a neighborhood of a point x ∈ M for some r.By Theorem 7.1, we can approximate U l by the tangent patch B TxM r

Table 4 :
Relative errors in patch density reconstructions for varying levels of curvature (κ) and patch radius (r).The reconstructions were produced using Algorithm 7.2 with the method set to half space and the variance percentage set to p = 90%.