Distribution of Distances based Object Matching: Asymptotic Inference

Abstract In this article, we aim to provide a statistical theory for object matching based on a lower bound of the Gromov-Wasserstein distance related to the distribution of (pairwise) distances of the considered objects. To this end, we model general objects as metric measure spaces. Based on this, we propose a simple and efficiently computable asymptotic statistical test for pose invariant object discrimination. This is based on a (β-trimmed) empirical version of the afore-mentioned lower bound. We derive the distributional limits of this test statistic for the trimmed and untrimmed case. For this purpose, we introduce a novel U-type process indexed in β and show its weak convergence. The theory developed is investigated in Monte Carlo simulations and applied to structural protein comparisons. Supplementary materials for this article are available online.


Introduction
Over the last decades, the acquisition of geometrically complex data in various fields of application has increased drastically.For the digital organization and analysis of such data it is important to have meaningful notions of similarity between datasets as well as between shapes.This most certainly holds true for the area of three-dimensional object matching, which has many relevant applications, for example in computer vision (Viola and Jones 2001;Torralba et al. 2003), mechanical engineering (Au and Yuen 1999;El-Mehalawi and Miller 2003) or molecular biology (Nussinov and Wolfson 1991;Krissinel and Henrick 2004).In most of these applications, an important challenge is to distinguish between shapes while regarding identical objects in different spatial orientations as equal.A prominent example is the comparison of three-dimensional protein structures, which is important for understanding structural, functional and evolutionary relationships among proteins (Kolodny, Koehl, and Levitt 2005;Srivastava et al. 2016).Most known protein structures are published as coordinate files, where for every atom a three-dimensional coordinate is estimated based on an indirect observation of the protein's electron density (see Rhodes 2010 for further details), and stored, for example, in the protein database PDB (Berman et al. 2000).These coordinate files lack any kind of orientation and any meaningful comparison has to take this into account.Figure 1 (created with PyMOL (Schrödinger, LLC 2015)) shows two cartoon representations of the backbone of the protein structure 5D0U in two different poses.These two representations, obtained from the same coordinate file, highlight the difficulty to identify them from noisy measurements.
Many approaches to pose invariant shape matching, classification and recognition have been suggested and studied in the literature.The majority of these methods compute and compare certain signatures in order to decide whether the considered objects are equal up to a previously defined notion of invariance.In the literature, these methods are often called feature (or signature) based methods, see Cárdenas et al. (2005) for a survey.Some examples for features are the shape distributions (Osada et al. 2002), that are connected to the distributions of lengths, areas and volumes of an object, the shape contexts (Belongie, Malik, and Puzicha 2002), that rely in a sense on a local distribution of inter-point distances of the considered object, and reduced size functions (d' Amico, Frosini, and Landi 2010), which count the connected components of certain lower level sets.
As noted by Mémoli (2007Mémoli ( , 2011)), several signatures describe different aspects of a metric between objects.In these and subsequent papers, the author develops a unifying view point by representing an object as metric measure space (X , d X , μ X ), where (X , d X ) is a compact metric space and μ X denotes a Borel probability measure on X .The additional probability measure, whose support is assumed to be X , can be thought of as signaling the importance of different regions of the modeled object.Based on the original work of Gromov (1999) and Mémoli (2011) introduced the Gromov-Wasserstein distance of order p ∈ [1, ∞) between two (compact) metric measure spaces (X , d X , μ X ) and (Y, d Y , μ Y ) which will be fundamental to this article.It is defined as where .
Here, M(μ X , μ Y ) stands for the set of all couplings of μ X and μ Y , that is, the set of all measures π on the product space X × Y such that π (A × Y) = μ X (A) and π (X × B) = μ Y (B) for all measurable sets A ⊂ X and B ⊂ Y.In Section 5 of Mémoli (2011), it is ensured that the Gromov-Wasserstein distance GW p is suitable for pose invariant object matching by proving that it is a metric on the collection of all isomorphism classes of metric measure spaces.1 Hence, objects are considered to be the same if they can be transformed into each other without changing the distances between their points and such that the corresponding measures are preserved.For example, if the distance is Euclidean, this leads to identifying objects up to translations, rotations and reflections (Lomont 2014).This makes the Gromov-Wasserstein distance theoretically well suited for a variety of shape matching tasks, including protein structure comparisons.However, the practical usage of the Gromov-Wasserstein approach is severely hindered by its computational complexity: Already for two finite metric measure spaces X and Y with metrics d X and d Y and probability measures μ X and μ Y , respectively, the computation of GW p (X , Y) boils down to solving a (nonconvex) quadratic program (Mémoli 2011, sec. 7).This is in general NP-hard (Pardalos and Vavasis 1991).To circumvent the precise determination of the Gromov-Wasserstein distance in practice, it can be approximated by conditional gradient descent (Mémoli 2007(Mémoli , 2011)).The result of this numerical scheme, however, is difficult to interpret, as it does not come with any guarantee how close it is to the minimum in (1).Nevertheless, this approach has been used in various applications and led to several extensions of the Gromov-Wasserstein distance (Solomon et al. 2016;Chowdhury and Mémoli 2019;Chowdhury and Needham 2020), especially in the area of machine learning (Alvarez-Melis and Jaakkola 2018; Bunne et al. 2019;Titouan et al. 2019;Xu et al. 2020).Gellert et al. ( 2019) pursued a different route to approximating the Gromov-Wasserstein distance by applying certain lower bounds of the Gromov-Wasserstein distance derived in Mémoli (2007Mémoli ( , 2011) ) for the comparison of the isosurfaces of various proteins.Among other things, the authors used that and that DoD p can be reformulated in terms of the distribution of (pairwise) distances.Let μ U be the probability measure of the random variable d X (X, X ), where X, X iid ∼ μ X , and let μ V be the Chowdhury and Mémoli (2019, Thm. 24) that where U −1 and V −1 are the quantile functions of μ U and μ V , respectively.Thus, this bound quantifies the differences between the distributions of pairwise distances of the metric measure spaces (X , d X , μ X ) and (Y, d Y , μ Y ) in terms of the Kantorovich distance (also known as Wasserstein distance, see Villani 2003).
In this article, we investigate the statistical properties of the sample counterpart of DoD p in (2), which is on the one hand extremely simple to compute in a quadratic number of elementary operations (see Section 1.1) and on the other hand statistically accessible and useful for inference tasks such as object discrimination when the data are randomly sampled or the dataset is massive and subsampling becomes necessary.Generally, DoD p is a simple and natural measure to compare distance matrices.Such distance matrices underlie many methods of data analysis, for example, various multidimensional scaling techniques (see Dokmanic et al. 2015).Thus, we believe that our analysis is of general statistical interest beyond the described scenario.

The Proposed Approach
Given two metric measure spaces, denoted as (X , d X , μ X ) and (Y, d Y , μ Y ), we aim to construct an (asymptotic) test based on DoD p (X , Y) for the hypothesis testing problem Hence, it is possible (as done in the following) to employ an (asymptotic) level α test for H 0 for pose invariant object discrimination, that is, to test It is wellknown that the distribution of distances does not uniquely characterize a metric measure space (Mémoli 2011), that is, there are metric measure spaces X and Y such that DoD p (X , Y) = 0, although GW p (X , Y) > 0. In consequence, a test for H 0 applied to test for H * 0 cannot develop power for every alternative in H * 1 .However, this seems to be a minor issue for many practical applications.Indeed, the distribution of distances was proposed as a feature itself for feature based object matching and was shown to work well in various examples (Osada et al. 2002;Brinkman and Olver 2012;Berrendero, Cuevas, and Pateiro-López 2016;Gellert et al. 2019).Furthermore, the discriminative abilities of the distribution of distances are well studied theoretically (Boutin and Kemper 2004;Mémoli and Needham 2021), see also Sections 2 and 4.
To set up our statistical framework, let X 1 , . . .
The sample analog to (2) is to be defined with respect to the empirical measures and we obtain the DoD-statistic as where, for t ∈ R, U n , and V m are defined as the empirical cdf 's of all pairwise distances of the samples X n and Y m , respectively, that is, ( 5 ) Besides, U −1 n and V −1 m denote the corresponding empirical quantile functions.We stress that the evaluation of DoD p boils down to the calculation of a sum and no formal integration is required.Let d X  (i) denote the ith order statistic of the sample d X (X i , X j ) 1≤i<j≤n and let d Y (i) be defined analogously.Let N := n(n − 1)/2 and M := m(m − 1)/2.Then, where

Main Results
The main contributions of the paper are various upper bounds and distributional limits for the statistic defined in (4) (as well as trimmed variants).Based on these, we design an asymptotic test for the hypothesis H 0 defined in (3).Other statistical applications such as confidence intervals for DoD p or multisample extensions are straight forward and omitted for brevity.We focus, for ease of notation, on the case p = 2 (see Section 2.3 for p ∈ [1, ∞)), that is, we derive for β ∈ [0, 1/2) the limit behavior of the statistic under the hypothesis as well as under the alternative in (3).While in many applications β = 0 in ( 6) is a natural choice, the introduced trimming parameter β can be used to robustify the proposed method (Czado and Munk 1998;Alvarez-Esteban et al. 2008).Furthermore, it gives the possibility to focus the comparison on specific areas of the considered distributions of distances when additional information about their shapes is available.In Section 4, we illustrate the influence of this parameter empirically.Next, we briefly summarize the setting in which we are working and introduce the conditions required.
Setting 1.1.Let (X , d X , μ X ) and (Y, d Y , μ Y ) be two metric measure spaces and let μ U and μ V denote the distributions of (pairwise) distances of the spaces (X , d X , μ X ) and (Y, d Y , μ Y ), respectively.For U the cdf of μ U , assume that U is differentiable with derivative u and let U −1 be the quantile function of U. Let V, V −1 and v be defined analogously.Further, let the samples iid ∼ μ Y be independent of each other and let U −1 n and V −1 m denote the empirical quantile functions of U n and V m in (5).
Since the statistic DoD (β) in ( 6) is based on empirical quantile functions, or more precisely empirical U-quantile functions, we have to ensure that the corresponding U-distribution functions are well-behaved.In the course of this, we distinguish the cases β ∈ (0, 1/2) and β = 0.The subsequent condition guarantees that the inversion functional φ inv : F → F −1 is Hadamard differentiable as a map from the set of restricted distribution functions into the space of all bounded functions on [β, for some > 0 with strictly positive derivative u and let the analogous assumption hold for V and its derivative v.
When the densities of μ U and μ V vanish at the boundaries of their support, which commonly happens (see Example 2.1), we can no longer rely on Hadamard differentiability to derive the limit distribution of DoD (β) under H 0 for β = 0.In order to deal with this case we require stronger assumptions.The subsequent ones resemble those of Mason (1984).
Condition 1.3.Let U be continuously differentiable on its support.Further, assume there exist constants and let the analogous assumptions hold for V and (V −1 ) .
Both, Conditions 1.2 and 1.3 are comprehensively discussed in Section 2.1 and various illustrative examples are given there.
Limit distribution under H 0 : Here, we have that the distributions of distances of the considered metric measure spaces, μ U and μ V , are equal, that is, where G is a centered Gaussian process with covariance depending on U (under H 0 we have U = V) in an explicit but complicated way, see Theorem 2.6.Further, " " denotes weak convergence in the sense of Hoffman-Jørgensen (see van der Vaart and Wellner 1996, Part 1).Additionally in Section 2, we establish a simple concentration bound for DoD (β) and demonstrate that for β ∈ (0, 1/2) and α ∈ (0, 1) the corresponding α-quantile of , which is required for testing, can be obtained by a bootstrap scheme, see Section 3.
Limit distribution under H 1 : Under the additional assumption (which is only required for β > 0) that we can prove (see Theorem 2.7) that given Condition 1.2 it holds for n, m → ∞ and β ∈ (0, 1/2) (resp.given Condition 1.3 for where N(0, σ 2 U,V,λ ) denotes a normal distribution with mean 0 and variance σ 2 U,V,λ depending in an explicit way on U, V, β and λ = lim n,m→∞ n/(m + n).

Applications
From our theory it follows that for any β ∈ [0, 1/2) a (robust) asymptotic level α test for H 0 against H 1 is given by rejecting where ξ 1−α denotes the (1 − α)-quantile of .This has many possible applications.
Exemplarily, in Section 5, we model proteins as metric measure spaces by assuming that the coordinate files are samples from (unknown) distributions (see Rhodes 2010) and apply our method to compare the protein structures depicted in Figure 2.
Our major findings can be summarized as follows: 5D0U versus 5JPT: 5D0U and 5JPT are two structures of the same protein from different organisms.Consequently, their secondary structure elements can be aligned almost perfectly (see Figure 2, left).Only small parts of the structures are slightly shifted and do not overlap in the alignment.Applying (9) for this comparison generally yields no discrimination between these two protein structures, as DoD (β) is robust with respect to these kinds of differences.This robustness indeed makes the proposed method particularly suitable for protein structure comparison.
5D0U versus 6FAA: 5D0U and 6FAA are structures from closely related proteins and thus they are rather similar.Their alignment (Figure 2, right) shows minor differences in the orientation of some secondary structure elements and that 5D0U contains an α-helix that is not present in 6FAA.We find that DoD (β) is highly sensitive to such a deviation from H 0 , as the proposed procedure discriminates very well between both structures already for small sample sizes.
Besides of testing H 0 , we mention that our theory immediately allows to perform tests for relevant differences, that is, to test H : DoD (β) ≤ versus K : DoD (β) > for some specified > 0 (see Munk and Czado (1998) or Dette and Wu (2019) for a discussion).Further, k-sample comparisons and asymptotic confidence intervals for DoD (β) can be obtained analogously.Our results also justify subsampling (possibly in combination with bootstrapping) as an effective scheme to reduce the computational costs of O (m ∨ n) 2 further to evaluate DoD (β) for large scale applications.

Related Work
First, we note that U n and V m can be viewed as empirical cdf 's of the N := n(n − 1)/2 and Hence, (4) can be viewed as the one dimensional empirical Kantorovich distance with N and M (dependent) data, respectively.There is a long standing interest in distributional limits for the one dimensional empirical Kantorovich distance (Munk and Czado 1998;del Barrio et al. 1999;del Barrio, Giné, and Utzet 2005;Sommerfeld and Munk 2018;Bobkov and Ledoux 2019;Tameling, Sommerfeld, and Munk 2019) as well as for empirical Kantorovich type distances with more general cost functions (Berthet, Fort, and Klein 2018;Berthet and Fort 2019).Apparently, the major difficulty in our setting arises from the dependency of the random variables {d X (X i , X j )} and the random variables {d Y (Y k , Y l )}, respectively.Compared to the techniques available for stationary and α-dependent sequences (Dede 2009;Dedecker et al. 2017), the statistic DoD (β) admits an intrinsic structure related to U-and U-quantile processes (Nolan and Pollard 1988;Arcones and Giné 1994;Wendler 2012).Note that for β > 0 we could have used the results of Wendler (2012) to derive the asymptotics of DoD (β) as well, as they provide almost sure approximations of the empirical U-quantile processes U −1 n := , however, at the expense of slightly stronger smoothness requirements on U and V.In contrast, the more interesting case β = 0 is much more involved as the processes U −1 n and V −1 m do in general not converge in ∞ (0, 1) under Condition 1.3 and the technique in Wendler (2012) fails.Under the null hypothesis, we circumvent this difficulty by targeting our statistic for β = 0 directly, viewed as a process indexed in β.Under the alternative, we show the Hadamard differentiability of the inversion functional φ inv onto the space of R-valued, integrable functions on (0, 1) (denoted as 1 (0, 1)) and verify that this is sufficient to derive (8).
Notice that tests based on distance matrices appear naturally in several applications, see, for example, the recent works by Baringhaus and Franz (2004), Sejdinovic et al. (2013), andMontero-Manso andVilar (2019), where the two sample homogeneity problem, that is, testing whether two probability measures μ, ν ∈ P(R d ) are equal, is considered for high dimensions.Most similar in spirit to our work is Brécheteau (2019) who also considers an asymptotic statistical test for a different lower bound of the Gromov-Wasserstein distance.This is based on a nearest neighbor-type approach and subsampling.However, the subsampling scheme is such that asymptotically all distances considered are independent, while we explicitly deal with the dependency structures present in the entire sample of the n(n − 1)/2 distances.In Sections 4.2 and 5.1 we empirically demonstrate that this leads to an increase of power and compare our test with the one proposed by Brécheteau (2019) in more detail.Closely related from a practical point of view is also the work of Gellert et al. (2019), who used and empirically compared several lower bounds of the Gromov-Wasserstein distance for clustering of various redoxins including our lower bound in (2).In fact, to reduce the computational complexity they heuristically employed a bootstrap scheme related to the one investigated in this article and reported good empirical results.Finally, we mention that permutation based testing for U-statistics (see Berrett, Kontoyiannis, and Samworth 2020) is an interesting alternative to our bootstrap test and worth to be investigated further in our context.

Organization of the article
Section 2 states the main results and is concerned with the derivation of a simple finite sample bound for the expectation of DoD (β) as well as the proofs of ( 7) and ( 8).In Section 3, we propose a bootstrapping scheme to approximate the quantiles of defined in (7) for β ∈ (0, 1/2).Afterwards in Section 4 we investigate the speed of convergence of DoD (β) to its limit distribution under H 0 in a Monte Carlo study.In the same section, we further study the introduced bootstrap approximation and investigate what kind of differences are detectable employing DoD (β) for H 0 by means of various examples.We apply the proposed test for the discrimination of three-dimensional protein structures in Section 5 and compare our results to the ones obtained by the method of Brécheteau (2019).Our simulations and data analysis of the example introduced previously (see Figure 2) suggest that the proposed DoD (β) based test outperforms the one proposed by Brécheteau (2019) for protein structure comparisons.
In Part I of the supplementary materials, we provide additional details for the examples considered and give the full, technical proofs of our main results.Furthermore in supplementary materials I.B, we include a more general consideration of distributions of Euclidean distances of a certain class of metric measure spaces and supplementary materials I.D contains additional material on simulation results and examples.Part II of the supplementary materials contains several technical auxiliary results that seem to be folklore, but have not been written down explicitly in the literature, to the best of our knowledge.An R package, that implements the test proposed in Section 3 (see ( 15)), is available at https://github.com/cweitkamp3/Distribution-of-Distances.

Limit Distributions
In this section, we investigate the limit behavior of the proposed test statistic under the hypothesis H 0 in (3), where it holds μ U = μ V (see Theorem 2.6), and under the alternative H 1 in (3), where we have μ U = μ V (see Theorem 2.7).

Conditions on the Distributions of Distances
Before we come to the limit distributions of the test statistic DoD (β) under H 0 and H 1 , we discuss Conditions 1.2 and 1.3.We ensure that these conditions comprise reasonable assumptions on metric measure spaces that are indeed met in some standard examples.
Then, a straightforward calculation shows that the density u 1 of d X 1 (X, X ) is given as u 1 (s) = 4s 3 − 12s 2 + 8s, if 0 ≤ s ≤ 1, and zero else.For an illustration of u 1 see Figure 3. Obviously, u 1 is strictly positive and continuous on (0, 1) and thus Condition 1.2 is fulfilled for any β ∈ (0, 1/2) in the present setting.Furthermore, we find that for t ∈ (0, 1) the quantile function of u 1 is given as for t ∈ (0, 1), the requirements of Condition 1.3 are satisfied.
Given two random point clouds in R d , it is often natural to assume that they are uniform samples from some compact set and to compare them based on their Euclidean distances, which are easily computable.Even if both samples stem from a curve or a hypersurface, this approach might be reasonable, since the corresponding (possibly more meaningful) intrinsic distances are unknown in general.Hence, the distribution of distances of standard Euclidean metric measure spaces (i.e., metric measure spaces (X , d X , μ X ), where d X denotes the Euclidean distance and μ X the uniform distribution on X ) deserve special attention.In what follows, let S d−1 be the unit sphere in R d .
Example 2.2.Let X 2 denote a disk in R 2 with diameter 1 and let X 3 denote a square in R 2 with diameter 1.Let X 4 be the sphere S 1 , X 5 the sphere S 2 and X 6 the sphere S 4 .Furthermore, let 4, 5]).Now, consider the standard Euclidean metric measure spaces induced by the sets X i and denote by u i the density of the distribution of distances of the respective space, 2 ≤ i ≤ 7.All densities are illustrated in Figure 3.In Section A.1 of the supplementary materials, we carefully check which of the just defined metric measure spaces meet the requirements of Condition 1.2 for all β > 0 and which meet the requirements of Condition 1.3.Our findings show that only the distribution of distances of X 4 and X 7 fail to meet the conditions of Condition 1.3 and that only X 7 fails to meet the requirements of Condition 1.2 for all β > 0 (it does not meet them for any β > 0).
Due to the importance of standard Euclidean metric measure spaces, we will provide simpler conditions than Conditions 1.2 and 1.3 for these spaces in the following.However, Example 2.2 already suggests that the distributions of distances of standard Euclidean metric measure spaces based on curves or surfaces might be less regular and more complex (X 3 does not meet the requirements of Condition 1.3, the density u 3 is unbounded) than those of spaces based on sets with nonempty interior.Therefore, we concentrate in a first step on spaces of the latter kind.To this end, we require some notation.Let λ d denote the Lebesgue measure in R d and let A ⊂ R d , d ≥ 2, be a bounded Borel set with λ d (A) > 0. Recall that y ∈ R d is determined by its polar coordinates (t, v), where t = y 2 and v ∈ S d−1 is the unit length vector y/t.Thus, we define the covariance function (Stoyan, Kendall, and Mecke 2008, sec. 3.1) Furthermore, we define the diameter of a given metric space , be a compact Borel set with λ d (X ) > 0, d X the Euclidean metric and μ X the uniform distribution on X .Let diam (X ) = D.
(i) If k X is strictly positive on [0, D), then the induced metric measure space (X , d X , μ X ) meets the requirements of Condition 1.2 for any β ∈ (0, 1/2).(ii) If additionally there exists > 0 and η > 0 such that 1. the function k X is monotonically decreasing on Then, it clearly holds that the distributions of distances of both spaces agree, that is, Hence, Lemma 2.3 can be applied to Y as well.
The full proof of the above lemma is deferred to Section A.2 of the supplementary materials.To conclude this section, we remark that we investigate the distributions of distances of standard Euclidean metric measure spaces based on various curves and hypersurfaces in Section B of the supplementary materials.There, we derive, under several technical assumptions, an analogue of Lemma 2.3.

The Hypothesis
Throughout this section we assume that the distributions of distances of the two considered metric measure spaces Further, assume that X 1 , . . ., X n iid ∼ μ X and Y 1 , . . ., Y m iid ∼ μ Y are two independent samples.In order to study the finite sample bias of the statistic DoD (β) , the subsequent bound is helpful (for its proof see Section A.3 of the supplementary material).
(i) Let Condition 1.2 be met and let m, n → ∞ such that n/(n + m) → λ ∈ (0, 1).Then, where G is a centered Gaussian process with covariance cov G(t), G(t (11) Here, (ii) If we assume Condition 1.3 instead of Condition 1.2, then the analogous statement holds for the untrimmed version, that is, for β = 0.
The full proof of Theorem 2.6 can be found in Section A.4 of the supplementary materials.

The Alternative
In this section, we are concerned with the behavior of DoD (β) given that the distributions of distances of the metric measure spaces (X , d X , μ X ) and (Y, d Y , μ Y ) do not coincide.We distinguish the cases β ∈ (0, 1/2) and β = 0.
converges in distribution to a normal distribution with mean 0 and variance 16λ Here, d X x, y is as defined in Theorem 2.6 and d Y x, y is defined analogously.(ii) If we assume Condition 1.3 instead of Condition 1.2, then the analogous statement holds for the untrimmed version, that is, for β = 0.
The proof of Theorem 2.7 is given in Section A.5 of the supplementary materials.
Remark 2.8.The assumptions of Theorem 2.7 (i) include that β is chosen such that DoD (β) > 0. Suppose on the other hand that μ U = μ V , but DoD (β) = 0, that is, their quantile functions agree Lebesgue a.e. on the interval [β, 1 − β].Then, the limits found in Theorem 2.7 are degenerate and it is easy to verify along the lines of the proof of Theorem 2.6 that DoD (β) exhibits the same distributional limit as in the case DoD (β) = 0.
Remark 2.9.As noted by a referee, it is possible to slightly relax the assumptions of Theorem 2.7 (ii).It is sufficient to assume that U and V admit continuous densities that are strictly positive on the interior of their respective support and that J 2 (μ U ) (see ( 10) for a definition) as well as J 2 (μ V ) are finite (see Section A.5.2 of the supplementary materials for more information).These relaxed assumptions are related to those of Proposition 2.3 of Del Barrio, and Loubes (2019).
Remark 2.10.An immediate application for Theorem 2.7 is testing for relevant differences (or equivalence testing), that is, testing H : DoD (β) ≤ versus K : DoD (β) > (or K vs. H) for some specified > 0 (see Munk and Czado 1998;Dette and Wu 2019).In both cases, the quantiles required for testing are quantiles of the limiting normal distribution.Hence, a consistent estimator for the limiting variance (e.g., a plug-in estimator) yields consistent estimates for the quantiles required.
Remark 2.11.So far we have restricted ourselves to the case p = 2.However, most of our findings transfer to results for the statistic DoD p , p ∈ [1, ∞), in (4).Using the same ideas one can directly derive Theorems 2.5 and 2.6 for (a trimmed version of) DoD p (see Sections A.3 and A.4 of the supplementary materials) under slightly different assumptions.Only the proof of Theorem 2.7 requires more care (see Section A.5 in the supplementary materials).

Bootstrapping the Quantiles
The quantiles of the limit distribution of DoD (β) under H 0 depend on the unknown distribution U and are therefore in general not accessible.One possible approach, which is quite cumbersome, is to estimate the covariance matrix of the Gaussian limit process G from the data and use this to approximate the quantiles required.Alternatively, we suggest to directly bootstrap the quantiles of the limit distribution of DoD (β) under H 0 .To this end, we define and investigate the bootstrap versions of Let μ n denote the empirical measure based on the sample X 1 , . . ., X n .Given the sample values, let X * 1 , . . ., X * n B be an independent identically distributed sample of size n B from μ n .Then, the bootstrap estimator of U n is defined as ) and the corresponding (empirical) bootstrap quantile process for t ∈ (0, 1) as . One can easily verify along the lines of the proof of Theorem 2.6 that for n → ∞ it also holds for β ∈ (0, 1/2) This suggests to approximate the quantiles of by the bootstrapped ones of Let β ∈ (0, 1/2), suppose that Condition 1.2 holds, let n B ,α denote the empirical bootstrap quantile of R independent bootstrap realizations * ,(1) n B , . . ., * ,(R) n B .Under these assumptions, we derive (see Section C of the supplementary materials) that for any α ∈ (0, 1), lim n,n B ,R→∞ Because of (12), Equality ( 14) guarantees the consistency of ξ (R) n B ,α for n, n B , R → ∞.Hence, a consistent bootstrap analogue of the test defined by the decision rule (9) is, for β ∈ (0, 1/2), given by the bootstrapped Distribution of Distances (DoD)-test * (15)

Simulations
We investigate the finite sample behavior of DoD (β) in Monte Carlo simulations.To this end, we simulate the speed of convergence of DoD (β) under H 0 to its limit distribution (see Theorem 2.6).Moreover, we showcase the accuracy of the approximation by the bootstrap scheme proposed in Section 3 and investigate what kind of differences are detectable in the finite sample setting using the bootstrapped DoD-test * DoD defined in (15).Based on Theorem 2.7, it is further possible to test H : DoD (β) ≤ versus K : DoD (β) > for some specified > 0 (see Remark 2.10).However, only few distributions of distances are known explicitly and hence the choice of is slightly problematic.Therefore, and due to page restrictions, we did not include this application in the article.All simulations were performed in R (R Core Team 2017).In order to increase the readability of this section, several tables have been postponed to Section D.1 of the supplementary materials.

The Hypothesis
We begin with the simulation of the finite sample distribution under the hypothesis and consider the metric measure space (X , d X , μ X ) from Example 2.1, where X denotes the unit square in R 2 , d X the distance induced by the supremum norm and μ X the uniform distribution on X .We generate for n = m = 10, 50, 100, 250 two samples X n and X n of μ X and calculate for β = 0.01 the statistic n 2 DoD (β) .For each n, we repeat this process 10,000 times.The finite sample distribution is then compared to a Monte Carlo sample of its theoretical limit distribution (sample size 10,000).Kernel density estimators (Gaussian kernel with bandwidth given by Silverman's rule) and Q-Q-plots are displayed in Figure 4.All plots highlight that the finite sample distribution of DoD (β) is already well approximated by its theoretical limit distribution for moderate sample sizes.Moreover, for n = 10 the quantiles of the finite sample distribution of DoD (β) are in general larger than the ones of the sample of its theoretical limit distribution, which suggests that the DoD-test will be rather conservative for small n.For n ≥ 50 most quantiles of the finite sample distribution of DoD (β) match the ones of its theoretical limit distribution reasonably well.

The Bootstrap Test
We now investigate the finite sample properties of the bootstrap test * DoD (defined in ( 15)).To this end, let μ W 1 denote the uniform distribution on a 3D-pentagon (inner pentagon side length: 1, Euclidean distance between inner and outer pentagon: 0.4, height: 0.4) and let μ W 6 denote the uniform distribution on a torus (center radius: 1.169, tube radius: 0.2) with the same center and orientation (see the plots for t 0 = 0 and t 6 = 1 in Figure 5).To interpolate between these spaces, we consider , the 2-Wasserstein geodesic between μ W 1 and μ W 6 (see (Santambrogio 2015, sec. 5.4) for a formal definition).Figure 5 displays for t i ∈ {0, 0.1, 0.2, 0.4, 0.6, 1} the metric measure spaces discretely approximated based on 40,000 points with the WSGeometry-package (Heinemann and Bonneel 2021).
Before we employ the bootstrap DoD-test with β = 0.01 to compare W 1 to the spaces {W i } 6 i=1 , we consider the bootstrap approximation proposed in Section 3 in this setting.Therefore, we generate n = 100, 250, 500, 1000 realizations of μ W 1 and as described in Section 3. We then compare the obtained finite sample distributions to ones of DoD (β) (W 1,n , W 1,n ) for the different n (W 1,n and W 1,n denote two independent samples of μ W 1 of size n).The results are summarized as kernel density estimators (Gaussian kernel with bandwidth given by Silverman's rule) and Q-Q-plots in Figure 6.Both, the kernel density estimators and the Q-Q-plots show that for n ≤ 250 the bootstrap quantiles are clearly larger than the empirical quantiles leading to a rather conservative procedure for smaller n, an effect that disappears for large n.
Next, we aim to apply * DoD with β = 0.01 at 5%-significance level for discriminating between W 1 , d W 1 , μ W 1 and each of the spaces W i , d W i , μ W i , i = 1, . . ., 6.To this end, we bootstrap the quantile ξ 0.95 based on samples from μ W 1 as described in Section 3 (R = 1000) and then we apply the test * DoD , defined in (15), with the bootstrapped quantile ξ (R)  n B ,α on 1000 samples of size n = 100, 250, 500, 1000 as illustrated in Section 3. We find that the prespecified significance level (see t 1 = 0) is never exceeded and the test is rather conservative for smaller n.Concerning the power of the test * DoD , we observe that it increases consistently with the increasing Wasserstein distance between the measures μ W i , 1 ≤ i ≤ 6.For n ≥ 250 the differences between W 1 and W i , 4 ≤ i ≤ 6 (see Figure 5) are clearly detected.If we choose n = 1000, even the spaces W 1 and W 3 (that correspond to t 1 and t 3 ) are almost always discriminated, although in this case, the (approximated) Wasserstein distance between μ W 1 and μ W 3 is smaller than 0.034.The test even develops some power for the comparison of W 1 and W 2 despite their strong similarity (see Figure 5).The detailed results are summarized in Table D.1 in Section D.1 of the supplementary materials.
In order to highlight how much power we gain in the finite sample setting by carefully handling the occurring dependencies we repeat the above comparisons, but calculate DoD (β) only based on the independent distances, that is, on the distances {d X (X 1 , X 2 ), d X (X 3 , X 4 ), . . .D.2 in Section D.1 of the supplementary materials.Apparently, D ind keeps its prespecified significance level of α = 0.05, but develops significantly less power than * DoD in the finite sample setting.
Furthermore, we investigate the influence of β on our results.To this end, we repeat the previous comparisons with n = 500 and β = 0, 0.01, 0.05, 0.25.It highlights that the test * DoD holds its level for all β.While the results are overall comparable, we observe some slight differences for the various values of β.For instance, for β = 0.25 the test * DoD develops slightly more power for the comparison of W 1 , d W 1 , μ W 1 and W 3 , d W 3 , μ W 3 than for β = 0. Apparently, in this case the respective (true) distributions of distances strongly resemble each other for small and large distances and the comparison of W 1 and W 3 becomes to some degree more informative, if we do not consider these distances.All results are summarized in Table D.3 in Section D.1 of the supplementary materials.
To conclude this section, we remark that in the above simulations the quantiles required for the applications of * DoD were always estimated based on samples of μ W 1 .Evidently, this slightly affects the results obtained, but we found that this influence is not significant.

Structural Protein Comparisons
Next, we apply the DoD-test to compare the protein structures displayed in Figure 2. First, we compare 5D0U with itself, in order to investigate the actual significance level of the proposed test under H 0 in a realistic example.Afterwards, 5D0U is compared with 5JPT and with 6FAA, respectively.However, before we can apply * DoD , we need to model proteins as metric measure spaces.Thus, we briefly recap some well known facts about proteins to motivate the subsequent approach.A protein is a polypeptide chain made up of amino acid residues linked together in a definite sequence.Tracing the repeated amide, C α and carbonyl atoms of each amino acid residue, a so called backbone can be identified.It is well established that the distances between the C α atoms of the backbone contain most of the information about the protein's structure (Rossman and Liljas 1974;Jones and Thirup 1986;Holm and Sander 1993).In order to verify that the test is able to compare protein structures based on subsamples (which might be important for database queries), we randomly select n = 10, 50, 100, 250, 500 from the 650 to 750 C α atoms of the respective proteins and assume that the corresponding coordinates are samples of unknown distributions μ X i 3 i=1 supported on Borel sets X i ⊂ R 3 with λ 3 (X i ) > 0 that are equipped with the Euclidean distance.We stress that although the backbone of a protein is usually represented as a curve in R 3 (see e.g., Figure 2), it is important to note that these representations are extracted from indirect, noisy observations of the electron density (see Rhodes 2010).In consequence, it is more realistic to assume that positions are drawn from a tube-like structure with non-empty interior.We choose β = 0.01, α = 0.05 and determine for each n the bootstrap quantile ξ (R) n B ,0.95 based on a sample of size n from 5D0U (R = 1000, n B = n) as illustrated in Section 3.This allows us to directly apply the test * DoD on the drawn samples.
The results of our comparisons are summarized in Figure 7.It displays the empirical significance level resp.the empirical power of the proposed method as a function of n. 5D0U versus 5D0U: In accordance with the previous simulation study this comparison (see Figure 7, left) shows that * DoD is conservative in this application as well.

5D0U versus 5JPT:
We have already mentioned in Section 1.3 that 5D0U and 5JPT are structures of the same protein from two different organisms and thus highly similar (their alignment has a root mean deviation of less than 0.59 Å).The empirical power for this comparison (Figure 7, middle) stays for all n below α = 0.05.Thus, the test does not discriminate between the two protein structures in accordance with our biological knowledge.
5D0U versus 6FAA: Although the protein structures 5D0U and 6FAA are similar at large parts (their alignment has a root mean square deviation of 0.75 Å), the DoD-test is able to discriminate between them with high statistical power.The empirical power (Figure 7, right) is a strictly monotonically increasing function in n that is greater than 0.63 for n ≥ 100 and approaches 1 for n = 500 (recall that we use random samples of the 650-750 C α atoms).
We remark that throughout this section we have always based the quantiles required for testing on samples of the structure 5D0U.By the definition of * DoD it is clear that this affects the results.If we compared the structures 6FAA and 5D0U using * DoD with quantiles obtained by a sample of 6FAA, the results would change slightly, but remain comparable.

Comparison to the DTM-Test
We investigate how the test proposed by Brécheteau (2019) compares to * DoD .To this end, we first briefly introduce the method proposed in Brécheteau (2019), empirically study various toy examples and analyze the differences of both tests for protein structure comparison.We summarize the results of these comparisons here and give the tables with the precise results in Section D.2 of the supplementary materials.
Let (X , d X , μ X ) denote a metric measure space with For n S ≤ n the empirical distance to measure signature with mass parameter κ = k/n is defined as where X (j) i denotes the jth nearest neighbor of X i in the sample X n (for general κ see Brécheteau (2019)).In particular, we observe that D X n ,κ (n S ) denotes a discrete probability distribution on R. Let D Y n ,κ (n S ) be defined analogously to (16).Then, given that n S n = o(1), Brécheteau ( 2019) constructs an asymptotic level α test for H * 0 defined in Section 1.1 based on the 1-Kantorovich distance between the respective empirical distance to measure signatures, that is, on the test statistic The corresponding test, that rejects if T n S ,κ (X n , Y n ) exceeds a bootstrapped critical value q DTM α , is denoted as DTM in the following.Brécheteau (2019) proves that, similar to DoD (β) , the statistic T n S ,κ is a (subsampled) empirical version of a lower bound T κ (X , Y) of the Gromov-Wasserstein distance (see (Brécheteau 2019, sec. 1) for a formal definition).It is important to note that there are metric measure spaces X and Y such that T κ (X , Y) = 0 although DoD (0) (X , Y) > 0 and vice versa (see Section A.7 of the supplementary materials for detailed comparison of T κ (X , Y) and DoD (0) (X , Y)).
We now compare both methods in two simulated examples.To this end, we first repeat the comparisons of W 1 , d W 1 , μ W 1 with the spaces W i , d W i , μ W i 6 i=1 (see Section 4.2 for the definitions) with DTM .Second, we simulate the empirical power of * DoD in the setting of Section 4.2 of Brécheteau (2019) for the comparison of different spiral types.For both comparisons, we choose a significance level of α = 0.05.We remark that the test DTM is not easily applied in the finite sample setting.Although it is an asymptotic test of level α, the parameters n S and κ have to be chosen carefully for the test to hold its prespecified significance level for finite samples.In particular, choosing n S and κ large violates the assumption of (asymptotic) independence underlying the results of Brécheteau (2019).In both settings, we found comparable results.While the test DTM (just like * DoD ) approximately holds his α level in both frameworks (κ ≤ 0.1 and n S ≤ n/15 for the comparison of the spaces W i , d W i , μ W i 6 i=1 /κ = 0.05 and n S = 20 for spiral comparison of Brécheteau 2019, sec.4.2), the additional subsampling in the definition of T n S ,κ (X , Y) in ( 17) leads to a notable loss of power.The complete results of these comparisons can be found in Tables D.4 and D.5 of Section D.2 of the supplementary materials.
Finally, we come to the protein structure comparison.We repeat the previous comparisons of 5D0U, 5JPT and 6FAA for a significance level α = 0.05, n = 100, 250, 500, n S = N/5 and κ = 0.05, 0.1.The test DTM approximately holds its significance level and is more sensitive to small local changes such as slight shifts of structural elements for small mass parameters κ compared to * DoD .However, the evident differences between 5D0U and 6FAA are detected much better by * DoD (see Figure 7).The complete results of this numerical study are reported in Table D.6 (see, Section D.2 of the supplementary materials).

Discussion
We conclude this section with some remarks on the modeling of proteins as metric measure spaces.So far, we have treated all C α atoms as equally important, although it appears to be reasonable for some applications to put major emphasis on the cores of the proteins.Further, one could have included that the error of measurement is in general higher for some parts of the protein by adjusting the measure on the considered space accordingly.We remark that throughout this section we have considered proteins as rigid objects and shown that this allows us to efficiently discriminate between them.However, it is well known that proteins undergo different conformational states.In such a case the usage of the Euclidean metric as done previously will most likely cause * DoD to discriminate between the different conformations, as the Euclidean distance is not suited for the matching of flexible objects (Elad and Kimmel 2003).Depending on the application one might want to take this into account by adopting a different metric reflecting (estimates of the) corresponding intrinsic distances and to modify the theory developed.Conceptually, this is straightforward but beyond the scope of this illustrative example.

Figure 1 .
Figure 1.Illustration of the proteins to be compared: Cartoon representation of the DEAH-box RNA-helicase Prp43 from chaetomium thermophilum bound to ADP (PDB ID: 5D0U (Tauchert et al. 2016)) in two different poses.The DEAH-box helicase Prp43 unwinds double stranded RNA and rearranges RNA/protein complexes.It has essential roles in pre-mRNA splicing and ribosome biogenesis (Arenas and Abelson 1997; Lebaron et al. 2005).
1)M∨(j−1)N} .Here, and in what follows, a ∧ b denotes the minimum and a ∨ b the maximum of two real numbers a and b.Hence, the representation (2) admits an empirical version which is computable in O (m ∨ n) 2 elementary operations, if the computation of one distance is considered as O(1).

Figure 2 .
Figure 2. Illustration of the proteins to be compared: Cartoon representation of the DEAH-box RNA-helicase Prp43 from chaetomium thermophilum bound to ADP (purple, PDB ID: 5D0U (Tauchert et al. 2016)) in alignment with Prp43 from saccharomyces cerevisiae in complex with CDP (cyan, PDB ID: 5JPT (Robert-Paganin et al. 2016), left) and in alignment with the DEAH-box RNA helicase Prp2 in complex with ADP (orange, PDB ID: 6FAA (Schmitt et al. 2018), right).Prp2 is closely related to Prp43 and is necessary for the catalytic activation of the spliceosome in pre-mRNA splicing (Kim and Lin 1996).

Figure 4 .
Figure 4. Finite sample accuracy of the limit law under the hypothesis: Upper row: Kernel density estimators of the sample of DoD (β) (in blue) and a Monte Carlo sample of its theoretical limit distribution (in red, sample size 10,000) for n = 10, 50, 100, 250 (from left to right).Lower row: The corresponding Q-Q-plots.

Figure 5 .
Figure 5. Different metric measure spaces: A graphical illustration of the metric measure spaces W i , d W i , μ W i 6 i=1 .

Figure 6 .
Figure 6.Bootstrap under the hypothesis: Illustration of the n out of n plug-in bootstrap approximation for the statistic DoD (β) based on two samples from W 1 , d W 1 , μ W 1 .Upper row: Kernel density estimators of 1000 realizations of DoD (β) (in red) and its bootstrap approximation (blue, 1000 replications) for n = 100, 250, 500, 1000 (from left to right).Lower row: The corresponding Q-Q-plots.
, d X (X n−1 , X n )} and {d Y (Y 1 , Y 2 ), d Y (Y 3 , Y 4 ), . . ., d Y (Y m−1 , Y m )}, instead of all available distances.From now on this statistic is denoted as D β,ind .Similarly, we construct an asymptotic level α test D ind based on D β,ind .The results for comparing W 1 , d W 1 , μ W 1 and W i , d W i , μ W i 6 i=1 using D ind with β = 0.01 are displayed in Table

Figure 7 .
Figure 7. Protein Structure Comparison: Empirical significance level for comparing 5D0U with itself (left), empirical power for the comparison of 5D0U with 5JPT (middle) as well as the the empirical power for comparing 5D0U with 6FAA (right).1000 repetitions of the test * DoD have been simulated for each n.
and consider the standard Euclidean metric measure spaces (X , d X , μ X ) and (Y, d Y , μ Y ) induced by X and Y, respectively.Suppose that φ : X → Y is a measure preserving isometry (i.e., φ#μ X