Persistence Flamelets: Topological Invariants for Scale Spaces

Abstract In recent years there has been noticeable interest in the study of the “shape of data.” Among the many ways a “shape” could be defined, topology is the most general one, as it describes an object in terms of its connectivity structure: connected components (topological features of dimension 0), cycles (features of dimension 1) and so on. There is a growing number of techniques, generally denoted as Topological Data Analysis, or TDA for short, aimed at estimating topological invariants of a fixed object; when we allow this object to change, however, little has been done to investigate the evolution in its topology. In this work we define the Persistence Flamelet, a multiscale version of one of the most popular tool in TDA, the Persistence Landscape. We examine its theoretical properties and we show its performance as both an exploratory and inferential tool. In addition, we provide open source implementation of the objects and methods presented in the R-package pflamelet. Supplementary materials for this article are available online.


Introduction
Topological data analysis (TDA) is a new and expanding branch of statistics devoted to recovering the shape of the data, focusing in particular on their connectivity. At its core, TDA comprises a set of tools for detecting and recovering some structure in the data such as connected blocks or clusters (i.e., 0-dimensional topological features) and cycles or loops (i.e., 1-dimensional topological features), while also providing a measure of the importance of each of these elements. As these "topological" features can be defined for very different types of data, from standard vectors in Euclidean spaces to networks or functions, TDA has gained popularity as a unified framework to describe arbitrarily complex objects through easily interpretable features such as their peaks, loops and voids (Wasserman 2018).
Summaries of the topology of data have been exploited in both explorative Bendich et al. 2016) and inferential settings (Padellini and Brutti 2017;Atienza, González-Díaz, and Soriano-Trigueros 2018;Le and Yamada 2018), when the object of interest is supposed to have only one resolution. This assumption can however, be limiting, as data can have multiple resolution either because of the way they are defined (this is the case for example of time series, whose resolution parameter is time) or because of the way we represent them (this is the case for example when we impose smoothing on the data, and we obtain different scales corresponding to different levels of smoothness).
In general, due to the ever-growing complexity of data, being able to examine it at different resolutions, hence, obtaining different insights, has become a crucial feature of statistical CONTACT Tullia Padellini tullia.padellini@bancaditalia.it Directorate General for Economics, Statistics and Research, Bank of Italy, Rome, Italy. The views expressed in this paper are those of the authors and do not necessarily reflect those of the Bank of Italy.
Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JCGS. tools. Building on the TDA's toolbox, we thus introduce a new topological summary, the Persistence Flamelet, which is able to characterize the evolution of the topological structure of an object across different scales and can be used for both exploratory and inferential purposes. This informative yet interpretable summary, has in fact probabilistic properties that make it suitable for statistical inference. At the same time, the Persistence Flamelet is an effective visualization tool, which allows for exploration of arbitrarily high dimensional data. This work is structured as follows: in Section 2 we briefly review TDA and its tools for a fixed scale. In Section 3 we introduce the Persistence Flamelet as a topological summary of a scale space and we investigate its theoretical properties. Finally, Section 4 shows how the Persistence Flamelets can be exploited in applications, with a special emphasis on two famous scale parameters: the time in dynamical point-clouds and the bandwidth in kernel density estimation. Implementations of the methods described in this work are freely available in the Rpackage pflamelet.

The Shape of Fixed-Scale Data
The mathematical backbone of TDA is the notion of Persistent Homology (Edelsbrunner and Harer 2010). Roughly speaking, Persistent Homology provides a characterization of the topological structure of any arbitrary function of data f by building a filtration on it (typically its sub-or super-levelsets, f ε and f ε , respectively).
The reason why it is necessary to investigate the topology of some function f rather than that of data directly is that data itself often presents only a trivial topological structure. This is especially true when our data is a set of points X = {X i ∈ R d , i = 1, . . . , n}, an object usually referred to as point cloud in the TDA literature.
As can be seen in Figure 1, in fact, when we look at the observed points (in black) we can see no connection between them. Every point X i is "connected" only to itself, thus, we have as many connected components as there are observations but no higher dimensional topological structures such as circles or voids. In order to recover the clear circular structure on which data lay, it is necessary to connect the points through an additional function f , in this case a distance function (whose sub-levelsets are shown in gray). The link between Persistent Homology and the "shape of the data" is that for some choice of f , sub-(or respectively super-) levelset filtrations are topologically equivalent to the space data X was sampled from, say M, which in the following we will assume to be a compact manifold with no boundary embedded in R d . We will now show the construction of Persistent Homology for two classes of functions for which this equivalence holds: distances and kernel density estimators. While we focus on these two example for the ease of interpretation, the result presented in the following are not limited to these functions, and, depending on the application of interest, other choices of f may be better suited (Chintakunta et al. 2015;Atienza, Gonzalez-Diaz, and Rucco 2019).
Distance functions. The most common choice for analyzing the topological structure of M is to investigate the Persistent Homology of the sub-levelset filtrations of a distance function. At each level ε, the ε-sub-levelset of the distance d, d ε , is defined as denotes a ball of radius ε and center X i , and d X is an arbitrary distance function.
The topology of d ε can be recovered by computing its Homology Groups; Homology groups of dimension 0, H 0 (d ε ), represent connected components of d ε , H 1 (d ε ) represent its loops, and so on. d ε is topologically more interesting than the original point cloud, but it is extremely sensible to the radius ε. For each value of ε, in fact, we obtain a different estimate d ε , with a different topological structure: for small values of ε, the topology of d ε is close to the one of the point-cloud itself. As ε grows more and more points start to be connected, until eventually the corresponding d ε is homeomorphic to a point.
The key feature of encoding data into a filtration is that as ε grows, different sub-levelsets d ε 1 , d ε 2 with ε 1 < ε 2 are related, so that if a feature is present in both we can say that it remains alive in the interval [ε 1 , ε 2 ]. Persistent Homology then allows to see how features appear and disappear at different scales. Values ε b < ε d of ε corresponding, respectively, to when two components are connected for the first time (birth-step) and when they connect to some other larger component (death-step) are the generators of a Persistent Homology Group (Figure 1).
In the statistical literature, d ε is often known as the Devroye-Wise support estimator (Devroye and Wise 1980). The consistency of the Devroye-Wise estimator justifies and motivates the use of the distance function: as d ε is a consistent estimator of M, the topology of d ε is a sensible approximation of the topology of M.
Kernel Density estimators. The second way of linking levelset filtrations to the topology of M, the support of the distribution generating the data, is that the super-levelsets of a density function p can be topologically equivalent to the support of the distribution itself ). More formally, if the data are sampled from a distribution P supported on M, and if the density p of P is smooth and bounded away from 0, then there is an interval [η, δ] such that the super-levelset p ε = {x | p(x) ≥ ε} is homotopic (i.e., topologically equivalent) to M, for η ≤ ε ≤ δ.
Since the true generating density p is most often unknown, it is typically approximated by a kernel density estimator p. A naïve way to estimate the topology of M is then to compute topological invariants of the super-levelset of the kernel density estimator p: The super-levelsets p ε , with ε ∈ [0, max p], form a decreasing filtration, which means that p ε ⊂ p δ for all δ ≤ ε. As in the case of distances, for each element in the filtration, that is, for each value ε, we obtain a different estimate p ε , whose topology can be characterized by its Homology Groups. Since in practice it is not possible to determine the interval [η, δ] in which the topology of p ε is closest to that of M, we analyze the evolution of the topology over the whole filtration. Once again, Persistent Homology allows to analyze how those Homology Groups change with ε. Persistent loops in p ε naturally represent circular structures in p, Persistent Homology Groups of dimension 2 indicate holes in p and so on.
An example of this construction can be seen in Figure 2. When ε is close to max p, the super-levelsets of p (highlighted in gray), are disjoint, as shown in the left panel of the Figure. When ε decreases, the local peaks that may be disconnected at the beginning (as the smaller one in the Figure), merged into the same super-levelset. Finally, for ε = 0, p 0 is formed by two disjoint components, corresponding to the two main peaks of the distribution. Since 0 is the smallest value the function p can take, it is also the "last" value for which the peaks can be defined, hence, it is taken to be their "time of death. " It is worth noticing that topological features of dimension 0, or connected components have a relevant interpretation in terms of "bumps"; connected components in the filtration p ε are in fact local maxima of p; this is true for any super-levelset filtration. When the filtration is defined in terms of sub-levelset instead, as in the case of the distance function, connected components represent local minima.

Persistence Diagram
Persistent Homology Groups can be summarized by the Persistence Diagram, is the ith generator of the Persistent Homology Group. Features with a long "lifetime" (or persistence pers = b − d) are those which can be found at many different resolution of the filtration, and are informative of the topology of M. Points that are close to the diagonal instead represent short-lived features, which may be only noisy artifacts and can be neglected .
The space of Persistence Diagrams D is a metric space when endowed with the Bottleneck distance, which, given two Persistence Diagrams D and D , is defined as where the infimum is taken over all bijections γ : D → D .
Several other metrics have been proposed to compare Persistence Diagrams, for example the Wasserstein distance (Mileyko, Mukherjee, and Harer 2011) or the Fisher Information metric (Le and Yamada 2018). One of the main advantages of adopt-ing the Bottleneck distance, is that it allows to prove stability, arguably one of the most important properties of Persistence Diagrams, under relatively mild assumptions (Chazal et al. 2016).
Theorem 2.1 (Stability). Let f and g be two functions on a triangulable space X and let D f , D g be the Persistence Diagram built on their respective sub-(or super-) levelset filtrations, then In the special case of f = d X and g = d Y two distance functions defined on two point-clouds X and Y, respectively, the stability result can be written in a more easily interpretable way: where d H (X, Y) is the Hausdorff distance between two topological spaces X and Y. Roughly speaking this means that if the two point clouds X and Y are close, their persistence Diagrams will be as well, and can be interpreted in two ways: • the Persistence Diagram is a topological signature: stability reassures us that if two point clouds X, Y are similar their Persistence Diagrams will be as well, and is therefore instrumental for using them in statistical tasks such as classification or clustering; • the Persistence Diagram is statistically consistent: stability reassure us that if we are using a point-cloud X n to estimate the topology of an unknown object X, if X n → X as n → ∞, then D X n converges to D X as well.
Stability is also key to assess statistical significance of topological features. Building on the core idea that features that are close to the diagonal are more likely to be noise than those that are far away from it, Fasy et al. (2014) proposes a way to define a bootstrap confidence band around the diagonal. Points of the Diagram laying outside the band are taken to be significant, as they are most likely signal, whereas those inside the band may be just noise.

Persistence Landscape
Persistence Diagrams are defined in spaces endowed with only a metric structure, which can be limiting in data analysis. A collection of Persistence Diagrams D 1 , . . . , D n in fact does not have a unique mean, nor a satisfying measure of variability (Turner et al. 2014). More critically, although it is possible to define a probability distribution on the space of Persistence Diagrams D, (Mileyko, Mukherjee, and Harer 2011), it is still not clear how to explicitly derive it (if it is possible to derived it at all). In order to overcome these issues and to work with more statistics-friendly spaces, several tools have been developed to convert Persistence Diagrams into functional objects, the most famous being the Persistence Landscape (Bubenik 2015) and the Persistence Silhouette (Chazal et al. 2014). These topological summaries are built by mapping each point z = (b, d) of a Persistence Diagram D to a piecewise linear function called the "triangle" function T z , which is defined as Informally a triangle function links each point of the Diagram to the diagonal with segments parallel to the axes, and then rotates them of 45 degrees.
The triangles T z can be combined in many different ways. If we take their k-max, that is, the kth largest value in the set T z (y), we obtain the kth Persistence Landscape The Persistence Landscape λ D is a representation of the Persistence Diagram D as a collection {λ 1 D , . . . , λ K D } of piecewise linear functions, indexed by the order of the maximum to be considered in defining the Landscape, k, which can be any number between 1 and m, the cardinality of the Persistence Diagram D. If we take the weighted average of the functions T z (y), we have the Power Weighted Silhouette Figure 3 shows a point cloud with its corresponding Persistence Diagram and Landscape. The two circles in the data, clearly picked up by the Persistence Diagram, correspond to the two peaks of the Persistence Landscape. Interestingly, the Landscape also retains information about the persistence of these two loops or cycles, with the peak on the left, corresponding to the smaller circle, being slightly shorter than the one on the right.
While the space of Persistence Diagrams D is only a metric space, Persistence Landscapes are defined in a much richer Banach space L, endowed with the following norm It is not possible to go back from Persistence Landscapes to Persistence Diagrams, meaning that there is a loss of information in going from Persistence Diagrams to Persistence Landscapes. However, the Persistence Landscape is still informative, since stability still holds (Bubenik 2015).
Theorem 2.2. Let f , g be two functions on X and let D f and D g be the Persistence Diagrams built from their super-(or sub-) levelsets, then Persistence Landscapes are piece-wise linear functions, which makes it possible to define a (unique) mean and a variance for any collection of them. The main advantage of the Persistence Landscape over the Persistence Diagram is that it is defined in a Banach Space, which is instrumental in statistical learning as it allows for a full characterization of the Persistence Landscape as a random variable. More details can be found in the supplementary materials.

The Persistence Flamelet
Persistence Diagrams and Persistence Landscapes give us a full characterization of a function f in terms of the topology of its sub-levelset (or super-levelset) filtration, however, they do not show changes in its structure as f varies. We now focus on the case where rather than one function f we are dealing with a family of functions F = {f σ , σ ∈ S}, indexed by some parameter σ , which represent the resolution or the scale of the object f σ . This is a very common setting in statistics, as the presence of multiple scales in the analysis can arise both from the nature of the data themselves (this is the case for example when σ is the time in time series or the spatial resolution in images or geo-referenced data) and from the algorithm used to perform the analysis, where σ typically represent a tuning parameter.
Although traditional methods focus on selecting a single optimal scale σ , inspired by scale space theory (Lindeberg 1994), we adopt the idea that different scales yield different information, hence, all of them must be simultaneously taken into account in order to get a better understanding of the phenomenon under analysis. We restrict ourselves to the case where the F = {f σ , σ ∈ S} is continuously indexed by the scale parameter σ , defined in a bounded interval S. For the sake of simplicity we will assume σ ∈ [0, 1], as every bounded interval can be rescaled to [0, 1]. Two notable examples that we will explore more thoroughly in the following are kernel smoothers, for which the resolution σ is given by the bandwidth parameter h, and time-varying processes, whose scale σ is the time, t.
Previous attempts at encoding a multi-resolution family F = {f σ , σ ∈ [0, 1]} into the TDA framework focused on considering the Persistence Diagram itself as a function of the scale parameter σ . The family of Persistence Diagrams D = {D σ , σ ∈ [0, 1]} corresponding to F, is known as Persistence Vineyards (Cohen-Steiner, Edelsbrunner, and Morozov 2006) and is a stable and continuous representation of the topology of the whole F (Munch 2013). Despite their theoretical relevance, however, Persistence Vineyards suffer from several drawbacks that hinder their popularity in applications. Comparing two of them, for example, is indeed computationally very intensive, due to the fact that matching distances need to be computed for all the Persistence Diagrams comprising the scale-space. As opposed to the Persistence Diagram, which is praised for being an invaluable visualization tool, graphical representation of Vineyards may at times be cumbersome to interpret. Moreover, they share all the drawbacks and limitations of Persistence Diagrams, more specifically they lack a unique average and a measure of variability for a group of them (Turner et al. 2014), and, once again, since their probabilistic behavior is not well understood, the use of Persistence Vineyard in statistical inference is severely compromised.
We thus introduce a new representation, based on the Persistence Landscape, that overcomes most of these issues and provides a functional representation with very favorable probabilistic properties, which can be exploited for statistical inference using well known tools of Functional Data Analysis, thus, greatly extending the potential application of topological summaries. It is worth noticing that although in the following we focus on Persistence Landscapes, the same results hold for Persistence Silhouettes as well. In order to explicitly take into account the multiple resolutions of F, we consider the Persistence Landscapes λ D σ corresponding to the family F = {f σ , σ ∈ [0, 1]} as a function of the scale parameter σ . Visually we can think of such function as a "flow" of landscapes, one for each resolution, smoothly moving and resembling a tiny fire (see, e.g., Figure 7). Definition 3.1 (Persistence Flamelet). Given a Persistence Vineyard D, that is a collection of Persistence Diagrams D = {D σ , σ ∈ [0, 1]}, continuously indexed by some parameter σ ∈ [0, 1], and k ∈ N + , we define the kth Persistence Flamelet as the function As the Landscape itself, the Persistence Flamelet is also a collection = { k , k ∈ N + } indexed by the order of the max we consider.
The theoretical reassurance that the Persistence Flamelet is a meaningful topological summary is its stability, which we will prove in the following. Before doing so, however, we need to introduce a notion of proximity between Persistence Flamelets.
Vineyards and D , G the corresponding Persistence Flamelets. We define the Integrated Landscape distance between D and G as As functional objects, Persistence Flamelets can be compared by a natural extension of the usual L p metrics, which is way less computationally demanding than the matching distances needed for Persistence Diagrams and Vineyards. Proof. These statements are a direct consequence of the Stability Theorem for Persistence Landscapes (Theorem 2.2) and the continuity of Persistence Vineyards, in fact: 1. For a fixed σ , consider D σ and D σ +ε (same applies for G). By Theorem 2.2 and the continuity of D we have 2. Since for a fixed σ we have, by Theorem 2.2 we have integrating both terms is enough to prove the result.
The Persistence Flamelet is also a random variable defined in a Banach space. In analogy with what Bubenik (2015) has done for Persistence Landscapes, we define a norm for Persistence Flamelets, more specifically Then, following Ledoux and Talagrand (2013), we can extend the Law of Large Numbers and the Central Limit Theorem to this new object.
Corollary 3.1.1 (Strong Law of Large Numbers). Let { n } n∈N be a sequence of independent copies of and, for a given n, let S n = 1 + · · · + n , where the sum is defined pointwise.
Corollary 3.1.2 (Central Limit Theorem). Assume B has type 2 in the sense of Hoffmann-Jorgensen and Pisier (1976). If E(V) = 0 and E( 2 ) < ∞ then S n √ n converges weakly to a Gaussian random variable G( ) with the same covariance structure as .
Proofs follow from Theorems A.1 and A.2 of the supplementary materials.

Confidence Band for Persistence Flamelets
In order to strengthen the role of the Persistence Flamelet as a statistical tool, we now show that it is possible to build confidence bands on the mean Persistence Flamelet by means of bootstrapping, in analogy with what has been done for Persistence Landscapes . What follows applies to any level k of a Flamelet, hence, we omit any explicit reference to it in order to ease the notation.
Let L n = { 1 , . . . , n }, be a sequence of Persistence Flamelets independently sampled from some probability distribution P. Then the mean Persistence Flamelet, μ, is defined as Theorem 3.2 (Consistency of the Bootstrap). Let G = {g σ ,t , σ ∈ [0, 1], y ∈ [0, Y]} be the class of functions defining the Persistence Flamelet, that is, for any Persistence Vineyard D, the function g σ ,t is given as If the family F = {f σ , σ ∈ [0, 1]} characterizing the Flamelet is Lipschitz in its scale argument, that is then the class G is Donsker, and the bootstrap procedure defined in Algorithm 1 is valid.
Corollary 3.2.1. The band CB n obtained from Algorithm 1 is an asymptotic confidence band for the Persistence Flamelet at confidence level 1 − α.
As any topological summary, the Persistence Flamelet may be affected by noise and highlight spurious features, and confidence bands may be used to assess whether or not a topological feature is statistically significant. If we take the empty Flamelet, that is, the constant Flamelet located on 0, to represent the case of Compute the bootstrapped sample mean no significant topological structure, the procedure illustrated before can in fact be used to define a confidence band on the noise component of the topological structure. Comparing this band with the observed Flamelet allows us to detect which parts (if any) of the Flamelet are to be considered noisy artifacts resulting from the procedure adopted to build the topological summaries, or significantly different than zero and can be thus considered topological signal. Alternatively, in order to check whether or not a feature is to be judged as relevant, we suggest to exploit the fact that, in practice, the Persistence Flamelet is computed over a finite set of values S = {σ 1 , . . . , σ J } of the scale parameter σ defining the family F. For each of the Persistence Diagrams corresponding to elements of S, it is possible to define a simplification scheme that retains only significant topological feature filtering out the noise. One strategy to implement this idea is to reshape the Diagram so that the persistence of its "relevant" features is maximized, as suggested in Atienza, Gonzalez-Diaz, and Rucco (2019). Here, instead, in order to preserve the original structure of the Diagram, we adopt the approach introduced in Fasy et al. (2014), which consists in building a confidence band around the diagonal of the Diagram via bootstrap: points of the Diagram that lay within this band are not significantly different from the diagonal itself, and can thus be removed from the Diagram. The Persistence Flamelet built using Diagrams "denoised" in this fashion, then contains only significant topological features; in the following we will refer to this technique as "Diagram cleaning. " It is worth noticing that since this procedure requires building multiple confidence bands, possibly using the same data, it may be necessary to adopt a multiplicity correction in order to ensure the proper coverage level. With respect to the denoising based on the Confidence Band for the Flamelets, this is more of a global approach, in the sense that it does not depend on the choice of a functional representation of the Persistence Diagram, nor on the tuning parameter k and p of, respectively, Landscape and Silhouettes; nevertheless, this procedure is not explicitly tailored for the Persistence Flamelet and may be affected by the choice of the grid S.
It is worth mentioning that Confidence Bands are one way of simplifying the Persistence Diagrams (or Persistence Flamelets) "a posteriori. " As shown in (Chintakunta et al. 2015), however, it is also possible to directly build the Diagram so that the information it contains is "significant. " The construction of the Persistence Flamelet is agnostic with respect to the way the family of Diagrams comprising the Persistence Vineyard are computed, hence, both strategies are viable to ensure that the topological noise it may contain is minimized.

Applications
In this last section, we illustrate how the Persistence Flamelets, so far only a rather abstract object, may be encountered and fruitfully used as both an inferential and exploratory tool.

Time Series / EEG Dynamic Point-Clouds
The easiest way to understand the need for topological characterization of a continuously varying space is to consider the case where the scale parameter σ is time, t. The Persistence Flamelets allows in fact for a characterization of a time-varying system F = {f t , t ∈ [0, 1]} in terms of its topology (Munch 2013;Munch et al. 2015) by allowing us to simultaneously study the shape of any time-dependent function f t and how it evolves with time t. Again, although this framework is general enough to cover any arbitrary function f t , as long as it is continuous with respect to time, we are especially interested in the case where f t is a function of data.
Assume that at each time t we observe a sample X(t) = {X 1 (t), . . . , X k (t)} drawn from some distribution P t . The Persistence Flamelet built on distance functions or kernel density estimators estimate the topology of the whole continuous-time generating process {P t , t ∈ [0, 1]}. The trace of the sample in the time interval {X(t), t ∈ [0, 1]}, usually called Dynamic Point Cloud, is just a high dimensional time series, hence, Persistence Flamelets can be exploited as a tool to extract a new type of insights on time series of arbitrarily high dimension.
In the special case of dynamic point-clouds, the stability result of Theorem 3.1 can be restated as follows.
Corollary 4.0.1. Let {X(t), Y(t)} with t ∈ (0, 1) two continuous dynamic point clouds, X and Y their corresponding Persistence Flamelets, then: where I H (X, Y) = 1 0 d H X(t), Y(t) dt is the Integrated Hausdorff distance for dynamic point-clouds, as defined in Munch (2013). Figure 4 shows two Persistence Flamelets built from electroencephalography (EEG) tracks, freely available on the UCI Machine Learning Repository. EEG are electric impulses recorded at a very high frequency (256 Hz) through multiple electrodes (64 in this study), located in different areas of the skull. At each time t, topological features represent dependency structure in the signal, which is relevant information per se, but since it is also important to assess whether or not these connection persist in time, this kind of data fits perfectly in our framework.
We compare the EEGs of 10 alcoholic and 10 control patient, all subject to the same stimulus. For each of them we have five trials of one second; EEG are typically very noisy hence, we average them across repetition before computing their topological summaries. Features of both dimension 1 and dimension 0 seem to be concentrated in the same area of the Persistence Diagrams, we thus, build the Persistence Flamelet using as a base summary the Persistence Silhouette, which takes into account all points in the Diagram, as well as the Persistence Landscapes, for which the selection of the order of the k-max is nontrivial in this application.
The Persistence Flamelet highlights a topological difference in behavior of the two groups; as shown in Figure 4 the signal from the control group, in fact, appears to be characterized by a few persistent features. In the alcoholic patient instead there is less structure; the number of features is higher than in the control patient, yet they all have a smaller persistence, and could thus be interpreted as noise. In order to assess whether there is statistical substance to this claim, we perform a two sample test to compare the average silhouettes of the two groups. As the sample size is rather small, exploiting the asymptotic normality of the average Flamelet is not advisable, hence, we resort to the permutation test detailed in Algorithm 2. Let m be the

Algorithm 2: Two-Sample Permutation Tests for Persistence Flamelets
Input:  i (σ , y))|; 9 end Output: θ (1,2) ) ; maximum number of elements per Diagram in D and J be the number of elements of the finite grid used to in practice to compute the Persistence Flamelet (S = {σ 1 , . . . , σ J }). In both Algorithms 2 and 1 the most intensive step, the computation of θ i , is O(mJ), as follows naturally from the results of Bubenik and Dłotko (2017). If we wanted to compare in the same fashion two Persistence Vineyards, however, the computational cost would be a considerably larger O(m 1.5 log(m)J) (Kerber, Morozov, and Nigmetov 2017). As the Flamelet allows to isolate topological invariants of different dimensions, it can be used to retrieve qualitative differences on the topology, as well as quantitative ones. In addition to assessing the presence of a topological discrepancy, in fact, it is also informative on which kind of features are responsible for it. Results shown in Table 1, in fact, highlight that the distinction between the two groups does not depend on the sole presence of connected components (i.e., electrodes that are in some sense "close") but it is determined by the nature of the association within these components, which is formalized by whether or not these sets of electrodes form cyclical structures. We consider two different orders of Landscapes for each dimension to show that the pattern we observe depend on the dimension and not on the order of the landscape we consider. We take the two largest order k of the landscape, which in the case of dimension 0 corresponds to k = 2, 3, as k = 1 is a constant triangle equal to the diameter of the data and it is not discriminative. Interestingly, this behavior is robust with respect to the base function used to build the Flamelet, as inferential conclusions when considering the Persistence Landscape (with different choices of k) are coherent with those obtained using the Persistence Silhouette.

Data Smoothing / Kernel Density Estimation
In the statistical literature, scale-space ideas have been especially popular in the context of data smoothing. In its broader definition, data smoothing is a family of methods aimed at recovering some structure in the data. Depending on their scale, however, smoothing methods may enhance noise or neglect relevant features, so that it is crucial to understand the impact of the smoothing level on the smoothed object. The Persistence Flamelet can be used to summarize and evaluate the evolution of the whole smoothing process, by tracking (and visualizing) the appearance and disappearance of feature of arbitrary dimension.
Among all the smoothing methods, we focus on Kernel Density Estimation (KDE) (Scott 2015), for which the role of topological features (especially that of 0th dimensional Homology Groups) is a well-established problem (Chaudhuri and Marron 1999). Features affected by the smoothing process such as local peaks (or, in topological terms, 0th dimensional Homology Groups), are in fact especially meaningful in the case of KDE; local modes of a density and their basin of attraction represent for example one way of defining clusters (Comaniciu and Meer 2002). Persistence Flamelets allows us to explore also higher dimensional features, such as cycles or voids, which have been noticeably neglected.
Given a sample {X 1 . . . , X n }, drawn from some smooth density p, a Kernel Density Estimator p h is defined as is a scaled kernel, h is the bandwidth parameter and K(·), the kernel, is a nonnegative, symmetric function that integrates to 1.
While any kernel function K(·) may be used without compromising the performance of the estimator, the bandwidth parameter represent the level of smoothing and needs to be finely tuned. In the scale-space approach, given some bounded range of bandwidths H ⊂ R + , all the estimators p h are simultaneously considered, so that the object of interest becomes the family of smooths F = { p h : h ∈ H}. Since K h is continuous with respect to h by definition, it is immediate to see that the Persistence Flamelets can be used to investigate and characterize F.
In the exploration framework, the first attempt at investigating the relation between the bandwidth of a kernel density estimator and its topology is SiZer (Chaudhuri and Marron 1999). Roughly speaking, given a sample {X 1 , . . . , X n } drawn from a univariate density p, SiZer (SIgnificant ZERo crossings of derivatives) is a map showing where in space, x, and scale, h, the kernel density estimator p h (x) is significantly increasing or decreasing. Since local peaks of a curve can be thought of as points where its derivative changes sign, the basic idea of SiZer is assess where this change happens, by testing whether the sign of the derivative p h (x) for each couple of values (x, h) is positive or negative. Values (x, h) corresponding to significantly positive derivatives are shown in red and significantly negative are shown in black, as in Figure 7.
SiZer is intrinsically one-dimensional and even though it has been extended to two-dimensional densities, especially in the context of image analysis, (Godtliebsen, Marron, and Chaudhuri 2004) the features it hunts for are always and only local modes. The Persistence Flamelet provides a further extension in two different directions: • it can be used to investigate topological features of any dimension, rather than only feature of dimension 0, that is, local peaks; • it does not depend on the dimension of the data and can thus be used to investigate kernel densities for very high dimensional data.
Like SiZer, the Persistence Flamelet is able to assess the significance of each peak, by exploiting the tools shown in Section 3.1, but, in addition, it also provides a measure of the relevance of each feature: its persistence.

Bandwidth Exploration
We now show two real-data applications. In the first univariate one we quickly compare the Persistence Flamelet with SiZer and show that, when both are available they yield similar insights. The second is a bivariate example, which motivates investigating higher dimensional features and highlights the potential of the Persistence Flamelet when other tools are not available.
Eartquakes I / Depth. In our first example we consider a classical dataset in kernel density estimation, the depth of the 512 earthquakes beneath the Mt. St. Helens volcano in the months before the eruption of 1982 (Scott 2015). Figure 5 shows the first and the second Persistence Flameles for the zero-dimensional topological feature of the density estimator p built with the Gaussian Kernel: In order to retain only significant topological information, before computing the Flamelets we built confidence bands on the diagonal of each of the Diagrams used to define it, and features whose persistence was smaller than the upper limit of such bands, were then discarded from the Diagram ( Figure 6). As the same data are used to build the Diagrams corresponding to the different bandwidths, we adopt a Bonferroni correction on the confidence level to ensure proper coverage over the whole Flamelet.
The first Persistence Flamelet consists of only one peak, representing the global maximum, which, as we can expect, always persists. This is not very informative, and when analyzing dimension 0 topological features, it is thus, advisable to consider second Persistence Flamelet, which represents the most relevant local peaks. In this case we can see that the two peaks appearing in the second Persistence Flamelet correspond to the two points in the Diagram (which in turn correspond to the two bumps we can see in the KDE in Figure 7). As we can see from Figure 5, the second Persistence Flamelet behaves differently than first Persistence Flamelet; when the bandwidth grows in fact, the two secondary peaks are smoothed away. Figure 7 shows the comparison with SiZer, and it is easy to see that the two approaches lead to very similar conclusions. In order to make this comparison possible, we highlight here that we cleaned the Diagrams following the procedure introduced in Section 3.1 and we removed from the Persistence Diagrams the noisy features falling inside the confidence band before computing the Flamelet, so that the topological features displayed on it are all statistically significant. The three peaks appear for h = 0.05, then one of them disappear at around h = 0.25, one other around h = 0.35 and, the last one always survives (in the given range of bandwidths).
Earthquakes II / Locations. For our second example we consider earthquake data coming from the USG catalog. Our sample consists of the locations, expressed in latitude and longitude, of 6500 events with magnitude higher than 5, taking place between June 2013 and June 2017. The two-dimensional density p generating the data {X 1 , . . . , X n } can still be estimated using the kernel density estimator with a Gaussian Kernel: Notice that in the multivariate case, the bandwidth is not a scalar but rather a matrix H, however, we chose an isotropic Gaussian Kernel, which corresponds to imposing a spherical structure to the covariance matrix so that the kernel density estimator expression can be simplified as follows:     Earthquakes are concentrated around circular structures, also known as plates. According to Plate Tectonics, in fact, the Earth's lithosphere is broken into seven main plates, plus a number of minor ones. Since earthquakes are caused by the movements of neighboring plates, the density p naturally inherits the Earth's plates structure. In terms of topology, plates can be thought as loops, or dimension 1 Homology Groups. The dimension 1 Persistence Flamelet of the kernel density estimator p can be employed to assess whether or not kernel density estimators are able to recover these loops. The Persistence Flamelet shown in Figure 8 presents seven crests, each of them representing one persistent loop in F; this seems to suggest that at, different resolution, the kernel density estimator is able to recover all the seven main plates. Notice that as opposed to the 0th dimensional case, where there is always one feature, the global maximum, dominating all the others, when analyzing loops we can limit our analysis to the first Persistence Flamelet.
In this example specifically, the Persistence Flamelet shows that there is one loop that persists noticeably more than all the others; as persistence is a measure of the importance of a feature, this suggests that there is one plate which is more neatly detected than all others. This highly persistent loop closely resembles the contour of the Philippine plate, which is not surprising, since more than 26% of the seismic activity in the given time interval was concentrated in the area between Philippine and Japan.

Bandwidth Selection
We conclude by showing that, even though the Persistence Flamelet is intrinsically related to the scale-space principle, according to which no bandwidth candidate is taken to be more informative than the others, our topological summary also plays a role in the context of bandwidth selection, and we can use it to heuristically choose a "topologically aware" bandwidth.
In the literature on bandwidth selection the topological structure of the KDE is usually ignored (with the exception of local modes Genovese et al. 2016). However, as standard approaches to the task (most noticeably cross-validation) have proven to fail when the density is concentrated around lower dimensional structures (Genovese et al. 2016), we believe that taking into account topological invariants whose dimension is larger than 0 (local modes) yet smaller than the ambient space, may be beneficial.
Intuitively, since persistence can be interpreted as a measure of the importance of each feature, bandwidths corresponding to peaks in the Persistence Flamelet result in estimators that  highlight the most prominent features in the density. By selecting the value of h that maximize the Persistence Flamelet, the topologically-aware h TA , we are forcing the density estimator to retain the most relevant topological treats.
Let us consider again the Earthquake II example. By choosing the value of h that maximize the Persistence Flamelet, we are forcing the density estimator to emphasize the most persistent loop. The kernel density estimator p h TA , shown in Figure 9, is in fact concentrated around the Philippine plate, as we could expect. To understand why such a topologically-aware bandwidth selection heuristic may be useful, let us compare it with more established methods for bandwidth selection: Silverman's Normal Rule and a Plug-in bandwidth selection criterion. We intentionally ignore cross-validation methods because, as shown in Genovese et al. (2016), they are known to display poor behavior in this setting. When the density is singular, which is the case when the support of the distribution is concentrated on an object whose dimension is smaller than the ambient dimension, cross-validation will in fact select 0 as optimal bandwidth, and it is thus not informative.
The first alternative we consider is an extension of Silverman Normal Rule, one of the most famous "rule of thumb" for bandwidth selection, to the case of densities with singular features, as detailed in Chacón, Duong, and Wand (2011). More specifically, given a sample {X 1 , . . . , X n } ∈ R D , from some distribution P, the optimal bandwidth h for recovering the d-dimensional where s = D −1 D j=i s 2 j and s 2 j is the variance of the jth variable. Despite the fact that we set d = 1, in order to take into account the loop structure, the density estimator, shown in Figure 10, does not seem to recover any of the plates at all.
The second approach we consider is a plug-in bandwidth estimator H PI , obtained by minimizing the AMISE (Asymptotic Mean Integrated Square Error) with respect to the bandwidth h; details are given in Chacón, Duong, and Wand (2011). Since limiting the case of scalar bandwidths, as we did until here, may seem too restrictive, in this final example we relax the hypothesis of spherical covariance and do not impose any structure on the bandwidth matrix H. The additional complexity of the estimator does not however, result in a better estimation: as we can see in Figure 11, the plates structure of the true density is still not recognizable.

Conclusion and Future Developments
In this work we introduced a new multiscale topological summary, the Persistence Flamelet, we characterized it in a probabilistic framework and we proved that it retains the topological information contained in the original scale spaces, and thus it is a meaningful topological summary. The very different nature of the two examples we considered (dynamic point clouds and kernel smoothers) show the versatility of this representation, which allows to account for the presence of multiple scales in the data and also in the tools we typically use to analyze them, and illustrate how the Persistence Flamelet allows to effectively summarize nonlinear dependencies within an object.
So far we exclusively focused on comparing objects of the same kind, that is, Flamelets built on the same type of data, such as EEG recordings. One of the main features of the Persistence Flamelets, however, is that they allow to compare data structures of potentially different types (e.g., networks and functional data), hence, providing a unified framework for analyzing complex data. In the future we plan to exploit this property to match different sources of neuroimaging data, more specifically EEG recordings, which are functional objects, and functional/structural networks obtained from fMRI data. We also wish to extend the use of the Flamelet to the supervised setting, especially for problems related to neuroimaging data, and see whether this topological summary is informative enough to help in predicting different neurological conditions. Finally, we plan to investigate further the properties of Persistence Flamelets-related heuristics for bandwidth selection. We have already seen how picking the bandwidth that maximizes the "persistency" seems to be promising. We plan to investigate it even further also considering the use of the Persistence Flamelet to select a bandwidth that reflects some previous knowledge on the topology of the object of interest. In addition, since features that appear at many different resolution can be thought as the most relevant ones, it may also be interesting to explore persistence in bandwidth ranges as an additional measure of relevance for topological traits.

Supplementary Materials
The supplementary material contains a concise introduction to probability theory in Banach spaces, the proof of Theorem 2.2 and a simulation study to validate the use of the Persistence Flamelet.