Greedy Segmentation for a Functional Data Sequence

Abstract We present a new approach known as greedy segmentation (GS) to identify multiple changepoints for a functional data sequence. The proposed multiple changepoint detection criterion links detectability with the projection onto a suitably chosen subspace and the changepoint locations. The changepoint estimator identifies the true changepoints for any predetermined number of changepoint candidates, either over-reporting or under-reporting. This theoretical finding supports the proposed GS estimator, which can be efficiently obtained in a greedy manner. The GS estimator’s consistency holds without being restricted to the conventional at most one changepoint condition, and it is robust to the relative positions of the changepoints. Based on the GS estimator, the test statistic’s asymptotic distribution leads to the novel GS algorithm, which identifies the number and locations of changepoints. Using intensive simulation studies, we compare the finite sample performance of the GS approach with other competing methods. We also apply our method to temporal changepoint detection in weather datasets.


Introduction
Changepoint analysis is a long-standing and active research area in statistics, and its related problems have been studied intensively. However, discussions of changepoint detection in the functional data domain are relatively rare, and they only started in 2009. Scholars have concentrated on the offline setting and have primarily focused on single changepoint detection under the assumption that there is at most one changepoint (AMOC) in the data sequence. To the best of our knowledge, changepoint detection for functional data was pioneered by Berkes et al. (2009) and Aue et al. (2009). The authors in Berkes et al. (2009) derived the null distribution of a CUSUM-based statistic of a single change in mean functions against its alternative for independent random functions. In a similar setting, Aue et al. (2009) studied the asymptotic behavior of a changepoint estimator.
For functional data sequences with weakly dependent structures, Hörmann and Kokoszka (2010) and Aston and Kirch (2012) modified the test statistic using the long-run covariance of functional principal component (FPC) scores. In addition, Aston and Kirch (2012) considered the epidemic change scenario in which the mean function returns to the original level after the first changepoint. Instead of using long-run covariance of FPC scores, Hörmann, Kidziński, and Hallin (2015) proposed the use of dynamic FPC scores. The authors decomposed the spectral density operator at different frequencies, thereby processing a dependent type of functional data. To detect a change for spatially indexed panels of functions, Gromenko, Kokoszka, and Reimherr (2017) considered the weighted CUSUM-based statistic at different locations, which reduced to that of Berkes et al. (2009)  contrast to the aforementioned studies, which used FPC analysis for dimension reduction, Aue, Rice, and Sönmez (2018) suggested a fully functional CUSUM-based statistic with no dimension reduction procedure in order to minimize information loss in the detection of mean function shifts.
In the context of multiple changepoint detection, Chiou, Chen, and Hsing (2019) derived a criterion for detecting changepoints in mean functions through segmentation of a functional data sequence. The authors proposed a two-stage procedure comprising dynamic segmentation (DS) and backward elimination (BE). Specifically, in the first stage, DS iteratively identifies the changepoints with a predetermined number K using the local AMOC idea. This process requires an initial segmentation with K that must exceed the correct number M to obtain consistent estimators. In the second stage, BE removes the statistically insignificant changepoint candidates.
The idea of segmentation has appeared in many changepoint detection systems for conventional univariate or multivariate data. Such systems use various criteria, including maximum likelihood and least squares. They often aim to design a fast algorithm to achieve optimal segmentation and determine the number of changepoints. Auger and Lawrence (1989) proposed the segment neighborhood method, which uses dynamic programming to identify the optimal segmentation for a given number of changepoints K. The number of changepoints is then determined separately using selection criteria such as the Schwarz criterion (Yao 1988;Braun, Braun, and Müller 2000;Zou et al. 2014) and its variant (Lavielle 2005). Several methods have been developed to accelerate the dynamic programming algorithm, including Maidstone et al. (2017) and Gedikli et al. (2010). Additionally, various penalty terms can be incorporated to search for the optimal segmentation, which include the optimal partitioning method of Jackson et al. (2005), the pruning method PELT of Killick, Fearnhead, and Eckley (2012), and the L 1 -penalty method of Harchaoui and Lévy-Leduc (2010).
Binary segmentation (BS), in which single changepoint detection is performed iteratively within a subsegment, is another popular approach for multiple changepoint detection. BS is straightforward to implement and computationally inexpensive. However, BS systems may perform poorly because they violate the AMOC assumption when they are applied more generally (i.e., moving from the detection of a single changepoint to multiple changepoints). Therefore, variants of the BS algorithm, including circular binary segmentation (Olshen et al. 2004) and wild binary segmentation (Fryzlewicz 2014), have been proposed to address the complications that arise from the AMOC assumption. Although these methods can be extended to the functional data domain, further research is required. Additionally, it is necessary to justify each method's consistency properties.
In this study, we propose a new method for the detection of multiple changepoints for a functional data sequence. We further investigate the theoretical properties of the changepoint estimator proposed in Chiou, Chen, and Hsing (2019) for cases with under-reported numbers of changepoints. The theoretical outcome inspires a novel greedy segmentation (GS) estimator, which can be efficiently obtained in a greedy manner. Furthermore, based on the GS estimator, we derive the test statistic's asymptotic distribution, which leads to an algorithm that determines the number and locations of changepoints.
The theoretical highlights of our method, as well as its advantages over existing methods, are summarized as follows: (a) the changepoint estimator's consistency property holds even when the predetermined number of changepoints is underreported relative to the true number of changepoints; (b) the GS approach, along with the theoretical results, does not require any conventional AMOC and its related assumptions; (c) the GS estimator is obtained in a greedy manner and requires neither initial segmentation nor a predetermined number of changepoints. Due to this, the procedure is easy to implement and as efficient as the BS algorithm concerning the linear computational order while retaining the consistency property; (d) the test statistic's asymptotic distribution, based on the GS estimators, facilitates theoretical justification for the significance of changes in a functional data sequence. It also assists in determining the correct number of changepoints; and (e) the approach unravels a detectability condition in connection with the projected subspaces of the changepoint criterion and the changepoint locations.
The remainder of this article is organized as follows. In Section 2, we introduce the multiple changepoint model for a functional data sequence and elucidate the changepoint estimator's theoretical properties, including consistency, subspace, and detectability. In Section 3, we propose the GS estimator and the test statistic's asymptotic distribution based on the GS estimators, and we state the novel GS algorithm for multiple changepoint detection. Numerical studies, including simulations, comparisons, and data applications, are presented in Sections 4 and 5. Technical details of the theorems are compiled in the Appendix A.1-A.4, and the proof of the lemma is provided in the supplementary material.

Functional Changepoint Model and Changepoint Estimator
Let {X i , i = 1, . . . , N} be an observed sequence of random functions, where X i = X i (t) ∈ L 2 (T ) is a square integrable random function over a closed interval T , that is, E T (X i (t)) 2 dt < ∞. Each X i is associated with a location variable. We assume that the location variable is equally spaced within the interval (0, 1] such that each X i is associated with the location at i/N.

Changepoint Estimator and Its Properties
Consider an arbitrary K-segmentation θ K as an estimate of θ * M . When K = 0, set θ K = ∅, and when K > 0, θ K = {θ 1 , . . . , θ K }, where 0 < θ 1 < · · · < θ K < 1 with the boundary points θ 0 = 0 and θ K+1 = 1 as is conventional. By analogy with (2), we define the kth segment mean function conditional on θ K , for k = 1, . . . , K + 1, as follows: In particular, ν m (t|θ * M ) = ν * m (t). In turn, the covariance function of X i (t) conditional on θ K and the sequence {X i (t), i = 1, . . . , N} can be expressed in the following way: In particular, c(s, t|θ * M ) = c(s, t). Chiou, Chen, and Hsing (2019) proved the following: where the bias term B(s, t|θ K ) is detailed as follows. Let m k be the number of changepoints in the subinterval (θ k−1 , θ k ) and Then B(s, t|θ K ) can be explicitly expressed as follows: Most importantly, (4) implies that B(s, t|θ K ) = 0 if and only if m k = 0 for all k = 1, . . . K + 1. In other words, θ * M ⊂ θ K . Let C θ K and C be the integral operators with kernels c(s, t|θ K ) and c(s, t), respectively. Given a p-dimensional subspace V p ⊂ L 2 (T ), we define where e 1 , . . . , e p is an arbitrary set of orthonormal basis functions of V p . If it holds that (C2) there exists a fixed 0 < p < ∞ such that, for all 1 ≤ m ≤ M, ν * m+1 − ν * m / ∈ V ⊥ p , then (3) and (4), together with the fact that both c(s, t) and B(s, t|θ K ) are nonnegative definite, suggest that T p (θ K |V p ) attains its minimum p =1 Ce , e if and only if θ * M ⊂ θ K . Consequently, T p (θ K |V p ) serves as a valid criterion to detect changepoints when K ≥ M.
Condition (C2) is a common assumption used to detect functional changepoints. See, for example, Berkes et al. (2009), Aue et al. (2009), Hörmann and Kokoszka (2010, and Aston and Kirch (2012). For a particular choice of V p , let us denote the spectral decomposition of c(s, t|θ 0 ) by ∞ =1 λ φ (s)φ (t) with the eigenvalues λ in a non-ascending order. In Theorem 2, we show that choosing V p as span{φ 1 , . . . , φ p } has the advantage of satisfying (C2). For notational convenience, we denote Under condition (C3), we further show that T p (θ K ) serves as a valid changepoint criterion not only when K ≥ M but also when K < M.
In contrast to (C2) guaranteeing (C3) in general, the assumption of (C3) is stipulated merely for the technical reason of preventing (C2) from the special case for the coefficients 0 < β j < 1. For the estimation of T p (θ K ) with unknown C θ K and {φ }, we further assume the following: Given θ K , the conditional covariance c(s, t|θ K ) can be estimated byĉ . Assumption (C4) admits a weak dependency structure within the sequence {Y i }. It also ensures thatĉ(s, t|θ K ) is consistent in the Hilbert-Schmidt norm, satisfying for arbitrary θ K . This assumption was adopted, for example, in an application for daily air pollutants in Hörmann and Kokoszka (2010), as well as in the traffic stream along a freeway in Chiou, Chen, and Hsing (2019). Other than the L 4 -m-approximable assumption, Aston and Kirch (2012) used the concept of strong mixing to illustrate the dependency within a functional data sequence.
In particular, recall that c(s, then, given the spectral decomposition ofĉ(s, is a consistent estimator of T p (θ K ) in (5), whereĈ θ K -defined analogously to C θ K -is the integral operator with kernel c(s, t|θ K ). Under (C1)-(C5), we define the multiple changepoint estimator using T N,p (θ K ) for any fixed K > 0 as follows: where K contains all possible K-segmentations. Theorem 1 elucidates the relationship betweenθ K and θ * M for K ≥ M and K < M. The following lemma is essential for proving Theorem 1: The property of strict concavity motivates the following theorem.
Theorem 1 suggests that the consistency property of the changepoint estimatorθ K is robust to the arbitrary value K relative to M, the true number of changepoints. Specifically, the limit ofθ K in probability either covers θ * M as a subset (when K ≥ M) or is itself a subset of θ * M (when K < M). The latter result is somewhat surprising because, when K < M, there must exist some 1 ≤ k ≤ K + 1 such that (θ k−1 , θ k ) contains at least one undetected changepoint and, thus, the conditional segment mean function ν k (t|θ K ) is necessarily inconsistent. However, in this case, the changepoint estimatorθ K still holds the consistency property.

Oracle Subspace and Detectability of Functional Changepoints
Most studies of functional changepoint detection have applied a projection onto a subspace and assumed a similar detectability condition as in (C2). Notably, Aue, Rice, and Sönmez (2018) proposed a fully functional test statistic and developed its asymptotic theory. However, truncation to a finite dimension is still required when computing the critical values of the test statistic. Condition (C2) ensures that the difference between the adjacent segment mean functions of a changepoint does not vanish after subspace projection. In view of this point, we call for every m = 1, . . . , M, which preserves all information about mean changes after projection. An obvious example of an oracle subspace is V ora 0 = span ν * m+1 − ν * m , m = 1, . . . , M , which is contained in all other oracle subspaces.
Nevertheless, the segment mean functions ν * m (t) are latent, which means that identifying the oracle subspace V ora 0 is impractical. Wang and Samworth (2018) also had such considerations in detecting mean changes for a sequence of high dimensional data, and they used sparse singular value decomposition to obtain a projection direction that is close to being an oracle. For functional data, to the best of our knowledge, the literature contains no relevant discussion about how to approximate the oracle basis. Studies have simply treated the detectability assumption as a condition. We now prove that (C2) holds if V p in (5) is selected as the eigenspace of c(s, t|θ 0 ).
, the eigenvalues {λ , ≥ 1} measure the mixture variability involving the randomness of Y i (t) and the mean shifts. Theorem 2 states that if the variation caused by mean shifts exceeds a certain portion of the total mixture variation, then the detectability assumption (C2) is guaranteed. By Theorem 2, we define the "strength" of a changepoint θ * m as γ m γ m+1 ν * m − ν * m+1 2 . In practice, we may focus more on large-strength than small-strength changes, especially when the number of changepoints M is large. In this case, Theorem 2 guarantees that all changes with a strength greater than some minimum level, say 0 , are detectable if the dimension of the eigenspace is defined as In this study, without any prior assumptions regarding the strength of change, we aim to detect as many changepoints as possible and determine the dimension p in (5) by the fraction of variance explained (FVE). Specifically, p = p(δ) = min p : p =1λ / ∞ =1λ > δ for some 0 < δ < 1, wherê λ is a consistent estimator of λ . In our simulation study and application to real data, we set δ as 0.95 or 0.99, which is close to 1, to ensure that the detectability assumption (C2) was likely to hold.

Greedy Segmentation Estimator and Its Properties
The estimatorθ K in (8) considers multiple changepoints simultaneously, globally minimizing T N,p (θ K ) with respect to θ K in K for all possible K-segmentations. However, obtaining multiple changepoints simultaneously is challenging, especially for large values of K. Also, the number of changepoints remains to be determined. In this section, the greedy segmentation (GS) approach is developed.

GS Estimator for Changepoints
The following theorem serves as the basis for the proposed GS estimator, which further leads to the asymptotic results that help to determine the number of changepoints.
Theorem 3. Given the changepoint estimatorθ K in (8) Theorem 3 suggests that conditional onθ K for a given K < M, the additional changepoint estimator identified as the minimizer in (9) converges in probability to one of the underlying changepoints. This suggests that the changepoints can be identified greedily one by one. This greatly reduces computing complexity compared to other optimization algorithms, including dynamic programming and its variants (Gedikli et al. 2010;Maidstone et al. 2017).
For practical and theoretical considerations, we introduce the parameter h to trim a segment that a new changepoint is located in, satisfying the following condition: , be a GS estimator and denote the elements sorted in ascending order asθ [1] < · · · < θ [K] , settingθ [0] = 0 andθ [K+1] = 1. Under condition (C6), we use the following procedure to obtain the GS estimator: (G1) Initialize with K = 0 andθ 0 = ∅. (G2) Given the current GS estimatorθ K , obtain an additional element of the GS estimator bŷ (G4) Set K = K +1 and repeat (G2)-(G4) to obtain the updated GS estimator.
In contrast to the global minimization concept, the GS estimatorθ K searches for one additional changepoint candidate at a time based on the current segmentation. Consequently, it produces a nested sequence of segmentationsθ 1 ⊂θ 2 ⊂ . . . ⊂ θ K . In practice, the GS estimatorθ K+1 in (G2) can be obtained using grid search. However, there are several critical aspects to consider when selecting the parameter h in I k . On the one hand, h must be as small as possible when N → ∞, ensuring that the intervals J K in (G2) contain the remaining changepoints to be detected. On the other hand, it is necessary for h to be bounded away from 0 to prevent the asymptotic distribution in Theorem 4 (ii) from diverging. The next section discusses the influence of h on the asymptotic distribution of the GS estimator-based test statistic.

GS Algorithm
The nested sequence of GS estimators, along with the consistency property shown in Theorem 3, motivates the following development.
Letq α (K) denote the (1−α) quantile of the asymptotic distribution of Q K in (10). The distribution contains and B(x). We estimate the long-run covariance matrix using the kernel esti- (6). The weight function ω u (·) is bounded and symmetric about 0. In this study, we use the Bartlett kernel, ω u (j) = 1 − j /(1 + u) for j ≤ u and ω u (j) = 0 otherwise. To select the bandwidth u, we adopt the strategy of Andrews (1991) under a first-order vector autoregressive model for the projections, yielding a data-driven bandwidth selector that is optimal with the minimum mean squared error. We simulate the Brownian bridges B(x) with a step size of 1/4000 for each x. Finally, we obtain the empirical (1 − α) quantile, denoted bŷ q α (K), based on 10,000 replicate simulations.
Theorem 4 provides a basis for testing the null hypothesis -H 0,K : M ≤ K for any K ≥ 0 -by rejecting it if Q K exceedŝ q α (K) for some α (e.g., α = 0.05). To determine the number of changepoints, we can iteratively perform the hypothesis tests on H 0,K by greedily increasing K until H 0,K is not rejected. We propose the GS algorithm below to obtain the number and locations of the changepoints.
Noticeably, Theorem 4 (ii) indicates that the asymptotic distribution depends on h for K ≥ M. It follows that Q K diverges when h → 0 as N → ∞ for K ≥ M, which is indistinguishable from the case of K < M as in Theorem 4 (i). Therefore, to avoid the latter case it is necessary to introduce a parameter h that is positive and bounded away from 0 in the weighting. As a consequence, the asymptotic distribution of Q K is well-defined in connection with the GS estimator obtained in (G2), which also depends on h. The assumption on h in (C6) is relatively mild under condition (C1), where the number of changepoints M and the locations θ * M are fixed. In practice, a small value of h (e.g., h = 0.01 or h = 0.05) serves as a reasonable threshold value for the minimum distance between the adjacent changepoints. In a simulation study presented in Section 4, we discuss the sensitivity of the selected h to the GS algorithm's performance.

Simulation Setting
The simulation designs under model (1) consider various types of mean shift and numbers of changepoints, along with their locations. To accommodate the L 4 -m-approximable assumption in (C4), we consider the functional autoregressive process for the random function where ζ i follows the AR(1) model ζ i, = ρζ (i−1), + i, . Here, ρ is the autocorrelation coefficient, i, follows the standard normal distribution for each , and (κ , ψ (t)), = 1, . . . , L, are fixed eigenpairs of the c(s, t) = Cov(Y i (s), Y i (t)).
In this study, we set ψ (t) as the Fourier basis. We also have κ = 30 × 0.7 with L = 30, and we specify that ρ = 0, 0.2, and 0.5 for the cases of zero, low, and moderate dependence, respectively, in the functional data sequence. We consider cases from no changepoint (M = 0) up to three changepoints (M = 3), with the corresponding segment mean functions defined in (C1) as ν * , respectively, according to the number of changepoints M.
These segment mean functions are designed to exhibit different types of mean shift. The mean shifts ν * 2 − ν * 1 and ν * 3 − ν * 2 do not rely on specific eigenfunctions, as in general cases; and ν * 4 − ν * 3 is nearly orthogonal to the random functions {Y i }. Given each M > 0, we also consider both evenly and unevenly located changepoints, as shown in Table 1, resulting in a total of 7 different scenarios. For each scenario, we set the sequence length to N = 200 and N = 1000 to indicate small and large sample sizes, respectively. By combining the three levels of dependency (ρ = 0, 0.2, 0.5), we consider a total of 7 × 2 × 3 = 42 different cases.

Effect of h on GS Algorithm
The choice of h is crucial for the GS estimatorθ K , as well as the asymptotic distribution of Q K in the GS algorithm. We conduct a simulation study to evaluate the sensitivity of the practical performance to the choice of h in the GS algorithm. To focus on the influence of h, we consider the independent case ρ = 0 and fix the dimension of ξ i as p = 13, thereby satisfying argmin 1≤p≤30 ( p =1 κ / 30 =1 κ ) > 0.99 . The critical valueq α (K) in (GS2) is obtained by the simulated empirical (1 − α) quantile, assuming that the long-run covariance matrix of ξ i , = diag(κ 1 , . . . , κ 13 ), is known. Here, we take α = 0.05. Table 2 demonstrates the GS algorithm's success rates for different values of h in the scenarios. In this context, success is defined as occurring whenM = M and max 1≤m≤M |θ [m] − θ * m | < 0.01. Overall, we found that the GS estimators performed similarly for various choices of h within the interval [0.01, 0.10], with the largest difference being no greater than 3.2% in the same scenario. Figure 1 shows the 0.95 quantiles (marked with asterisks) of the asymptotic distributions of Q 0 in scenario CP0 for various values of h, which are superimposed on the boxplots of the empirical Q 0 based on 500 replicate simulations. When h increases, the empirical quantile of Q 0 decreases, and so does the simulated distribution; that is, they move in the same direction. Consequently, the test in the GS algorithm is insensitive to the choices of h within the interval [0.01, 0.10], which is supported by the outcome presented in Table 2. To reasonably approximate the desired critical value, the simulated Brownian bridges' resolution should increase as h decreases. In practice, we recommend using h = 0.02 and an approximation of the 0.95 quantile with 10, 000 simulated Brownian bridges at a resolution of 1/4000. For most of the scenarios in our simulation study, these parameters attained the best performance.

Comparing GS Estimator With DS Estimator
As mentioned in Section 3, the estimatorθ K in (8) is the minimizer of T N,p (θ K ) for all possible K-segmentations. Chiou, Chen, and Hsing (2019) proposed the DS algorithm to obtaiñ θ K , which requires K ≥ M. In contrast, the GS estimator does not require a predetermined K. We compare the performance of the proposed GS estimator and the DS estimator. To demonstrate the interplay between M and K, we define the success of a changepoint estimatorθ K by satisfying the following:   Table 3 presents the success rates of the GS and DS estimators. The results were derived from 500 replicate simulations in scenarios CP2eve, CP2une, CP3eve, and CP3une, where different values of K were used ranging from 2 to 5. Scenario CP1 is omitted because the GS and DS results when K = 1 are identical. In the evenly located scenarios CP2eve and CP3eve, both the GS and DS estimators performed well in general, especially for the large sample size (i.e., N = 1000). In particular, given the equally spaced initial segmentation, the success rate of the DS estimator improved when K = M. However, the GS estimator outperformed the DS estimator in the unevenly located scenarios CP2une and CP3une when K ≥ M. This can be attributed to the fact that the local AMOC assumption fails. Although increasing K may elevate the probability of satisfying the local AMOC assumption, such an increase also causes the problem of small sample sizes due to the shorter lengths of the subintervals I (0) k . By contrast, the GS estimator does not require AMOCrelated assumptions and uses the full sample information, which leads to a significant improvement in success rate.

Comparing GS Algorithm With BS methods
givenφ (t) as defined in (7). We evaluate the performance of the GS algorithm against the binary segmentation (BS) approaches: first, BScus, the BS algorithm that uses the CUSUMbased test statistic max 1≤k≤N {U N (k)}, where U N (k) = S N (k/N) ˆ −1 S N (k/N); and second, BSwcus, the BS algorithm that uses the weighted CUSUM-based test statistic Although the BScus algorithm has been the mainstream approach for detecting a single changepoint for functional data (Berkes et al. 2009;Hörmann and Kokoszka 2010;Aston andKirch 2012), Horváth, Miller, andRice (2020) demonstrated that detection power using the unweighted CUSUM-based statistic may be influenced by changepoint positions, which motivates the BSwcus. On this basis, we adopt the same trimming scheme as the GS algorithm for BSwcus in the asymptotic distribution, Table 3. Success rate comparison of GS and DS estimators based on 500 replicate simulations.  x) B (x)B(x) for a given 0 < h < 0.5. For both methods, we set the trimming parameter h = 0.02, which leads to the best performance overall.
Different ideas underpin the BS and GS algorithms. BS recursively conducts hypothesis testing to estimate a changepoint within each subinterval, as determined based on the current changepoint estimates. Thus, only samples in the particular subinterval are used to estimate the related model components. Additionally, although both the BSwcus and GS methods use the same weighting scheme, they operate the longrun covariance matrix differently, which affects the algorithms. The long-run covariance matrix and its eigenvalues λ must be estimated, and these estimates are denoted bŷ andλ , respectively. BSwcus inserts the inverse −1 into the CUSUM-based test statistic, which means that its asymptotic distribution is pivotal. In contrast, the GS test statistic Q K is free of , but its asymptotic distribution involves . Due to this difference, and to ensure a fair comparison, we additionally consider the modified BS method, BS * wcus , by replacing U N (k) with U * N (k) = S N (k/N) S N (k/N) as the test statistic, where the (1 − α) quantiles of the corresponding asymptotic distributions are obtained in a similar way to the GS algorithm. Table 4 compares the success rates for the methods, where the success rate is defined as it was in Section 4.2 (i.e., an estimated set of changepointsθM for θ * M is successful ifM = M and max 1≤m≤M |θ [m] − θ * m | < 0.01). In scenario CP0, both BScus and BSwcus achieved high success rates. However, given the 0.05 nominal significance level, BScus and BSwcus were rather conservative concerning the Type I error rates (1− success rates), while the GS and BS * wcus Type I error rates were always closer to 0.05.
By comparing the relative performances of BScus and BSwcus, it is evident that adding the weighting scheme to BS significantly improves the success rate. Even for the case of a single changepoint, as in CP1, BScus deteriorated in CP1une compared to those in CP1eve, indicating that BScus favors a changepoint in the middle of the sequence but loses power when a changepoint is near the segment boundaries. In contrast, BSwcus is not sensitive to the changepoint locations and attains similar performance levels in both scenarios. In the multiple changepoint cases CP2 and CP3, the performance of BScus was very low and unstable in general, and BSwcus performed poorly when ρ = 0.5. Notably, the success rate of BSwcus increased when N = 1000. The comparison demonstrates the advantage of the weighting scheme when using the CUSUM-based test statistic for multiple changepoint detection.
The results show that BS * wcus generally outperformed BSwcus, especially for small sample sizes or multiple changepoints. The disparity in performance is attributed to insertinĝ −1 into the test statistic for BSwcus, in contrast to arranginĝ in the asymptotic distribution for BS * wcus . When the number of FPCs is considerable, given that we set FVE to 0.99 in the simulation study, the FPC score vector may contain an element    1987 1808, 1850, 1926, 1988 1808,1926,1988 9 1896, 1987 1850, 1926, 1988 1808,1926,1988 10 1896, 1987 1850, 1926, 1988 1926,1988 11 1896, 1987 1850, 1926, 1988 1926,1988 with variance closing to zero, leading to nearly singularˆ and unsatisfactory results for BSwcus. Regarding the comparison between GS and BS * wcus , both applying the same weighting scheme, GS has the advantage of using every sample in the entire interval to estimate . In contrast, BS * wcus takes only samples within a subinterval determined by the previous step. This was reflected in the relative performance of GS to BS * wcus in CP2une, CP3eve, and CP3une when N = 200. When the sequence length increased to N = 1000, such an advantage was not obvious, and GS and BS * wcus were in close competition.
It is reasonable to conclude from this simulation study that (i) the reliability of the BS-based methods increases when the weighting scheme concerning changepoint locations is incorporated (BSwcus and BS * wcus vs. BScus); (ii) when using the inverse ofˆ in the conventional BS-based approach (BScus and BSwcus), it is important to consider the singularity problem if the number of components is large; (iii) GS and BS * wcus , sharing the same weighting scheme, outperform the other BSbased methods; and (iv) when the sample size is small or when the number of changepoints is considerable, GS outperforms BS * wcus in terms of estimating the covariance.

Central England Temperature Data
We apply the proposed GS algorithm to the central England temperature (CET) data (Parker, Legg, and Folland 1992) available at www.metoffice.gov.uk/hadobs/hadcet/. The dataset contains average daily temperatures in central England from 1780 to 2019, which forms a functional data sequence of length N = 240 with 365 measurements in each function. Compared to Berkes et al. (2009), we also presmoothed the data using 12 Bspline basis functions before applying the GS algorithm. Table 5 presents the detected changepoints of the GS algorithm in comparison with BScus and BSwcus under different dimensions p in (5). It is worth noting that p = 8, 9, 10, 11 correspond to the FVE levels δ = 0.8, 0.85, 0.9, 0.95, respectively, as mentioned in Section 2.3. The parameter settings for both methods are the same as those in Section 4. The outcome of BScus with p = 8 is consistent with the finding of Berkes et al. (2009) both in terms of changepoint number and location. However, using BScus and BSwcus, the number of changepoints decreased with an increase in dimension p, which was not expected given that the detectability condition (C2) is more likely to hold for a larger p. In contrast, the GS results were stable, where the changepoints remained the same across different values of p. In Figure 2, the GS segment mean functions show a more recognizable increasing trend over the entire year compared to BScus and BSwcus, with an especially distinguishable pattern in January.

Australian Temperature Data
The second application of the proposed GS algorithm focuses on Australian temperature (AUT) data obtained from eight weather stations, which are available at the Australian Bureau of Meteorology (www.bom.gov.au). Unlike the CET data, these data comprise daily minimum temperatures over different time spans. Aue, Rice, and Sönmez (2018) also used this dataset to detect a single changepoint by applying their fully functional CUSUM-based statistic,   1935, 19721962(1952, 1966) Gunnedah Pool 1876-2011-1985(1935, 1992) Hobart (Ellerslie Road) 1882-201119681966(1957, 1969) Robe Comparison 1884-20111891, 19831981(1954, 1985  whereX(t) = N i=1 X i (t) /N and · is the L 2 norm. Before applying the GS algorithm, we first excluded data with missing values of more than 1/3 in a year, and then we pre-smoothed the data using 21 Fourier basis functions, as in Aue, Rice, and Sönmez (2018). Table 6 lists the estimated changepoints and the results of Aue, Rice, and Sönmez (2018), including the confidence intervals. For comparison with the fully functional CUSUM-based test, we set the FVE level δ as 0.99. The other parameters were identical to those used in Section 4.
For the eight stations, GS detected no changepoints and a single changepoint at three stations, along with two changepoints at the remaining two stations. Among the five stations for which both methods detected at least one changepoint, one of the GS changepointsθM is close toθ CUS , with the exceptions of the stations at Cape Otway Lighthouse and Gayndah Post Office. After closely examining the temperature profiles between 1864 and 2012 at Cape Otway Lighthouse, we found that the 1866 profile is significantly lower than the others, as shown in Figure 3. We discuss the results at Gayndah Post Office in more detail.
In Figure 4, the GS segment mean functions exhibit an increasing trend over the calendar year, and the segment mean function in 1936-1972 is relatively close to the overall mean function,X(t). More importantly, the values T CUS N,ff (θ ) show a significant trend that increases before 1935 and decreases after 1972, along with a less evident increasing trend with random fluctuations in between. Consequently, the changepoints at 1935 and 1972 cannot be detected when assuming a single changepoint using the BS-based approach. Similar scenarios occur in CP2eve in Section 4. In particular, the overall mean function and the second segment mean functions are very close and the AMOC condition is violated, and the changepoint consistency for BScus does not hold.

Discussion and Concluding Remarks
In this study, we investigated the changepoint detection criterion T N,p (θ K |V p ) (7) for the mean functions. The most crucial result sheds new light on the consistency property of the changepoint estimatorθ K (8), which holds irrespective of whether the number of changepoints is under-reported or over-reported. This new finding motivated the development of the GS estimator, and the asymptotic behavior of the GS estimator-based test statistic resulted in the GS algorithm. Detectability in changepoint analysis is an interesting issue when using subspace projection methods. A noteworthy discussion appeared in Theorem 4.1 of Aston and Kirch (2012) under the AMOC or epidemic change model for dependent functional data. In contrast, our focus is on multiple changepoints, which stems from the viewpoint that the distribution of changepoint locations should play a role in detectability. In our approach, choosing V p in T N,p (θ K |V p ) as the eigenspace of c(s, t|θ 0 ) enables us to express the detectability condition explicitly, relying on the eigenvalues and the distances between adjacent changepoints, as shown in Theorem 2. Nevertheless, it is possible to develop relevant theorems and algorithms for the fully functional criterion Tĉ (t, t|θ K )dt without subspace projection along with the condition (C2), similar to Aue, Rice, and Sönmez (2018). However, it requires fully observed functions, whereas our approach allows sparsely observed functional data.
The GS estimator, along with the GS algorithm, connects the concept of global optimization and the greedy algorithm for multiple changepoint detection for functional data, and it suggests that the weight 1/{x(1 − x)} is essential. Although introducing the weighting scheme induces a relevant divergence issue, it provides a suitable remedy by adequately selecting a trimming parameter. A challenging problem to pursue in the future is the construction of simultaneous confidence bands for the multiple changepoints identified by the GS algorithm for a functional data sequence.

Supplementary Material
The online supplementary material contains technical proofs of Lemmas 1 and A.1.