Envelope Model for Function-on-Function Linear Regression

Abstract The envelope model is a recently developed methodology for multivariate analysis that enhances estimation accuracy by exploiting the relation between the mean and eigenstructure of the covariance matrix. We extend the envelope model to function-on-function linear regression, where the response and the predictor are assumed to be random functions in Hilbert spaces. We use a double envelope structure to accommodate the eigenstructures of the covariance operators for both the predictor and the response. The central idea is to establish a one-to-one relation between the functional envelope model and the multivariate envelope model and estimate the latter using an existing method. We also developed the asymptotic theories, confidence and prediction bands, an order determination method along with its consistency, and a characterization of the efficiency gain by the proposed model. Simulation comparisons with the standard function-on-function regression and data applications show significant improvement by our method in terms of cross-validated prediction error. Supplementary materials for this article are available online.


Introduction
The envelope model is a recently developed multivariate analysis strategy that enhances estimation accuracy by reducing model complexity using the reducing subspaces of the covariance matrices.It was first proposed for multivariate linear regression by Cook et al. (2010), and has been extended to many settings such as partial least squares (Cook et al. 2013), generalized linear models (Cook and Zhang 2015a), tensor regression (Zhang and Li 2017;Ding and Cook 2018), quantile regression (Ding et al. 2021) and spatial models (Rekabdarkolaee et al. 2019), among others.See Cook (2018) for a comprehensive review.
In this article, we extend the envelope models from multivariate linear regression to function-on-function linear regression.This type of regression is becoming increasingly common in modern applications.See, for example, Ferraty et al. (2012), Ivanescu et al. (2015), and Luo and Qi (2017).An envelope model adapted to this setting could significantly enhance estimation accuracy by exploiting inherent model parsimony.
The development of the envelope paradigm went through several phases: the response envelope model (Cook et al. 2010), the predictor envelope model (Cook et al. 2013), and the simultaneous envelope model (Cook and Zhang 2015b).Since both the response and the predictor envelope models are special cases of the simultaneous envelope, we only need to extend the latter to function-on-function regression.To facilitate this extension, we now give an overview of the simultaneous envelope in the multivariate setting.
Let Y ∈ R r be the response vector and let X ∈ R p be the predictor vector, and suppose that they satisfy the multivariate linear regression model where μ ∈ R r is a constant vector, β ∈ R r×p is a matrix that contains the regression coefficients, is the error vector independent of X having mean 0 and covariance matrix , and the predictor vector X has mean 0 and covariance matrix X .
Let us say that a matrix A is a basis matrix of a subspace S if the columns of A form a basis of S. For a finite-dimensional subspace S, let P S and Q S = I−P S denote the projection matrices onto S and its orthogonal complement S ⊥ , respectively, in the usual inner product.For the response envelope, we consider subspaces S of R r that satisfy Conditions (a) and (b) imply that Q S Y carries no information about β directly or indirectly through its relation with P S Y.
They hold if and only if Q S Y ⊥ ⊥ (P S Y, X) and thus P S Y carries all the regression information in (1).They hold trivially when S = R r and, when S = {0}, they imply that Y ⊥ ⊥ X.The response envelope is defined as the intersection of all subspaces S that satisfy (a) and (b).
For the predictor envelope, we consider subspaces T of R p that satisfy (3) Conditions (c) and (d) imply that Q T X does not affect Y directly or indirectly through its relation with P T X.They hold if and only if Q T X ⊥ ⊥ (Y, P T X) and so P T X carries all the regression information about Y.They hold trivially when T = R p and, when T = {0}, they again imply that Y ⊥ ⊥ X.The predictor envelope is defined as the intersection of all subspaces T that satisfy (c) and (d).
The simultaneous envelope model is the multivariate regression model (1) that requires conditions (a)-(d), and is parameterized in terms of the response and predictor envelopes.We refer to this model as the Multivariate Envelope Linear Model (MELM).The response envelope model is the special case of MELM with T = R p ; the predictor envelope is the special case of MELM with S = R r ; finally, the classical multivariate linear regression is the special case of MELM with both S = R r and T = R p .
The theoretical structure of our extension is sketched as follows.Let X and Y be random functions in Hilbert spaces H X and H Y .Assume that they satisfy the linear model where α is a fixed member in H Y , is a random member of H Y , and B: H X → H Y is a linear operator.We illustrate this function-on-function regression model with an example.To assess the economic effect of Covid-19, the daily new confirmed cases and daily mobility of retail and recreation are measured for all 21 counties in June 2020 in New Jersey.The curve of new confirmed cases over the entire month serves as the predictor X.The curve of mobility, which corresponds to visits to places like restaurants, shopping centers, and movie theaters, is the response Y.For many counties, the new confirmed cases are recorded, while the mobility data is missing.Using the functionon-function regression model (4), it is possible to predict the mobility from the new confirmed cases for those counties.The goal of envelope model is to achieve a more efficient estimation of the model parameters, which leads to more accurate prediction.Now we provide a sketch of the construction of the envelope model under (4).Let and X be the covariance operators of X and , respectively.If A : H → H is a bounded linear operator, then a reducing subspace of A is any subspace S of H such that AS ⊆ S and AS ⊥ ⊆ S ⊥ ; if A is self-adjoint, then these two conditions are equivalent.Let S be the intersection of all reducing subspaces of that contains the range of B, and let T be the intersection of all reducing subspaces of X that contains the range of B * , the adjoint of B. The subspaces S and T are the functional extensions of the response and predictor envelopes defined for (1).The rationale based on the covariance operators may seem different from that used for model (1), but it will be shown in Theorem 1 that these operator-based definitions lead to extensions of ( 2) and ( 3) to the functional model (4).When S and T are proper subspaces of H Y and H X , they offer a reduction of the complexity of the function-on-function linear regression, which, as we will show, can be very substantial in applications.Our goal is to estimate S, T , α and B in the dimension reduced function-on-function linear regression.We refer to model (4) with the simplifying structure (S, T ) as the Functional Envelope Linear Model (FELM).
In the theoretical development of our approach, we rely on the assumption that the regression error is a Gaussian random element in a Hilbert space.When the Gaussian assumption is violated, the objective function used to construct our estimator is still valid if we replace the conditional independence relations by some weaker conditions such as conditional uncorrelation.
The function-on-function envelope model developed here is but one way to achieve model parsimony for functional regression.Other approaches to model parsimony have been considered previously such as the penalized function-on-function linear regression (Ivanescu et al. 2015;Scheipl and Greven 2016;Sun et al. 2018).Our approach uses a different philosophy to achieve this: rather than penalizing the roughness of the coefficient functions, we impose sparse structures on the spectra of the covariance operators of the predictor and response processes, and let the data tell us which parts of the spectra are important.To the best of our knowledge, this article is the first attempt to generalize the envelope model to function-on-function regression.The closest earlier work is Zhang et al. (2018), which extended the envelope model to sufficient dimension reduction, where the response is a scalar and the predictor is a function.This article aims at estimating the functional dimension reduction space whereas we provide the explicit regression estimator, furnished with confidence and prediction bands, as well as an order determination method.

Functional Envelope Linear Model
Let ( , F, P) be a probability space, let N ⊆ R denote an interval and let H X and H Y denote separable Hilbert spaces of realvalued functions on N. Let X : → H X be a random element in H X and Y : → H Y a random element in H Y .
Recall that, if H is a generic Hilbert space, and X a random element in H with E( X H ) < ∞, then the function T : H → R, g → E( g, X H ) is a bounded linear functional, which has a Riesz representation g 0 s.t.T(g) = g 0 , g H for each g ∈ H.The function g 0 is defined as the expectation of X, and is written as E(X).
To define the second moment of X, recall that, if H 1 , H 2 are Hilbert spaces and f 1 , f 2 are members of H 1 , H 2 , respectively, then the tensor product (Conway 2013, chap. 2).The operator B is then defined as the expectation of A, written as E(A).Using this definition of expectation of a random operator, we define the second moment operator of X as E(X ⊗ X), and the variance operator of X as E[(X − E(X)) ⊗ (X − E(X))], which is denoted by var(X).
Our definitions of E(X) and E(A) are different from the standard definitions of moments in the functional data analysis, which are done through the pointwise moments such as E[X(t)] and E[X(s)X(t)].See, for example, Ramsay and Silverman (2007).The two definitions of E(X), as the Riesz representation of the bounded linear functional f → E X, f H and as the pointwise expectation function t → EX(t), are equivalent under mild conditions, for example, they are equivalent when H separable.However, the former definition does not rely on the nature of H; that is, it does not require H to be a set of functions of t, whereas the latter definition does.
For a subspace U of H, and a self-adjoint operator A : H → H, the A-envelope of U , denoted by E(U ; A), is defined as E(U ; A) = ∩{S : S ∈ Lat U (A)}, where Lat S (A) represents the collection of all reducing subspaces of A that contain S. The symbol Lat comes from the word "lattice." For another-bounded linear operator B : H → H, let ran(B) represent the range (or image) of B, and ran(B) the closure of ran(B).We abbreviate E(ran(B); A) by E(B; A).In this article, we will only deal with E(B; A) where ran(B) = ran(B).
Let B : H X → H Y be a bounded linear operator.Let be a random element in H Y such that ⊥ ⊥ X and E( ) = 0.For simplicity, we tentatively assume that E(X) = 0. We consider the function-on-function linear regression where and X are Gaussian random elements in H Y and H X , respectively, satisfying and α is a nonrandom member of It is easy to see these are selfadjoint operators.For an operator A, let A * represent the adjoint operator of A.
Next, we give a rigorous definition of FELM.Let {(λ i , φ i ), i ∈ N}, {(ρ i , χ i ), i ∈ N} and {(τ i , ψ i ), i ∈ N} be the sequences of eigenvalues and eigenfunctions for the linear operator Y , and X .We say that a subspace S of H Y is covered by a subset C of {χ i , i ∈ N} if S is contained in the subspace spanned by C. The same applies to a subspace of H X .Definition 1.A functional linear model is called a functional envelope linear model (FELM) with respect to the response envelope E(B; ) and predictor envelope E(B * ; X ).Furthermore: (i) If at least one of the envelopes is a proper subset of its ambient space, then we call the FELM a proper envelope model.(ii) If ran(B) is covered by a finite subset of {χ 1 , χ 2 , . ..} and ran(B * ) by a finite subset of {ψ 1 , ψ 2 , . ..}, then we call the FELM an eigen-sparse envelope model.
According to part (i) of this definition, any functional linear model is a FELM, but only when one or both envelopes are proper subspaces, does the FELM lead to efficiency gains.When the predictor envelope is the ambient space, the FELM reduces to the functional response envelope model; when the response envelope is the ambient space, the FELM reduces to the functional predictor envelope model.Since, in the multivariate setting, the predictor envelope model is the underlying model of the partial least squares, the FELM generalizes the partial least squares (Wold 1966;De Jong 1993;Cook et al. 2013) to the functional setting.See also Delaigle and Hall (2012).
It is also reasonable to call E(B; ) the residual envelope, but, as we will show in Theorem 2, this envelope is the same as E(B; Y ).Hence, it is justified to call it the response envelope.
In this article, we focus on the eigen-sparse envelope model in part (ii) of Definition 1, where the numbers of eigenfunctions χ 's and ψ's that cover the response and predictor envelopes are fixed.In principle, we can allow the numbers of the covering eigenfunctions to grow with sample size n.The situation is similar to sparse estimation, except that, here, the sparsity is imposed on the eigenstructures.However, due to limited space, the development for FELM where the numbers of the covering eigenfunctions increase with the sample size is left to future research.
We next express the eigen-sparse FELM explicitly in terms of the eigenfunctions of the covariance operators.Let I and J be the finite subsets of {1, 2, . ..} such that {χ i : i ∈ I} covers ran(B) and {ψ j : j ∈ J} covers ran(B * ).Then we have where b ij are real numbers.Corresponding to I and J, X and can be decomposed as It is easy to see E(B; ) = span{χ i : i ∈ I}, and E(B * ; X ) = span{ψ j : j ∈ J}.
Let s and t denote the cardinalities of I the cardinality of J, respectively.Note that E(B; ) and E(B * ; X ) are s and t.
Our Definition 1 of FELM was made solely in geometric terms, but, as the next theorem will show, under the Gaussian assumption, they induce independence and conditional independence relations parallel to (a), (b), (c), and (d) in ( 2) and (3) for MELM.These relations are the true motivation behind our definition of FELM.
Theorem 1.Under the assumptions in Definition 1, we have (11) Relationships ( 11a) and ( 11b) are the functional counterparts of ( 2a) and (2b), and (11c) and (11d) are the functional  counterparts of (3c) and (3d). Following the finite-dimensional  analog, (11a) and (11b) hold if and only if Q and (11c) and (11d) hold if and only In consequence, if we knew the envelopes E(B; ) and E(B * ; X ), then, to estimate B, all we would need to do is to regress X solely for the estimation of the corresponding parts of the covariance operator and X .This is the underlying mechanism that drives the efficiency gains of the envelope model.Of course, at the sample level, the envelopes themselves have to be estimated as well, and The next theorem shows that E(B; ) and E(B; Y ) are in fact the same, which generalizes a result to the functional setting (see, Cook et al. 2010, Proposition 3.1).
Theorem 2. Under the functional linear model ( 5), E(B; Next we derive the Karhunen-Loève expansion of Y.In preparation, let i 1 , . . ., i s be the members of I. , and QDQ T the spectral decomposition of U , where Q ∈ R s×s is an orthogonal matrix and D ∈ R s×s is a diagonal matrix.For a sequence a 1 , a 2 , . .., and a subset A ⊆ {1, 2, . ..}, we denote {a i : i ∈ A} by a(A).Using this notation, we have E(B; ) = span{χ(I)} and E(B * ; X ) = span{ψ(J)}.
In the following, if A is a matrix and a is a vector, we use diagmv(A) to represent the diagonal vector of A, and diagvm(a) the diagonal matrix with a as its diagonal.
Theorem 3.Under the assumptions in Definition 1, the Karhunen-Loève expansion of Y has the form

}, and {χ
This implies that there is an isomorphism between the eigenfunctions {φ 1 , φ 2 , . . ., } of Y and the eigenfunctions {χ 1 , χ 2 , . . ., } of .In particular, for the indices in I, Using this relation we can reexpress B in terms of {φ i } and {ψ i } as follows: The exposition so far in this section is coordinate-free: the entire system is built on invariant objects such as subspaces, lattices, and linear operators.To make the presentation less abstract, we now give an alternative, and somewhat intuitive, construction of the FELM based on orthonormal bases.Let {e i (X) : i ∈ N} and {e i (Y) : i ∈ N} be orthonormal bases of H X and H Y , respectively.Then X, Y, and can be expanded as ) .
Let X • and Y • and • represent the sequences These establish one-to-one correspondences between H X and 2 , and between H Y and 2 , where 2 is the collection of all square summable sequences.In this framework the linear operators X and correspond to the ∞ × ∞ matrices the eigenvectors of X • and • , respectively, and let Note that X * and X are different representations of the same random elements; the same can be said of Y and Y * .The envelope model can be transparently explained in terms of X * and Y * : it simply assumes that only finite number of components of X * and Y * participate in the regression between X and Y; that is, there exist finite subsets I and J of N such that {Y * i : i ∈ I} and {X * j : j ∈ J} follow a multivariate regression model.Furthermore, the theoretical properties of the envelope structure guarantee that there is no additional regression relation left in the rest of the components.Of course, the regression in terms of Y * versus X * needs to be translated back to the original spaces for Y and X through the isomorphisms H X and 2 and H Y and 2 .The numerical procedures in Sections 5 and 6 are simply the implementation of the above construction at the sample level.

From FELM to MELM
Our basic idea for estimating FELM consists of three steps: first, find its one-to-one relation with MELM, second, use the available methods to estimate the MELM, and third, transform the results back to FELM via the one-to-one relation.In this and the next sections we develop such a one-to-one relation.
Before proceeding further, we need to review the coordinate form of the MELM.Recall from the Introduction that the multivariate linear regression model ( 1) is said to follow MELM if Assumptions (a), (b), (c), and (d) hold with S = E(β; ) and T = E(β T ; X ).Also recall that is an r × u matrix representing an orthonormal basis matrix of E(β; ) and is a p × d matrix representing an orthonormal basis matrix of E(β T ; X ).Similarly, 0 ∈ R r×(r−u) and 0 ∈ R p× (p−d) are matrices representing orthonormal basis matrices of E(β; ) ⊥ and E(β T ; X ) ⊥ , respectively.The coordinate form of MELM is where β = η T , ∈ R u×u and 0 ∈ R (r−u)×(r−u) carry the coordinates of with respect to and 0 , and ∈ R d×d and 0 ∈ R (p−d)× (p−d) carry the coordinates of X with respect to and 0 .The matrix η ∈ R u×d is the regression coefficient matrix for the reduced multivariate linear model where Y is replaced by T Y and X by T X.
Assumption 1.There are integers k > 0 and l > 0 such that By construction, t ≤ k and s ≤ l.In the following, let b and c be the vectors of functions (b 1 , . . ., b k ) T and (c 1 , . . ., c l ) T .We need to introduce some additional notations.For a Hilbert space As implied by the notation, these will be the envelope basis matrices, analogous to those in ( 14), that arise from mapping FELM to MELM.Because {c 1 , . . ., c l } and {b 1 , . . ., b k } are orthonormal sets, we have φ(I) = T c, and ψ(J) = T b, which imply The next theorem describes how one goes from FELM to MELM, which is the theoretical foundation for the two estimation methods to be developed in Sections 5 and 6.Let where b and c are as defined immediately following Assumption 1.
Theorem 4. If (X, Y) follows the FELM in Definition 1 with response envelope E(B; ) and predictor envelope E(B * ; X ), then ( X, Ỹ) follows the MELM From Theorem 4, we obtain a corollary where {b 1 , . . ., b k } and {c 1 , . . ., c l } are eigenfunctions of X and Y; that is, b i = ψ i , c j = φ j , for i = 1, . . ., k, j = 1, . . ., l.We denote (φ 1 , . . ., φ l ) by φ(1 : l) and (ψ 1 , . . ., ψ k ) by ψ( 1 The next corollary shows that, if Y and X follow FELM, then Y • and X • follow MELM in Theorem 4 with , taking a special form.This fact is the theoretical foundation for estimation method developed in Section 6. If l = s and k = t then = I s , = I t and the MELM in Theorem 4 reduces to a version of the multivariate linear regression model (1).We refer to this as the full model.
An alternative way to write the MELM in Theorem 4 is through the following parameterization.Let 0 ∈ R k×(k−t) be a matrix whose columns form an orthonormal basis of E ( ( φ(I) [B] ψ(J) ) T T ; X ) ⊥ , and let 0 ∈ R l×(l−s) be a matrix whose columns form an orthonormal basis of E( ( φ(I) [B]  t) .In this parameterization, the covariance matrices of ˜ and X can be rewritten as and the MELM in Theorem 4 can be restated as Ỹ| X ∼ N(μ Ỹ| X , Ỹ| X ), where The parameterized form also applies to the MELM in Corollary 1 with and taking the special form (17). Theorem 4 tells us that, under the conditions assumed therein, a FELM can always be converted into a MELM.This means we can use available methodologies and softwares to estimate the converted MELM.

From MELM to FELM
Suppose (X, Y) obeys the FELM in Definition 1.Let ( X, Ỹ) be as defined in the last section.
Theorem 5.Under the conditions in the last paragraph, if ( X, Ỹ) follows the MELM In the next two sections we will implement the theoretical results of Sections 3 and 4 to develop sample estimates of FELM.In the multivariate setting, since X and Y are random vectors, it is natural to build the envelope model directly on X and Y themselves.In particular, there seems no reason to build the envelope model, say, on the principal components of X and Y.The situation is entirely different in the functional setting.Since we do not have random vectors to begin with, we face two apparent options: 1. building the envelope model directly on the coordinates of X and Y with respect to the bases in H X and H Y that we happen to choose.This seems to be the most direct generalization of the original idea of the envelope model; 2. building the envelope model on the coefficients of Karhunen-Loève expansions of X and Y.
We next develop estimation procedures via both routes.In application we do not observe the entire functions Y and X, but only at a finite number of points.So we need a method to connect the points.As a pragmatic approach, we assume H Y and H X have finite bases.

Direct Estimation
We focus on the first approach in this section.To construct orthonormal bases for H X and H Y , we start with any finite bases b = (b 1 , . . ., b k ) T for H X , and c = (c 1 , . . ., c l ) T for H Y .
In practice, k and l are picked according to the shape of the functions.For example, if cubic splines are used, k and l are decided by the number of knots.More knots are needed for a function with more fluctuation.As an illustration, let N = [0, 1], and {t 0 , . . ., t m 1 } and {s 0 , . . ., s m 2 } be two sets of nodes in [0, 1] for H X and H Y , respectively, with t 0 = s 0 = 0, t m 1 = s m 2 = 1.Let {b 1 , . . ., b k } and {c 1 , . . ., c l } be the cubic spline bases determined by the nodes t 0 , . . ., t m 1 and s 0 , . . ., s m 2 , respectively.In this case, k = m 1 +3 and l = m 2 +3.The specific forms of these bases can be found, for example, in Wang (2011).We use the L 2 ([0, 1]) inner product for H X and H Y .For example, b i , b j H X =  For a function h ∈ H X , we use [h] b to denote the coordinate of h with respect to the basis b.That is, Then coordinate Note that the above is exactly the spline estimate of the curve {X i (t) : t ∈ [0, 1]} as expressed in orthonormal basis.The coordinate of Y i with respect to c is computed similarly.Note that this process does not assume t to be equally spaced, thus, it is applicable if t is irregularly distributed.When the observed points are sparse, for example, m * 1 < k, we can use regularized regression such as ridge regression or lasso to obtain [X i ] b .But further investigation will be needed for its theoretical justification.We can also reduce the number of basis functions which is itself a kind of regularization, for instance, if spline functions are used as basis, we can reduce the number of knots.Wang et al. (2016) has more detailed discussion on the various situations where functional data is observed in practice, sparsely or densely observed functional data.
Based on Theorem 4, we fit a MELM with X as the predictor and Ỹ as the response Let β = η T denote the regression coefficients.Estimation of the parameters in ( 19) is performed by maximizing the loglikelihood function for ( X, Ỹ).We first estimate the envelopes E(β; ˜ ) and E(β T ; X ).Since both E(β; ˜ ) and E(β T ; X ) are subspaces, the estimation involves optimization over Grassmann manifold.Let a, b be two positive integers, and a > b.
An a × b Grassmann manifold is the set of all b-dimensional subspace of an a-dimensional space.An R package Renvlp for estimation for envelope models can be found on CRAN, see Lee and Su (2022).Details can be found in Section 6 of the supplementary materials.
The regression coefficient is then estimated by β = ˆ η ˆ T .The MLE of the rest of the parameters are Xi .In Section 8, we will develop the confidence and prediction bands of the FELM regression estimate, which requires the asymptotic distribution of the above estimate.The asymptotic distribution and efficiency gain of the MELM estimator are known (Cook and Zhang 2015b), which we now outline for later use.Let vec(•) denote the vector operator that stacks the columns of a matrix into a vector, and vech(•) denote the vector-half operator that stacks the lower half (including the diagonal) of a symmetric matrix into a vector.The vector of all parameters of interest under model ( 19) is 2 and J the Fisher information for v 1 under the full model.The explicit forms of G and J are in Section 7 of the supplementary materials.Let v1,full be the maximum likelihood estimator of v 1 under the full model.Then , where d → denotes convergence in distribution.The next theorem is due to Cook and Zhang (2015b).

Karhunen-Loève Expansion Based Estimation
We now turn to the second approach outlined at the end of Section 4. Let b and c be any finite bases and G b and G c their Gram matrices.Let [X i ] b and [Y i ] c be the least squares approximations of the coordinates of X i and Y i , as explained in Section 5. Let ( τr , ψr ) be the rth eigenvalue-eigenfunction pair of ˆ X .As shown in Solea and Li (2020), , where v r is the rth eigenvector of the matrix G b , and τr is the rth eigenvalue.The rth eigenfunction of ˆ X is then ψr = [ ψr ] T b b.The empirical Karhunen-Loève expansion of X i is then where c ŵr , and ( λr , ŵr ) is the rth eigenvalue-eigenvector pair of the matrix G Note that X and Ỹ follow MELM (19) from Corollary 1. Estimation of the parameters in MELM is the same as discussed in Section 5.

Order Determination
In both the direct estimation in Section 5 and the Karhunen-Loève expansion based estimation in Section 6, the dimension s of the response envelope and the dimension t of the predictor envelope need to be selected.Let (s 0 , t 0 ) denote the true dimensions of the response and predictor envelopes, and let (s, t) denote a generic pair of dimensions that is varied in the set for optimization.To estimate (s 0 , t 0 ), we compute the BIC value over a grid of (s, t) from (1, 1) to (l, k), and find the pair which minimizes BIC.Specifically, for the dimension pair (s, t), BIC(s, t) = −2 ˆ (s, t) + log(n)K(s, t), where is the maximum of the likelihood for a fixed (s, t), and K(s, t) = 1 2 l(l + 1) + 1 2 k(k + 1) + st + l is the total number of model parameters when the dimensions of the response and predictor envelops are s and t.Note that all the hatted parameters in (20) depend on (s, t).
We estimate (s 0 , t 0 ) by minimizing the above BIC-type criterion, assuming that we search over the grid of = {1, . . ., l} × {1, . . ., k}, and denote the estimate by (ŝ, t).Note that the consistency of order determination has not yet been proved for the envelope model even in the multivariate setting.So the next result is novel to the envelope model in general.
When implementing the dimension selection procedure, we make the following adjustment to speed up the computation.Let r denote the rank of β.Since r is no greater than min(s, t), if r were known, we would only need to compute the BIC value over a grid of (s, t) from (r, r) to (l, k).Following Cook and Zhang (2015b), r can be estimated from a Chi-squared test developed in Bura and Cook (2003).The test statistic is Ỹ| X , and βOLS = ˆ Ỹ X ˆ −1 X .According to Bura and Cook (2003), the asymptotic under the null hypothesis that r = d.Let r be the smallest d for which this null hypothesis is not rejected.Instead of minimizing BIC(s, t) over the grid {1, . . ., l}×{1, . . ., k}, we minimize it over {r, . . ., l}×{r, . . ., k}, which works well in our examples.

Confidence Band and Prediction Band
In this section, we construct the confidence band for the mean of a new response and the prediction band for the new response itself.Specifically, let X new be a new observation on X.We are interested in constructing the confidence interval for E[Y new (t)] and prediction interval for Y new (t), for each t ∈ N.
Let Ŷnew denote the estimate of E(Y new ), which is also the prediction of Y new .We estimate E(Y new ) as follows.First, we compute the coordinates [X new ] b of X new relative to the basis b.Then, the coordinates of Ŷnew with respect to the basis c in We next approximate the variance of Ŷnew (t 0 ) at some t 0 ∈ T.
We approximate var[vec( β)] by n −1 times the asymptotic variance of √ n[vec( β)−vec(β)], which is the upper left kl×kl block of V melm in Theorem 6.We approximate var( μ) by n −1 times the asymptotic variance of √ n( μ − μ), which is the lower right l × l block of V melm .In symbols, we use where z 1−α/2 is the 1 − α/2 percentile of standard normal distribution.
To compute the prediction interval for

Simulations
We generate n independent Gaussian random functions using Fourier basis , where both χ i 's and ψ i 's are basis functions on [0, 1]: sin(4π t)} and the dimensions of the envelopes are s = 2 and t = 2.The observed t are 10 evenly spaced points in N = [0, 1], that is, {t 1 , . . ., t 10 } = {0.1, . . ., 1}, and they are the same for all the observations.The sample size varies from 50 to 400, and 100 repetitions were generated for each sample size.
In the implementation of FELM, we use the cubic spline basis with five knots 0, 0.25, 0.5, 0.75, and 1 for both H X and H Y .We compare FELM with three existing methods: the full function-on-function regression model (FFFR), the principal component regression (PCR), and the partial least squares ). PCR performs a standard multivariate regression of [Y i ] c on the first few principal components of [X i ] b , and then post-multiply the coefficients by the transpose of the loading matrix of the principal components.PLS is implemented with the R-function plsr, as applied to In the simulation we used two sets of tuning parameters: (i) we took s and t to be the true envelope dimension 2, and chose the numbers of components of PCR and PLS also as 2; (ii) we used the BIC-type criterion developed in Section 7 to estimate s and t, and used 5-fold crossvalidation to select the numbers of components in PCR and PLS.We compare the four methods using prediction errors, which are computed by performing 5-fold cross-validation on each of the 100 simulated samples.Table 1 shows the above mean squared error averaged over the 100 simulated samples.Comparing the four methods, we see that FELM significantly outperforms the three other methods.The second best performer is FFFR, while PCR and PLS trail behind.Note that FFFR contains the true envelope model as a submodel, and is therefore asymptotically unbiased.Its difference from FELM is caused completely by the smaller asymptotic variance of FELM.That explains why the difference decreases with the sample size.In comparison, since PCR and PLS each takes only two leading components, they do not contain the true envelope model as a submodel, and are therefore asymptotically biased.This explains their large prediction errors, and the fact that these errors do not significantly decrease with the sample size.We also note that the results obtained by the direct methods do not significantly differ from those obtained by the Karhunen-Loève expansion methods.
Due to space limitation, additional results on larger envelope dimensions and summary of computation time are included in Sections 9.1 and 9.2 of the supplementary materials, respectively.FELM also performs well with irregular grid.An additional simulation with each sample (X i , Y i ) observed in different random points is included in Section 9.3 of the supplementary materials.During implementation, representation error may occur due to violation of Assumption 1. Thus extra variation is involved in estimation of X and Ỹ.To investigate its effect on FELM, we include a simulation in Section 9.4 of the supplementary materials.
We now turn to confidence and prediction bands using n = 100 as an illustration.The confidence and prediction intervals were constructed at t = 0.05, 0.15, 0.25, . . ., 0.95, which are different from the points at which the functional data are observed.Because Table 1 shows no significant difference between the direct and the Karhunen-Loève expansion based methods, here we only compare the intervals based on the direct method.Figure 1 shows the confidence and prediction bands computed by FELM and FFFR.Comparing the confidence intervals, we see that FELM's intervals are much narrower than the FFFR's intervals, and at the same time has better coverage (note that the blue band does not cover the black curve in the right portion of the plot, whereas the red band does).The prediction intervals of FELM and FFFR have about the same length, which is because the efficiency gain is eclipsed by the noise level.The envelope dimensions in FELM were set at the true values.If the envelope dimensions were estimated by BIC, the confidence and prediction bands were almost identical to those in Figure 1.

Economic Data
The World Bank website (https://data.worldbank.org/)contains complete data on trade and gross domestic product (GDP) growth for 141 countries from 2001 and 2018.The Trade variable of a country is defined as the sum of exports and imports of goods and services measured as a percentage of GDP of that country.GDP growth is the annual percentage growth rate of GDP at market prices based on constant local currency.Both Trade and GDP growth are functional data, as observed over the 18 years.We treat Trade as the predictor and GDP growth as the response in a function-on-function regression problem, and applied FELM to the data for each continent separately.
Since South America has data for only nine countries, we combined it with North America.The continent of Australia has measurements for only seven countries, we omitted it from our analysis.The sample sizes for Africa, America, Asia, and Europe are 32, 30, 32, and 40, respectively.Spaghetti plots of Trade and GDP growth for eight countries (two for each region) are presented in Figure 2. The dimensions of the envelopes are selected by BIC.The prediction errors are measured by 5fold cross-validation with 100 random splits.The percentages of the reduction of prediction error by FELM as compared with FFFR for the four regions are shown in Table 2. Overall, FELM achieves very significant improvement over FFFR.Moreover, it is interesting to note that the improvements are especially strong for America.
A possible explanation for this discrepancy in the levels of improvement is that the GDP growths for countries in Africa are more associated with main-stream products such as minerals, crude oil, and agricultural products (coffee, cocoa, cotton, and so on), which are related to the leading eigenvectors of the responses.In this situation, the envelope components and the principal components are somewhat aligned, leading to milder (nonetheless significant) efficiency gains.In comparison, the GDP growths for countries in America are more associated with less main-stream products such as microchips, medicines, vaccines, or airplanes, which are more related to the non-leading eigenvectors of the responses.In this situation, the envelope components are largely orthogonal to the leading principal components, yielding substantial gains in efficiency and prediction.For more discussions on when an envelope would achieve most of its gains, see Cook (2018).

Covid-19 Data
Covid-19 is a global pandemic of the coronavirus disease, which has up-to-date 212 million reported cases in 220 countries and territories.The Open  collects daily data for more than 50 countries around the world.We focus on the daily new confirmed cases and mobility data in June, 2020 in 21 counties in the state of New Jersey.We took the new confirmed cases as the predictor and mobility of retail and recreation as the response.The data for mobility of retail and recreation records percentage change in visits to places like restaurants, shopping centers, museums, and movie theaters compared to a baseline value.The baseline is the median value for the corresponding day of the week during the five week period from January 3 to February 6 in 2020.Figure 3 shows the spagetti plots for the predictor (upper panel) and response (lower panel) in six counties in New Jersey.We fit FELM to the data.The dimensions of the response envelope and predictor envelope were estimated as 3 and 2 by BIC, respectively.We computed the mean squared prediction errors for FFFR and FELM with 5-fold cross-validation with 100 random splits.The predicted results for Camden County and Atlantic County are in Figures 2 and 3 of the supplementary materials.Compared with FFFR, FELM reduces the prediction error by 62.3%.The 95% confidence and prediction bands for the daily mobility in Hudson County are displayed in Figure 4. Notice that the confidence bands of both FELM and FFFR have good coverage, and FELM has significantly narrower bands than FFFR.We also performed analysis taking the daily new confirmed cases in May as the predictor, and the daily new confirmed cases in June as the response.The dimensions for the response and predictor envelopes are inferred to be 5 and 4 by BIC.FELM reduces the prediction error by 11.66% compared to FFFR.
participate in the estimation of them, thus, being indirectly involved in the estimation of B.

10
b i (t)b j (t)dt, which can be written down explicitly for cubic splines.To turn b = (b 1 , . . ., b k ) T and c = (c 1 , . . ., c l ) T into orthonormal bases, let G b and G c denote the Gram matrices G j=1 .The orthonormal bases b * and c * are then calculated by b * = G For notational simplicity, we reset (b * , c * ) to (b, c).

Figure 1 .
Figure 1. and prediction bands: solid lines indicate confidence bands of FELM; dashed lines indicate confidence bands of FFFR; the dash-dotted lines represent prediction bands of FELM; the dotted lines represent prediction bands of FFFR.The thick long dashed line in the middle represents true mean response E[Y(t)].The black dots represent the observed response at 0.05, . . ., 0.95.For better visibility, the confidence band by FELM is shaded gray.

Figure 2 .
Figure 2. of trade and GDP growth for eight countries.

Figure 3 .
Figure 3. Plot of new confirmed case and mobility to retail and recreation in June.

Figure 4 .
Figure 4. Confidence and prediction bands: solid lines indicate confidence bands of FELM; dashed lines indicate confidence bands of FFFR; the dash-dotted lines represent prediction bands of FELM; the dotted lines represent prediction bands of FFFR.The dots represent the observed response in each day of June.For better visibility, the confidence band by FELM is shaded gray.

Table 1 .
Comparison on mean squared prediction errors.
's and ν i 's are independent standard normal random variables, and τ i 's and ρ i 's are constants.In general X and need not be in the same Hilbert space; the above construction is for simplicity.The parameters τ i 's and ρ i 's are chosen as The b ij 's in the linear operator B are all zero except entries b 22 = −1.25,b 24 = −1, and b 42 = b 44 = 0.4.Under this setting, H X and H Y are finite dimensional.The linear operators

Table 2 .
Mean squared prediction errors reduced by FELM as percentages of those of FFFR.Similar to FELM, FFFR, PCR and PLS can be performed either directly on the coordinates of X and Y as in Section 5, or on the empirical Karhunen-Loève expansions of X and Y as in Section 6.Take the direct estimation as an illustration, we first compute the coordinates [X i ] b and [Y i ] c for the sample (X 1 , Y 1 ), . . ., (X n , Y n ).FFFR estimates the coefficients β by performing a standard multivariate regression on