Expansion and estimation of Lévy process functionals in nonlinear and nonstationary time series regression

ABSTRACT In this article, we develop a series estimation method for unknown time-inhomogeneous functionals of Lévy processes involved in econometric time series models. To obtain an asymptotic distribution for the proposed estimators, we establish a general asymptotic theory for partial sums of bivariate functionals of time and nonstationary variables. These results show that the proposed estimators in different situations converge to quite different random variables. In addition, the rates of convergence depend on various factors rather than just the sample size. Finite sample simulations are provided to evaluate the finite sample performance of the proposed model and estimation method.


Introduction
A Lévy process (Z(t), t ≥ 0) is a cádlág stochastic process with independent and stationary increments and Z(0) = 0 almost surely (a.s.). Lévy processes constitute a very rich and attractive class of stochastic processes. For instance, two of the best known processes, the Poisson process and the Wiener process (Brownian motion), are fundamental Lévy processes. Therefore, this area has attracted considerable attention due to its exibility for a wide variety of modeling issues in nance, risk, and engineering, to name just a few. Although the celebrated Black-Scholes option pricing formula describes the price of an option as a functional of Brownian motion, the literature has pointed out that there are some signi cant theoretical drawbacks in modeling with Brownian motion in some circumstances. For example, empirical evidence suggests that log returns do not behave according to a normal distribution (see Schoutens, 2003). Hence, researchers realize that more exible stochastic processes should be utilized to relax the restriction of Brownian motion mainly due to its normal distribution when one needs to formulate a more suitable stochastic regression model to depict some stochastic phenomenon. Towards this end, Lévy processes have much potential to play a role in the place where Brownian motion is incompetent since Lévy processes do not specify their distributions by de nition.
Meanwhile, some empirical studies show that many economic and nancial data sets admit nonlinearity and nonstationarity. Consequently, a number of nonparametric and semiparametric models and kernel-based methods have been proposed to deal with both nonlinearity and nonstationarity simultaneously. Such studies include Tjøstheim (1994) for an excellent survey of nonlinear time series models; Fan and Yao (2003) provide a comprehensive study for parametric and nonparametric nonlinear time series; Gao (2007) further updates the developments on nonparametric and semiparametric time series models. In the nonstationary time series case, Phillips and Park (1998), Karlsen and Tjøstheim (2001), Karlsen et al. (2007), Phillips (2009), and Wang and Phillips (2009a,b) deal with nonparametric kernel estimation in co-integrating regression models; Cai et al. (2009) and Xiao (2009) study nonparametric kernel estimation in varying-coe cient regression models; Gao and Phillips (2013) and Chen et al. (2012) dwell on nonparametric kernel estimation in semiparametric regression models, Teräsvirta et al. (2010) give some detailed discussions about existing developments on nonlinear economic time series models, and Myklebust et al. (2012) propose to use null recurrent Markov chains in regression models where an integrated process is typically engaged.
Therefore, we are motivated to study the estimation of the following model based on discrete observations Y(t) = m(t, Z(t)) + ε(t), (1.1) where Z(t) is a Lévy process, ε(t) is an error process with zero mean and nite variance, and m(t, z) is an unknown function on R + × R. Throughout, suppose that there exists a σ -eld sequence F t such that, for s < t, almost surely we have E(ε(t)|F s ) = 0, E(ε(t) 2 |F s ) = σ 2 e , and sup t>0 E(|ε(t)| p |F s ) < ∞ for some p ≥ 4.
Model (1.1) incorporates time variable t in the regression function such that the model possesses the exibilities to be time-homogenous or time-inhomogeneous depending on the particular form of the function m. Indeed, from a practical point of view, there is need for us to consider the timeinhomogeneous case where time variable t is involved explicitly. In recent years, the literature has naturally evolved towards the inclusion of time variable t in econometric models and their applications. See, for example, , Härdle et al. (2003), and Choi (2013).
Nonparametric estimation of m(t, Z t ) for the case where Z t is nonstationary is a very di cult issue. A closely related work is a recent article by Phillips et al. (2013), who consider nonparametric kernel estimation for the case where m(t, Z t ) = α(t/T) τ Z t , in which α(·) is a vector of unknown functions and Z t is a vector of integrated time series. However, as far as we know there has not been any discussion about how to estimate m(t, z) by a nonparametric method. As pointed out by Phillips et al. (2013), there are certain di cult issues associated with nonparametric kernel degeneracy when using a nonparametric kernel method for this type of models. In this article, we therefore consider using a nonparametric series method (see, for example, Ai and Chen, 2003; Chapter 2 of Gao, 2007;and Li and Racine, 2007) and then develop an orthogonal series expansion method before we tackle the estimation issue. Meanwhile, we establish some general asymptotic theory for weighted sums of nonstationary random variables in Appendix B below as the rst step towards establishing two limiting distributions for a consistent estimator of the unknown function m(t, z) involved in model (1.1).
Nonetheless, to do so, since the distribution of Lévy process Z(t) varies with time t, the di culty is to obtain an orthogonal polynomial sequence which is orthogonal with respect to the density or probability distribution of Z(t), ρ(t, x), say. In the supplementary material, we will discuss di erential and di erence equations of hypergeometric type involving time variable and obtain an orthogonal sequence given some conditions on the density or probability ρ(t, x). Such orthogonal sequences enable us to expand Lévy process functionals into orthogonal series in random variable space L 2 ( ) consisting of all variables with nite second moments. This is totally di erent from the usual expansion of stochastic process functionals where researchers expand the functionals on the real line and then plug the process into the expansion. The advantage of the proposed expansion, comparing with the conventional sieve method in the literature, is discussed in Appendix D of the supplementary le. It is not surprising that there exist orthogonal sequences with weights that are densities or probabilities of some Lévy processes. In e ect, it has been a long history that there exists a close connection between stochastic processes and orthogonal polynomials. For example, the so-called Karlin-McGregor representation expresses the transition probability of birth and death processes by means of a spectral representation in terms of orthogonal polynomials. Schoutens (2000) gives an extensive discussion about relations between stochastic processes and orthogonal polynomials.
This article aims to establish an asymptotically consistent estimator for m(t, z) in model (1.1) and the resulting asymptotic theory in the following two sampling situations: a) observing (Y t,n , Z t,n ) where Y t,n = Y tT n , Z t,n = Z tT n , T > 0 xed, t = 1, 2, . . . , n, and b) observing (Y t,n , Z t,n ) where Y t,n = Y tT n n , Z t,n = Z tT n n at t = 1, 2, . . . , n, and T n → ∞ with sample size n increasing. To establish an asymptotic theory for the estimator, m, of functional m, we develop a general asymptotic theory to deal with the sample mean and sample covariance of two classes of functionals of observed data. It is noteworthy that the established asymptotic theory considerably extends existing results, such as Phillips (1999, 2001) and Wang and Phillips (2009a). In addition, Monte Carlo simulations below show that the approximation of the truncation series of the orthogonal expansion is accurate in both cases where time interval is xed and where time interval varies according to the sample size.
With the advantage of expanding an unknown functional into an orthogonal series, the proposed method is applicable to some estimation problems in economics and nance. For example, there is a number of studies dealing with conditional moment models involving unknown functionals, such as Chen (2003, 2007) and Chen and Ludvigson (2009). Since existing theory for expansions of functionals of stationary processes is not directly applicable, the proposed expansion and estimation method in this article is useful and signi cant in dealing with conditional moment models with nonstationarity. Additionally, Dong and Gao (2013) provide a direct application of the expansion for an unknown function involved in a continuous-time nancial model. Further discussion in this direction will be given in future research.
The organization of this article is as follows. Section 2 introduces the methodology of estimation. Section 3 develops a general asymptotic theory. Section 4 establishes an asymptotic theory for the estimator proposed in Section 2. Several Monte Carlo simulations are given in Section 5. Appendix A includes all assumptions, as well as remarks and justi cations for some de nition. Appendix B shows two crucial lemmas based on which the asymptotic theorems in Section 3 are established, and more importantly, they are of interest on their own right. Appendix C contains the proofs of the main theorems. The supplemental material includes Appendices D-F, where Appendix D gives some basic lemmas in the preparation of orthogonal series expansion and the establishment of the asymptotic theory; the existence and the explicit expression of an orthogonal polynomial system associated with an underlying Lévy process are studied in Appendix E; moreover, Appendix F contains the limit distributions of the proposed estimators for the unknown functionals in the case where time horizon is nite.

Functional orthogonal expansion and estimation
In this section we shall state the methodology of estimation for the unknown function m(·, ·) in model (1.1), given discrete observations of Y(t) and Z(t). The idea is based on the orthogonal series expansion of m(t, Z(t)), and then we estimate the coe cients in a truncated orthogonal series. The orthogonal series expansion, however, relies on the existence of orthonormal basis in a suitable function space where the function m(·, ·) belongs to. Such function space is a closed subspace, denoted by and constructed in Appendix D in the supplement, of L 2 ( ). The construction of the orthonormal basis in is investigated in Appendix E, while the signi cance of the basis constructed is emphasized in Appendix D when the orthogonal series expansion is concerned with.
The orthonormal basis engaged in the sequel has a close relationship with the density or probability of Lévy process Z(t). Because of the involvement of time t, we actually use a tensor product basis, denoted by {Q i (t, Z(t))ϕ jT (t)} (i, j ≥ 0), t ∈ [0, T], to expand m function, where {Q i (t, Z(t))} (i ≥ 0) is the basis in and {ϕ jT (t)} (j ≥ 0) is the basis for L 2 [0, T] given by ϕ 0T (t) = 1 √ T and for j ≥ 1, ϕ jT (t) = 2 T cos jπ t T in this article (which can be replaced by any other basis in L 2 [0, T] without any impact on the result).

Given that
T 0 E[m 2 (t, Z(t))]dt < ∞ and certain condition on Z(t), we can have where c ij = T 0 ϕ jT (t)E[m(t, Z(t))Q i (t, Z(t))]dt. The convergence of (2.1) is in the sense of the norm in the Hilbert space . The following are two particular examples for such expansion.
Example 2.1. We consider orthogonal expansion of a Brownian motion functional, f (t, B(t)). Suppose that is the Hermite polynomial sequence orthogonal with respect to exp(−x 2 /2). Then Example 2.2. Let N(t) be a Poisson process with intensity parameter λ. We consider the orthogonal expansion of a Poisson process functional, f (t, N(t)). Suppose that Whence, the expansion of f (t, N(t)) is We shall come back to these expansions (2.2) and (2.4) in the simulation section.
The following study is divided into two categories according as t ∈ [0, T] with xed T, and t ∈ [0, T n ] with T n increasing with sample size n. We basically establish and discuss our results for the latter case, while we leave the discussion of the former case to Appendix F of the supplemental material.
We are very much interested in the scenario where the time variable lies in [0, T n ] and T n → ∞ as n → ∞. The relationship between T n and n is crucial for the following development. Both of them diverge to in nity. The divergence of T n , however, is negligible comparing with that of n, viz., T n n → 0 as n → ∞. The main reason is that the proposed method requires su cient information from the path of the process to accurately estimate the unknown function in the model. In technical terms, allowing T = T n → ∞ and T n n → 0 amounts to both in ll and long span asymptotics. Meanwhile, the two-fold limit theory avoids a possible involvement of the so-called aliasing problem (i.e., di erent continuoustime processes may be indistinguishable when sampled at discrete time). Under Assumption A.5 below, m(t, Z(t)) is expanded as (2.1). To estimate the regression function, the in nite series is truncated. Precisely, taking parameter k for i, p 0 , . . . , p k−1 for j, de ning p = (p 0 , . . . , p k−1 ), which are two residues a er the truncation. For the truncation parameters k, p i (i = 0, 1, . . . , k − 1) and time span T = T n , we make Assumption A.6 in Appendix A. Given the observation number n, one can choose T = T n according to Assumption A.6. Let us sample on [0, T n ] at equally spaced points: t s,n = T n s n (s = 1, . . . , n) for model (1.1). Denote Y s,n = Y(t s,n ), Z s,n = Z(t s,n ), and e s = ε(t s,n ). Using (2.5), we have (2.8) To simplify the notation, for any t, denote B ℓ (t, Z(t)) = ϕ jT n (t)Q i (t, Z(t)) and b ℓ = c ij for ℓ = 1, . . . ,p (p = p 0 + · · · + p k−1 ) with the following corresponding relationship ℓ ↔ Thus, we can rephrase (2.8) in matrix form as . . , δ p (t n,n , Z n,n )) ′ as a n-dimensional vector, γ = (γ p (t 1,n , Z 1,n ), . . . , γ p (t n,n , Z n,n )) ′ as a n-dimensional vector, X = x ′ 1 , . . . , x ′ n ′ as a n × p matrix, x 1 = (B 1 (t 1,n , Z 1,n ), . . . , B p (t 1,n , Z 1,n )), · · · , x n = (B 1 (t n,n , Z n,n ), . . . , B p (t n,n , Z n,n )), and ε = (e 1 , . . . , e n ) ′ . The ordinary least squares (OLS) estimator of β is given by , as an estimate of m(τ , x) for any xed τ > 0 and xed x on the path of Z(τ ). m(τ , x) is generated from the expansion of m(τ , x) by replacing β by β and removing all residues. The di erence between m(τ , x) and where δ(τ , x) and γ (τ , x) are de ned similarly by (2.6) and (2.7). In order to establish an asymptotic distribution for (2.11), we develop in the next section some asymptotic properties for partial sums of several classes of functionals of general processes.

Asymptotic theory
We shall establish an asymptotic theory for both the sample mean and sample covariance of two general classes of functionals F(t, x) (t > 0 and x ∈ R), denoted by T (HI) and T (HH). This will help us in the next section to derive the limit distribution of the estimator proposed in Section 2. Such results extend the existing literature and are applicable beyond the setup of this article, and hence they are of general interest on their own right.

Asymptotic time-homogeneous and integrable functionals
The limits of two basic forms [nr] s=1 f s n , c n x s,n and [nr] s=1 f s n , c n x s,n e s are provided in Appendix B under Assumptions A.1-A.3. However, since the interested quantities in practice are n s=1 F(s, c n x s,n ) and n s=1 F(s, c n x s,n )e s , we need to normalize the time variable involved in a functional. Towards this end, we introduce the following de nition.
De nition 3.1. Let F(s, x) be de ned on s ≥ 0 and x ∈ R. Suppose for every x ∈ R, any η > 0, and t ∈ [0, 1], where the following statements hold: , which is bounded on [0,1], and Q(y) is bounded on any compact interval with lim y→+∞ Q(y) = 0, and P(x) and P 2 (x) are Lebesgue integrable.
Such functions F(s, x), asymptotically homogeneous with respect to s and integrable with respect to x, are said to be in Class (HI), denoted by T (HI). Note that υ and f are called homogeneity power and normal function, respectively. Functions like F(s, x) = α(s)P(x) or their combinations satisfy the de nition, where α(s) is a polynomial or power function and P(x) is integrable.
Theorem 3.1. Suppose that F(t, x) is in the class T (HI) with homogeneity power υ and normal function f . If Assumptions A.1 (a) and (c) and A.3 in Appendix A hold, we have for any c n → ∞, c n /n → 0, and r ∈ [0, 1], is the local-time process of Brownian motion W given by Assumption A.1 at the origin over time interval [0, t].
If, in addition, Assumption A.1(a) is replaced by Assumption A.1(b), then for any c n → ∞, n/c n → ∞, and r ∈ [0, 1], under the same probability space as de ned in Assumption A.1 (b).
Moreover, suppose that f 4 (t, x) satis es Assumption A.3 and that {e s } and {x s,n } satisfy Assumptions A.2 and A.1 (c). We have for n → ∞, c n → ∞, c n /n → 0, and r ∈ [0, 1], where G 2 (t) = f 2 (t, x)dx and N is a standard normal random variable independent of W.
The proof of Theorems 3.1, 3.2, and 4.1 are given in Appendix C.

Asymptotic homogeneous regular functionals
To entertain empirical application, this de nition is extended below to a more general form F(s, x) with (s, x) ∈ R + × R, referred to as asymptotical homogeneous regular functionals. Let T LB denote the class of locally bounded functions on R; let T 0 LB be the subclass of T LB in which the functions are exponentially bounded, i.e., P(x) = O(e c|x| ) for some c > 0; the class of bounded functions on R is denoted by T B , and T 0 B is the subclass of T B collecting all functions that are bounded and vanish at in nity, i.e., P(

De nition 3.2. We say that function
with positive functions A, a, P, q, b, B such that the following statements hold: If F(s, x) satis es De nition 3.2, we say that F(s, x) is in Class (HH), denoted by F(s, x) ∈ T (HH), and call f (t, x) the normal function of F(s, x), and υ 1 (·) and υ 2 (·) the homogeneity powers with respect to s and x, respectively. See veri cation in Appendix A below for discussion and examples.
Theorem 3.2. Let F(s, x) be in Class T (HH) with homogeneity powers υ 1 (·) and υ 2 (·) and normal function f (t, x). Let martingale di erence (e s , F n,s ) and x s,n satisfy Assumption A.2 in Appendix A. We then have where (U(r), W(r)) is the limit of (U n (r), W n (r)) for r ∈ [0, 1] in Assumption A.2.
Remark 3.1. Note that if F(s, x) reduces to a univariate function F(x), with c n = √ n, Eq. (3.5) becomes Theorem 5.3 of Park and Phillips (1999) and the rst part of Theorem 3.3 with singleton of Park and Phillips (2001). Equation (3.6) becomes the second part of Theorem 3.3 with singleton in Park and Phillips (2001).
In Section 4 below, we apply the asymptotic theory developed here to establish an asymptotic distribution for the estimator given in Section 2.

Asymptotic distribution of the estimator
Given the asymptotic theory in the preceding section, we are able to establish an asymptotic distribution for m(τ , x) − m(τ , x) in (2.11). To this end, put where · stands for the Euclidean norm of a vector and the notation in (2.11) is used. It is readily seen that the largest eigenvalue of B is λ max = 1 and all the other eigenvalues are zero. Let unit vector α be the le eigenvector of B pertaining to λ max = 1. Hence, we have α ′ B = α ′ and α = 1. Denote α ′ = α 00 , . . . , α k−1,p k−1 .
Assumption A.4 of Appendix A below imposes conditions on a double-index sequence that enable us to deal with the bias terms involved in Eq. (2.11). Suppose . .} by the following statements: Due to Riesz-Fischer theorem in Dudley (2003, p. 167), for two sequences S andS, there exist two functions, denoted by F(t, x) and G(t, x), such that In view of the de nitions of a ij andā ij as well as (4.2) and (4.3), we have the following expressions: where F ′ = ( F(t 1,n , Z 1,n ), . . . , F(t n,n , Z n,n )), G ′ = ( G(t 1,n , Z 1,n ), . . . , G(t n,n , Z n,n )), δ ′ = ( δ 1 , . . . , δ n ), with δ s = δ p (t s,n , Z s,n ) = k−1 i=0 ∞ j=p i a ij ϕ jT n (t s,n )Q i (t s,n , Z s,n ), (1)]. These transformations are introduced because we are working on a centralized version of the underlying process. Note that by the stationarity of the increments and in nite divisibility of Lévy processes, for every s, E[Z s,n ] = t s,n µ with µ = E[Z(1)] and Var(Z s,n − Z s−1,n ) = T n n σ 2 z , where , which form an independent and identically distributed (i.i.d.) (0,1) sequence. Denote Hence, Z s,n − E[Z s,n ] = √ T n σ z x s,n . Observe that, by virtue of the functional central limit theorem, x s,n converges in distribution to a Brownian motion W(r) on [0, 1] as n → ∞. Moreover, x s,n , along with d l,k,n = (l − k)/n, satis es Assumption A.1(a) and (c) in Appendix A, and A.1(b) can also be achieved by Skorohod representation theorem.
where f (t, x) is the normal function of F(t, x) given by Assumption A.7(a), W is a standard Brownian motion on [0, 1], N is a standard normal variable independent of W, and L W is the local-time process of W. If Assumption A.7(b) is satis ed, then where f (t, x) is the normal function of F(t, x) given by Assumption A.7(b), the vector (U(r), W(r)) of Brownian motions is de ned by Assumption A.2.
Note that for the case where T n = T is xed, the limit of m is much simpler since we need not normalize T n = T involved. We therefore leave such discussion to Appendix F in the supplemental material.

Simulation experiments
This section shows Monte Carlo simulations for the estimation of unknown functionals of two particular Lévy processes, i.e., Brownian motion and Poisson process, via orthogonal expansion.
Example 5.1. Consider a nonlinear time series model of the form where m(·, ·) function is unknown, B(t) is a Brownian motion, ε(t) is an error process, and t ≥ 0. We shall focus on two situations about time variable t: (1) t ∈ [0, T] and (2) t ∈ [0, T n ], where n is the sample size. As shown in Example 2.1, we can expand m(t, B(t)) into an orthogonal series of the form We are going to estimate m function via estimating the coe cients c ij in its expansion based on the observations (y t , B t ) at sample points. More precisely, let n be the sample size, and t s = s n T (s = 1, . . . , n) be the sample points equally spaced in [0, T]. Let I = [a · n κ 1 ] and J = [b · n κ 2 ] be the truncation parameters for i and j in the double summation, respectively, of m function expansion, where 0 < κ 1 , κ 2 < 1. Therefore, where y t s = Y(t s ), e t s = ε(t s ), and δ(·, ·) and γ (·, ·) are the residues a er truncation. The equations, like (2.9), can be written in matrix form Y = Xβ + δ + γ + e and hence the estimator β of β by OLS is β = (X ′ X) −1 X ′ Y, corresponding to the simulated values B(t s ) ∼ N(0, t s ), e ∼ i.i.d.N(0, 1) and y t s , s = 1, . . . , n. Whence where M is the number of Monte Carlo replications, m ℓ (·, ·) is the ℓth simulation value of m(·, ·) and m(·, ·) is the mean over all Monte Carlo simulations.
To generate data for simulation, we use m(t, x) = √ t sin(x 2 ) and T = 2. The Monte Carlo simulation results, along with all parameters used in the simulation, are reported in Table 1.
For the second case that t ∈ [0, T n ], we do the same simulation as in the rst case but with replacement of T by T n = n κ 3 and κ 3 = 0.16. To generate data, we use m(t, x) = 1 1+x 2 , and the results are reported in Table 2.
As can be seen from Tables 1 and 2, both the bias and the estimated variance perform very well in the sense that the bias remains at a quite low level and moves toward zero, and the variance decreases gradually as the sample size increases. Two cases where T is xed and T = T n → ∞ have similar nite-sample properties.

Example 5.2. Consider another nonlinear time series model of the form
where the form of function m(·, ·) is unknown, N(t) is a Poisson process with intensity λ, and ε(t) is an error process.
For t ∈ [0, T] with xed T, as shown in Example 2.2, the expansion of m(t, N(t)) is Like what we did in the last example, we shall estimate m function via estimating the coe cients in its expansion. More precisely, let n be the sample size, and t s = s n T (s = 1, . . . , n) be the sample points equally spaced in [0, T]. Let I = [an κ 1 ] and J = [bn κ 2 ] be truncation parameters for i and j in the double summation, respectively, of m function expansion, where 0 < κ 1 , κ 2 < 1. Therefore, where y t s = Y(t s ) and e t s = ε(t s ), γ (·, ·) is the residue a er truncation. Similarly, we have the estimator β of β by OLS β = (X ′ X) −1 X ′ Y, corresponding to the simulated observation values N(t s ) ∼ Poi(t s ),  e ∼ i.i.d.N(0, 1) and the generated y t s , s = 1, . . . , n. Whence, we have where c ij are the entries of β.
To generate data for simulation, we let m(t, x) = t + sin(x), T = 2, and λ = 1. Bias and variance are calculated using formulae (5.3) and (5.4), respectively. The results, along with the parameters used in the simulation, are reported in Table 3.
For the case of t ∈ [0, T n ], letting T n = n κ 3 and κ 3 = 1 6 , we do the same as the rst case but with replacement of T by T n and for m(t, x) = √ t 1 1+x 2 . The results are reported in Table 4. It is readily seen that in Tables 3 and 4 all the biases remain at quite a low level, while all the variances decrease gradually when the sample size increases. However, the variances in Table 4 are larger than that in Table 3, possibly because of the increase of the interval in the second case. This also happens in Example 5.1. Comparing the results in Tables 1 and 3, the estimates perform very similarly, indicating that the estimation is not sensitive to the choice of the regressor. Meanwhile, when T = T n such that the interval is getting lager with the sample size, the estimates of Brownian functional and Poisson process functional have similar nite-sample behavior, although all the variances in Table 4 are slightly larger than those in Table 2. This may be due to the factor √ t in m function for the Poisson functional, since in Table 2 the regression functions is 1 1+x 2 , while in Table 4 the regression functions is √ t 1 1+x 2 .

Conclusion
The estimation of the regression function in the proposed nonstationary models has been given via an orthogonal series expansion method. The main characteristics and contributions of this article include that the regression function can include an explicit time variable, which impels us to explore the existence of such polynomial system involving time variable and, therefore, expand the regression function using the tensor product basis in some Hilbert space. The asymptotic distributions of the proposed estimators have been established. The Monte Carlo experiments have shown that both the approximation of an orthogonal series to the regression function and the resulting estimates are accurate in both the continuous-type and the discrete-type distributions that the regressor admits, as speci ed as Brownian motion and Poisson process in the experiments, respectively. It is noteworthy that the proposed method may have an advantage over its natural counterpart, kernel estimation method, because, as pointed by the work, such as Gao and Phillips (2013) and Phillips et al. (2013), the kernel method may not be workable in the case where a time variable is incorporated in the regression function.

Appendix A: Assumptions and veri cation
This section gives Assumptions A.1-A.3 for the establishment of the asymptotic theory in Section 3, which may be used beyond the scope of this article, and Assumptions A.4-A.7 for the establishment of the asymptotic distribution of the estimator in Section 4.
Given a triangular array x s,n (x 0,n = 0 by de nition), 1 ≤ s ≤ n, constructed from some underlying time series, we assume that x [nr],n (0 ≤ r ≤ 1) converges in distribution to a stochastic process W(r) on D[0, 1] with respect to the Skorohod topology, where D[0, 1] stands for the space of real-valued functions that are right continuous with le limits. This setup for {x s,n } is extensively used in the literature. See, for example, Phillips (1987), Phillips (1999, 2001), Wang and Phillips (2009a), and Gao and Phillips (2013). We now state the following assumption on x s,n .
there exist a sequence of positive constants d l,k,n and a sequence of σ -elds F n,k , where F n,0 = {∅, }, such that the following statements hold: (i) For some m 0 > 0 and C > 0, inf (l,k)∈ n (ǫ) d l,k,n ≥ ǫ m 0 /C; for any xed l > 0, nd l,0,n → ∞ as n → ∞, If x k,n are discrete variables, conditional on F n,k , (x l,n −x k,n )/d l,k,n has a probability distribution P l,k,n (x) and its distribution function F l,k,n (x) satis es lim δ→0 lim n→∞ sup (l,k)∈ n (δ 1/(2m 0 ) ) sup |u|<δ |F l,k,n (u) − F l,k,n (0)| = 0. (A.5) Remark A.1. As we are concerned with both continuous and discrete variables in (ii) of (c), Assumption A.1 has considerably extended the conditions assumed for the univariate function case in Wang and Phillips (2009a) as a subclass. Note that Assumption A.1 is quite weak, which contains a variety of processes as discussed in the literature. Particularly, we remark that this situation accommodates discrete observations of any Lévy process.
The following assumption stipulates some necessary conditions for {x s,n } and error sequence {e s }.
s=1 e s and W n (r) = x [nr],n . Suppose that (U n , W n ) converges in distribution to (U, W) in D[0, 1] 2 as n → ∞, where (U, W) is a correlated Brownian motion vector.
Remark A.2. Assumption A.2 is quite general and applicable in many situations as it has already been used in many studies, such as Phillips (1999, 2001) and Wang and Phillips (2009a). With condition on ε(t) in model (1.1) and F n,s = F s , the condition (a) is satis ed automatically by e s = ε(s) at all sampling points s.
x) is continuous in t, and there are at most a nite number of points for t at which f (t, x)dx = 0. Wang and Phillips (2009a). Requirement on integrability of functions is a basic need to deal with such kind of problems. Note that if f (t, x) = f (x) becomes time-homogeneous, Condition (a) reduces to Assumption 2.1 in Wang and Phillips (2009a). Condition (b) requires that the function f (t, x) is dominated uniformly in t over compact interval [0, 1] by an integrable function c f (x). In the case where f (t, x) is the product of a continuous function of t and an integrable function of x or the superposition of such products, the condition is automatically ful lled. Condition (c) excludes the situation where there are in nite many points t j ∈ [0, 1] such that f (t j , x)dx = 0. Thus, the measure of the set consisting of all such t j is always zero.

Remark A.3. Condition (a) is an extension of Assumption 2.1 in
We need the following assumptions on the regression function in model (1.1) and truncation parameters to ensure that m(t, x) can be expanded and consistently estimated.
Remark A.4. Note that Conditions (a) and (b) are both needed and they cannot substitute each other. Obviously, if there are some ǫ > 0 and η > 0 such that a ij = O 1 (1+j) 2+ǫ (1+i) 1+η for i, j ≥ 0, both conditions are ful lled. This assumption is used to tackle the bias term of the estimator.
In the following, operator D stands for either di erentiation or di erence with respect to x only.
(c) For i large enough, coe cient functions c i (t, D 3 m) of D 3 m(t, Z(t)) expanded by the system {Q 3i (t, Z(t))} are such that ψ(t) 3 c 2 i (t, D 3 m) are bounded uniformly in i, where function ψ(t) > 0 is determined by Z(t) given by Remark E.1 in the supplement. T] are bounded uniformly in i.
Remark A.5. Condition (a) gives us the possibility of expanding the functional m(t, z) into an orthogonal series and ensures the convergence for such expansion due to the smoothness. Condition (b) is mild and only requires su cient smoothness for the coe cient functionals. De nitely the condition would be achieved if we impose a stronger condition on m and ρ(t, x) function. While Condition (c) looks strong, it can be ful lled by many functions and processes. For processes such as Poisson processes and Gamma processes, ψ(t) is actually a constant independent of t, so the condition is veri ed easily; even when ψ(t) is a power function of t, in expansions of functionals such as t η sin x, t η cos x, t η e −ct P(x) with η ≥ 2, c > 0, and P(x) being any polynomial of xed degree, c i (t, m) include exponential function e −αt (α > 0) so that the condition is also satis ed. , and T n = [n κ 3 ], where 0 < κ i < 1 (i = 1, 2, 3) and κ 2 ≤κ 2 < 1. (b) Let κ 1 > 1 2 and 3κ 3 + κ 1 + 1 < 3κ 2 .
Remark A.6. Condition (a) actually connects all the truncation parameters and the sampling interval with the number of observations. Thus, with the increase of the sample size, the truncation series converges to the m function. Condition (b) gives a relationship among κ i such that all biases in the estimation will be o set. It looks untidy but is the minimum requirement to make the residues negligible. Feasible solutions for κ i (i = 1, 2, 3) do exist. For instance, κ 1 = 0.55, κ 2 = 0.67, and κ 3 = 0.15.
Remark A.7. Note that the conditions in (a) and (b) are untidy since we would like to show the original requirement for the parameters.

Veri cation on De nition 3.2. (a) In practice, o en one of the two dominated terms of R appears. (b)
There are many functions that have asymptotic homogeneity.

Appendix B. Crucial lemmas
Two crucial lemmas are stated in this section. They are of interest on their own right.

B.1 Time-normalized and integrable functionals
This section establishes a more general asymptotic theory than those available in the literature, such as Phillips (1999, 2001), Wang and Phillips (2009a), and Wang (2015). Let f (t, x) be de ned on where N is a standard normal random variable independent of W, and G 2 (t) = f 2 (t, x)dx.
The proof of Lemma B.1 is relegated to Appendix D in the supplementary le of the article.
Proof of Theorem 3.1. It follows from the de nition of the class T (HI) that c n nυ(n) [nr] s=1 F(s, c n x s,n ) = c n n [nr] s=1 f s n , c n x s,n + c n nυ(n) [nr] s=1 R n s n , c n x s,n := 1 + 2 .
By Lemma B.1, under (a) and (c) in Assumption A.1, we have 1 → D r 0 G 1 (t)dL W (t, 0), while under (b) and (c) in Assumption A.1, we have 1 → P r 0 G 1 (t)dL W (t, 0) uniformly in r. In order to complete (3.1) and (3.2), it thus su ces to prove that 2 → P 0 uniformly in r under Assumption A.1(c).
If F(t, x) is in the class T (HI 1 ) and q n (t)/υ(n) → 0 uniformly in t ∈ [0, 1] as n → ∞, for a given ǫ > 0, when n is large, 0 < q n (t)/υ(n) < ǫ for all t. We then have from Assumption A.  where K is the uniform upper bound of the densities h l,k,n (x). Thus, the desired result follows from (A.3) as n → ∞ and ǫ → 0. If F(t, x) is in the class T (HI 2 ), |R n s n , c n x s,n | ≤ q n (t)Q(nt)P(c n x s,n ) where lim n→∞ q n (t)/υ(n) = l(t) which is bounded on [0,1], P(x) is integrable, and Q(y) is bounded on any compact interval and lim y→+∞ Q(y) = 0. We have when n is large, q n (t)/υ(n) = l(t)(1 + o(1)), and for a given ǫ > 0, there exists s 0 > 0 such that 0 < Q(s) < ǫ whenever s > s 0 . Whence, It follows from Lemma B.1 that 3 → D r 0 G 3 (t)dL W (t, 0) 1 2 N as n → ∞. Hence, it is su cient to show 4 → P 0 to complete the proof.
The martingale di erence structure of (e s , F n,s ) and the adaptivity between e s and x s,n give E[ 4 ] 2 = σ 2 e c n n 1 υ(n) 2 [nr] s=1 E R 2 n s n , c n x s,n .
In the same fashion as in the proof of 2 → P 0, we can show that E[ 4 ] 2 → 0. This nishes the proof.
Proof of Theorem 3.2. It follows from the asymptotic homogeneity of the F function that 1 nυ 1 (n)υ 2 (c n ) n s=1 F(s, c n x s,n ) = 1 n n s=1 f s n , x s,n + 1 nυ 1 (n)υ 2 (c n ) n s=1 R(n, c n ; s, c n x s,n ).
Note that f (t, x) is regular and thus, by Lemma B.2, the rst part converges almost surely to 1 0 f (r, W(r))dr as n → ∞. In order to complete the proof of (3.5), it thus su ces to show that the second part converges to zero in probability.
To begin with the proof of (C.3), by the expression of δ,  (1)) n k 2 → 0 by Assumptions A.5, A.6, where A is the uniform bound of ψ(t) 3 c 2 i (t, D 3 m). As for the rest of (C.3), it follows from the expression of δ s and the orthogonality of the basis that = o(1) υ(T n ) 2 n κ 1 −2κ 2 −κ 3 /2 → 0, as n → ∞ by Assumptions A.4 and A.6.