Inference for Time Series Regression Models With Weakly Dependent and Heteroscedastic Errors

Motivated by the need to assess the significance of the trend in some macroeconomic series, this article considers inference of a parameter in parametric trend functions when the errors exhibit certain degrees of nonstationarity with changing unconditional variances. We adopt the recently developed self-normalized approach to avoid the difficulty involved in the estimation of the asymptotic variance of the ordinary least-square estimator. The limiting distribution of the self-normalized quantity is nonpivotal but can be consistently approximated by using the wild bootstrap, which is not consistent in general without studentization. Numerical simulation demonstrates favorable coverage properties of the proposed method in comparison with alternative ones. The U.S. nominal wages series is analyzed to illustrate the finite sample performance. Some technical details are included in the online supplemental material.


INTRODUCTION
Consider the simple linear trend model, X t,n = b 1 + b 2 (t/n) + u t,n , t = 1, . . . , n. (1) Inference of the linear trend coefficient b 2 is an important problem in many fields such as econometrics, statistics, and environmental sciences, and there is a rich literature. Hamilton (1994, chap. 16) derived the limiting distribution of the ordinary least-square (OLS) estimators of b 1 and b 2 when the errors are independent and identically distributed (iid) from a normal distribution. To account for possible dependence in the error, {u t,n } has often been assumed to be a stationary weakly dependent process. For example, Sherman (1997) applied the subsampling approach when the errors are from a stationary mixing process, and Zhou and Shao (2013) considered the M-estimation with stationary errors whose weak dependence was characterized by physical dependence measures (Wu 2005). Another popular way in econometrics is to model the dependence in the error as an autoregressive (AR) process, and testing procedures that are robust to serial correlation with possible unit root have been developed; see Canjels and Watson (1997), Vogelsang (1998), Bunzel and Vogelsang (2005), and Harvey, Leybourne, and Taylor (2007), among others. However, there is increasing evidence that many macroeconomic series exhibit heteroscedastic behavior. For example, the U.S. gross domestic product series has been observed to have less variability since 1980s; see Kim and Nelson (1999), McConnell and Perez-Quiros (2000), Busetti and Taylor (2003), and references therein. In addition, Sensier and van Dijk (2004) argued that majority of the macroeconomic data in Stock and Watson (1999) had abrupt changes in unconditional variances. The empirical evidence of weakly dependent and heteroscedastic errors can also be found from environmental time series; see Rao (2004), Zhou and Wu (2009), and Zhang and Wu (2011) for their data illustrations using global temperature series. There have been a number of articles on the inference of weakly dependent time series models with heteroscedastic errors. For example, Xu and Phillips (2008) studied the inference on the AR coefficients when the innovations are heteroscedastic martingale differences. Xu (2008) focused on inference of polynomial trends with errors exhibiting nonstationary volatility. Zhao (2011) and Zhao and Li (2013) explored inferences on the constant mean in the presence of modulated stationary errors. Xu (2012) presented a statistic with a pivotal limiting distribution for the multivariate trend model when the error process is generated from a stationary vector AR process with heteroscedastic innovations.
Our framework generalizes the model (1) in two nontrivial aspects. For one, our trend function is a finite linear combination of deterministic trends, each of which can contain finite number of breaks at known points. Most studies in the literature are restricted to the linear or polynomial trend, but allowing for break in intercept can be useful in practice. For example, if X t,n = b 1 + b 2 1(t > t B ) + b 3 (t/n) + u t,n , that is, a linear trend model with a jump at a known intervention time t B , then testing the significance of b 2 is of interest in assessing the significance of the jump at the intervention in the presence of a linear trend. This kind of one break model was considered in the analysis of nominal wage series by Perron (1989), and its inference is reinvestigated in Section 5. For the other, we allow two types of nonstationary processes for the errors u t,n . One type of non-stationary process is adapted from Cavaliere and Taylor (2007, 2008a, 2008b and Xu and Phillips (2008). For this class of models, the error process is represented as a linear process of independent but heteroscedastic innovations. This includes autoregressive moving average (ARMA) models with independent but heteroscedastic innovations as a special case. The other type of nonstationary process is the modulated stationary processes (Zhao 2011;Zhao and Li 2013), where a stationary process is amplified by a (possibly periodic) deterministic function that only depends on the relative location of an observation. The latter is a type of locally stationary processes (Priestley 1965;Dahlhaus 1997), as pointed out in Zhao and Li (2013). Together, they cover a wide class of nonstationary but weakly dependent models with unconditional heteroscedasticity.
When the errors are nonstationary with unconditional heteroscedasticity, the difficulty rises due to the complex form of the asymptotic variance of the OLS estimator. In this article, we apply the wild bootstrap (WB) (Wu 1986) to approximate the limiting behavior of the OLS estimator. However, simply applying the wild bootstrap to the unstudentized OLS estimator does not work because it cannot properly capture the terms in the asymptotic variance that are due to temporal dependence in the error. To overcome this difficulty, we propose to adopt the self-normalized (SN) method (Lobato 2001;Shao 2010), which was mainly developed for stationary time series. Due to the heteroscedasticity and the nonconstant regressor, the limiting distribution of the SN-based quantity is nonpivotal, unlike the existing SN-based methods. The advantage of using the SN method, though, is that the limiting distribution of the studentized OLS estimator only depends on the unknown heteroscedasticity, which can be captured by the wild bootstrap. Compared to the conventional heteroscedasticity and autocorrelation consistent (HAC) method or block-based bootstrap methods that are known to handle mildly nonstationary errors, the use of the SN method and the wild bootstrap are practically convenient and are shown to lead to more accurate inference.
To summarize, we provide an inference procedure of trends in time series that is robust to smooth and abrupt changes in unconditional variances of the temporally dependent error process, in a quite general framework. Our assumptions on deterministic trends and weakly dependent heteroscedastic errors are less restrictive than those from most earlier works in the trend assessment literature. Besides, our inference procedure is convenient to implement. Although our method is not entirely bandwidthfree, the finite-sample performance is not overly sensitive to the choice of the trimming parameter , and its choice is accounted for in the first-order limiting distribution and the bootstrap approximation. The work can be regarded as the first extension of the SN method to the regression model with weakly dependent and heteroscedastic errors. One of the key theoretical contributions of the present article is to derive the functional central limit theorem of the recursive OLS estimator, which leads to the weak convergence of the SN-based quantity. At a methodological level, the self-normalization coupled with the wild bootstrap eliminates the need to directly estimate the temporal dependence in the error process. Although some earlier works such as Cavaliere and Taylor (2008b) and Zhao and Li (2013) applied the wild bootstrap for similar error processes, it seems that they need to involve the choice of a block size or consistent estima-tion of a nuisance parameter to make the wild bootstrap work. In our simulation, our method is demonstrated to deliver more accurate finite-sample coverage compared to alternative methods, and thus inherits a key property of the SN approach that was known to hold only for stationary time series (Shao 2010). Recently, Zhou and Shao (2013) extended the SN method to the regression setting with fixed parametric regressors, where the error is stationary and the response variable is only nonstationary in mean. The framework here is considerably more general in that the error is allowed to exhibit local stationarity or unconditional heteroscedasticity and the response variable is nonstationary at all orders. On the other hand, we only consider the inference based on the ordinary least-square (OLS) estimator, whereas the M-estimation was considered in Zhou and Shao (2013).
The rest of the article is organized as follows. Section 2 presents the model, the OLS estimation, and the limiting distribution of the OLS estimator. Section 3 contains a functional central limit theorem for the recursive OLS estimators, which leads to the limiting distribution of the SN-based quantity. The wild bootstrap is described and its consistency for the SN quantity is justified in Section 3. Section 4 presents some simulation results and Section 5 contains an application of our method to the U.S. nominal wage data. Section 6 concludes. Technical details are relegated to a supplemental online appendix.
Throughout the article, we use D −→ for convergence in distribution and ⇒ weak convergence in D[ , 1] for some ∈ (0, 1), the space of functions on [ ,1], which are right continuous and have left limits, endowed with Skorohod metric (Billingsley 1968). The symbols O p (1) and o p (1) signify being bounded in probability and convergence to zero in probability, respectively. If O p (1) and o p (1) are used for matrices, they mean elementwise boundedness and convergence to zero in probability. We use a to denote the integer part of a ∈ R, B(·) a standard Brownian motion, and N (μ, ) the (multivariate) normal distribution with mean μ and covariance matrix . Denote by ij ) 1/2 be the Frobenius norm. Let X n = (X 1,n , . . . , X n,n ) denote the data.

Consider the model
where X t,n is a univariate time series, n is the sample size, β = (b 1 , . . . , b p ) is the parameter of interest, and the regressors F t,n = {f 1 (t/n), . . . , f p (t/n)} are nonrandom. The regressors {f j (t/n)} p j =1 are rescaled, thus increasing the sample size n means we have more data in a local window. We use F (s) = {f 1 (s), . . . , f p (s)} , s ∈ [0, 1] to denote the regressors with relative location s, and we observe F t,n = F (t/n). Let N = n − p + 1, where p is fixed.
The error process {u t,n } is assumed to be either one of the followings: (A1) Generalized Linear Process. The error process is defined as where ε t,n = ω t,n e t with e t iid (0,1), C(L) = ∞ j =0 c j L j , and L is the lag operator. We further assume (i) Let ω(s) = lim n→∞ ω ns ,n be some deterministic, positive (cadlag) function on s ∈ [0, 1]. Let ω(s) be piecewise Lipschitz continuous with at most finite number of breaks. If t < 0, let ω t,n < ω * for some (A2) Modulated Stationary Process. The error process is defined as where {η t } is a mean zero strictly stationary process that can be expressed as η t = G(F t ) for some measurable function G and F t = (. . . , t−1 , t ), where t are iid (0,1). We further assume (i) Let ν(s) = lim n→∞ ν ns ,n be some deterministic, positive (cadlag) function on s ∈ [0, 1]. Let ν(s) be piecewise Lipschitz continuous with at most finite number of breaks.
The key feature of the two settings above is that they allow both smooth and abrupt changes in the unconditional variance and second-order properties of u t,n through ω(s) in (A1) (i) and ν(s) in (A2) (i), which depend only on the relative location s = t/n in a deterministic fashion. If this part should be constant, then the two models correspond to the popular stationary models; the linear process for (A1) and stationary causal process (Wu 2005) for (A2). The model (A1) is considered in, for example, Cavaliere (2005), Cavaliere and Taylor (2007, 2008a, 2008b, and Xu and Phillips (2008). The assumptions (A1) (ii) is popular in the linear process literature to ensure the central limit theorem and the invariance principle, and (A1) (iii) implies the existence of the fourth moment of {u t,n }. The model (A2) is adapted from Zhao (2011) and Zhao and Li (2013) and is called the modulated stationary process, which was originally developed to account for the seasonal change, or periodicity, observed in financial or environmental data. The condition (A2) (ii) is slightly stronger than the existence of the fourth moment, which is required in the proof of the bootstrap consistency. (A2) (iii) implies that {u t,n } is short-range dependent and the dependence decays exponentially fast. These two classes of models are both nonstationary and nonparametric, which represent extensions of stationary linear/nonlinear processes to nonstationary weakly dependent processes with unconditional heteroscedasticity.
Remark 2.1. Note that the model (A1) is a special case of Xu and Phillips (2008), but it can be made as general as the framework of Xu and Phillips (2008), by letting e t be a martingale difference sequence with its natural filtration E t satisfying n −1 n t=1 E(e 2 t |E t−1 ) → C < ∞ for some positive constant C, rather than an iid sequence. See Remark 1 of Cavaliere and Taylor (2008b). In this article, we do not pursue this generalization only for the simplicity in the proofs. For (A2), our framework for the modulation, ν(s) is more general than that of Zhao (2011). For example, the so-called "k-block asymptotically equal cumulative variance condition" in Definition 1 in Zhao (2011) rules out a linear trend in the unconditional variance. The model (A2) can be replaced with some mixing conditions for the stationary part, but the details are omitted for simplicity. It is worthwhile to stress that both (A1) and (A2) are based on already popular and widely used stationary models, and they extend those stationary models by incorporating heteroscedasticity in a natural way.
In this article, we are interested in the inference of β = (b 1 , . . . , b p ) , based on the OLS estimator β N = ( n t=1 F t,n F t,n ) −1 ( n t=1 F t,n X t,n ). If the errors satisfy (A1) or (A2) and the fixed regressors F (s) are piecewise Lipschitz continuous, then under certain regularity conditions, the OLS estimator is approximately normally distributed, that is, where the covariance matrix has the sandwich form The statement (2) is a direct consequence of Theorem 3.1. Notice that Q 1 can be consistently estimated by n −1 n t=1 F t,n F t,n , but V depends on the nuisance parameters C(1), , and {ω(s), ν(s); s ∈ [0, 1]}. To perform hypothesis testing or construct a confidence region for β, the conventional approach is to consistently estimate the unknown matrix V using, for example, the HAC estimator in Andrews (1991). The HAC estimator involves a bandwidth parameter, and the finite sample performance critically depends on the choice of this bandwidth parameter. Moreover, the consistency of the HAC estimator is shown under the assumption that the errors are stationary or approximately stationary (Newey and West 1987;Andrews 1991). Alternatively, block-based resampling approaches (Lahiri 2003) such as the moving block bootstrap and subsampling (Politis, Romano, and Wolf 1999) are quite popular to deal with the dependence in the error process, see Fitzenberger (1998), Sherman (1997, and Romano and Wolf (2006) among others. However, these methods are designed for stationary processes with different blocks having the same or approximately the same stochastic property. Note that Paparoditis and Politis (2002) proposed the so-called local block bootstrap for the inference of the mean with locally stationary errors. Their method involves two tuning parameters and no guidance on their choice seems provided.
There have been recently proposed alternative methods to the HAC-based inference for dynamic linear regression models such as Kiefer, Vogelsang, and Bunzel (2000;KVB, hereafter). KVB's approach is closely related to the SN method by Lobato (2001) and Shao (2010). The KVB and SN methods share the same idea that by using an inconsistent estimator of asymptotic variance, a bandwidth parameter can be avoided. In finite samples this strategy is shown to achieve better coverage and size compared to the conventional HAC-based inference. In a recent article by Rho and Shao (2013), the KVB method is shown to differ from the SN method in the regression setting: the KVB method applies an inconsistent recursive estimation of the "meat" part V of the covariance matrix in (2) and consistently estimates the "bread" part Q 1 , whereas the SN method involves the inconsistent recursive estimation for the whole covariance matrix . Although the KVB and SN methods have the same limiting distribution, Rho and Shao (2013) had shown in their simulations that the SN method tends to have better finite sample performance than the KVB method. For this reason, we adopt the SN method for our problem.
To apply the SN method, we need to define an inconsistent estimate of based upon recursive estimates of β. Consider the OLS estimator β t = β t,N of β using the first t Here, N ( ) is an estimate of , but unlike the HAC method, the major role of this self-normalizer is to remove nuisance parameters in the limiting distribution of the SN quantity T N , rather than as a consistent estimator of the true asymptotic variance . With stationary errors, the same self-normalizer does this job well, providing pivotal limiting distributions for T N . However, under our framework of nonstationary errors, the self-normalizer N ( ) cannot completely get rid of the nuisance parameter in the limiting distribution of T N , as seen from the next section.

INFERENCE
For the regressor F t,n , we impose the following assumptions.
tinuous with at most finite number of breaks.
The assumptions (R1) and (R2) are satisfied by commonly used trend functions such as f j (s) = s τ for some nonnegative integer τ , f j (s) = sin(2πs) or cos(2πs). The assumption inf r∈[ ,1] det(Q r ) > 0 in (R1) excludes the collinearity of fixed regressors and is basically equivalent to the assumption (C4) in Zhou and Shao (2013), and the trimming parameter ∈ (0, 1) has to be chosen appropriately. The assumption (R1) appears slightly weaker than Kiefer, Vogelsang, and Bunzel's (2000) Assumption 2, nr −1 nr t=1 F t,n F t,n = Q + o p (1) for some p × p matrix Q, which is identical for all r ∈ (0, 1]. Although, in general, Kiefer, Vogelsang, and Bunzel (2000) allowed for dynamic stationary regressors whereas our framework targets fixed regressors, our method can be extended to allow for dynamic regressors by conditioning on the regressors. Since our work is motivated by the trend assessment of macroeconomic series, it seems natural to assume the regressor is fixed. The assumption (R2) allows for structural breaks in trend functions.
(i) Let the error process {u t,n } be generated from (A1). The recursive OLS estimator β Nr of β converges weakly, that is, . It follows from the continuous mapping theorem that Remark 3.1. For a technical reason, the above functional central limit theorem has been proved on D[ , 1] for some ∈ (0, 1) instead of on D[0, 1]. For most of the trend functions that include polynomial s τ with τ > 1/2 or sin(2πs), this has to be strictly greater than 0. However, for some trend functions, can be set as 0. In particular, when the trend function is a constant mean function, then Q r = 1 for r ∈ (0, 1], B F (r) = r 0 dB(s) = B(r), and B F,ν (r) = r 0 ν(s)dB(s). Our proof goes through when = 0. In any case, the choice of trimming parameter is captured by the first-order limiting distribution in the same spirit of the fixed-b approach (Kiefer and Vogelsang 2005).
Remark 3.2. If the trend function has breaks, the trimming parameter has to be rather carefully chosen so that the condition (R1) can be satisfied. For example, if we use a linear trend function with a jump in the mean, that is, However, as long as we choose > t B /n so that Q r is invertible, the choice of trimming parameter does not seem to affect the finite sample performance much, as we can see in the simulation reported in Section 4.3.
The limiting distribution L F in Theorem 3.1 is not pivotal due to the heteroscedasticity of the error process and nonconstant nature of Q r . If the error process is stationary, then ω(s) ≡ ω in (A1) and ν(s) ≡ ν in (A2) would be canceled out, and the only unknown part in the limiting distribution is Q r . If Q r = Q, r ∈ [ , 1] for some p × p matrix Q, then Q would be canceled out in the limiting distribution, which occurs in the use of the SN method for stationary time series with a constant regressor; see Lobato (2001) and Shao (2010). In our case, the part that reflects the contribution from the temporal dependence, C(1) or , does cancel out. However, the heteroscedasticity remains even if Q r = Q, and the limiting distribution still depends on the unknown nuisance parameter {ω(s), ν(s); s ∈ [0, 1]}.
Estimating unknown parameters in L F seems quite challenging due to the estimation of ω(s), ν(s), and integral of them over a Brownian motion. Instead of directly estimating the unknown parameters, we approximate the limiting distribution L F using the wild bootstrap introduced in Wu (1986). Although there has been some recent attempts to use the wild bootstrap to the trend assessment with nonstationary errors (e.g., Xu (2012) for the linear trend model and Zhao and Li (2013) for the constant mean model), this article seems to be the first to apply the wild bootstrap to the SN-based quantity. The main reason the wild bootstrap works for the SN method in such settings is that the part due to the temporal dependence, which is difficult to capture with the wild bootstrap, no longer exist in the limiting distribution L F of the SN-based quantity so that L F can be consistently approximated by the wild bootstrap. In contrast, without the self-normalization, the unknown nuisance parameter C(1) or still remains in the limiting distribution, and thus, the wild bootstrap is not consistent in this case.
Let u t,n = X t,n − F t,n β N denote the residuals. We first generate the resampled residuals u * t,n = u t,n W t , where W t is a sequence of external variables. Then we generate the bootstrap responses as Based on the bootstrap sample (F t,n , X * t,n ) n t=1 , we obtain the OLS estimator β * N = n t=1 F t,n F t,n −1 n t=1 F t,n X * t,n . Then the sampling distribution of N 1/2 ( β N − β) is approximated by the conditional distribution of N 1/2 ( β * N − β N ) given the data X n , and L F can be approximated by its bootstrap counterpart. We assume t=1 are independent of the data, and E(W 4 1 ) < ∞. In practice, we can sample {W t } n t=1 from the standard normal distribution with mean zero or use the distribution recommended by Mammen (1993). Notice that there are no user-determined parameters, which makes the wild bootstrap convenient to implement in practice. Let F t,n X * t,n be the OLS estimate using the first Nr + p − 1 of the bootstrapped sample (F t,n , X * t,n ) n t=1 . The following theorem states the consistency of the wild bootstrap. (i) If {u t,n } is generated from (A1), we have iii) It follows from the continuous mapping theorem that Remark 3.3. Notice that the wild bootstrap cannot successfully replicate the original distribution without the normalization, unless C 2 (1) = D(1) or = 1, because the wild bootstrap cannot capture the temporal dependence of the original error series. However, in the limiting distribution of N −1/2 Nr ( β Nr − β), the part that reflects the long run dependence, C(1) or , can be separated from the rest, for both data-generating processes (A1) and (A2). This property makes it possible to construct a self-normalized quantity, whose limiting distribution depends only on the heteroscedastic part (captured by the wild bootstrap), not on the temporal dependence part.
Remark 3.4. Hypothesis testing can be conducted following the argument in Zhou and Shao (2013). Let the null hypothesis be H 0 : , and the test can be formed by using the bootstrapped critical values, owing to the fact that in probability under the assumptions of Theorem 3.2. For example, if we let k = 1, λ = 0, and R be a vector of 0's except for the jth element being 1, we can construct a confidence interval or test the significance of individual regression coefficient b j . Let β t,j be the jth element of β t . The 100 N,j is the jth element of the OLS estimate of β using the ith bootstrapped sample, and B is the number of bootstrap replications.
Remark 3.5. Our method is developed in the same spirit of Xu (2012), who focused on the multivariate trend inference under the assumption that the trend is linear and errors follow a VAR(p) model with heteroscedastic innovations. In particular, Xu's Class 4 test used the studentizer first proposed in Kiefer, Vogelsang, and Bunzel (2000) and the wild bootstrap to capture the heteroscedasticity in the innovations. Since the KVB method and the SN method are closely related (see Rho and Shao 2013 for a detailed comparison), it seems that there are some overlap between our work and Xu (2012). However, our work differs from Xu's in at least three important aspects.
1. The settings in these two articles are different. Xu (2012) allowed for multivariate time series and is interested in the inference of linear trends, whereas ours is restricted to univariate time series. Our assumption on the form of the trend function and the error structure are considerably more general. In particular, we allow for more complex trend functions as long as it is known up to a finite-dimensional parameter. In addition, the AR(p) model with heteroscedastic innovation used by Xu (2012) is a special case of our (A1), that is, linear process with heteroscedastic innovations. Also Xu's method does not seem directly applicable to the modulated stationary process (A2) we assumed for the errors. 2. The wild bootstrap is applied to different residuals. In Xu's work, he assumed VAR(p) structure for the error process with known p. He needs to estimate the VAR(p) model to obtain the residuals, which are approximately independent but heteroscedastic. His wild bootstrap is applied to the residuals from the VAR(p) model, and is expected to work due to the well-known ability of wild bootstrap to capture heteroscedasticity. By contrast, we apply the wild bootstrap to OLS residuals from the regression model and our OLS residuals are both heteroscedastic and temporally dependent. The wild bootstrap is not expected to capture/mimic temporal dependence, but since the part that is due to temporal dependence is cancelled out in the limiting distribution of our self-normalized quantity, the wild bootstrap successfully captures the remaining heteroscedasticity and provides a consistent approximation of the limiting distribution. Note that our method works for errors of both types, but it seems that Xu's method would not work for the modulated stationary process as assumed in (A2) without a nontrivial modification. 3. Xu's Class 4 test without prewhitening is in a sense an extension of Kiefer, Vogelsang, and Bunzel (2000) to the linear trend model, whereas our method is regarded as an extension of SN method to the regression setting. In time series regression framework, Rho and Shao (2013) showed that there is a difference between the SN method and the KVB method; see Remark 3.1 therein and also Section 2 for a detailed explanation. It is indeed possible to follow the KVB method in our framework and we would expect the wild bootstrap to be consistent to approximate the limiting distribution of the studentized quantity owing to the separable structure of the errors. To save space, we do not present the details for the KVB studentizer in our setting. Finite sample comparison between our method and Xu's Class 3 and 4 tests are provided in Section 4.2.

SIMULATIONS
In this section, we compare the finite sample performance of our method to other comparable methods in Zhao (2011) and Xu (2012) for (i) constant mean models, (ii) linear trend models, and (iii) linear trend models with a break in the intercept. For the constant mean models, Zhao's method is applicable only when the errors have periodic heteroscedasticity. On the other hand, Xu's method can be used only for the linear trend models when the errors are from an AR process. In the cases where neither Zhao's nor Xu's methods can be directly applied, we compare our method to the conventional HAC method. Although it has not been rigorously proven that the traditional HAC method works in our framework, we can construct a HAC-type estimator that consistently estimates V in (2). Define V n,l n = n −1 n t 1 =1 (t 1 +l n −1)∨n where k(s) is a kernel function that is defined as 0 if |s| ≥ 1 and 1 if s = 0, l n is the bandwidth parameter, and u t,n is the OLS residual, that is, u t,n = X t,n − F t,n β N . This HAC-type estimate V n,l n can be shown to be consistent for appropriately chosen l n , and the inference is based on where Q N = n −1 nF t=1 t,n F t,n , and the χ 2 approximation. In fact, in the linear trend testing context, this statistic is almost identical to the Class 3 test without prewhitening in Xu (2012).

Constant Mean Models
Consider the linear model where μ is an unknown constant. Without loss of generality, let μ = 0. Note that for this constant mean models our method is completely bandwidth free, since the trimming parameter can be set to be 0, as mentioned in Remark 3.1. We conduct two sets of simulations for different kinds of error processes. The first simulation follows the data-generating process in Zhao (2011), where the error process {u t,n } is modulated stationary and has periodic heteroscedasticity. Consider the model (A2) with v(t/n) = cos(3πt/n) and j =0 a j t−j , a j = (j + 1) −ζ /10, ζ > 1/2, where t are independent standard normal variables in M1. We consider M2 with two different kinds of innovations: M2-1 denotes M2 with standard normal innovations, and M2-2 stands for M2 with t-distributed innovations with k = 3, 4, 5 degrees of freedom. We used 10,000 replications to get the coverage and interval length with nominal level 95% and sample size n = 150. In each replication, 1000 pseudoseries of iid N (0, 1) W t are generated. The coverage rates and interval lengths of Zhao's method (Zhao) and our method (SN) are reported in Table 1. Note that our method does not require any userchosen number in this particular example and m in the table is only for Zhao's method. With our choice of v(t/n), Zhao's "k-block asymptotically equal cumulative variance condition" holds for m = 25, 50, 75, . . ., which means that when m = 15 and 30 in M1 and M2-1, the conditions in Zhao (2011) are not satisfied. For the model M2-2, m = 25 is used for Zhao's method.
Zhao (2011) provided a comparison with some existing blockbased bootstrap methods and showed that his method tends to have more accurate coverage rates. Our method is comparable to Zhao's based on our simulation. For the nonlinear threshold autoregressive models M1, it seems that our method delivers slightly less coverage especially when the dependence Table 1. Percentages of coverage of the 95% confidence intervals (mean lengths of the interval in the parentheses) of the mean μ in (5), where μ is set to be zero. The error processes are from Zhao's (2011) setting with periodic heteroscedasticity. The method in Zhao (2011)  is strong and slightly wider confidence intervals compared to Zhao's method with correctly specified m. However, for the linear process models M2, our method has better coverage and slightly narrower intervals when ζ is small, that is, when dependence is too strong so that the short-range dependence condition (A2) (iii) does not hold.
Since this error structure do not show any periodicity, Zhao's method is no longer applicable. We use the HAC-type quantity introduced in (4) to make a comparison with our method. However, the performance of the HAC-based method is sensitive to the user-chosen parameter l n , the choice of which is often difficult. For the purpose of our simulation, we choose the oracle optimal l n , favoring the HAC method. Specifically, since we know the true V, we choose l n such thatl = argmin l || V n,l n − V || 2 F , where || · || F stands for the Frobenius norm. Note that this choice of l n is not feasible in practice. For the choice of the kernel, we use the Bartlett kernel, that is, a(s) = (1 − |s|)1(|s| ≤ 1), which guarantees the positive-definiteness of V n,l n . The results are presented in Table 2. The columns under the HAC (l) represent the result using the HAC-type quantity witĥ l,l/2, and 2l. When ρ = 0, the optimal bandwidths chosen for the HAC estimators are l = 1, so we do not have any values for the column corresponding tol/2.
If ρ = 0, all methods perform very well, which is expected because the error process is simply iid with some heteroscedasticity. The wild bootstrap without the self-normalization is also consistent because C(1) = 1. However, as the dependence increases, the performance of all methods gets worse. If ρ = 0, the wild bootstrap without the self-normalization is no longer consistent and thus produces the worst coverage. Of the three types of methods considered (SN-WB, WB, HAC), the SN method with the wild bootstrap delivers the most accurate coverage probability. For the case of moderate dependence ρ = 0.5, the SN-WB method still performs quite well, while the HAC-type method shows some undercoverage. When the temporal dependence strengthens, that is, ρ = 0.8, the SN-WB method delivers the best coverage. For the HAC-type method, note that the optimal bandwidth is chosen to minimize the distance from the true covariance matrix. However, this choice of the bandwidth may not coincide with the bandwidth that produces the best coverage; as can be seen for the cases (τ, δ) = (0.2, 3) or (τ, δ) = (0.8, 0.33), where 2l tends to produce better coverage thanl. The interval lengths for the SN-WB method are somewhat longer than those of HAC methods, but considering the gain in coverage accuracy, this increase of interval length seems reasonable. Table 2. Percentages of coverage of the 95% confidence intervals (mean lengths of the interval in the parentheses) of the mean μ in (5), where μ is set to be zero. The error processes exhibit nonperiodic heteroscedasticity; (A1)-AR(1) with single volatility shift (6).

Linear Trend Models
In this subsection, we are interested in inference on the linear trend coefficient from a linear trend model X t,n = b 1 + b 2 (t/n) + u t,n , t = 1, . . . , n.
Without loss of generality, let b 1 = 0 and b 2 = 5. The error process {u t,n } follows the same setting as in the second simulation of Section 4.1, except that {u t,n } is either generated from AR(1) or MA(1), that is, AR(1) : u t,n = ρu t−1,n + ε t,n , where ε t,n = ω(t/n)e t , e t iid N(0, 1), and ω(s) as in (6). For this setting, Xu's (2012) methods can be used. In fact, Xu's Class 3 test is almost identical to the HAC-type method in (4). Xu's Class 4 test is a fixed-b version of his Class 3 test, and it is closely related to our method, as mentioned in Remark 3.5. For this reason, we compare our method with Xu's Class 3 and 4 tests. In addition to his original statistics, Xu further applied the prewhitening. In our simulation, we examine finite sample performance of the following methods: "SN-WB( )" Our method with various values for the trimming parameter , "WB" Wild bootstrap without self-normalization, "F3" Xu' s Class 3 test without prewhitening with χ 2 limiting distribution, "PW3" Xu's prewhitened Class 3 test with χ 2 limiting distribution, "iid" Xu' s prewhitened Class 3 test with the iid bootstrap, "WB" Xu's prewhitened Class 3 test with the wild bootstrap, "F4" Xu's Class 4 test without prewhitening with the wild bootstrap, "PW4" Xu's prewhitened Class 4 test with the wild bootstrap.
For Xu's Class 3 tests, we need to choose the truncation parameter. Following Xu, we used Andrews' (1991) bandwidth selection for AR(1) or MA(1) model with Bartlett kernel. In the implementation of the wild bootstrap for Xu's method, we assumed the true order of the AR structure of the error process is known.
Remark 4.1. It should be noted that a direct comparison of our method and Xu's methods is not fair. We gave some advantages to Xu's methods in two aspects. The first aspect is that we assume the knowledge of the true error model, VAR(1), following Xu (2012), which may be unrealistic in practice. Xu's methods can take advantages of this knowledge of the true model; one advantage is that their bootstrap samples are generated following the correctly specified data-generating process, and the other advantage is that his Class 3 tests can be based on the most (theoretically) efficient choice of the bandwidth parameters. In contrast, our method does not rely on parametric assumption on the error structure, so knowing the true data-generating process does not provide useful information in the implementation of our method. The second aspect is that some of Xu's methods use Table 3. Percentages of coverage of the 95% confidence intervals (mean lengths of the interval in the parentheses) of the linear trend model (7) with (b 1 , b 2 )=(0, 5), (A1)-AR(1), single volatility shifts. The number of replications is 2000, and 1000 pseudoseries with iid N(0, 1)  another layer of correction, prewhitening. The prewhitening can be highly effective when the true order is known. The only tests that our method is directly comparable with are Xu's original Class 3 test (F3) and Class 4 test (F4), for which the advantage from knowing the true model may still exist. Table 3 presents the finite sample coverage rates of 95% confidence intervals of the linear trend coefficient b 2 of (7), along with the mean interval lengths in parentheses, for the AR(1) models. Comparing our method with the HAC method (Xu's Class 3 test without prewhitening), our method always Table 4. Percentages of coverage of the 95% confidence intervals (mean lengths of the interval in the parentheses) of the linear trend model (7) with (b 1 , b 2 )=(0, 5), (A1)-MA(1), single volatility shifts. The number of replications is 2000, and 1000 pseudoseries with iid N(0, 1)  have more accurate coverage rates regardless of the choice of the trimming parameter . Not surprisingly, prewhitening is very effective in bringing the coverage level closer to the nominal level. Xu's Class 4 test seems to provide the best coverage rates, for both prewhitened and nonprewhitened versions.
However, Xu's methods and his prewhitening are based on the assumption that the errors are from an AR model with a known lag. When the errors are not from an AR model, then the performance may deteriorate. As can be seen from Table 4, where the errors are generated from an MA model, Table 5. Percentages of coverage of the 95% confidence intervals of b 2 (mean lengths of the interval in the parentheses) of the linear trend model with a break in intercept (8)   In this linear trend assessment setting, we need to use > 0 so that our method is theoretically valid. As can be seen in Table 3, if the dependence is weak or moderate (ρ = 0 or 0.5), the choice of does not affect the coverage accuracy much, and = 0.1 or above seems appropriate. However, the effect of is more apparent when the dependence is stronger, that is, when ρ = 0.8. It seems that ∈ {0.2, 0.3, 0.4} are all reasonable choices here, considering the amount of improvement in coverage percentage. Notice that even when our choice of is not optimal, the SN-WB method delivers more accurate coverage than the HAC method. In terms of the interval length, there is a mild monotonic relationship between and interval length. The larger is, the longer the interval tends to become. As a rule of thumb in practice, we can use ∈ [0.2, 0.4] if the temporal dependence in the data is likely to be strong, otherwise, we can also use = 0.1, which was also recommended by Zhou and Shao (2013) under a different setting.

Linear Trend Models With a Break in Intercept
This section concerns the linear trend models with a break in its mean level at a known point, X t,n = b 1 + b 2 1(t/n > s b ) + b 3 (t/n) + u t,n , t = 1, . . . , n, where s b ∈ (0, 1) indicates the relative location for the intercept break, which is assumed to be known. Without loss of generality, let s b = 0.3, b 1 = 0, b 2 = 2, and b 3 = 5, which was modified from the model we fit in the data illustration in Section 5. Our interest is on constructing confidence intervals for the break parameter b 2 . For the error process {u t,n }, we use the AR(1) model, as introduced in the second simulation of Section 4.1. Because of the general form of the trend function, neither Xu's nor Zhao's methods work in this setting. Here, we compare the HAC-type method defined in (4) with our SN-WB( ) method, and the wild bootstrap alone. For the HAC-type method, the bandwidth parameter is chosen following Andrews' (1991) suggestion for the AR(1) model, so in a sense, this HAC-type method takes some advantage of the knowledge of the true model. Table 5 presents the empirical coverage rates for 95% confidence intervals of b 2 . The mean interval lengths are in parentheses. When there is no temporal dependence, that is, ρ = 0, all methods work well. When there is moderate (ρ = 0.5) or strong (ρ = 0.8) dependence, the wild bootstrap is no longer consistent. In this case, the HAC method with Andrews' bandwidth parameter choice also shows some departures from the nominal level, converging to its limit much slower than our method. In this setting, our SN-WB method provides the most accurate coverage rates. Even the worst coverage of our SN-WB method for a range of 's under examination seems to be more accurate than the HAC method in all simulation settings. Notice that for this kind of linear trend model with a break in the intercept, the trimming parameter in our method should be chosen so that > s b . See Remark 3.2 for more details. In this simulation, since s b = 0.3, we choose to be 0.35, 0.4, 0.5, 0.6, and 0.7. The results does not seem to be sensitive to the choice of this trimming parameter.

DATA ANALYSIS
In this section, we illustrate our method using the logarithm of the U.S. nominal wage data, which was originally from Nelson and Plosser (1982) for the period 1900-1970 and has been intensively analyzed in econometrics literature, including Perron (1989). The latter author argued that this series has a fairly stable 1900 1920 1940 1960 1980   linear trend with a sudden decrease in level between 1929 and 1930 (the Great Crash). In this section, we examine his assertion for an extended dataset for the period 1900-1988, provided by Koop and Steel (1994). The left panel of Figure 1 presents the data fitted with the linear trend model with one break in the intercept X t,n = b 1 + b 2 1(t > 30) + b 3 (t/n) + u t,n , t = 1, . . . , n = 89.

Logarithm of Nominal Wages
Here the second regressor 1(t > 30) represents the break at year 1929. From the residual plot on the right panel of Figure 1, the error appears to exhibit some degree of heteroscedasticity. Our inference method, which is robust to this kind of heteroscedasticity and weak temporal dependence, is expected to be more appropriate than other existing methods, the validity of which hinges on the stationarity of the errors. Table 6 reports the 100(1 − α)% confidence intervals with α = 0.01, 0.05, 0.1, for the three coefficients using our method. For the wild bootstrap, we used 1000 pseudoseries {W t } n t=1 drawn from iid N (0, 1). For the choice of the trimming parameter , we use = 0.4, 0.5, 0.6. Note that since there is a break in the regressor at 30/89 = 0.34, cannot be less than 0.34, see Remark 3.2. As can be seen in Table 6, the resulting confidence intervals are not very sensitive to the choice of the trimming parameter . In fact, the confidence intervals unanimously suggest that the linear trend is significantly positive, that is, the nominal wage is linearly increasing at significance level 1% with a 99% confidence interval of about [3.6,6], and the decrease at the Great Crash is significant at 5% level but not at 1% level.

CONCLUSION
In this article, we study the estimation and inference of nonstationary time series regression models by extending the SN method (Lobato 2001;Shao 2010) to the regression setting with fixed parametric regressors and weakly dependent nonstationary errors with unconditional heteroscedasticity. Due to the heteroscedasticity and nonconstant regressor, the limiting distribution of the SN quantity is no longer pivotal and contains a number of nuisance parameters. To approximate the limiting distribution, we apply the wild bootstrap (Wu 1986) and rigorously justify its consistency. Simulation comparison demonstrates the advantage of our method in comparison with HAC-based alternatives. The necessity of self-normalization is also demonstrated in theory and finite samples.
The two kinds of heteroscedastic time series models we consider in this article are natural generalizations of the classical linear process and the stationary causal process to capture unconditional heteroscedasticity often seen on the data. However, it is restricted only to the short-range dependence as seen from our condition (A1) (ii) and (A2) (iii). When there is long-range dependence, it is not clear if the SN method is still applicable. This can be an interesting topic for future research. Furthermore, the OLS is a convenient but not an efficient estimator of the regression parameter when the errors exhibit autocorrelation and heteroscedasticity. How to come up with a more efficient estimator and perform the SN-based inference is worthy of a careful investigation. Finally, our method assumes a parametric form for the mean function so it is easy to interpret and the prediction is straightforward. However, a potential drawback is that the result may be biased if the parametric trend function is misspecified. We can avoid this misspecification problem by preapplying Zhang and Wu's (2011) test to see if a parametric form fits the data well. The inferential procedure developed here can be readily used provided that a suitable parametric form is specified.

SUPPLEMENTARY MATERIALS
Technical Appendix: Proofs of Theorems 3.1 and 3.2. (pdf file)

ACKNOWLEDGMENTS
The research is supported in part by NSF grants DMS08-04937 and DMS11-04545. The authors are grateful to the referees for constructive comments that led to a substantial improvement of the article.