Forward detrending for heteroskedasticity-robust panel unit root testing

Abstract The variances of most economic time series display marked fluctuations over time. Panel unit root tests of the so-called first and second generation are not robust in such cases. In response to this problem, a few heteroskedasticity-robust panel unit root tests have been proposed. An important limitation of these tests is, however, that they become invalid if the data are trending. As a prominent means of drift adjustment under the panel unit root hypothesis, the (unweighted) forward detrending scheme of Breitung suffers from nuisance parameters if the data feature time-varying variances. In this article, we propose a weighted forward-detrending scheme. Unlike its unweighted counterpart, the new detrending scheme restores the pivotalness of the heteroskedasticity-robust panel unit root tests suggested by Demetrescu and Hanck and Herwartz et al. when applied to trending panels with heteroskedastic variances. As an empirical illustration, we provide evidence in favor of non-stationarity of health care expenditures as shares of GDP in a panel of OECD economies.


Introduction
Time-varying variances are a common feature of many macroeconomic time series. This could, for instance, reflect economic crises or major policy changes. As a prime example for the latter, one could think of the so-called Great Moderation-a decline in the variance of trending macroeconomic series, such as output and prices, from the mid-1980s until the great recession (e.g. Bernanke, 2004;Stock and Watson, 2002). When time series exhibit changes in variance, panel unit root tests (PURTs) of the first and second generation (e.g. Breitung and Das, 2005;Levin et al., 2002) suffer from nuisance parameters, and hence, could provide misleading inferential outcomes (Herwartz et al., 2016). To address this problem, recent studies have proposed new tests that remain pivotal under time-varying variances. Demetrescu and Hanck (2012a,b) propose PURTs based on the Cauchy estimator, which employs the sign function to instrument the lagged level series. Herwartz et al. (2016) show that the White-type test in Herwartz and Siedenburg (2008), which is a non-Cauchy counterpart of the PURT in Demetrescu and Hanck (2012a), is also robust to variance breaks.
To retain the pivotalness of first and second generation PURTs in the presence of linear trends, the adjustment scheme suggested in Breitung (2000) has become an established detrending procedure (Baltagi and Moscone, 2010;Herwartz and Siedenburg, 2008;Narayan and Smyth, 2008). As a particular merit of removing deterministic terms from the data in a way that respects the null hypothesis, subsequent steps of the testing procedure do not require bias adjustments that are inevitable if PURT regressions pool projections of panel time series on cross-section-specific intercepts or trends. 1 At its core, the detrending approach suggested by Breitung (2000) consists of time-varying drift estimates obtained from unconditional averaging of future time series changes. Henceforth, we refer to this approach as unweighted forward detrending (UFD). While UFD immunizes PURTs against linear trends under homoskedasticity, however, it introduces a nuisance to the location of the asymptotic distribution of the heteroskedasticity-robust PURT statistics under heteroskedasticity. As a result, the heteroskedasticity-robust PURTs of Demetrescu and Hanck (2012a) and Herwartz et al. (2016) lack practical applicability in the important case of having macroeconomic data that are subject to both linear trends and variance changes.
Addressing heteroskedasticity-robust panel unit root testing, the test in Westerlund (2014) relies on the restrictive assumption of a trend common to all cross-sectional units. Building upon the detrending scheme of Demetrescu and Hanck (2016), Herwartz et al. (2019) have suggested to modify the White-type test in Herwartz et al. (2016) by adjustments for its non-zero expectation under heteroskedasticity. While this adjustment yields an asymptotically Gaussian test statistic under heteroskedastic panels with cross-section specific drift terms, it might come with sizeable finite sample size distortions if prewhitening is also applied to remove serial correlation from the data. Providing further devices to immunize PURTs against various sources of nuisance parameters, suitable resampling schemes have also been proposed in the literature (e.g. Herwartz and Walle, 2018;Smeekes and Urbain, 2014). Demetrescu and Hanck (2016) have noticed the limitations of UFD in heteroskedastic and trending panels and argue in favor of the potential merits of weighted forward detrending. Acknowledging the difficulty in estimating time-varying variances (i.e. the latent weights), however, they did not pursue this avenue further. In this article, we take advantage of the conjecture raised by Demetrescu and Hanck (2016), and suggest weighted forward detrending (henceforth, WFD) as an alternative to the UFD scheme of Breitung (2000). 2 WFD rests on familiar motivations of weighted least squares estimation, and delivers an accurate centering of the (common) numerator of heteroskedasticity-robust PURT statistics discussed in this article. Hence, it is applicable in the case of heteroskedastic trending panels. To make WFD feasible, we propose a simple way of estimating time-varying variances in a non-parametric manner. Henceforth, we refer to the feasible variant of WFD as FWFD. Asymptotically, FWFD approximates WFD, and, hence, provides an accurate centering of the heteroskedasticity-robust PURTs of Demetrescu and Hanck (2012a) and Herwartz et al. (2016). Simulation outcomes confirm asymptotic results. FWFD leads to very good finite sample properties of the two tests in the presence of linear trends and time-varying variances. Moreover, the test in Demetrescu and Hanck (2012a) displays remarkably good size control even if the series are not only trending and heteroskedastic, but also cross-sectionally correlated along a (specific) common factor structure.
As an empirical illustration, we analyze the trending behavior of the share of health care expenditures in GDP for the case of 20 OECD member countries and the period 1970-2019. 1 Levin et al. (2002) discuss the case of removing deterministic terms from the panel by means of cross-section specific ordinary least squares (OLS) regressions. Owing to data centering by means of OLS parameter estimates, both the numerator and denominator of the test statistic in Levin et al. (2002) are subject to estimation bias. Levin et al. (2002) suggest (bias) adjustments for centering and rescaling the numerator of their test statistic. Results in Breitung (2000) highlight that OLS detrending with subsequent bias correction comes at the cost of substantial power loss in comparison with the performance of the Levin et al. (2002) test in purely stochastic panels (see also respective discussions in Bai and Ng, 2004). Moreover, it is noteworthy that bias adjustments might become overly complicated under heteroskedasticity of unknown form. 2 We concentrate on the detrending scheme proposed in Breitung (2000). If forward demeaning is applied to first differences, detrending lagged levels according to the method in Demetrescu and Hanck (2016) invokes marked power losses of PURTs in comparison with transforming lagged levels following the method in Breitung (2000). Corresponding simulation results are available upon request.
Seeing that the trending panel data display significant changes in variances, we detrend the data by means of FWFD and apply heteroskedasticity-robust PURTs. Results show that employing UFD could imply substantially different inferential outcomes in comparison with using FWFD. In general, the empirical evidence based on the heteroskedasticity-robust PURTs implies that the share of health care expenditures in GDP is well characterized as a panel random walk with drift-and not a panel trend stationary-process.
Section 2 (i) states the testing problem at hand, (ii) motivates the WFD scheme, (iii) provides its feasible counterpart FWFD, and (iv) states the asymptotic properties of FWFD. The finite sample performances of heteroskedasticity-robust tests under alternative detrending strategies (UFD, WFD, and FWFD) are evaluated by means of a Monte Carlo study in Section 3. As an empirical illustration, the shares of health care expenditure in GDP are subjected to panel unit root testing in Section 4. Section 5 concludes.

Heteroskedasticity-robust PURTs
In this section, we first consider purely stochastic panels and sketch the heteroskedaticity-robust PURTs of Demetrescu and Hanck (2012a) and Herwartz et al. (2016). Subsequently, we augment the stochastic model with deterministic trends and illustrate that both PURTs are subject to nuisance parameters if the drift terms are removed by means of UFD under heteroskedasticity. As a solution, we propose WFD and state that the heteroskedasticity-robust tests of Demetrescu and Hanck (2012a) and Herwartz et al. (2016) are asymptotically Gaussian after heteroskedastic panel data are detrended by means of WFD or its feasible counterpart FWFD.

A purely stochastic model
A purely stochastic first order panel autoregression with given presample values and heteroskedastic residuals can be specified as y t ¼ qy tÀ1 þ e t , t ¼ 1, 2, :::, T, where y t ¼ ðy 1t , :::, y Nt Þ 0 , y tÀ1 ¼ ðy 1, tÀ1 , :::, y N, tÀ1 Þ 0 , e t ¼ ðe 1t , :::, e Nt Þ 0 are N Â 1 vectors. Henceforth, we use i, i ¼ 1, 2, :::, N, to indicate cross section members. PURTs are used to test the null hypothesis of a panel random walk (H 0 : q ¼ 1) against the alternative hypothesis of panel stationarity (H 1 : q < 1) in (1). However, since assuming a homogeneous autoregressive coefficient under the alternative hypothesis seems restrictive, it can be relaxed to the case of cross-section specific stationary autoregressive parameters H 1 : q i < 1: Breitung and Pesaran (2008) highlight that the power of first and second generation PURTs depends on the average of the individual-specific autoregressive coefficients (Westerlund and Breitung, 2013). By implication, pooled PURTs have also power against a 'mixed' alternative hypothesis where mean reverting behavior is stated to hold only for some non-empty subset of cross sectional units (leaving the remaining units unrestricted). We do not further consider such alternatives, as their scope for testing economic theory is limited. For instance, when it comes to testing behavioral models (e.g. interest or purchasing power parities), a test decision in favor of a mixed alternative cannot be interpreted to support a specific theoretical model at the panel level (J€ onsson, 2005). We make the following assumptions: Assumptions A: (i) e t is serially uncorrelated with mean 0 and covariance X t .
(ii) For all t, X t is a positive definite matrix with eigenvalues k t > c > 0, where c and c are positive real numbers bounding the N eigenvalues from below and above, respectively.
(iii) E½u p it u p jt u p kt u p lt < 1 for all i, j, k, l and some p > 1, where u t , 2 fi, j, k, lg, denote typical elements of u t ¼ X À1=2 t e t : Here, we set X 1=2 t t where K t is a diagonal matrix of eigenvalues of X t and the columns of C t are the corresponding eigenvectors.
Assumptions A are standard to establish weak cross-sectional dependence and finite fourth order moments of the panel residuals (see, for instance, Herwartz et al., 2016). The exclusion of serial dependence in AðiÞ is clearly restrictive for practical purposes. However, removing the autoregressive dynamics by means of prewhitening schemes leaves the validity of the PURTs considered in this work unaffected (for details see footnote 7 in Section 3). More specific assumptions on the covariance matrix have been made in the earlier literature on first and second generation PURTs. Assuming a diagonal covariance structure, for instance, the test of Levin et al. (2002) is valid for homoskedastic, cross sectionally independent panels X t ¼ diagðx 1 , x 2 , :::, x N Þ: Allowing time invariant cross sectional correlation, Breitung and Das (2005) assume the covariance matrix to be time invariant (X t ¼ X). In this study, the marginal variances (i.e. the diagonal elements of X t ) are allowed to evolve in a non-stationary manner. To enable variance estimation, however, the variance processes are subject to regularity conditions that we formalize in Assumptions B below.
For testing the unit root hypothesis in purely stochastic (and heteroskedastic) panels the test statistics of Demetrescu and Hanck (2012a) and Herwartz et al. (2016) read as respectively, where sgn(g) indicates the sign function, i.e.
The assumptions A establish that both statistics converge to a joint limit where both data dimensions approach infinity simultaneously, denoted as N, T ! 1: Specifically, both statistics are asymptotically Gaussian under the panel unit root null hypothesis and converge to À1 under the alternative hypothesis as N, T ! 1: Both statistics have originated as generalizations of second generation PURTs which are robust to cross sectional correlation of model residuals. They offer robustness against time-varying second order features of panel residuals through the non-parametric 'White-type' covariance estimation in their denominators. Unlike t HSW , the test statistic t DH builds upon the (instrumental) 'Cauchy' estimator, which utilizes only the sign-and not the magnitude-of the lagged level series.

Data adjustment in trending panels
As an important limitation, the test statistics in (2) lack applicability if the heteroskedastic panel data comprise deterministic terms as non-zero intercepts or linear trends. Going beyond the purely stochastic model in (1) and adopting the model representation of Pesaran (2007), a panel autoregression featuring weak cross-sectional dependence and deterministic terms reads as where the vectors l ¼ ðl 1 , :::, l N Þ 0 and d ¼ ðd 1 , :::, d N Þ 0 stack cross-section specific intercepts and trend parameters, respectively. While assuming a presample initialization of y 0 ¼ 0 is not uncommon in the related literature to facilitate asymptotic derivations, it might be criticized for being overly restrictive. Madsen (2010) discusses asymptotic (i.e. N ! 1, T fixed=smallÞ propertiesin particular power -of several panel unit root tests and their dependence on initial conditions. As an alternative to fixed initializations, one could relax the model by considering random (i.e. O p ð1Þ) initial values y i0 stemming, for instance, from the stationary distribution under the alternative hypothesis. Westerlund and Breitung (2013) show that, even in the full asymptotic case (N, T ! 1) initial conditions matter for local power, if panel unit root statistics are immunized against deterministic terms by means of cross-section specific OLS regressions. In contrast, pooling data after suitable adjustments under the null hypothesis might result in test properties that are independent of the initialization. In the framework of (3) the null hypothesis of interest is the panel random walk with drift model (H 0 : q ¼ 1), such that the data align with the representations Under the alternative hypothesis H 1 : q < 1 the panel is trend stationary. Without further adjustments, the numerators of the statistics t DH and t HSW in (2) lack a suitable centering in the presence of linear trends under the null hypothesis. For instance, inserting the right hand side of (4) into (2) obtains that the numerators of the test statistics t DH and t HSW comprise the nuisance components P T t¼1 sgnðy tÀ1 Þ 0 l and P T t¼1 ðt À 1Þl 0 l, respectively, which lead to non-zero expectations of t DH and t HSW under the null hypothesis. For the case of homoskedastic and trending panels, the UFD suggested in Breitung (2000) is capable of isolating the stochastic components of y t and Dy t in (4) and thereby centering the numerator of both statistics under the null hypothesis of a panel unit root with drift. To implement the UFD, the level data and the first differences in (2) are replaced by respectively, where (i) y 0 and ðt À 1Þl are, respectively, mean and time-trend estimators that are consistent with the null hypothesis, 3 (ii)l tþ1 conditions on forward information which is essential for an accurate centering of t DH and t HSW , and (iii) s 2 t ¼ ðT À tÞ=ðT À t þ 1Þ: Specifically, the unweighted drift and forward drift estimators arẽ Dy t andl tþ1 ¼ 1 T À t ðDy tþ1 þ ::: þ Dy T Þ: Taking account of the definition of Dy t in (4) the estimators in (6) are related with the true drift parameters bỹ ðe tþ1 þ e tþ2 þ ::: þ e T Þ: Hence, using the representations in (4) and the results in (7) the adjusted data read as and Now consider the expectation of a typical element entering the numerator of the t HSW statistic after data adjustment, i.e. E½y ᭡ 0 tÀ1 Dy ᭡ t ¼ E½ The rightmost expressions in (8) and (9) obtain 3 For testing the random walk model against a stationary panel with non-zero expectation (i.e. y t ¼ ð1 À qÞl þ qy tÀ1 þ e t ), applying UFD is not necessary. Rather, consistent panel unit root testing is possible-even under heteroskedasticity-by simply subtracting initial observations from the data, i.e. by applying heteroskedasticity-robust PURTs on y ᭡ t ¼ y t À y 0 : Hence, after adopting UFD, terms entering the numerator of t HSW are generally distinct from mean zero, i.e. E½y ᭡ 0 tÀ1 Dy ᭡ t 6 ¼ 0, with the case of homoskedastic variances considered in Breitung (2000; r 2 it ¼ r 2 i ) as an important exception. To establish an appropriate centering of the numerator of t HSW under heteroskedasticity, it is helpful to observe that after division of all stochastic terms fe it g T t¼1 in (10) by their standard deviations fr it g T t¼1 the expectation in (11) is zero. Therefore and in analogy to weighted least squares estimation, we make use of a weighting scheme to estimate the drift parameters in l (under the null hypothesis). Specifically, we suggest to replace the unweighted drift (l) and forward drift estimates (l tþ1 ) in (7) by weighted counterpartsl andl tþ1 that comprise, respectively, the typical elementŝ From common results for weighted least squares estimation we havê With the results in (14) the cross-section specific counterparts of (8) and (9) are, respectively, and to which we refer to WFD detrended data. This type of centering the numerator of t HSW obtains by virtue of (4), (15) and (16)

Feasible detrending and heteroskedasticity-robust PURTs
Estimating the weighted means in (14) relies on the true cross-section specific and time-varying variances which are, however, unknown in practice. To estimate the variances, we proceed similarly to Boswijk (2005) and Beare (2018), who adapt classical univariate unit root tests to apply in the presence of non-stationary heteroskedasticity by means of kernel estimators. Non-parametric variance estimates obtain asr where K h ðÁÞ ¼ KðÁ=hÞ=h, KðÁÞ is a kernel function, h is the bandwidth andl i is a typical element of the estimator in (6) (and replaced byl i in case of iterative estimation). For consistency of the non-parametric variance estimator, we impose the following regularity conditions.
Assumptions B (i) KðÁÞ is a symmetric kernel function with compact support and Ð KðxÞdx ¼ 1: (ii) The bandwidth h satisfies h ¼ aT b where 0 < a < 1 and b 2 ð1=ð4pÞ, 1Þ for p determined by assumption AðiiiÞ: (iii) The cross-section specific variance process converges weakly in D½0, 1, r 2 i, bsTc ! w r 2 i ðsÞ, s 2 ½0, 1, for T ! 1, where bsTc denotes the integer part of sT. The limit process r 2 i ðsÞ has continuous sample paths. The assumptions in B allow for the consistent estimation of the marginal variance, i.e. of the diagonal elements of X t in assumption AðiÞ: Assumption BðiÞ is standard in the literature on kernel estimation (Pagan and Ullah, 1999). The regularity conditions for the bandwidth (BðiiÞ) and the variance process (BðiiiÞ) enable convergence of the variance estimator. Although level shifts are excluded from the variance process by assumption BðiiiÞ, Boswijk (2005) argues that they might be approximated by smooth transitions in practice, and underlines this reasoning convincingly by means of Monte Carlo experiments.
Employing the estimated time-varying variancesr 2 it , the vectorsl tþ1 andl stack typical elements such asl respectively. Henceforth, we refer to FWFD as the feasible counterpart of WFD that we suggest for the detrending of heteroskedastic panel data. To construct the panel unit root statisticst DH andt HSW , we use the following detrended observations Employing data that are detrended according to FWFD, the heteroskedasticity-robust PURTs of Demetrescu and Hanck (2016) and Herwartz et al. (2016) are, respectively, To avoid an abuse of notation we use ' Ã ' to indicate interchangeably WFD or FWFD transformed data. For instance, similar to the definitions in (20) we denote witht DH andt HSW the (typically infeasible) PURT statistics that obtain from detrending the data by means of WFD defining y Ã tÀ1 and Dy Ã t as in (15) and (16). In addition,t DH andt HSW are test statistics that obtain after adjusting observables by means of UFD, i.e. applying the definitions of y ᭡ tÀ1 and Dy ᭡ t in (8) and (9), respectively.
In light of their implementation, both statisticst , 2 fDH, HSWg, combine elements of weighted least squares estimation (applied for the detrending) with pooled and unweighted least squares estimation. In view of the second-order heterogeneity of the panel data, one might alternatively opt for a fully fledged generalized least squares (GLS) approach to testing. Such a GLS PURT statistic, however, would need to build upon the entire time-varying covariance structures. Hence, a feasible counterpart would deserve the estimation of NðN À 1Þ=2 additional moments (at each time instance) which might result in overly wiggly estimates, in particular, if time dimensions T are small. 4

Theoretical results
We next derive asymptotic normality of the test statisticst HSW andt DH : As a first step, we formulate large-sample properties of the feasible weighted drift estimatorl t , i.e. convergence in vector norm tol t , based on convergence of the non-parametric cross-section specific variance. We adapt results of Boswijk (2005) and Hansen (1995) for the cross-section specific non-parametric variance estimator and results of Carroll (1982) on the convergence of the resulting weighted least squares estimator. Under assumptions A and B we can state the following lemma.
Lemma 1. Under assumptions A and B,l t converges to the weighted least squares estimatorl t in L q -norm, i.e.
for a random vector x ¼ ðx 1 , :::, x N Þ, q ! 1: The proof of Lemma 1 is given in the Appendix. For subsequent derivations the lemma allows to evaluate the asymptotic differences between PURTs building alternatively upon WFD and FWFD, i.e.t HSW Àt HSW andt DH Àt DH : As a basis for convergence of the test statistics we use Lemma 3 of Demetrescu and Hanck (2016) who show that the weighted mean of residuals, is a martingale difference array. Asymptotic normality oft DH under the null hypothesis follows directly under assumptions B and the regularity conditions in Demetrescu and Hanck (2012a; see their Section 2) that are largely analogous to assumptions A: For N, T ! 1, For convergence of the test statistict HSW , we apply a central limit theorem for martingale difference sequences by considering the weighted version of the elements of the numerator as used above in (10), i.e.
4 See Hafner and Linton (2010) for a semiparametric approach to the estimation of non-stationary covariances from high frequency data.
with e Ã ¼l À l being the vector of weighted means of residuals as defined in (21) that involve the estimated variancesr 2 it : By Lemma 1,l converges tol: Hence, by additionally applying results of Demetrescu and Hanck (2016;(21)) and convergence results for the original test statistic t HSW of Herwartz et al. (2016), we can state the following proposition on the asymptotic distribution oft HSW : Proposition 1. Under assumptions A, B and the null hypothesis, H 0 : q ¼ 1, the test statistict HSW is asymptotically normally distributed, i.e. for N, T ! 1 with N=T 1=2 ! 0, A detailed proof is given in the Appendix.

A Monte Carlo study
As an alternative to UFD, the proposed weighting scheme WFD and its feasible counterpart FWFD promise a valid centering of heteroskedasticity-robust PURTs in the case of trending panels. This Monte Carlo analysis addresses several issues which are relevant in empirical practice. First, while the properties of FWFD are well understood asymptotically, it is not clear how the feasible detrending scheme performs in finite samples in comparison with a detrending that relies on knowledge of the true variances (WFD). Second, one can expect performance losses for the robust FWFD approach in case that it is not necessary to account for heteroskedasticity. Hence, it is of interest to compare the performance of PURTs in homoskedastic trending panels after detrending by means of UFD and FWFD. Similarly, one might imagine that an analyst feels insecure about the presence of linear trends. Hence, it is worth highlighting potential effects of applying UFD and FWFD in the case of non-trending panels. Third, noting that the test in Herwartz et al. (2016) is not robust to strong forms of crosssectional dependence, simulation exercises are useful to elaborate how a defactoring procedure could allow consistent panel unit root testing under specific forms of strongly correlated panels. In this context it will be of further interest to evaluate the performance of the so-called PANIC (Panel Analysis of Nonstationarity in Idiosyncratic and Common components) approach of Bai and Ng (2004). While the PANIC methodology has been widely recognized as a powerful toolkit for handling panels subject to strong forms of cross-sectional dependence, it is not clear how the resulting tests perform in the presence of time-varying heteroskedasticity.

The simulation design
3.1.1. Data generation For Monte Carlo experiments covered by the Assumptions A and B we sample from three distinct data generating processes (DGPs), namely, where j refers to a vector of ones and denotes the Hadamard product. Initializing all DGPs with y À50 ¼ 0, we generated and subsequently discarded 50 presample values to minimize potential effects of initial values on simulation results. 5 Whereas first order autoregressive processes with serially uncorrelated residuals are formalized by DGP1 in (25), autocorrelated disturbances are introduced in DGP2 in (25). The DGP in (27) is used to simulate non-trending panels. To investigate empirical size and power, we set q ¼ j and q ¼ ðq 1 , :::, q N Þ 0 , q i $ iid Uð0:85, 0:95Þ, respectively. 6 Following Pesaran (2007), we draw the elements of l ¼ ðl 1 , :::, l N Þ 0 and trend parameters d ¼ ðd 1 , :::, d N Þ 0 as l i $ iid Uð0, 0:02Þ and d i $ iid Uð0, 0:02Þ and set the short run dynamics in c ¼ ðc 1 , :::, c N Þ 0 , as c i $ iid Uð0:2, 0:4Þ: 7 As in Herwartz et al. (2016), we separate the issue of variance changes from cross sectional correlation by means of the With respect to the degree of correlation among panel units, cross-sectional independence is simulated by setting W to an identity matrix. For weak cross sectional dependence we obtain W from the following spatially autoregressive (SAR) model where W is the row normalized symmetric contiguity matrix of the 'm ahead and m behind' structure, with m ¼ 1 (e.g. Kelejian and Prucha, 1999). Accordingly, the covariance of e t is ððI N À HWÞ 0 ðI N À HWÞÞ À1 , and W is the implied correlation matrix.
We model variance breaks for each individual series in the panel such that where s i indicates the timing of the variance break. As size distortions of PURTs arising from variance breaks depend on the timing of the (co)variance shifts (e.g. Cavaliere and Taylor, 2007;Herwartz et al., 2016), we consider scenarios of homogeneously early (s i ¼ 0:2) or late (s i ¼ 0:8) variance breaks for all panel units. We set r i, I ¼ r i, II ¼ 1 to formalize homoskedasticity, but change r i, II to 3 and 1/3 to generate a late (s i ¼ 0:8) upward (i.e. positive) and an early s i ¼ 0:2 downward (i.e. negative) variance shift, respectively. 8 The considered data dimensions are from the set U ¼ fTjT ¼ 25, 50, 100, 250g Â fNjN ¼ 10, 50, 100g: We employ a 5% nominal significance level for all Monte Carlo experiments. Results for alternative nominal levels of 1% and 10% are qualitatively equivalent and available from the authors upon request.

Variance estimation
The practical implementation of FWFD employs the Epanechnikov kernel (i.e. KðxÞ ¼ 3=4ð1 À x 2 Þ for jxj 1 and K(x) ¼ 0 otherwise). The bandwidth is throughout h ¼ T 1=2 : To improve finite sample performance of the tests, (forward) estimation of variances (r 2 it ) and drift 5 Monte Carlo simulation results from alternative initializations (constant or bounded) are provided in the Online Supplemental Material. As it turns out, alternative stationary initial values leave the size of the considered tests qualitatively unaffected, although they may affect power, at least in scenarios with a relatively short time series dimension (i.e. T ¼ 25, 50) 6 Results for DGPs with a homogeneous autoregressive coefficient under the alternative hypothesis, i.e. q ¼ 0:9, are qualitatively identical and available upon request (see also Breitung and Pesaran, 2008;Westerlund and Breitung, 2013). 7 To enable panel unit root testing under serial correlation the data are prewhitened prior to being detrended. Prewhitening of the trending data involves regressing first differences on their lags and a constant as Dy it ¼ c i0 þ P pi j¼1 c ij Dy i, tÀj þ e it , with p i chosen by means of a lag-length selection criterion. Subsequently, data are prewhitened as y ðÞ it ¼ y it Àĉ i0 Àĉ i1 y i, tÀ1 À ::: Àĉ ipi y i, tÀpi and Dy ðÞ it ¼ Dy it Àĉ i0 Àĉ i1 Dy i, tÀ1 À ::: Àĉ ipi Dy i, tÀpi : 8 As in Cavaliere (2003), we present results for early negative and late positive variance break scenarios. Results for other combinations of variance breaks and/or randomly selected break moments (s i $ iid Uð0:1, 0:9Þ) are available upon request. terms (l i ) are iterated five times. As argued in Kuk (1999), iterative estimation does not affect consistency of the mean estimator. Thus, the asymptotic distribution of the test statistics remains unchanged.
Simulation results documented in the Online Supplemental Material of this article provide three main lessons on the roles of bandwidth selection and kernel functions for the effectiveness of the FWFD. First, for the considered patterns of variance breaks alternative bandwidth selections of T 1=4 , T 1=3 , T 2=3 and T 3=4 result in rather similar performances in terms of both, size, and power of the tests. Second, employing a data-based bandwidth determination makes little difference. In particular, we have considered leave-one-out cross validation which obtains qualitatively similar results as choosing the bandwidth in the form of a deterministic function of the sample size. Third, while the Epanechnikov kernel appears to obtain somehow preferable finite sample size estimates, alternative kernel choices, the uniform, Gaussian, and Epanechnikov kernel, appear to have negligible and unsystematic effects on test outcomes. Further underlining that the kernel choice is unlikely crucial in the present context, it is worth noticing that the Monte Carlo study in Boswijk (2005) relies exclusively on the one-sided exponential kernel. Moreover, Beare (2018) considers the Gaussian Kernel. Both studies specifically hint at a slightly improved performance for two-sided kernels.

Simulation results
In the following, we discuss the finite sample performances of the test statistics of Demetrescu and Hanck (2012a) and Herwartz et al. (2016) under distinct DGPs featuring both homoskedastic and heteroskedastic residual distributions. We first compare simulation results obtained after using the UFD scheme of Breitung (2000;t , 2 fDH, HSWg) with results obtained after applying WFD and FWFD (t andt , 2 fDH, HSWg) in first-order autoregressive panels. In the second place, we view at the performance of the PURTs when higher-order autoregressive panels are prewhitened and subsequently subjected to FWFD. Third, we take the perspective of an analyst who feels insecure about the necessity of detrending and falsely applies forward detrending to non-trending panels. Throughout, simulation results are documented for the alternative scenarios of cross-sectional independence and weak forms of dependence. Table 1 documents results that allow us to evaluate the relative merits of the proposed detrending schemes (WFD and FWFD) over UFD suggested in Breitung (2000). The data are generated according to the DGP1 (no serial correlation). Under the benchmark scenario of homoskedasticity and cross-sectional independence, applying either FWFD or UFD (which is identical to WFD under homoskedasticity) results in empirical size estimates that are close to the nominal level of 5%. Since the practical implementation of FWFD builds upon (non-parametric) variance estimates, unsurprisingly, using FWFD instead of UFD (or WFD) involves some power loss under homoskedasticity.

Comparing UFD, WFD and FWFD
Rejection frequencies documented in the right hand side of Table 1 reveal that both tests suffer from substantial size distortions under weak cross sectional dependence and heteroskedasticity if the data are subjected to UFD (t ). In particular,t HSW andt DH barely result in rejections of the null hypothesis of a panel unit root if an upward variance shift occurs late in the sample. As a confirmation of our theoretical results, using FWFD (or ideally WFD) restores size precision of the heteroskedasticity-robust test statistics (t andt ), if variance shifts co-exist with linear trends. An expected power loss that goes back to using variance estimates (FWFD) instead of true variances is generally mild and vanishes asymptotically.
Since thet HSW statistic makes full use of sample information in metric form, it shows in most experiments some advantages in terms of power in comparison witht DH which only processes the sign of the (adjusted) level data (see, e.g. results in the left-hand side of Table 1, homoskedasticity). Table 2 documents simulation results obtained by employing FWFD on data generated according to DGP2 (serial correlation). As in the case of DGP1, FWFD allowst HSW andt DH to display very good size control in most variance break cases when the data exhibit serial correlation, and, hence, are subjected to an initial step of prewhitening. However, both tests suffer from finite sample size distortions that occur if the time dimension is small (T ¼ 25) or if it is not markedly larger than N. These finite sample size distortions likely arise from the facts that (i) the effectiveness of the prewhitening regression relies on T to allow precise estimation of the autoregressive parameters and (ii)-quite naturally-one gains additional control over the cross-sectional aggregation of (drift) estimation errors with T being sufficiently in excess of N. Two observations are  Data are generated according to DGP1 with cross-sectionally independent (CS independence) or spatially dependent (SAR (1)) errors. We set r i, I ¼ r i, II ¼ 1 to model homoskedasticity. Under heteroskedastic residual distributions, variance parameters are changed to r i, II ¼ 3 to introduce a positive break (up) and r i, II ¼ 1=3 to generate a negative shift (down). 'Early' ('late') denotes an early variance break occurring at the 20th (80th) percentile of the sample, i.e. s i ¼ 0:2 (s i ¼ 0:8). UFD (t , 2 fDH, HSWg) refers to the unweighted forward detrending suggested in Breitung (2000), while (F)WFD (t andt , 2 fDH, HSWg) stands for the (feasible) weighted forward detrending procedure suggested in this article. For variance estimation, the Epanechnikov kernel with bandwidth h ¼ T 1=2 is used. Variance and drift estimation of are iterated five times. The nominal size is 5% and power is size adjusted. All results are based on 5000 replications.

Serial correlation
worth noticing in this context. First, with regard to the relative importance of the encountered effects, a comparison of results in Tables 1 and 2 reveals that the adverse effects of estimation errors attached to the prewhitening coefficients somehow dominate those originating from drift estimation, since the latter matter for the results documented in both tables while the former are specific to the higher order dynamic model (DGP2). Second, size distortions oft HSW andt DH documented for small time dimensions T ¼ 25, 50 and prewhitened data are throughout less than the corresponding estimates for the robust PURT of Herwartz et al. (2019). For instance, simulation results in Table 2 of Herwartz et al. (2019) show that this robust test results in empirical size estimates below 1% for all experiments that involve DGP2 and time dimensions T ¼ 25, 50.
The PURTs under consideration (t HSW andt DH ) retain sizeable power and consistency when applied on data that are detrended by means of FWFD. Most importantly, size-adjusted power increases with N and T. It is also noteworthy that the finite sample performances of the tests are qualitatively similar under independence and weak cross-sectional dependence. Perhaps the only visible difference is that the tests are slightly less powerful when applied on correlated panels, which is consistent with the results in Herwartz et al. (2016).

Detrending of non-trending panels
In empirical practice an analyst might (unnecessarily) detrend heteroskedastic panel data that do not actually contain a deterministic trend. While just subtracting the first observation from the data ('demean') is the transformation of choice in this case (Breitung, 2000; see also footnote 3), it is interesting to examine how (unnecessarily) applying UFD and FWFD affects the performances of the tests. Table 3 shows respective simulation outcomes for the first order autoregressive model (DGP3). For all documented experiments applying the correct demeaning obtain most accurate empirical size and largest power. This result is not surprising, since both test statistics t HSW and t DH (and their variants applying to detrended data) are robust to heteroskedaticity. Unnecessarily subjecting non-trending heteroskedastic data to UFD, however, renders the test statistics useless as indicated by severe size distortions that occur under heteroskedastic model residuals. Unlike using UFD, applying FWFD to non-trending heteroskedastic panels is only 'costly' in terms of power weakness, but features accurate rejection frequencies under the null hypothesis.

Strong cross-sectional dependence
Results documented in Tables 1 and 2 show that removing linear trends by means of FWFD yields satisfactory finite sample properties of the heteroskedasticity-robust PURTs under distributional scenarios (independence, SAR(1)) that align in particular with Assumption AðiiÞ: Under bounded eigenvalues of X t , it holds, for instance, that (weighted) cross-sectional averages of e it converge in probability as N ! 1 if the weighting coefficients are of sufficient granularity (for a formal exposition and a detailed discussion of alternative definitions of cross-sectional dependence see, e.g. Chudik et al., 2011). In line with this characterization of weak cross-sectional dependence, a strong form of dependence is defined by assuming the weighted average to remain random in the limit of infinite N. Under strong cross-sectional dependence the limit distributions of both statistics t DH and t HSW may differ from the Gaussian under the panel unit root hypothesis. Although several articles (e.g. Bai and Ng, 2004;Phillips and Sul, 2003;Moon and Perron, 2004;Pesaran, 2006;Breitung and Das, 2008) have addressed panel unit root testing under strong forms of cross-sectional dependence as formalized by means of (a variety of) factor structures, they have not treated heteroskedasticity as a prominent data characteristic. In the following, we provide some insights from simulation exercises that highlight the performance of the PURTŝ t HSW andt DH in cases when panel data are not only strongly correlated but also heteroskedastic and trending. In line with Theorem 3.1 of Chudik et al. (2011), we consider a factor structure of strong form to generate panels subject to strong cross-sectional dependence. Specifically, we employ the following one-factor DGP similar to Bai and Ng (2004): In (28), model parameters are drawn as m i $ iid Uð0, 0:02Þ and d i $ iid Uð0, 0:02Þ: Moreover, the idiosyncratic component e it is simulated as e it ¼ qe i, tÀ1 þẽ it , where the stochastic termsẽ it are cross-sectionally independent, but potentially heteroskedastic. For homoskedasticẽ it , we set its variance r 2 it ¼ 1: Among variance break scenarios, we only consider an early negative variance break for space considerations. In this case, we set r it ¼ 1 for t b0:2Tc and change to r it ¼ 1=3 for t > b0:2Tc: Similar to the cases without cross-sectional dependence discussed above, data featuring a unit root in the idiosyncratic component e it are generated under the null hypothesis H 0 : q ¼ 1: Under the alternative hypothesis data are generated from a parameterization q $ iid Uð0:85, 0:95Þ: The autoregressive factor in (28) is sampled as F t ¼ aF tÀ1 þ u t with u t $ iid Nð0, 1Þ, and the loadings are drawn as k i $ iid Nð0, 1Þ: We consider both cases of stationary and non-stationary factors, and set, respectively, a ¼ 0:8 and a ¼ 1:0: In cases where heteroskedasticity-robust PURTs lack pivotalness under specific forms of strong cross-sectional dependence, FWFD could be preceded by a suitable defactoring step to remove the strong cross-sectional dependence from the data (e.g. Phillips andSul, 2003, andMoon andPerron, 2004). It is important to notice, however, that by using the defactoring approach, the unit root testing changes from a test on the data (i.e. y it ) to a test on the idiosyncratic error terms (i.e. e it ) only. 9 To demonstrate that employing existing panel unit root tests that are capable of handling strong forms of cross-sectional dependence may not necessarily provide valid inference under heteroskedasticity, we additionally simulate rejection frequencies for the PANIC test of Bai and Ng (2004). Specifically, the PANIC test proceeds in four main steps: (i) take the first difference of the data, (ii) defactor the differenced data by means of the first principal component,   Demetrescu and Hanck (2012a) to data that are transformed by subtracting the first observation from the lagged level. Data are generated according to DGP3. In the undocumented case of homoskedastic data UFD shows accurate size estimates and a lack of power similar to that of FWFD documented in this table. For further notes see Table 1. 9 We employ the defactoring procedure proposed in Phillips and Sul (2003) and Moon and Perron (2004). Specifically, we (i) extract the first principal component from the cross section of first differenced data, (ii) project observables y t and Dy tÀ1 on the space spanned by the principal component, and (iii) continue the analysis with the residuals from this projection.
(iii) apply the Augmented Dickey-Fuller (ADF) tests-without an intercept and a linear trend-on the cumulative sums of the defactored series, (iv) combine the p-values from (iii), denoted p i for the i-th panel unit, to obtain the following Fisher-type statistic: 10  Table 4 are obtained by assuming a stationary factor while a non-stationary factor has been used to obtain the results in the right hand side panel. With a stationary factor, botĥ t HSW andt DH display substantial over-rejections if the data are not defactored. This reflects the fact that the stationary factor could dominate the non-stationary process in the idiosyncratic component, resulting in misleading inference on the order of integration of the data. With defactoring, however, botht HSW andt DH display a remarkable size precision, highlighting the effectiveness of the defactoring step to enable consistent panel unit root testing under the given form of strong dependence. Interestingly, while the PANIC test of Bai and Ng (2004) works well under homoskedasticity, it suffers from substantial oversizing when a negative variance break occurs early in the sample. This result is not surprising given the fact that the PANIC tests considered in Bai and Ng (2004; and in this work) are inherently built on ADF tests, and it is well documented that ADF-type unit root tests are not robust to heteroscedasticity (e.g. Hamori and Tokihisa, 1997;Cavaliere, 2003).
Results documented in the right hand side of Table 4 show that empirical sizes oft HSW and t DH are very close to the nominal 5% level even when the data are not defactored. This is not surprising as both the factors and the idiosyncratic components are non-stationary in this case. Size-adjusted power is, however, substantially smaller for the case without defactoring, as the non-stationarity of the factors likely dominates the stationarity of the idiosyncratic components. The large power for defactored data underscores, however, the importance of interpreting the inferential analysis as a testing on the idiosyncratic component only, and not on the data as a whole. For instance, if the factor, but not the idiosyncratic components, is the source of nonstationarity in the data, defactoring could lead to the misleading indication of panel stationarity. 11 In sum, the simulation results confirm that the heteroskedasticity-robust tests work well for strongly correlated panels if defactored data are subjected to FWFD. Moreover, accounting for cross-sectional correlation without addressing the issue of heteroskedasticity could lead to misleading inference as highlighted by the large size distortions of the PANIC test under heteroskedasticity.

Empirical illustration
In this section, we reexamine the stationarity of the share of health care expenditures in GDP (denoted HCE/GDP) by means of PURTs that are robust not only to cross-sectional dependence, but also to time-varying variances. While this type of analysis could be seen to have been inspired by the discussion of stationarity of the so-called 'big ratios' (e.g. investment or consumption over 10 The respective MATLAB package for implementing the PANIC test is obtained from Serena Ng's home page: https://sites. google.com/site/sn2294/home/code-and-data. The package also includes the pÀvalues ('lm1.asc') required to construct the pooled test.

11
Unreported simulation results, which are available upon request, also reveal that applying the defactoring procedure for data with weak or no cross-sectional correlation leaves the small sample performance of the tests unaffected, although it leads to power losses. However, as results in Table 4 show, failing to defactor when the data contains one leads to biases which worsen or do not vanish asymptotically. Notes: Data are generated along the following one-factor model taken from Bai and Ng (2004): iidNð0, 1Þ, u t $ iidNð0, 1Þ andẽ it denotes the cross-sectionally independent, but potentially heteroskedastic, errors. Stationary (non-stationary) factors are generated by setting HSW and DH results with defactoring, data are first defactored (Moon and Perron, 2004;Phillips and Sul, 2003) before being subjected to FWFD. BN refers to the PANIC test (Bai and Ng, 2004). For further notes, see Table 1. GDP; Kaldor, 1957;Jones, 2016), the international development of trending HCE is typically considered within a richer set of conditioning information (including, for instance, demographic profiles, unemployment rates, etc.) in the form of a cointegration analysis (e.g. Herwartz and Theilen (2003) and the references therein). Yet, our unconditional analysis provides insights for analysts or social planners who (i) are interested in ad-hoc predictions of future HCE/GDP ratios or (ii) use such data in vector autoregressions or regression models. With regard to (i), it is worth recalling that non-stationarity of HCE/GDP complicates ex-ante prediction and the assessment of forecast uncertainty. With regard to (ii), the use of eventually non-stationary HCE/GDP ratios in regressions might result in spurious inferential outcomes. Similarly, in vector autoregressive analysis such data could call for an in-depth cointegration analysis. We subject the data to the proposed FWFD scheme, which renders the tests in Demetrescu and Hanck (2012a) and Herwartz et al. (2016) robust in the presence of linear trends and (co)variance changes. For comparison purposes, we also report results based on the heteroskedasticity-robust PURT suggested in Herwartz et al. (2019). The HCE/GDP data have been drawn Figure 1. Analyzed data on health care expenditures. Notes: HCE/GDP ratios and their natural logarithms are shown in the top panels. Graphs in the second row show time-varying standard errors (r it ) estimated non-parametrically by means of the Epanechnikov kernel with bandwidth h ¼ T 1=2 (Section 3.1.2). FWFD detrended level data (y Ã t ) are shown in the third line panels. The bottom panels display data subjected to FWFD subsequently to defactoring as suggested in Phillips and Sul (2003) or Moon and Perron (2004). Owing to the involved first differencing, the lag length selection and the prewhitening procedures, the first 4 or 5 observations are lost for subsequent analysis.
from the OECD database. 12 Dictated by data availability, our sample covers 20 OECD member countries and the period from 1970 until 2019. For unit root testing we use both the original HCE/GDP shares and their natural logarithms. We do not document test results for the changes (i.e. first differences) of the HCE/GDP shares or their logarithms. For the panel in first differences all considered tests provide highly significant evidence against the panel unit root (i.e. test statistics are throughout smaller than À2.22).
To get an impression of the trending and heteroskedastic characteristics of the panel, we visualize first and second order properties of the data in Figure 1. By construction, HCE/GDP ratios account for common trends in the underlying variables to some extent. It is therefore interesting to see from an eyeball inspection that the HCE/GDP ratios (and their log transforms) are both (linearly) trending. In addition, from inspecting the detrended (and defactored) data it is clear that the data construction might not have (fully) removed a common stochastic trend, since the random walk with drift model remains a potential candidate to describe HCE/GDP ratios. Moreover, the series exhibit time-varying variances for most of the economies. In sum, panel unit root testing is a non-trivial diagnostic step to investigate the trends in HCE/GDP ratios and, in particular, necessitates the use of heteroskedasticity-robust PURTs that work on detrended data. Table 5 reports estimated PURT results on the stationarity of HCE shares in GDP. As results are generally robust to the use of the AIC or BIC lag length selection criteria, we restrict our discussion to the results obtained by employing the BIC criterion. At a first glance, results generally depict mixed pictures on the trend stationarity of the series. However, a close look at the individual results reveals three main lessons from this empirical illustration. First, results could substantially vary depending on the use of the FWFD or UFD detrending schemes. For instance, focusing on the defactored log series, botht HSW andt DH (UFD) suggest a stationary GDP share of health care spending while non-stationarity is suggested ift DH ort HSW (FWFD) are employed. Given that the tests are not robust under UFD, the evidence is thus generally in favor of the null hypothesis that HCE/GDP follows a panel random walk with drift. The only exception to this conclusion comes fromt DH that leads to rejecting the null hypothesis of non-stationary HCE/ GDP ratios. Second, the use of the defactoring procedure in Phillips and Sul (2003) and Moon and Perron (2004) matters for the obtained results. Most of the potential rejections are recorded when the tests are applied on defactored data. As OECD member countries are highly integrated with each other, strong cross-sectional correlation is expected to exist in the data. Hence, given that the tests suffer from size distortions under strongly correlated panels, the results without defactoring deserve careful interpretation. Third, thet HSW statistics are uniformly in line with the null hypothesis of a unit root in HCE/GDP shares. In sum, our empirical illustration highlights Notes: Reported numbers are estimates of the PURTs. Bold entries represent cases in which the panel unit root null hypothesis is rejected with 5% significance. 'HMW' indicates outcomes for the robust test of Herwartz et al. (2019) which is subject to weak empirical size in short samples. The lag order used for prewhitening (see footnote 7) is selected based on either the BIC or the AIC, respectively, with the minimum and maximum lag lengths set to one and two (annual data). For further notes see Table 1. 12 OECD (2020), Health spending (indicator). https://doi.org/10.1787/777a9575-en (Accessed on 27 August 2020).
that HCE/GDP are likely subject to both linear and stochastic trending, which should be taken into account for subsequent empirical exercises such as ex-ante prediction, regression modeling and cointegration analysis.

Conclusions
We discuss the problem that existing detrending schemes affect the pivotalness of the heteroskedasticity-robust panel unit root tests (PURTs) of Demetrescu and Hanck (2012a) and Herwartz et al. (2016) under time-varying (co)variances. We propose a (feasible) weighted counterpart of the unweighted forward detrending method of Breitung (2000). While PURT statistics are subject to nuisance if applied to heteroskedastic panels after unweighted forward detrending, the suggested weighting scheme reinstalls asymptotic pivotalness (i.e. Gaussianity) of the heteroskedasticity-robust PURT statistics considered in this work.
To determine the required weights feasibly from the data, we employ non-parametric kernel estimates of time-varying variances of drift terms under the null hypothesis. Under certain regularity conditions, the non-parametric variance estimator is consistent. Moreover, the feasible weighted forward detrending approaches its (typically infeasible) counterpart that relies on true second order moments. Accordingly, the asymptotic distributions of the considered PURTs do not depend on required steps of variance estimation. Monte Carlo results for weakly dependent panel structures show that the tests of Demetrescu and Hanck (2012a) and Herwartz et al. (2016) have good finite sample properties when applied on data detrended by means of the suggested feasible procedure. Under specific forms of strong cross-sectional dependence defactoring is a viable means to reinstall the applicability of the heteroskedasticity-robust PURTs under scrutiny. In the same time, an ad-hoc application of the PANIC toolkit suggested in Bai and Ng (2004) to heteroskedastic panels is prone to marked size distortions.
As an empirical illustration, we examine the order of integration of health care expenditures (as a share of GDP, i.e. HCE/GDP) in OECD economies. Results show that the diagnosis of stochastic trends might crucially depend on the adopted detrending. As an outcome from heteroskedasticity-robust panel unit root testing, health care expenditures measured in relation to GDP are well described by both linear and stochastic trending. Put differently, within the OECD the dynamic patterns of HCE/GDP align with those of a random walk with drift.
The same holds true for the overall meanl i by setting t ¼ 0 in (34). From convergence in probability and uniform integrability of the expression ðl i Àl i Þ, 14 convergence in L q follows directly: To derive convergence of the vectorl , we additionally need to control for uniform convergence of the variance in the cross section dimension, i.e. max 1 i N max 1 t T jr 2 it À r 2 it j: Considering the definition of the variance estimatorr 2 it ¼ P T k¼1 K h ðk À tÞðDy it Àl i Þ 2 with K h ðÁÞ defined in (31) and noticing that P max for a constant C > 0, the summands of the right-hand side are independently of i a null sequence as discussed in (33). Thus, with the relation N=T 1=2 ! 0, uniform convergence follows, 15 which implies convergence of the mean estimation

Asymptotic normality oft HSW
We first consider results for the statistic involving the true variances r 2 it , i.e.t HSW : These results are then extended tot HSW with the results of Lemma 1. Similar to the asymptotic result in Herwartz et al. (2016), the following conditions of a central limit theorem (CLT) for martingale difference arrays (Davidson, 2000) need to be fulfilled for the sequence X Tt : (a) fX Tt , F Ã Tt g is a martingale difference array with finite unconditional variances v 2 Tt : Proof of Lemma 2. Condition (a): For fX Tt , F Ã Tt g to be a martingale difference array, it needs to hold that (i) EjX Tt j < 1, and (ii) E½X Tt jF Ã T, tÀ1 ¼ 0, a.s. for all t ¼ 1, :::, T: For (i), it can be shown that for s t defined in (5), with e Ã i ¼ ð e Ã tþ1 À ðt À 1Þe Ã 0 e t þ ðt À 1Þe Ã 0 e Ã tþ1 j < 1: to zero in probability. With comparable assumptions on the kernel density estimator of the variance which defines the estimated weights, our estimatorl i corresponds to the intercept estimator in Carroll (1982). 14 Uniform integrability follows from boundedness in L 2 (see, for instance, Davidson, 1994, Theorem 12.10) that obtains from the triangle inequality, Ejl i Àl i j 2 Ejl i j 2 þ Ejl i j 2 : Specifically,l i is bounded in L 2 , since the squared weighted mean has a finite variance following Assumption AðiiiÞ which is vanishing for finite r 2 kt > 0: An analogous argument applies tol i after controlling for the variance estimation error as formalized by the convergence result in (33). The first part corresponds to the elements of the test statistic considered in Herwartz et al. (2016) and thus, finiteness here follows with the same arguments of weak cross sectional dependence and the assumption of finite fourth order moments in AðiiiÞ as for t HSW . The remaining terms involve expressions e Ã tþ1 , which converge to zero with increasing T, implying finiteness of these expressions.
The martingale difference property in (ii) follows for all t ¼ 1, :::, T, with E X Tt jF Ã T, tÀ1 Measurability of the first part follows as the sum P tÀ1 k¼1 e k is directly measurable with respect to rðe 1 , :::, e tÀ1 Þ: For e Ã it holds and thus, the first part in (6) can be taken out of the conditional expectation. The second part equals zero, since ik Þ À1 defines the square root of the covariance matrix of e Ã k where e Ã ¼ P T k¼1 e Ã k : Thus, the entries of X k are of order OðT À2 Þ: Equivalently,X tþ1, 1=2 k defines the covariance matrix of e Ã k, tþ1 with entriesx k ij ¼ Weak cross sectional dependence yields finiteness with respect to the cross sectional dimension N. The first term is finite by finiteness of the fourth moments of e t and arguments in Herwartz et al. (2016). For the remaining three summands, we can use similar arguments on their order as in condition (b) and use the convergence in L q -norm jje Ã jj q and jje Ã tþ1 jj q so that the expressions within the expected values are stochastically bounded.
Proof of Proposition 1. With Lemma 2, the conditions for the central limit theorem stated in Theorem 6.2.3 in Davidson (2000) are fulfilled and asymptotic normality of the test statistic follows directly 1 ffiffi ffi l !l in L q norm. By writing e Ã ¼l À l ¼ ðl ÀlÞ þl À l we can separate the remaining distancel Àl in X Tt and thus, in the test statistict HSW : By boundedness of the distancel Àl in L q and using Markov's inequality, the separated terms involvingl Àl converge to zero. For instance, in the numerator the scaling in X Tt (i.e. by 1= ffiffiffiffiffiffi ffi NT p ) and the overall numerator (i.e. by 1= ffiffiffi ffi T p ) control the summation of these vanishing distances over T.