Consistent Testing for Pairwise Dependence in Time Series

ABSTRACT We consider the problem of testing pairwise dependence for stationary time series. For this, we suggest the use of a Box–Ljung-type test statistic that is formed after calculating the distance covariance function among pairs of observations. The distance covariance function is a suitable measure for detecting dependencies between observations as it is based on the distance between the characteristic function of the joint distribution of the random variables and the product of the marginals. We show that, under the null hypothesis of independence and under mild regularity conditions, the test statistic converges to a normal random variable. The results are complemented by several examples. This article has supplementary material online.


Introduction
The problem of measuring and detecting generic serial dependence is often encountered in time series analysis. The autocorrelation function (ACF) (Brockwell and Davis 1991, Definition 1.3.1) is a traditional tool for measuring dependence and constructing tests of independence. Several authors have also considered the use of the spectral density function, see Hong (1996Hong ( , 1999 among others. Therefore, the methodologies for constructing test statistics for checking serial dependence can be divided into two main categories: time and frequency domainbased methodologies. It is well known that time domain analysis includes correlation-based tests such as those proposed by Box and Pierce (1970) and Ljung and Box (1978). The corresponding test statistics employ the ACF to test serial dependence. However, these tests are inconsistent for processes that are dependent but uncorrelated (Romano and Thombs 1996;Shao 2011). Another limitation of these tests is that the number of lags included in the construction of a test statistic is held constant in the asymptotic theory (Xiao and Wu 2014). The latter may be a severe limitation in practice, since the actual dependence may be of higher order (Hong 2000). Moreover, the ACF is suitable for detecting serial dependence in Gaussian models. Thus, the ACF fails to detect dependence for nonlinear and non-Gaussian models and alternative dependence measures are required (see Tjøstheim 1996 for an overview).
A different measure of dependence, which is termed as distance covariance function, has been proposed recently by Székely, Rizzo, and Bakirov (2007). The sample version of distance covariance function can be viewed as a degenerate V-statistic. The limit distribution of degenerate U-and V-statistics for stationary and ergodic random variables, as well as for weakly dependent random variables, is examined thoroughly in the works by Leucht andNeumann (2013a, 2013b of the distance covariance function is that it identifies nonlinear dependence structures that are not detected by the ACF (for instance, stock return data). Zhou (2012) extended this measure into time series, by retaining its basic features and by calling it auto-distance covariance function (ADCV). Zhou (2012) studied the behavior of ADCV at a fixed lag, while in this contribution, we consider an increasing number of lags. This is achieved by employing spectral domain methods, which allows us to incorporate a higher number of lags. If a stationary time series is serially uncorrelated, then its standardized spectral density is uniformly distributed, that is, it takes a constant value over the interval (−π, π ). Thus, any deviation of the normalized spectral density from uniformity provides strong evidence of correlation. However, standard spectral density approaches work sufficiently well for Gaussian processes. They become inappropriate for non-Gaussian models since they miss nonlinear processes with zero autocorrelation (for instance, autoregressive conditional heteroscedastic (ARCH), generalized ARCH (GARCH), bilinear, nonlinear, or nonlinear moving average (NMA) models; see Priestley 1981;Hong 1999). Motivated by this fact, Hong (1999) introduced a generalized spectral density approach that captures all form of dependencies, using the empirical characteristic function (ECF) and its derivatives. Some applications of the ECF include the work by Feuerverger (1993) who developed a consistent rank test for bivariate dependence, and Knight and Yu (2002) who proposed an estimation method, based on the ECF for strictly stationary processes, leading to consistent and asymptotically normal estimators. Based on Hong's (1999) approach, we employ the ADCV to propose a new test for serial independence. The main contribution of the article is that the number of lags included in the construction of test statistic grows with the sample size of the process. Moreover, because of the generalized spectral density approach, the proposed test statistic captures all pairwise dependencies. In addition, it is faster to compute than Hong's (1999) statistic, since it essentially avoids a two-dimensional integration. This contribution builds a bridge between the theory of distance covariance functions proposed by Székely, Rizzo, and Bakirov (2007) and the work of Hong (1999).
The article is divided as follows. In Section 2, we give some basic definitions of the theoretical and empirical ADCV. Moreover, by considering a strictly stationary α-mixing process, we establish the consistency of the empirical ADCV. In Section 3, we discuss the definition of ADCV by means of spectral analysis. In particular, we show the connection between the works of Székely, Rizzo, and Bakirov (2007) and Zhou (2012) and the work of Hong (1999). Our proposed test and its asymptotic properties are described in Section 4. Some simulated and real data are analyzed in Section 5. Concluding remarks are provided in Section 6. This article has online supplementary material that includes further results and proofs.

Distance Covariance Function
Assume that {X t , t ∈ Z} is a univariate strict stationary time series and suppose that we have available a sample of size n. In what follows, we will make the following assumptions for developing the theory: Assumption 1. {X t } is a strictly stationary α-mixing process with mixing coefficients α( j), j ≥ 1.
Assumption 1 is useful for developing theoretical results about the ADCV. It is a rather natural assumption as a first step toward studying the estimation of (4) in the context of time series. Assumption 2 guarantees the finiteness of (4). Assumption 3(i) implies the existence of the generalized spectral density (9), as we will see in the next section. Assumption 3(ii) is the minimal condition needed to obtain a Marcinkiewicz-Zygmund-type inequality (Doukhan and Louhichi 1999, Lemma 6) for the proofs of two lemmas given in the supplementary material. Assumption 3(ii) implies 3(i).
We will define the distance covariance function by resorting to the joint and marginal characteristic functions of the pair X t , X t−| j| . Denote the joint characteristic function of X t and X t−| j| by φ | j| (u, v ), that is, where (u, v ) ∈ R 2 , and i 2 = −1. Furthermore, let be the marginal characteristic function of X t . Following Hong (1999), and because of the assumed stationarity, we define that is, (1) denotes the covariance function between the two series e iuX t and e ivX t−| j| . From (1) we note that σ j (u, v ) is simply the difference between the joint characteristic function of (X t , X t−| j| ) and the product of their marginals. Hence, where W (u, v ) is an arbitrary positive weight function for which the above integral exists. In particular, the weight function which results to the ADCV function defined by Hong (1999) defined implicitly (2) by using an integrable weight function W (·, ·), it turns out that using a nonintegrable weight function, like (3), yields a closed-form expression of the estimate of the auto-distance covariance function. In addition, the calculation of the estimator based on (3) is faster than using Hong's (1999) approach. The auto-distance correlation function (ADCF) is the square root of It is scale invariant and nonzero when X t and X t−| j| are dependent at lag j. In addition, the most important feature of (4) is that if it is calculated by using an integrable function then it might miss the potential dependence among observations Bakirov 2007, p. 2771). To develop an estimator for (4), definê Then, the sample auto-distance covariance function is defined byV 2 Estimator (7) can be computed as follows: with a rl = |X r − X l |,ā r. = ( n l=1+| j| a rl )/(n − | j|),ā .l = ( n r=1+| j| a rl )/(n − j ),ā .. = ( n r,l=1+| j| a rl )/(n − j ) 2 , and quite analogously for B rl . Then, Clearly (8) can be easily implemented for any given time series data, because it is computed by simple summation and multiplication. It is expected to perform better than the usual autocovariance function especially for nonlinear time series models. In addition, it is an appealing measure of dependence since its computation is based on linear combinations of distances among observations. Note that (4) (and its empirical analog) have been studied by Zhou (2012) under the setup of multivariate time series. However, although in this contribution we will be focusing exclusively on univariate responses, our results can be extended to the case of multivariate time series.
The proof of the above proposition is based on the arguments provided in the proof of Székely, Rizzo, and Bakirov (2007, Theorem 2). The α-mixing condition enables application of the ergodic theorem to the case of time series data. Under mild conditions, Zhou (2012) obtained the weak consistency ofV 2 X (·). Note that in the approach taken by Zhou (2012) it is required that E |X t | 1+δ < ∞ for some δ > 0. In our approach, we require E |X t | < ∞. However, Zhou (2012) proved this result using the physical dependence measure suggested by Wu (2005), whereas we employ the notion of α-mixing (Rosenblatt 1956).

Generalized Spectral Density Approach
We now discuss the connection of ADCV with the work by Hong (1999).
Recall (1) and suppose that sup (u,v )∈R 2 j σ j (u, v ) < ∞, which holds under Assumption 3(i). Then, the Fourier transform of σ j (u, v ) exists and is given by When σ j (u, v ) = 0, ∀ j then (9) reduces to the constant Therefore testing f whether is constant with respect to ω implies that all σ j (u, v ) = 0, that is, X t and X t−| j| are independent for all j. Hong (1999) studies a kernel-density estimator of f (ω, u, v ) and calculates its L 2 -distance from f 0 (ω, u, v ) to test for f being constant. In addition to Assumptions 1-3 consider also the following.
Assumption 4. Suppose that k(.) is a kernel function such that k : R → [−1, 1], is symmetric and is continuous at 0 and at all but a finite number of points, with k(0) = 1, Assumption 4 is mild and allows for kernels with bounded or unbounded support. Set the following nonparametric estimator of f, where k(·) is a kernel function satisfying Assumption 4, and p is a bandwidth whose choice is discussed later. Similarly, put whereσ 0 (·, ·) is given by (6). Then, consider the weighted squared norm off n minusf 0 . After some calculations, we obtain that for any suitably weighting function such that the above integral exists. In particular, for the choice of W (·, ·) given by (3) we have that This fact motivates our study of Box-Pierce-type statistics based on the auto-distance covariance function. Indeed, if k(z) = 1 if |z| ≤ 1 and 0 otherwise (i.e., in the case of uniform weighting), then the last expression becomes (1 − j/n)V 2 X ( j). (12) Equation (12) can be viewed as Box-Pierce-type statistic for testing the hypotheses V 2 X ( j) = 0, j = 1, . . . , p, since the factor (1 − j/n) can be replaced by unity. It is interesting to note that, by recalling (1) and letting we can develop a test statistic for testing independence in terms of ADCF, that is, whereR 2 X (·) is given by (5).

Main Results
In this section, we develop a test statistic for testing the hypotheses that the sequence {X t , t = 1, . . . , n} forms an iid sequence. The test statistic is motivated by (11) and is based on following Hong (1999). However, there is an important difference between the test statistic obtained by Hong (1999) and this given by (13). The weight function chosen previously in (11) to form test statistics like (13) is assumed to be integrable. However, in our case we propose (13) by allowing a nonintegrable weight functions W (·, ·). We have the following results: Theorem 1. Suppose that Assumptions 2 and 4 hold and let p = cn λ , where c > 0, λ ∈ (0, 1). Then, under the null hypothesis that {X t } is an iid sequence, we have that andĈ 0 ,D 0 are their sample counterparts and expectation is taken with respect to distribution of X t with X t an independent copy of X t .
The above theorem assures that under any alternative hypothesis, M n has asymptotic power 1 whenever the weighted squared norm of f (ω, u, v ) minus f 0 (ω, u, v ) is positive. This is a consequence of the fact that Clearly, (14) is equal to 0 if and only if X t and X t−| j| are independent for all j ≥ 1. Therefore, the statistic M n is consistent against the hypothesis of pairwise dependence. In fact, we can obtain the optimal kernel function k(.) that maximizes the asymptotic power of M n under some conditions. In this sense, the Daniell kernel is the optimal kernel that maximizes the power of the test statistic proposed by Hong (1999, Theorem 6).

Simulations
We first report some empirical results concerning the behavior of the test statistic T n given by (13). The simulations correspond to different sample sizes and we use standard nonparametric bootstrap (number of replications B = 499) to obtain critical values for studying the size and the power of the proposed statistic. The calculation of the test statistic is based on the use of the R package energy of Rizzo and Székely (2013).
To examine the effects of using different kernel functions for constructing the test statistic T n , we choose Lipschitz continuous functions, that is, functions k(·) such that for any z 1 , for some constant C. In particular, we use the Daniell kernel (DAN), the Parzen kernel (PAR), and the Bartlett kernel (BAR) (more details can be found in the supplementary material). We also compare the test statistic T n , given by (13), with other test statistics to examine its relative performance. In particular, we consider the Box-Pierce (BP) test statistic the test statistic proposed by Hong (1996) and the test statistic given in Equation (11) (with an integrable weight function but without the multiplier of 2/π ). We denote the latter M (2) n . Note that a convenient way to calculate M (2) n is to employ the cumulative distribution function of a standard normal random variable (Chen and Hong 2012), that is, . This allows us to consider a countable number, say N, of grid points (u, v ) for which the integral in Equation (11) is replaced by its empirical mean. The number N is chosen to be 500 because a larger choice of N would not alter the results significantly. Furthermore, in the supplementary material we show that M (2) n is computationally more expensive than T n , especially when n and p are large (see Table  S1 of the supplementary material).
We first investigate the size of the test. Suppose that {X t } is an iid sequence of standard normal random variables. To examine the sensitivity of the test statistic T n on the values of bandwidth p, we use p = n λ with λ = 1/5, 2/5, 3/5. If n = 100, then p takes approximately the values 3, 7, and 16. Similarly for other sample sizes. Table 1 contains achieved Type I error rates at 5% and 10% nominal levels. We note that the proposed test statistic keeps its size closer to its nominal level. In fact the Bartlett kernel yields better approximations. Further support for the asymptotic normality of the proposed test statistic is given in Table 2. These results show that the asserted asymptotic normality is adequate especially for large sample sizes.
For investigating the power of the test statistic T n , we consider the following data-generating processes: r TAR(1)-model r NMA(2)-model with { t } a sequence of iid standard normal random variables. Note that (15) corresponds to an autoregressive conditional heteroscedastic model of order two (see Engle 1982) and (16) corresponds to a threshold autoregressive model (TAR) of order one (see Tong 1990;Tsay 2005, Section 4.1.2 for instance). The TAR model generates data with nonlinear dependence structure. Model (17) is an example of nonlinear moving average of order two. It is well known that the process {X t } generated by  (17) consists of a sequence of dependent but uncorrelated random variables. Figure 1 (respectively, Figure 2) shows the power of all test statistics considered for various sample sizes and bandwidth parameters when the data are generated by (15) (respectively, (16)). We note that in both cases T n and M (2) n perform better than all the other test statistics in the sense that they achieve the maximum power. For bandwidth values of the form n 1/5 and n 2/5 , the power of both test statistics increases to one, especially for large sample sizes. When p = n 3/5 , then we note that, for the case of model (15), the power of M (2) n is superior to the power of T n . However, the simulation suggests that as the sample size tends to larger values, the power of T n approaches the power of M (2) n . The situation is reversed in Figure 2, but the fact that both tests give similar results is clearly depicted, especially for large values of sample size. Similar results are obtained for (17) and are reported in the supplementary material.

S&P 500 Stock Return Data
We analyze monthly excess returns of the S&P 500 index starting from 1926. This series consists of 792 observations (Tsay 2005, Example 3.3). The ACF plot of the original series-see Figure S2 of supplementary material-suggests a moderate serial correlation at lags 1 and 3, while the ACF plot of the squared series shows strong linear dependence. This is a common feature in financial returns. However, the ADCF plots strongly suggest dependence, especially when considering the shown critical values that correspond to the independence test. These critical values are computed via the subsampling approach suggested by Zhou (2012, sec. 5.1) where the choice of the block size is based on the minimum volatility method proposed by Politis, Romano, and Wolf (1999, sec. 9.4.2). Tsay (2005) suggested an AR(3)-GARCH(1,1) model for the series. However, it is further observed that all autoregressive parameters are insignificant at the 5% significance level. Hence, a Gaussian GARCH(1,1) model is fitted to these data. After data fitting it is of interest to study the behavior of the standardized residuals. The upper panels of Figure 3 show the ACF plots of the standardized residuals and the squared standardized residuals of the fitted model. These plots fail to show any signal of serial correlation. On the other hand, their ADCF plots (lower plots of Figure 3) indicate that there is dependence among the residuals. Table 3 contains   density allows us to detect pairwise dependence in both linear and nonlinear time series structures. Our approach differs from that of Hong (1999), since our test statistic is calculated by means of a nonintegrable weighting function. The nonintegrability of the weight function yields more interesting results. In addition, we allow the number of lags tested in H 0 to increase with the sample size n. Empirical results suggest that our new test of independence has better power than the portmanteau tests of Box and Pierce (1970) and Ljung and Box (1978) and the test proposed by Hong (1996), against a nonlinear structure. The proposed test is quite close in terms of power to the test proposed by Hong (1999), and in some cases it outperforms M (2) n . This test statistic depends on a bandwidth parameter p. A cross-validation method might be suitable to choose the bandwidth parameter but we can also vary p to examine the sensitivity of the results obtained. In our data example we did not discover any notable relation between p and the outcome of all test statistics.
We believe that this line of research can lead to several different directions. First, we note that the ADCV can also be defined in the context of multivariate time series. Indeed, we can define ADCV for the purpose of examining pairwise and concurrent dependence. Similar arguments can provide a new insight on testing pairwise dependence for multivariate time series. Another possible direction is based on relaxing Assumption 1. For this goal, the framework introduced by Dahlhaus (1996) concerning locally stationary processes can be quite useful to define a local ADCV. Furthermore, we can extend the concept of ADCV to spatial data, irregularly spaced time series, and high-dimensional time series.

Supplementary Materials
The online supplementary materials include some further results concerning applications (Section 1) and the proofs of the main theorems (Section 2).