Cross-sectional Independence Test for a Class of Parametric Panel Data Models

This paper proposes a new statistic to conduct cross-sectional independence test for the residuals involved in a parametric panel data model. The proposed test statistic, which is called linear spectral statistic (LSS), is established based on the characteristic function of the empirical spectral distribution (ESD) of the sample correlation matrix of the residuals. The main advantage of the proposed test statistic is that it can capture nonlinear cross-sectional dependence. Asymptotic theory for a general class of linear spectral statistics is established, as the cross-sectional dimension N and time length T go to infinity proportionally. This type of statistics covers many classical statistics, including the bias-corrected Lagrange Multiplier (LM) test statistic and the likelihood ratio test statistic. Furthermore, the power under a local alternative hypothesis is analyzed and the asymptotic distribution of the proposed statistic under this local hypothesis is also established. Finite sample performance shows that the proposed test statistic works well numerically in each individual case and it can also distinguish some dependent but uncorrelated structures, for example, nonlinear MA(1) models and multiple ARCH(1) models.


Introduction
Cross-sectional dependence has been widely studied in panel data analysis.It plays an important role in economic and financial models and creates great challenges to classical statistical inference.
For example, the existence of cross-sectional dependence can lead to the loss of efficiency of the classical least-square estimation method.Before imposing any structure on models under study, it is necessary to test whether there is a type of cross-sectional dependence.The econometrics literature basically discusses about how to test for cross-sectional uncorrelatedness in panel data analysis.Under the case of fixed N and large T , Breusch and Pagan (1980) proposed Lagrange multiplier (LM) test statistic which is based on the average of correlation coefficients of the residuals.For large N and large T , Pesaran, Ullah and Yamagata (2008) developed a bias-adjusted LM test using finite sample approximations.Recently, Baltagi, Feng and Kao (2012) derived the asymptotic distribution of a scaled LM test statistic proposed in Pesaran (2004).However, both papers assume normally distributed error components.Pesaran (2004) provided a diagnostic test for parametric linear models based on the average of the sample correlations as N and T are comparable, which is called the CD test.Chen, Gao and Li (2012) extended the CD test to nonparametric nonlinear models.Other related studies include Su and Ullah (2009) for testing conditional uncorrelation through examining a covariance matrix in the case of N being fixed.Meanwhile, Schott (2005) also established an asymptotic distribution for a scaled LM test statistic for high dimensional normally distributed data.Bai and Silverstein (2004) analyzed this kind of statistics based on sample covariance matrices, and Bai, el. (2009) utilized it to develop an asymptotic theory for likelihood ratio (LR) statistics under high dimensional settings.
Since the population mean and variance of the original data are usually unknown, sample covariance matrices cannot provide us with sufficient and correct information about the data.In order to address such issues, Gao, el. (2014) proposed using linear spectral statistics of sample correlation matrices.
One of the main advantage of using sample correlation matrices over sample covariance matrices is that it does not require the first two population moments of the elements of the random vector under study to be known.In this paper, we further explore the idea of using the characteristic function of the empirical spectral distribution (ESD) of the sample correlation matrix of the data under study.
We then propose a new test statistic for testing cross-sectional independence of the cross-sectional residuals involved in a class of parametric panel data models.The construction of the new test statistic is based on the fact that it is a sum of the characteristic function of each eigenvalue of the sample correlation matrix.In view of this, this statistic includes the high order moments of the residuals under investigation.Due to possible nonlinear dependence being reflected by the relationship among high order moments of the residuals, our proposed statistic is applicable to distinguish various dependent structures.In view of this point, we are able to test for cross-sectional independence rather than just cross-sectional uncorrelatedness, as has been discussed in the econometrics literature (see, for example, Pesaran (2004)).
In terms of the comparison with the work by Gao, el. (2014), we can stress the following points.
First, this paper deals with the case where the cross-sectional residuals are unobservable.By contrast, Gao, el. (2014) considered a vector of observable random variables.Second, Gao, el. (2014) focused on the case where the observed random variables are all independent and identically distributed.By contrast, this paper allows that the cross-sectional residuals can be either independent or dependent.
We then establish new asymptotic distributions for the proposed test statistic for such cases.The main difficulty involved in the establishment of the main results of this paper is that the estimated versions of the cross-sectional residuals are always highly dependent even when the cross-sectional residuals themselves are assumed to be independent in Sections 2 and 3. We should also point out that Section 4 then demonstrates both the effectiveness and the strength of the proposed test statistic for capturing some weak dependence structures.As a consequence, the proposed test is applicable to test for cross-sectional dependence among some commonly used econometric models, such as spatial moving average, dependent factor, nonlinear moving average and multiple ARCH models.
The rest of the paper is organized as follows.Section 2 introduces the proposed test statistic and some results related to large dimensional random matrix theory.Asymptotic theory is presented in Section 3, including the asymptotic distribution of the proposed test statistic under the null hypothesis and the power under a general class of local alternative hypotheses.Section 4 specifically studies a local alternative hypothesis, under which the asymptotic distribution of the new statistic is demonstrated.In Section 5, the finite sample performance illustrates the effectiveness of the proposed test statistic under different dependent structures, including some dependent but uncorrelated structures.
Conclusions are in Section 6.All the mathematical proofs are given in Appendix A, and computation code functions are displayed in Appendix B.

The Model and test statistics
Consider a parametric linear panel data model of the form where j indexes the j-th cross-sectional unit and t indexes the t-th time series observation; y jt is the dependent variable; x jt denotes the p-dimension regressors with the slope parameter β; α j is the fixed effect with N j=1 α j = 0 for the identifiability of the model (2.1); and the error component u jt is allowed to be cross-sectionally dependent but uncorrelated with x jt .
The aim of this paper is to conduct a cross-sectional independence test as follows.
H 0 : {u jt } is independent of {u rt } f or all j = r; (2.2) against H a : {u jt } and {u rt } are dependent f or some j = r.
Under the null hypothesis, the least squares estimator of Then the estimator for (2.5) We are now ready to introduce linear spectral statistics for cross-sectional independence test (2.2).
Consider the sample correlation matrix Let us study a class of statistics related to eigenvalues of the matrix RN .First, the empirical spectral distribution (ESD) of the sample correlation matrix RN is defined as where λ 1 ≤ λ 2 ≤ . . .≤ λ N are the eigenvalues of RN and I(•) is an indicator function.
The strategy of analyzing the ESD of RN is divided into two steps.The first step is to investigate the eigenvalues of the matrix R N = (ρ rj ) N ×N with ρ rj being ρrj by replacing ûr with u r , while the second step compares the eigenvalues of RN with those of R N .
If u 1 , u 2 , . . ., u N are independent, F R N (x) converges with probability one to the Marcenko-Pastur (simply called M-P) law F c (x) with c = lim T →∞ N/T (see Jiang (2004)), whose density has an explicit expression of the form (2.7) and a point mass 1 − 1/c at the origin if c > 1, where a = (1 − √ c) 2 and b = (1 + √ c) 2 .In the following section, we will prove that F RN (x) has the same limit as F R N (x).
Based on the difference between the empirical spectral distribution F RN (x) and M-P law where f (•) is an analytic function on [0, ∞).
Consider a modified linear spectral statistic of the form: where The linear spectral statistic T N (f ) is a general statistic in the sense that it covers some classical statistics as special cases.
The construction of our proposed test statistic mainly comes from the following observation: under the null hypothesis, the limit of the ESD of the sample correlation matrix RN is the M-P law defined in (2.7) when u 1 , • • • , u N satisfy Assumptions 1 and 2.Moreover, numerical investigations indicate that when u 1 , • • • , u N are only uncorrelated instead of independent, the limit of the ESD of RN is not the M-P law (see Ryan and Debbah (2009)).From this point, any deviation of the limit of the ESD from the M-P law is evidence of dependence.Hence these motivate us to use the ESD of RN , F RN (x), as a test statistic.However, there is no central limit theorem available for (F RN (x) − F c N (x)), as argued by Bai and Silverstein (2004).Therefore, instead, we consider the difference between the respective characteristic functions of F RN (x) and F c N (x).
The characteristic function of (2.11) where λ j , j = 1, 2, . . ., N are the eigenvalues of the sample correlation matrix RN .
Our test statistic is then proposed as follows: where s c N ( ) is the characteristic function of F c N (x), obtained from the M-P law F c (x) with c being replaced by c N = N/T , and U ( ) is a weight function with its support on a compact interval, say An important concept related to the spectral analysis of large dimensional random matrix theory is the Stieltjes transform.For any cumulative distribution function (CDF) G, its Stieltjes transform is defined as Linear spectral statistics and the Stieltjes transform of any CDF G have the relation where f is analytic on an open set containing the support of G; C is a contour which is closed and is taken in the positive direction in the complex plane enclosing the support of G.

Asymptotic Theory
In this section, we will establish a new CLT for a general class of linear spectral statistics and then apply the CLT to the proposed test statistic S N .
Before stating the main results, we specify some notation.Let RN = N In Theorem 1 below, and then Theorems 2-4 in Sections 3 and 4, we will establish some new asymptotic properties.Their proofs are given in Appendix A of a supplementary document.Theorem 1 provides the CLT for linear spectral statistics based on the sample correlation matrix RN .
Theorem 1.In addition to Assumptions 1 and 2, let f 1 , f 2 , . . ., f k be functions on R analytic on an open interval containing Moreover, let κ = E(u 4 11 ) E(u 2 11 ) .Then the random vector converges weakly to a Gaussian vector (U f 1 , . . ., U f k ), with means and covariance function where and j, r = 1, 2, . . ., k.The contours in (3.1) and (3.2) are closed and are taken in the positive direction in the complex plane, each enclosing the support of F c .
Based on Theorem 1, we can derive an asymptotic distribution for the proposed test statistic S N as follows.
Theorem 2. Under the assumptions of Theorem 1, the scaled statistic where V (τ ), Z(τ ) is a Gaussian vector whose mean and variance are determined in (3.1) and (3.2) by taking f 1 (x) and f 2 (x) as sin(x) and cos(x), respectively.
We can evaluate the power of the statistic S N for a class of local alternatives, although it is difficult to establish the asymptotic distribution for the test statistic under such a class of local alternative hypotheses.
Due to (2.12), the proposed statistic S N can be written into the form as follows. Furthermore, where From (3.5), the power of the statistic S N relies on the value of ∆ N .
Theorem 3. In addition to Assumptions 1 and 2, let the following hold in probability, lim sup where F RN H 0 stands for the ESD of RN under H 0 and F RN Ha is the ESD of RN under H a .Then lim where γ α is the critical value of N 2 S N under H 0 corresponding to the significance level α.
Remark 1.Note that if F RN H 0 and F RN Ha have different limits in probability, then e i x d F RN H 0 (x) − F RN Ha (x) converges in probability to a nonzero constant depending on by Levy's continuity theorem.This ensures (3.6) is true.Most of the examples given in the subsequent sections satisfy (3.6).

A local alternative hypothesis
It is well known that there are two commonly used cross-sectional dependent structures in panel data analysis: spatial models and factor models.In this section, we consider a simple factor model to describe cross-sectional dependence.An asymptotic theory is established as a consequence of our discussion.
Note that the proposed test is based on the idea that the limits of ESDs under the null and local alternative hypotheses are different.Yet, it may be the case that there exists some dependence among the set of vectors u 1 , • • • , u N , but the limit of the ESD associated with such vectors is the M-P law.
Then a natural question is whether the statistic S N works under this case.
Model (4.1) can be written as the vector form or the matrix form where τ and e is an N × 1 vector with all elements being one.
Under the local alternative hypothesis (4.1), the residuals u 1t , u 2t , . . ., u N t are dependent due to the common factor 1 √ T v t .This kind of dependence is rather weak in the sense of the covariance between u jt and u kt (j = k) being 1 T , which tends to 0 as T goes to infinity.By the rank inequality (see Lemma 3.5 of Yin (1986)) and the fact that rank(ve T ) ≤ 1, it can be concluded that the limit of the ESD of the matrix R N is the same as that of the sample correlation matrix of {ε 1t , ε 2t , . . ., ε N t }, i.e. the M-P law.Even so, we still would like to use the proposed statistic S N to capture this kind of cross-sectional dependence.
Then, the proposed test statistic N 2 S N converges in distribution to the random variable R 2 given by where (W ( ), Q( )) is a Gaussian vector whose mean and covariance are specified below: and covariance function where m (2) (z), m (2) (z) and V (c, m(z 1 ), m(z 2 )) are defined in (3.3) and (3.4) respectively.
Replacing by cos( z) and sin( z), respectively.The contours in (4.5) and (4.6) both enclose the interval Moreover, the contours γ 1 and γ 2 are disjoint.
In view of Theorem 4, we see that the proposed test statistic S N still works mainly due to the involvement of the last term on the right-hand side of (4.5).Section 5 below employs the proposed test statistic to evaluate the finite-sample performance and the practical applicability of the proposed test.

Finite sample studies
We will present the empirical sizes and power values of the proposed test statistic under several scenarios.

Empirical sizes and power values
First, we introduce the method of calculating the empirical sizes and power values.
where N 2 S H 0 N represents the value of the test statistic N 2 S N based on the data simulated under the null hypothesis.
In our simulation, we choose K = 1000 as the number of the replications.The significance level is α = 0.05.Similarly, the empirical power is calculated by where N 2 S Ha N represents the value of the test statistic N 2 S N based on the data simulated under the alternative hypothesis.

Computational aspects
In the procedure of calculating both the empirical size and the empirical power in (5.1) and (5.2), respectively, we need to compute the asymptotic mean and variance derived in Theorem 1.Since the computation is relatively complicated, we provide a summary of the key steps to show how it is done.
The code functions involved are displayed in Appendix A.
There are four key steps involved in computing the numerical values of the asymptotic mean and variance functions.They are summarised as follows.
Step 1.The LSD's m(z) and m(z) are replaced by the estimators m Step 2. The derivatives T tr( RN − zI T ) −2 , respectively.
Step 3.For the asymptotic mean, we let z = r • e iθ by the polar coordinates transform and then replace the contour C by the circle {(r, θ) : θ ∈ [0, 2π]}, which involves the contour C inside.The integral in the asymptotic mean can be numerically computed by the MATLAB function named "quad".
Similarly, for computing the asymptotic variance, the polar coordinates transforms z 1 = r 1 • e iθ 1 and z 2 = r 2 • e iθ 2 are utilized, and the two contours C 1 and C 2 are then replaced by circles Step 4. The implementation of Steps 1-3 is realised in Section 5.3 by the code functions which are displayed in Appendix B.

Examples of implementation
The procedure proposed to calculate the empirical size and power values is stated as follows.
1. Data Generating Process (DGP): generate the data Y jt = α j + x τ jt β + u jt by following each example.
6.The empirical size or power is calculated as where the regressors X k,jt i.i.d.
Tables 1 and 2 show the empirical sizes of our proposed test and the CD test provided in Pesaran ( 2004) for (5.3) respectively.From the results, it can be seen that the proposed test statistic performs better than the CD test, in the sense of empirical sizes being close to the true size 0.05.

Table 1 and Table 2 near here
In the following sections, we consider several alternative hypotheses.

Spatial Models and Factor Models
In this part, we consider two types of cross-sectional dependent models: spatial models and factor models.
The empirical powers in Table 4 and Table 5 show that, as the correlation between u jt and u rt (which is reflected in γ) increases, the power values also increase.
Table 4 and Table 5 near here

A Local Alternative Hypothesis
We examine the finite sample performance of the proposed test for the general panel data model (4.1), i.e.
The simulation results in Table 6 show that the proposed test can capture the cross-sectional dependence in the residuals for the general panel data model (4.1).
Table 6 near here

Some Dependent but Uncorrelated Examples
Dependent structures of a set of random variables are often described by non-zero correlations among them.However, there are some data which are not independent but uncorrelated.We consider two examples and test their dependence by the proposed test statistic.

Nonlinear MA model
Consider nonlinear MA models of the form where Z jt ∼ N (0, 1).For any j = 1, . . ., N , the correlation matrix of u t = (u 1t , u 2t , . . ., u N t ) τ is a diagonal matrix.This model is provided by Kuan and Lee (2004) which tests the martingale difference hypothesis.Our proposed cross-sectional independence test statistic can be applied to this nonlinear MA model, and the powers in Table 7 show that this test statistic performs well numerically for this model.
From another aspect, this result also implies that the limit of the ESD of the nonlinear MA model (5.8) is not the M-P law since the proposed test statistic is established on the characteristic function of the M-P law.

Some useful lemmas
Lemma 1 (Theorem 8.1 of Billingsley (1999)).Let P n and P be probability measures on a measurable space (C, ϕ), where C is a space and ϕ is a σ-algebra.If the finite dimensional distributions of P n converge weakly to those of P , and if {P n } is tight, then P n ⇒ P .
Lemma 3 (Continuous Mapping Theorem).Let X n and X be random elements defined on a metric space S.
Suppose g : S → S has a set of discontinuous points D g such that P (X ∈ D g ) = 0. Then where Re(z) and Im(z) are the real and imaginary parts of z respectively; and L(a, b) {a+t(b−a) : t ∈ (0, 1)}.

Proof of Theorem 1
In order to simplify notation we use M to denote constants which may change from line to line.Recall from (2.4) in the main paper that the centralization of the original model is Under the null hypothesis H 0 , it is well known that the convergence rate of the least-square estimator β for the parameter β (see Hsiao (2003)) is With the estimator β, we can decompose ûj for the error component u j , i.e. ûj = Y j − X j β = u j + X j (β − β).
where the contour C is closed and is taken in the positive direction in the complex plane, enclosing the support of F c (•).
First, we prove that , and e k is a T × 1 vector with its k-th element being one and others zero.This, together with Lemma 7 and (A.4), yields Thus the first part of (A.9) is proved.By Lemma 7 the second part of (A.9) can be similarly derived.
As in (A.13) it is easy to obtain Then (A.8) can be obtained from (A.13) and (A.14).Now we introduce some formulas that will be frequently used in the proof.For any invertible matrices A and B, vectors r, w and a scalar q, . From (A.15) we have From the proof of Theorem 1, it is known that the asymptotic distribution of N s N ( ) − s( ) is the same as that of N s N ( ) − s( ) , where s N ( ) is s N ( ) with RN replaced by R N and R N is the sample correlation matrix RN with ûj , j = 1, 2, . . ., N replaced by u j , j = 1, 2, . . ., N respectively.So it is enough to provide the tightness of N s N ( ) − s( ) .
Repeating the same truncation and centralization steps as those in Gao, el. (2014), we can assume that we have, with probability one, for N large enough, The contour C involved in the above integral is specified as follows.Let (A.47) where v 0 > 0, x r is any number greater than (1 + √ c) 2 , x l is any negative number if c ≥ 1 and otherwise choose Then the contour C is defined by the union of C + and its symmetric part C − with respect to the x-axis, where From Theorem 1 in Gao, el. (2014), the argument regarding the equivalence in probability of M N (z) and its truncation version in the proof of Theorem 1 of Gao, el. (2014), and Lemma 3, we have (A.49)where M (z) is a Gaussian process, i.e. the limit of M N (z).
We conclude from Lemma 4 that, for any δ > 0, sup where 3 and 4 lie in the interval [L 1 , L 2 ], the last inequality uses (A.49) and the fact that Re(ize i 3z ), Im(ize i 4z ) are bounded on the contour C; and K (and in the sequel) is a constant number which may be different from line to line.
By (A.50), we have for any ε > 0, and lim δ→0 lim sup 2. Consider (A.10) under H a .From Lemma 5 we have We next consider the first term on the right hand of the second equality of (A.75).Let with j, j 1 , j 2 = 1, 2, . . ., N ; i = 1, 2, k = 1, 2 and . From the formula (A.17), it follows that where Here we also write A −1 N j2 (z) as A −1 N j2 (z) in order to simplifying notation.It is easy to verify that 1 T and that by Lemma (6), E|a (1,2)(k) j1j2 | 2 = O 1 T .The above estimates, together with (A.57 and (A.58) As in (A.35), by (A.58) 2 .The Stieltjes transforms of ESD and LSD for RN are denoted by m N (z) and m c (z), and the corresponding transforms for RN are denoted by m N (z) and m c (z), respectively.Moreover, m c N (z) and m c N (z) are the respective m c (z) and m c (z) with c replaced by c N .For ease of notation, we denote m c (z) and m c (z) by m(z) and m(z), respectively.
cos( z) in E [W ( )] by sin( z) yields the expression of E [Q( )].The expressions of the covariances Cov(W ( ), W ( )) and Cov(Q( ), Q( )) are similar except replacing sin( z) and cos( z) respectively, where O 1 and O 2 include C 1 and C 2 , respectively.The double integral involved in the asymptotic variance can be simulated by the MATLAB function named "dblquad".
3. Repeat K times of steps 1-2 and derive the number K statistic values {S (m) N : m = 1, 2, . . ., K}. 4. The asymptotic mean and variance derived in Theorem 1 are calculated as follows.The LSD's m(z) and m(z) are replaced by the estimators m(z) = 1 N tr( RN − zI N ) −1 and m(z) = 1 T tr( RN − zI T ) −1 , respectively.The derivatives m (z) and m (z) are estimated by m (z) = 1 N tr( RN − zI N ) −2 and m (z) = 1 be analytic in D, a connected open set in C, satisfying |f n (z)| ≤ M for every n and z ∈ D, and f n (z) converges as n → ∞ for each z that is in a subset of D having a limit point in D. Then there exists a function f , analytic in D for which f n (z) → f (z) and f n (z) → f (z) for all z ∈ D.Moreover, on any set bounded by a contour interior to D, the convergence is uniform and {f n (z)} is uniformly bounded by 2M/ε, where ε is the distance between the contour and the boundary of D.
the matrix RN has the same non-zero eigenvalues as those of the sample correlation matrix RN other than |N − T | zero eigenvalues.The main part of the proposed statistic is

Table 1 :
The power values are listed in Table 8.Although the power values are small comparably with those given for other examples, the results show that our proposed test statistic is effective for this model.Sizes of the proposed test at the 5% significant level This paper has proposed a new statistic to test cross-sectional independence for a panel data model.This statistic is based on the characteristic function of the empirical spectral distribution of the sample correlation matrices.The asymptotic theory of a general class of linear spectral statistics for sample correlation matrices has been established, which is of significant interest in large dimensional random matrix theory.Our test statistic belongs to a general class of linear spectral statistics in the sense of covering some classical statistics.Furthermore, it can capture nonlinear dependence instead of just correlation.The nonlinear MA and ARCH(1) models used in the simulation part have demonstrated both the practical relevance and the applicability of the test proposed in this paper.

Table 2 :
Sizes of the CD test at the 5% significant level

Table 3 :
Powers of the proposed test at the 5% significance level for SMA model

Table 4 :
Powers of the proposed test at the 5% significance level for factor models (Case 1)

Table 5 :
Powers of the proposed test at the 5% significance level for factor models (Case 2) Lemma 4 (Complex mean value theorem (see Lemma 2.4 of Guo and Higham (2006))).Let Ω be an open convex set in C. If f : Ω → C is an analytic function and a, b are distinct points in Ω, then there exist points u, v on L(a, b) such that Re

Table 6 :
Powers of the proposed test at the 5% significance level for the local alternative model

Table 7 :
Powers of the proposed test at the 5% significance level for nonlinear MA model

Table 8 :
Powers of the proposed test at the 5% significance level for multiple ARCH(1) model