Exponent of Cross-sectional Dependence for Residuals

In this paper, we focus on estimating the degree of cross-sectional dependence in the error terms of a classical panel data regression model. For this purpose we propose an estimator of the exponent of cross-sectional dependence denoted by α, which is based on the number of non-zero pair-wise cross correlations of these errors. We prove that our estimator,α~\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$, \tilde {\alpha }$\end{document}, is consistent and derive the rate at which it approaches its true value. We also propose a resampling procedure for the construction of confidence bounds around the estimator of α. We evaluate the finite sample properties of the proposed estimator by use of a Monte Carlo simulation study. The numerical results are encouraging and supportive of the theoretical findings. Finally, we undertake an empirical investigation of α for the errors of the CAPM model and its Fama-French extensions using 10-year rolling samples from S&P 500 securities over the period Sept 1989 - May 2018.


Introduction
Interest in the analysis of cross-sectional dependence applied to households, firms, markets, regional and national economies has become prominent over the past decade, especially so in the aftermath of the latest financial crisis given its effects on the global economy. Researchers in many fields have turned to network theory, spatial and factor models to obtain a better understanding of the extent and nature of such cross dependencies. There are many issues to be considered: how to test for the presence of cross-sectional dependence, how to measure the degree of cross-sectional dependence, how to model cross-sectional dependence, and how to carry out counterfactual exercises under alternative network formations or market inter-connections. Many of these topics are the subject of ongoing research. In this paper we focus on measuring cross-sectional dependence. Bailey, Kapetanios and Pesaran (2016, BKP hereafter) give a thorough account of the rationale and motivation behind the need for determining the extent of cross-sectional dependence, be it in finance, micro or macroeconomics. They focus on the asymptotic behaviour of the variance of the cross section average of the observations on a double array of random variables, say x it , indexed by i = 1, 2, . . . , N and t = 1, 2, . . . , T , over space and time. In particular, they analyse the rate at which this variance tends to zero and show that it depends on the degree or exponent of cross-sectional dependence which they denote by α. They explore a factor model setting as a vehicle for characterising strong and semi-strong covariance structures as defined in Chudik et al. (2011). They relate these to the degree of pervasiveness of factors in unobserved factor models often used in the literature to model cross-sectional dependence.
In this paper we build on BKP and extend the analysis in two respects. First, we consider a more generic setting which does not require a common factor representation and holds more generally for both moderate to sizable cross-sectional dependence. We achieve this by directly considering the significance of individual pair-wise correlations, and do not concern ourselves with the factors that might underlie these pair-wise correlations. Second, we consider estimating the exponent of cross-sectional dependence, α, of the residuals obtained from a panel data regression model.
We propose a new estimator of α based on the number of statistically significant pair-wise correlations of the residuals from the panel regression under consideration. To establish the statistical significance of the correlation coefficients we adopt the multiple testing (MT) estimator proposed by Bailey et al. (2019), BPS. Other thresholding estimators can also be used. See, for example, Bickel and Levina (2008) or Karoui (2008) & Cai and Liu (2011a) or Fan et al. (2013). The MT testing procedure advanced by BPS has the advantage that it directly considers the statistical significance of the correlation coefficient which is invariant to scales. Other thresholding S47 Exponent of Cross-sectional Dependence procedures focus on the sample covariances and resort to cross validation to identify the threshold. Bickel and Levina (2008) use universal thresholding, namely comparing all the sample covariances to the same threshold value, whilst Cai and Liu (2011a) propose an 'adaptive' thresholding procedure that allows for differing thresholds across the different pairs of sample covariances. Other contributions to this literature include the work of Huang et al. (2006), Rothman et al. (2009), & Cai and Zhou (2011b) and Cai and Zhou (2012), Wang and Zhou (2010), & Fan et al. (2011). 1 All these contributions apply the thresholding procedure to sample covariances and do not apply to the residuals from a panel regression model that concerns us in this paper. It is also important to bear in mind that when estimating α we do not assume that the underlying error covariance matrix is sparse, as is assumed in the literature on regularization of the sample covariance. Our objective is to estimate the degree of sparsity of the covariance matrix rather than assume sparsity for the purpose of consistent estimation of the covariance matrix or its inverse. What matters for estimation of α is to ensure that all non-zero entries of the correlation matrix are correctly identified.
We establish consistency of our estimator under the assumptions of exogeneity of regressors and symmetry of the error distribution. We also explain how the derivations can be extended to the case when weakly exogenous variables are present, as for example in a dynamic panel data setting. The proposed estimator is simple to compute and is shown to perform well in small samples, for a variety of correlation matrices, irrespective of whether the cross correlations are generated from a multi-factor structure or specified by a given correlation matrix with a specified degree of sparsity. This is especially the case as compared to basing the estimation of α on the largest eigenvalue of the correlation matrix, which performs particularly poorly. The rate of convergence of our preferred estimator is complex and depends on an interplay of the cross-sectional and time dimensions, N and T . The Monte Carlo results also show that the error in estimating α is smaller for values of α close to unity, which is likely to be of greater interest in practice. The problem of making inference about the value of α raises additional technical difficulties and will not be addressed in this paper. In practice, bootstrap techniques can be used to obtain confidence bounds around our proposed estimator. We provide some Monte Carlo results in support of estimating the empirical distribution of the proposed estimator of α, using cross-sectional resampling as suggested in Kapetanios (2008). Finally, we provide an empirical application investigating the degree of inter-linkages between financial variables using the Standard & Poor's 500 index. We present 10-year rolling estimates of α applied to excess returns on securities included in the S&P 500 data set as well as α estimates applied to the residuals obtained from the CAPM and its Fama-French extensions used extensively in the finance literature.
The rest of the paper is organised as follows: Section 2 discusses alternative characterisations of α, the exponent of cross-sectional dependence, and the conditions under which these measures are equivalent as N → ∞. Section 3 sets up the panel data model and discusses its underlying assumptions. Section 4 proposes the estimator of α in terms of the number of statistically significant non-zero pair-wise correlations of the residuals. Section 5 presents the main theoretical results of the paper for a static panel data model with strictly exogenous regressors. Extensions to dynamic panels or panels with weakly exogenous regressors are discussed in the Section 5.2 while inference of α by use of bootstrap procedures is discussed in Section 5.3. Section 6 presents a detailed Monte Carlo simulation study. The empirical application is discussed in Section 7. Finally, Section 8 concludes. Proofs of all theoretical results are provided in the Appendix.

Degrees of cross-sectional dependence: alternative measures
Our analysis focuses on the covariance matrix of ε t = (ε 1t , ε 2t , . . . , ε Nt ) , where ε t is the N ×1 vector of errors from a panel data regression model. Let Σ N = E (ε t ε t ) = (σ ij ), and denote its largest eigenvalue by λ max (Σ N ) > 0. The errors ε it are said to be strongly cross-sectionally correlated, if λ max (Σ N ) = (N ), where denotes exact order of magnitude, and they are said to be weakly cross-sectionally correlated, if λ max (Σ N ) is bounded in N . All intermediate cases can be parameterized in terms of the exponent α λ , such that λ max (Σ N ) = (N α λ ). (2.1) The weak and strong cross dependence cases then relate to α λ = 0 and α λ = 1, respectively. It is important to emphasise that the exponent, α λ , is an asymptotic concept, in the sense that α λ can be identified only as N → ∞, as the definition in Eq. 2.1 makes clear. Suppose now that the cross dependence of ε it is characterized by the following approximate multiple-factor error process where f t is the m × 1 vector of unobserved common factors with zero means, and β i = (β i1 , β i2 , . . . , β im ) is the associated m × 1 vector of factor loadings, and u it is the idiosyncratic component assumed to have mean zero and covariance matrix V =E(u t u t ), where u t = (u 1t , u 2t , . . . , u Nt ) . Then where B = (β 1 , β 2 , . . . , β N ) , and without loss of generality we have set E (f t f t ) = I m . To identify the factor component from the idiosyncratic component we assume that λ max (V) = O(1), but allow the factor loadings to satisfy the condition where α β measures the degree to which the factors are pervasive, in the sense that they have non-zero effects on the individual errors, ε it . In what follows we refer to α β as the exponent of factor loadings. In the standard approximate factor models it is assumed that α β = 1, whilst in practice, where the possibility of weak factors can not be ruled out, α β could be a parameter of interest to be estimated. To see how α β and α λ are related note that where B 1 and B ∞ are column and row norms of B, respectively. To ensure that V ar(ε it ) is bounded we must have B ∞ < K. Also to ensure that λ max (V) = O(1), we must have V 1 < K. Therefore, the rate at which λ max (Σ N ) can rise with N is controlled by To distinguish the effects of the factor component from those of the idiosyncractic component we must have α β > 0. Comparing this result with Eq. 2.1 establishes that α λ = α β > 0, as N → ∞.
The above analysis suggests two alternative ways of estimating α λ . A direct procedure would be to base the estimate of α λ on λ max (Σ N ) and set where κ is a constant independent of N . Then (2.5) In order to identify α λ , as N → ∞, we set κ = 1, so that Eq. 2.5 becomes In this form, the value of α λ is susceptible to the scaling of elements in ε t . For this reason we focus our attention rather on the corresponding correlation matrix R N = (ρ ij ) given by Hence, Eq. 2.6 finally becomes and α λ has fixed bounds at zero and unity, as N → ∞. Developing a theory based on the maximum eigenvalue of the correlation matrix R N can be challenging. To avoid some of the technical problems involved in estimating λ max (R N ), and noting that V ar (ε t This means that at least N 1/2 of the factor loadings must have non-zero values for Σ N to differ sufficiently from a diagonal Σ N .
In this paper we consider an alternative estimation strategy that does not require ε it to have a factor representation. Since

S51
Exponent of Cross-sectional Dependence we focus directly on estimation of ρ ij and distinguish between values of ρ ij that are close to zero and those that are significantly different from zero, and measure the exponent of cross-sectional dependence in terms of the number of significant (non-zero) cross-correlation coefficients. Specifically, we define α such that M N = N 2α where M is the number of non-zero elements of R N which can be written equivalently as M N = τ N Δ N τ N , where τ N is an N ×1 vector of ones and Δ N = (δ ij ) is an N × N matrix of population correlation indicators with typical elements given by in which I(A) is equal to unity if A is true and zero otherwise. Note that by construction δ ii = 1. Hence, (2.9) Cross-sectional independence refers to the case when R N = I N , and α = 1/2, while the case of cross-sectional strong dependence corresponds to all pairwise correlation coefficients being non-zero such that α = 1. Note that by construction 1/2 ≤ α ≤ 1, with α = 1/2 arising when Δ N = I N , and α = 1 if ρ ij = 0 for all i and j. Other exponents of cross-sectional dependence can be defined by focussing only on the off-diagonal elements of R N and consider the following exponent of cross-sectional dependence: 2 assuming that Δ N = I N . Unlike α the above measure is not defined if R N = I N . The two measures coincide, namely α = α • = 1, if ρ ij = 0 for all i and j, as N → ∞. In cases where ε it have a multi-factor error representation given by Eq. 2.2, the largest exponent of the factor loadings is given by α β > 0. Assuming, for simplicity that V is diagonal, it then readily follows that 2 One can also consider only the distinct off-diagonal elements of RN and define α as

S52
N. Bailey et al. is the total number of off-diagonal non-zero pair-wise cross correlations of the errors. In such a multi-factor error set up we have Recall that we must have α β > 1/2 for factors to be distinguishable from the idiosyncratic components. It is then easily seen that lim N →∞ α = lim N →∞ α • = α β . However, the two measures could differ if N is not sufficiently large. In finite samples α • can be written in terms of α by first solving the quadratic equation for α β , namely where x it is a k × 1 vector of observed regressors, γ i is the associated vector of coefficients, and ε it are the model's errors. We are interested in estimating the exponent of the cross-sectional dependence of the errors, ε it , defined by Eq. 2.9. First, we obtain residuals e it computed as where X i is the T × k matrix of observations on the regressors for the i th unit, and y i is the T × 1 vector of observations on the dependent variable of the i th unit. We assume that the regressors are strictly exogenous.
We define the standardized errors, ξ it , and the associated standardized residuals, z it , as Further, in what follows we assume that the error terms are symmetrically distributed.
Assumption 1. Conditional on X i , the errors of the panel data model, Eq. 3.1, (a) ε it are symmetrically distributed with zero means and variances 0 < c < σ 2 i < K < ∞, (b) ε it are serially independent, (c) ε it and ε jt are distributed independently if E (ε it ε jt ) = 0, for all i = j.
Under the above assumption, and using Eq. 3.3 it readily follows that and (3.7) Our main analysis will condition on the observed regressors. Remark 2 will discuss an unconditional version of our results. For the observed regressors, we make the following assumption:

N. Bailey et al.
Under Assumption 1 it readily follows that which in turn implies that z it is a martingale difference process with respect to the non-decreasing information set, But in view of Eq. 3.8, E (z it |Ω iT ) = 0, and it also follows that E (z it |Ω it ) = 0, for all i and t.
We also require the following assumption that sets a lower bound condition on the non-zero values of the pair-wise correlations.
Remark 1. This assumption is needed for successful recovery of non-zero pair-wise correlations, and is weaker than requiring ρ min > 0, as it allows ρ min to tend to zero with N or T or both, so long as its rate of decline is slower than ln(T )/ √ T .
4 Consistent estimation of α Consider the sample estimate of the pair-wise correlation coefficients of the residuals from units i and j, where e i = (e i1 , e i2 , . . . , e iT ) , e it is defined by Eq. 3.2, and by construction the sample mean of e it is exactly zero. We can re-write (4.1) equivalently aŝ

Exponent of Cross-sectional Dependence
where z it is defined by Eq. 3.4. In order to identify whether the pair-wise correlation coefficientsρ ij are significantly different from zero we follow Bailey et al. (2019) and apply the multiple testing estimator associated witĥ ρ ij . This is defined byρ n is the number of tests carried out, p is the nominal size of the individual test, which can be set to 1%, 5% or 10%, Φ −1 (.) is the inverse of the standard normal distribution function, and δ is a tuning parameter to be set a priori. This thresholding method is based on the notion that for each unit (i, j) pairs we carry out a total of 1 2 N (N − 1) individual tests of the null hypothesis that ρ ij = 0 where j = i, i, j = 1, 2, . . . , N. Such tests can result in spurious outcomes especially when N is larger than T . The critical value function, c p (n, δ), is therefore adjusted using parameter δ to take account of the effects of the multiple testing procedure for the estimation of α. It is important to bear in mind that the multiple testing problem encountered here differs from the standard one studied in the literature by Bonferroni (1935) & Holm (1979 and others. Our focus here is on identifying the range of values for δ such that α can be consistently estimated, rather than controlling the overall size of the multiple tests being carried out.
Accordingly, we propose to estimate α bỹ

Main results
To establish thatα converges to α, in addition to Assumptions 1, 2 and 3, we also require the following additional technical sub-exponential assumption:

Assumption 4. There exist sufficiently large positive constants
This assumption is used to allow a relatively simple bounding of an infinite sum of probabilities, needed for the proof of Lemmas 2 and 3 in the Appendix. It can be relaxed to allow for fatter tails, at the expense of smaller allowable values for N .
The rate of convergence ofα to α is given in Theorem 1 below: for any 0 < κ < 1, and some C 0 , C 1 > 0.
As long as δ is set large enough (δ > 1 − α), the first term on the RHS of Eq. 5.2 can be made sufficiently small.

Remark 2.
If we do not wish to condition on the observed x it , one could obtain the result of Theorem 1 unconditionally, if it is assumed that the regressors satisfy the following sub-exponential condition for some s > 0, and if for some T 0 , exists for all T > T 0 . Under these conditions on x it (which replace Assumption 2), we can then use Lemma A6 of Chudik et al. (2018) to establish suitable probability bounds on , and to show that, for some C 0 , C 1 > 0, and 0 < π < 1,

Extension to panels with weakly exogenous regressors
In the case of panels with lagged dependent variables, the use of OLS residuals for estimation of α could still be justifiable so long as T is sufficiently large, such that the time series bias in the estimated residuals is not too large. This is supported by the Monte Carlo evidence provided for dynamic panels below.
However, the mathematical proofs provided above will not be applicable to the OLS residuals if the panel regression model, Eq. 3.1, contains weakly exogenous regressors, such as lagged values of y it . An alternative approach which avoids some of the technical issues associated with the use of OLS residuals would be to base the estimation of α on recursive residuals. Specifically, one could consider the recursive residuals defined by Then the pair-wise correlations based on these recursive residuals are given byρ Here h is the size of the training period, which needs to be set by the researcher. It is then easily seen that under cross-sectional independence,ž itžjt is a martingale process with respect to Ω ιi,t−1 , where Ω ιi,t−1 = (y iτ , x iτ ; for τ = t − 1, t − 2, . . . , 1). This and other related results then allow us to apply the mathematical analysis of the previous sections to the recursive residuals, after suitable adjustments. The main open question is what critical value to use when checking the significance ofρ ij , and hence the threshold value in the determination of the indicators δ ij defined above. This issue will not be pursued in this paper.

S58
N. Bailey et al. 5.3. Confidence intervals for α Quantifying the uncertainty surrounding the proposed estimator of α is clearly of interest. Given the complexity of developing asymptotic inferential theory, a fruitful avenue is to use bootstrap procedures. In the case of panel datasets a number of important data features matter. One possibility is to consider a parametric bootstrap where residuals are resampled. Again there are a number of ways to construct such a bootstrap method. The first is to resample, with replacement, from the rows of the residual matrix with the rows referring to time periods. This procedure applies if the residuals are serially uncorrelated, although block resampling methods can be considered to deal with the serial correlation. It is important to resample whole rows as otherwise the nature of the cross sectional dependence of the original sample, on which α depends, is not inherited by the bootstrap sample. Initial experimentation suggests that coverage rates for such bootstrap methods are very low. An alternative is to use some estimation method for large dimensional covariance matrices, such as thresholding, to estimate the covariance matrix of the residuals and then implement a wild bootstrap. This approach has been considered, for example, by Gonċalves and Perron (2018). However, its desirable properties depend on the true covariance being sparse and certainly sparser that the structures we consider in this paper. Another alternative is to resample from columns (cross section units) of the residual matrix, namely to resample across units keeping all the residuals (observations) on a given unit together. This procedure is robust to the serial correlation problem. This resampling procedure, initially proposed by Kapetanios (2008), will be used to obtain bounds on the estimator of α in the Monte Carlo section below.

Monte carlo simulations
We investigate the small sample properties of our proposed estimator of α, defined by Eq. 4.5, using a number of different simulation designs, allowing for dynamics as well as non-Gaussian errors. We consider the following relatively general dynamic panel data model with exogenous, but serially correlated, regressors: We consider the following cases: (i) a static panel data model, where ϑ i = 0, a i ∼ IIDN (1, 1), and γ i ∼ IIDN (1, 1), for i = 1, 2, . . . , N; and (ii) a dynamic panel data model with exogenous regressors, where ϑ i ∼ IIDU (0, 0.95), γ i ∼ IIDN (1, 1), and a i ∼ IIDN (1, 1), for i = 1, 2, . . . , N. 3 Our estimator is robust to possible correlations between the fixed effects, α i , and the regressors, x it .
We consider two different designs for generating the errors, ε it , both with the same exponent of cross-sectional dependence, α : and set the remaining elements to zero. Then, we construct the correlation matrix R N given by The degree of sparsity of R N is determined by the choice of α β . If α β = 0 then R N = I N and α = 1/2, while if α β = 1 then all elements of R N will be non-zero and we have α = 1. For all intermediate values of α β , R N will have a total of [N α β (N α β − 1) + N ] non-zero elements. The exact relationship between α and α β is given by Eq. 2.11. Further, we generate the variances of ε it as σ ii ∼ IID 0.5 1 + 0.5χ 2 (2) , for i = 1, 2, . . . , N, and set D N = diag(σ ii , i = 1, 2, . . . , N). We now generate ε it so that its correlation matrix is equal to R N . To this end we first obtain matrix P N as the Cholesky factor of R N , and then set where u jt are IID draws from Gaussian or non-Gaussian distributions, to be specified below.
In both designs, we examine two cases for the innovations u jt : (i) Gaussian, where u jt ∼ IIDN (0, 1) for j = 1, 2, . . . , N; (ii) non-Gaussian, where u jt follows a multivariate t-distribution with v degrees of freedom. This is achieved by generating u jt as whereν jt ∼ IIDN (0, 1) and χ 2 v,t is a chi-squared random variate with v = 8 degrees of freedom.
For the estimation of α, in each replication r = 1, 2, . . . , R, we first compute the OLS residuals e (r) jt , for i, j = 1, 2, . . . , N, 1, 2, . . . , N). The corresponding sample correlation matrix is then given bŷ We evaluate the bias and root mean squared error (RMSE) of the exponent of cross-sectional dependenceα computed as in Eq. 4.5 with the critical value, c p (n, δ), given by Eq. 4.4. For p and δ we consider the values p = {0.05, 0.10} and δ = {1/2, 1/3}. 5 Further, we compute the bias-corrected version of the exponent of cross-sectional dependence estimator developed in BKP,α, and compare its performance with that ofα. 6 However, it is important to bear in mind that BKP provide theoretical justification for their estimator only in the case of demeaned observations, namely x it −x i , and do not consider residuals from panel regressions as we do in this paper. As a by-product, this paper also provides Monte Carlo evidence on the properties of the estimator,α, when applied to residuals from panel regressions.
Finally, for the construction of confidence intervals forα, we propose the following bootstrap procedure: Bootstrap In each replication, r = 1, 2, . . . , R, we collect residuals in matrix E (r) = e (r) it T ×N and proceed to resample (with replacement) from the columns a total number of B times. More precisely, in 5 The value of δ = 1/4 was also considered. Results are in line with those for δ = 1/3 and are available in the online supplement, Tables S2a-S2d. 6 Recall thatα corresponds to the most robust bias-adjusted estimator of the exponent of cross-sectional dependence considered in Bailey et al. (2016) and allows for both serial correlation in the factors and weak cross-sectional dependence in the error terms. It is given byα whereσ 2 x ,μ 2 v andĉN are consistent estimators of σ 2 x , μ 2 v , and cN -see BKP for further details. We use four principal components when estimatingĉN .

S62
N. Bailey et al.          corresponding to residuals E (r),(b) and estimate α, which we denote bỹ α (r),(b) . The bootstrap estimates of α are collected in the vector α (r),B = α (r),(1) ,α (r),(2) , . . . ,α (r),(B) from where we obtain the estimates of α that correspond to the 0.05 and 0.95 percentiles, and which we denote byα 6.1. Small sample results First, we consider the small sample performance of our proposed estimator,α, and investigate its robustness to different choices of p and δ, that govern the critical value, c p (n, δ), used in estimating it. The results for Gaussian errors are provided in Tables 1, 2, 3 and 4, and the results for non-Gaussian errors are reported in Tables 5, 6, 7 and 8. Each table reports bias and RMSE ofα computed using the residuals from either static or dynamic panel data regressions. Tables 1  and 2 give the results for static and dynamic panels, respectively, when the cross-sectional dependence in the errors are generated according to Design 1, whilst the same results for Design 2 are summarized in Tables 3 and 4. Similarly, the results in Tables 5 and 6 give bias and RMSE ofα for static and dynamic panels when the errors are non-Gaussian and the cross correlations are generated according to Design 1, whilst the same results for Design 2 are provided in Tables 7 and 8. In each of these tables, the left panels give bias and RMSE for p = 0.05, and the right panels for p = 0.10, whilst the top panels give the results for δ = 1/2, and the bottom panels for δ = 1/3. Specifically, each Table gives four sets of results for the combinations (p, δ), with p = 0.05, 0.10 and δ = 1/3, 1/2.
Comparing the left and right panels of the tables, it is clear thatα is robust to the choice of p, irrespective of the value of δ, and for all N and T combinations. Observing that n = N (N − 1) /2 is quite large even for moderate values of N , the effective p-value of the underlying individual S71 Exponent of Cross-sectional Dependence

Exponent of Cross-sectional Dependence
tests is given by 2p/n δ , which is likely to be dominated by the choice of δ as compared to p. Therefore, the test outcomes are more likely to be robust to the choice of p as compared to δ.
Turning to the choice of δ, comparing the results reported in top and bottom panels of the tables, we note that for all N and T combinations the choice of δ = 1/2 produces smaller bias and RMSE as compared to δ = 1/3 for values of α close to 1/2 (α ≤ 0.75). The reverse is true when considering values of α close to unity (α > 0.80). Again, this is consistent with the result of Theorem 1 which requires δ to be larger than 1 − α. Hence, for α → 1/2 setting δ = 1/2 is more appropriate, while as α → 1 values of δ below 1/2 are more appropriate. In cases where there is no priori information regarding the range in which the true value of α might fall, the simulation results suggest setting δ to its upper bound value of δ = 1/2.
Overall, irrespective of whether we consider static or dynamic panel regressions, with Gaussian or non-Gaussian errors, the tabulated results show that the small sample performance ofα improves as the true exponent of cross-sectional dependence, α, rises from 0.55 towards 1.0, uniformly over N and T combinations. This finding holds for both Designs, althoughα generally performs better when Design 1 is used to generate the error crosssectional dependence. Further, both bias and RMSE ofα diminish as N rises for all values of T considered. These results are in line with our main theoretical findings as set out in Theorem 1. It is also interesting to note that under Design 1, the bias and RMSE ofα are particularly small for values of α in the range of 0.9−1, even if we consider dynamic panels with non-Gaussian errors. For example, for T = 100 and N = 500, p = 0.05, δ = 1/3, the bias and RMSE of estimating α = 0.95 byα, in the case of dynamic panels with non-Gaussian errors are −0.00008 and 0.00067, respectively. (see Table 6).
Tables 3 and 4 summarize the results for static and dynamic panel data models, respectively, when the error cross-sectional dependence is generated by Design 2 (the two-factor structure). Compared with Tables 1 and 2, both bias and RMSE are more sizeable across the range of α when T = 100. However, as T increases, the performance ofα improves for all values of α, especially when α approaches unity, as to be expected. Perhaps, the signalto-noise ratio implied by Eq. 6.4 becomes somewhat distorted when the T dimension is short and adversely affects the accuracy of the multiple testing procedure used to identify the non-zero elements of the error correlation matrix, R N . To verify this conjecture, we repeated the same experiments attaching a scaling parameter of ς = 1/2 to u it in Eq. 6.4, in line with the simulation setup in BKP. The performance ofα is much improved in this case and comparable to those shown in Tables 1 and 2, even when T is small. 7 Our conclusions regarding the robustness ofα to the choice of p and δ arrived at under Design 1 continue to hold for Design 2.
We now consider the small sample performance ofα relative to that of α, the estimator of α proposed in BKP.α is a biased-corrected estimator of α based on the standard deviation of the cross-sectional average of the residuals. As noted earlier, the asymptotic properties ofα are established only for demeaned observations, but it is conjectured that these asymptotic properties are likely to hold even ifα is computed using residuals from panel regressions. For comparison we consider bias and RMSE ofα computed using p = 0.05 and δ = 1/2, and note that similar results are obtained for other choices of p and δ. Table 9 compares the resulting bias and RMSE of the two α estimators when applied to residuals obtained from a static (top panel) and dynamic panel data models (bottom panel). These results refer to Design 1 with Gaussian errors. Both estimators perform well for all values of α, and irrespective of whether the panel regressions are static or dynamic. This is particularly so for values of α > 0.8. In comparative terms,α outperformsα on average, for all values of α, and all N and T combinations. The superior performanceα is more pronounced when α ≤ 0.75 uniformly over N and T . The results for Design 2 (where the cross-sectional dependence of the errors are generated using a two-factor specification) are summarized in Table 10 which has the same format as Table 9. In the case of these experiments,α (the estimator proposed by BKP) performs better thanα when T is small (T = 100), but the bias and RMSE ofα becomes more comparable toα as both N and T rise. As noted above, scaling u it by ς in Eq. 6.4 eliminates this relative outperformance ofα. 8 Further, Table 11 displays bias and RMSE results for estimates of the exponent of cross-sectional dependence, given by Eq. 2.8, and computed using the maximum eigenvalue of the correlation matricesR N derived from the residuals from a static or dynamic panel data model with Gaussian errors generated under Designs 1 or 2. It is clear that all eigenvalue based estimates of α perform rather poorly even for large values of N and T , and even for values of α close to 1.
Turning to the problem of sampling uncertainty, we compute bootstrapped confidence intervals for the estimate α in the case of Design 1 with Gaussian errors. The results are summarised in Table 12 and give the average coverage Table 9: Comparison of Bias and RMSE (×100) for theα andα estimates of the cross-sectional exponent of the errors from a static and dynamic panel data model with exogenous regressors Cross correlations are generated using Design 1 with Gaussian errors S82 N. Bailey et al.     S87 Exponent of Cross-sectional Dependence rates (in percent) over the R simulations of α being between the 5th and 95th percentiles of its empirical distribution. The related confidence intervals are constructed by applying the resampling procedure with replacement to the residuals obtained for the static and dynamic panel data models, with the results summarized in the top and bottom panels of Table 12, respectively. We set δ = 1/2 in the critical value function c p (n, δ), and p = 0.05 (left panel) p = 0.10 (right panel), in each replication of the MC experiments and in each bootstrap. For α = 0.60 coverage is high for small values of N , and rises to unity as N increases and for all values of T considered. For 0.65 ≤ α ≤ 0.95 coverage stands universally at 100% but drops to zero when α = 1.00. This is to be expected since, as noted earlier, the error in estimatingα becomes negligible when α = 1.00.
Overall, we can conclude that using multiple testing for identifying nonzero elements ofR N when computingα is computationally attractive, has sound theoretical properties, with comparable performance to the estimator of the exponent of cross-sectional dependence,α, developed in BKP.
7 An empirical application: identifying the weak factor component of CAPM In their paper BKP investigate the extent to which excess returns on the Standard & Poor's 500 (S&P 500) securities are interconnected through the market factor by computing rolling estimates of α, the degree of crosssectional dependence of S&P 500 securities. According to asset pricing theories such as the capital asset pricing model (CAPM) of Sharpe (1964) & Lintner (1965, and arbitrage pricing (APT) of Ross (1976), such estimates of α should be close to unity at all times. This is because both CAPM and APT assume security returns have a common factor representation with at least one strong common factor, with the idiosyncratic component being weakly correlated -see also the approximate factor model due to Chamberlain (1983). Such a factor structure implies that all individual stock returns are significantly affected by the common factor(s) and in consequence they are all pair-wise correlated with varying degrees.
The subsequent analysis in BKP reveals that a disconnect between some asset returns and the market factor does occur particularly at times of stock market bubbles and crashes where these asset returns could be driven by non-fundamentals. In this paper, we focus on the exponent of cross-sectional dependence of the residuals obtained from different versions of the CAPM model, and provide rolling estimates of the exponent of the cross-sectional dependence of the errors from CAPM and related APT models. This is important since under CAPM, after allowing for the market factor, the errors

S88
N. Bailey et al.  S89 Exponent of Cross-sectional Dependence can not be cross-sectionally strongly correlated. It is therefore of interest to see if this is in fact true at all times, or if there are episodes where market factors are not sufficient to capture all the significant interdependencies that might exist across the security returns.
We update the BKP analysis and consider monthly excess returns of the securities included in the S&P 500 index over the period from September 1989 to May 2018. We obtain estimates of α using rolling samples of 120 months (10 years) to capture possible time variations in the degree of crosssectional dependence. 9 Since the composition of the S&P 500 index changes over time, we compiled returns on all 500 securities at the end of each month and included in our analysis only those securities that had at least 10 years of data in the month under consideration. On average, we ended up with 442 securities at the end of each month for the 10-year rolling samples. The one-month US treasury bill rate was chosen as the risk-free rate (r ft ), and excess returns were computed asr it = r it − r ft , where r it is the monthly return on the i th security in the sample inclusive of dividend payments (if any). 10 First, following BKP, we estimated α for excess security returns to see the extent to which securities in S&P 500 index are fully interconnected at all times. As noted above, under CAPM we would expect estimates of α to be close to unity. To this end we used our proposed estimator,α, defined by Eq. 4.5, and computed 10-year rolling estimatesα t , for t = September 1989 to May 2018 -a total of 345 estimates -with p = 0.05. To check the robustness of the estimates to the choice of δ we computed the rolling estimates for δ = 1/2 and 1/3. 11 The resultant estimates are shown in Fig. 1. 12 As expected, estimates ofα t , for δ = 1/2, lie below those ofα t , using δ = 1/3, but the series track each other very closely. Also the quantitative differences between the two estimates are not that large. Specifically, all the 345 rolling estimatesα t (δ = 1/2) fall in the interval 0.82 to 0.96, whilst the corresponding estimatesα t (δ = 1/3) all lie in the range 0.86 − 0.97. These estimates show a high degree of inter-linkages across individual securities, and are very close to unity at the start and at the end of the sample, with important departures from unity in between. Consideringα t (δ = 1/2), it 9 We also consider rolling samples of size 60 months (5 years). Results for this setting are shown in the online supplement -Figs. 3 and 4. 10 For further details of data sources and definitions see Pesaran and Yamagata (2012). 11 For the remaining parameters in Eq. 4.5 we set p = 0.05 and n = Nt(Nt − 1)/2, where Nt is the number of securities in a given 10-year rolling window (t = 1, 2, . . . , 345). 12 The same estimates including their 90% confidence intervals are shown in Figs. 5 and 6 of the online supplement, where in critical value cp(n, δ) we set δ = 1/2 and 1/3, respectively. 10-year rolling estimates of ( with =1/2) based on excess returns 10-year rolling estimates of ( with =1/3) based on excess returns We now turn our attention to the exponent of cross-sectional correlations of the error terms of the CAPM model, and two well known extensions using 13 The measured increase inα estimates during 2003 − 2007 is partly attributed to the length of the rolling windows being set to 10 years (120 months). Opting for 5-year rolling windows (60 months) produces more pronounced increments inα estimates which is expected -see Fig. 3 in the online supplement. additional Fama and French factors. Specifically, the first regression is the usual CAPM one-factor representation given by 2, . . . , N, (7.1) where r mt is the market return computed as the value-weighed returns on all NYSE, AMEX, and NASDAQ stocks. The second and third regressions assume the following extensions to Eq. 7.1 proposed by Fama and French (2004): 2, . . . , N (7.2) and 3) where smb t stands for average return on the three small portfolios minus the average return on the three big portfolios formed by size, while hml t refers to the average return on securities with high book value to market value ratio minus the average return of securities with low book value to market value ratio. 14 As noted previously, under CAPM we would expect the errors, u 1i,t , to be cross-sectionally weakly correlated, with α u 1 to be close to 1/2. But this need not be the case in reality. In fact the introduction of FF factors, smb t and hml t , could be viewed as an attempt to ensure cross-sectionally weakly correlated errors for the augmented CAPM model. It is therefore of interest to consider the estimates of α for the errors, u 1i,t , u 2i,t , and u 3i,t , and see if they are close to 1/2 as required by the theory. To this end, we compute 10-year rolling estimates of α based on the pair-wise correlations of the OLS residualsû 1i,t ,û 2i,t andû 3i,t in the panel regressions (7.1), (7.2) and (7.3), respectively. These estimates denoted byαû j t , j = 1, 2, 3 for t = September 1989 to May 2018, are shown in Fig. 2. 15,16 As expected, estimates of α based on the residuals are smaller compared to the estimates obtained for the securities themselves (as depicted in Fig. 1).
14 For further details of data sources and definitions see Pesaran and Yamagata (2012). 15 We set p = 0.05 and δ = 1/2 when estimating α for the residuals, since a priori we would expect the true value of α for the errors of CAPM models to be close to 1/2. See the discussion in Section 6 . 16 The same estimates including their 90% confidence intervals are shown in Figs. 7, 8 and 9 of the online supplement, respectively. 10-year rolling estimates of ( with =1/2) for the residuals of the CAPM model 10-year rolling estimates of ( with =1/2) for the residuals of the CAPM model augmented by the SMB factor 10-year rolling estimates of ( with =1/2) for the residuals of the CAPM model augmented by SMB and HML factors Figure 2: 10-year rolling estimates of the exponent of cross-sectional correlation (α t ) of residuals from CAPM and its two Fama-French extensions. Notes: CAPM model includes excess market returns, CAPM model augmented by SMB includes excess market returns and small minus big (SMB) firm returns, and CAPM model augmented by SMB and HML includes excess market returns, small minus big (SMB) firm returns and high minus low (HML) firm returns as regressors in Eqs. 7.1, 7.2 and 7.3, respectively It is also interesting that all the three estimatesαû 1 t ,αû 2 t andαû 3 t are closely clustered over the two sub-periods September 1989 to September 1997, and February 2011 to May 2018, suggesting that the standard CAPM model provides an adequate characterisation of the cross-sectional correlations of securities, and the additional FF factors are not required in these sub-periods. It is also worth noting that, over these two sub-periods, estimates of α fall in the narrow range of 0.63-0.71 which are sufficiently small and support CAPM as an adequate model for characterising cross-correlations of S&P 500 security returns. In contrast, the estimatesαû 1 t ,αû 2 t andαû 3 t tend to diverge over the period from October 1997 to January 2011, and more importantly they all start to rise sharply, suggesting important departures from the basic CAPM model. Using only the market factor, as in Eq. 7.1, results inαû 1 t jumping to levels around 0.74−0.76. Adding smb t to Eq. 7.1 reduces the α estimates of the resulting residuals to 0.69 − 0.73, suggesting that the size portfolio does have some influence on individual security returns during this period. Adding the second FF factor (as in Eq. 7.3), further reduces the estimates of α to the range 0.66−0.68. These results are also in line with the sharp drop in the estimates of α we reported for the excess returns during the period 1998 − 2010, and provide further evidence in favour of the argument that the presence of factors other than the market factor, namely smb t and hml t , tend to become relevant during periods of financial crises and turmoils.
8 Conclusions Cross-sectional dependence and the extent to which it occurs in large multivariate data sets is of great interest for a variety of economic, econometric and financial analyses. Such analyses vary widely. Examples include the effects of idiosyncratic shocks on aggregate macroeconomic variables, the extent to which financial risk can be diversified by investing in disparate assets or asset classes and the performance of standard estimators such as principal components when applied to data sets with unknown collinearity structures. A common characteristic of such analyses is the need to quantify the degree of cross-sectional dependence, especially when it is prevalent enough to materially affect the outcome of the analysis.
In this paper we generalize the work of Bailey et al. (2016) by proposing a method of measuring the extent of inter-connections in the residuals of large panel data sets in terms of a single parameter, α. We refer to this as the exponent of cross-sectional dependence of the residuals. We show that this exponent can be used to characterize the degree of sparsity of correlation matrices, or the prevalence of factors in multi-factor representations routinely used in economic and financial analysis. We propose a simple consistent estimator of the cross-sectional exponent and derive the rate at which it approaches its true value. We also propose a resampling procedure for the construction of confidence bounds around the estimator of α.
A detailed Monte Carlo study suggests that the proposed estimator has desirable small sample properties especially when α > 3/4. We apply our measure to the widely analysed Standard & Poor's 500 data set. We find that for individual securities in S&P 500 index, the 10-year rolling estimates of cross-sectional exponents are sufficiently close to unity over the two subperiods 1989 − 1997 and 2011 − 2018, but not during the intervening period 1998 − 2010, when markets have been subject to a number of financial turmoils, starting with the LTCM crisis and the Dotcom bubble, and ending with the credit crunch of 2007 − 2008. These results carry over when we consider the cross-sectional dependence of errors from the CAPM model and its multi-factor extensions using Fama-French factors. Estimates of α based on the residuals from the CAPM model lend support to CAPM during the subperiods 1989−1997 and 2011−2018, but not when we consider the period 1998− 2010.