Testing the Effects of High-Dimensional Covariates via Aggregating Cumulative Covariances

Abstract In this article, we test for the effects of high-dimensional covariates on the response. In many applications, different components of covariates usually exhibit various levels of variation, which is ubiquitous in high-dimensional data. To simultaneously accommodate such heteroscedasticity and high dimensionality, we propose a novel test based on an aggregation of the marginal cumulative covariances, requiring no prior information on the specific form of regression models. Our proposed test statistic is scale-invariance, tuning-free and convenient to implement. The asymptotic normality of the proposed statistic is established under the null hypothesis. We further study the asymptotic relative efficiency of our proposed test with respect to the state-of-art universal tests in two different settings: one is designed for high-dimensional linear model and the other is introduced in a completely model-free setting. A remarkable finding reveals that, thanks to the scale-invariance property, even under the high-dimensional linear models, our proposed test is asymptotically much more powerful than existing competitors for the covariates with heterogeneous variances while maintaining high efficiency for the homoscedastic ones. Supplementary materials for this article are available online.


Introduction
In regression analysis, it is of fundamental importance to infer whether a set of covariates x = (X 1 , . . ., X p ) T ∈ R p has any effect on a response Y ∈ R 1 .Consider the null hypothesis H 0 : E(Y|x) = E(Y), almost surely. (1.1) If H 0 holds, it implies that x does not contribute to the conditional mean function of Y. Thus, there is no need to build a regression model for the conditional mean function.In genomic studies, exploring whether a set of genes is predictive for certain clinical outcomes can be formulated as the hypothesis in (1.1).See, for example, the significant gene set selection in Subramanian et al. (2005), Efron and Tibshirani (2007) and Zhong and Chen (2011).
Testing for the covariates effects has received much attention, and many tests have been developed for low and fixeddimensional covariates.By assuming E(Y|x) = x T β, testing (1.1) is equivalent to checking whether β = 0.The classical F-test can be used to infer the overall significance of linear regression coefficients.Moreover, many model-specification tests are also designed to test (1.1), including the local (Hardle and Mammen 1993;Zheng 1996;Guo, Wang, and Zhu 2016) and global smoothing tests (Stute 1997;Stute, Thies, and Zhu 1998;Escanciano 2006).Without specifying a functional form of E(Y|x), Wang and Akritas (2006) developed a test for covariate effects in a completely nonparametric fashion.Shao and Zhang (2014) proposed a martingale difference divergence in the setting of fixed dimension.However, these methods do not target the case of highdimensional covariates and suffer from the curse of dimensionality when the covariate dimension p diverges.In particular, Zhong and Chen (2011) proved the power of F-test is adversely impacted by an increasing ratio p/n even when p < n − 1, where n is the sample size.For the martingale difference divergence that can capture any type of conditional mean dependence in the fixed dimension, Zhang, Yao, and Shao (2018) showed that it can only measure the component-wise linear dependence in high dimension.
In the case of high-dimensional covariates, many tests have been proposed under parametric model settings.For example, in high-dimensional linear regression models, Goeman, Van De Geer, and Van Houwelingen (2006) considered a score test in an empirical Bayesian model.Zhong and Chen (2011) proposed a simultaneous test for coefficients based on a U-statistic.This work is further modified by Feng et al. (2013) with Wilcoxon scores, and by Cui, Guo, and Zhong (2018) with refitted crossvalidation.In high-dimensional generalized linear regression models, Goeman, Van Houwelingen, and Finos (2011) extended the test of Goeman, Van De Geer, and Van Houwelingen (2006) and derived its asymptotic distribution.Guo and Chen (2016) introduced a test that is robust to a wide range of link functions.It is desirable to develop a test that can accommodate the highdimensionality without any parametric model assumptions.
Directly testing (1.1) without specifying any model structure is very challenging in high dimensions.McKeague and Qian (2015) proposed an adaptive resampling test (ART) using the maximum-type statistic on the slopes of marginal linear regressions.The ART may effectively detect the presence of significant covariates under assumption lean linear model: Y = β 0 +x β + ε, where ε and x are uncorrelated, and β = arg min γ E(Y − x γ ) 2 .Thus, the ART procedure may be directly applied for the hypothesis (1.1) under condition that β = 0 implies that E(Y|x) = E(Y).Luedtke and van der Laan (2018) established theoretical properties of the standardized ART procedure under high-dimensional setting and further introduced a more computationally tractable approach to the ART.Zhang, Yao, and Shao (2018) considered a relatively weaker null hypothesis: It is clear that H 0 in (1.1) implies H 0 in (1.2), but not vice versa.In other words, the distance between E(Y|x) and E(Y) cannot be fully captured by pairwise distances between E(Y|X s ) and E(Y), for s = 1, . . ., p.However, Zhang, Yao, and Shao (2018) pointed out that the difference between E(Y|X s ) and E(Y) can be regarded as the marginal effect of X s contributing to Y. From this point of view, the pairwise distances between E(Y|X s ) and E(Y) are still informative in testing H 0 in (1.1).The pairwise comparison partly motivates us to develop a novel high-dimensional nonparametric test for H 0 in (1.2).
In real applications, different components of covariates usually exhibit different levels of variation.For example, in highdimensional microarray data, the variation of expression differs substantially from gene to gene (Nettleton, Recknor, and Reecy 2008).The heterogeneous variances of covariates may affect the performances of the testing procedures, when the test statistics are not scale invariant.See, for example, McKeague and Qian (2015) and its discussions and rejoinder for extensive discussions on this issue.To deal with the issue, a widely used strategy is to standardize each covariate by its corresponding standard deviation prior to implementing the aforementioned tests.Examples include Zhong andChen (2011), McKeague andQian (2015) and Zhang, Yao, and Shao (2018).This strategy however, brings difficulties in theoretical justifications for diverging p and requires implicitly the variances of all covariates be finite.
To accommodate the issues of heterogeneous covariates variances and high dimensionality, we introduce a novel test to detect the mean effects of high-dimensional covariates on the response.Without any model assumptions, our test statistic is built on cumulative covariance (Zhou et al. 2020), which utilizes the rank of covariates and hence, is scale-invariance.To illustrate the appealing features of our proposed test, we compare it with two state-of-art tests: one is proposed by Zhong and Chen (2011) and the other by Zhang, Yao, and Shao (2018).The first test is designed for high-dimensional model and originates from the classical F-test that is known to be powerful when p is fixed as n → ∞.The second test is developed in a high-dimensional model-free setting.The asymptotic properties of these two tests are universal (Paindaveine and Verdebout 2016), in the sense that p may go to infinity in an arbitrary rate as n goes.This also applies to our proposed test.In what follows, we summarize our contributions as well as the desirable properties of the new test.
• A direct implementation of the proposed test statistic according to its definition has a computational complexity of order O(n 5 p), which is computationally expensive.By sorting the covariates in an increasing order, we provide an efficient algorithm to reduce the computational cost of its numerator to O{np log(n)}.This algorithm can also be adapted to Zhong and Chen's (2011) and Zhang, Yao, and Shao's (2018) test statistics to improve the computational efficiency of their numerators.The denominators of three statistics can be calculated in O(n 2 p) operations.Therefore, the overall computational complexities of three tests are all of order O(n 2 p).• We derive the universally asymptotic normality of our proposed test statistic under the high-dimensional null hypothesis, without any assumption on relative growth rate between p and n.No bootstrap or random permutation is required to approximate the asymptotic null distributions.In this sense, our testing procedure is tuning-free and distribution-free.• We derive the asymptotic power function of our proposed test and carefully study its asymptotic relative efficiency with respect to the tests proposed by Zhong and Chen (2011) and Zhang, Yao, and Shao (2018) in high-dimensional linear models.When each covariate has the same variance, we prove that the asymptotic relative efficiency of our proposal is near 0.872 with respect to Zhong and Chen (2011), and 0.979 with respect to Zhang, Yao, and Shao (2018).However, under the heterogeneous variances of covariates, the asymptotic relative efficiency can even go to infinity as p goes to infinity.See Section 3 for detailed discussions.This implies that our proposed test has little efficiency loss for homoscedastic covariates, but substantial efficiency gain for heteroscedastic cases.
The rest of the article is organized as follows.In Section 2, we introduce a new conditional mean testing procedure based on the cumulative covariance, and derive its asymptotic distribution under the null hypothesis and alternatives.Section 3 carefully studies the asymptotic relative efficiency of the proposed test with respect to the tests in Zhong and Chen (2011) and Zhang, Yao, and Shao (2018).We assess the finite sample performance of the proposed test through Monte Carlo simulations and a real data application in Sections 4.1 and 4.2, respectively.A short discussion is given in Section 5.All technical proofs and extra simulations are relegated to the supplementary materials.

A New Test Procedure
For the purpose of comparison, we first review the tests proposed by Zhong and Chen (2011) and Zhang, Yao, and Shao (2018).We will compare the asymptotic relative efficiency of these two tests with our proposed procedure in Section 3.

Two Existing Tests Allowing for Universal (n, p)-Asymptotics
Let x i = (X i1 , . . ., X ip ) T .Suppose we have a random sample {(x i , Y i ), i = 1, . . ., n} drawn independently from the joint distribution of (x, Y).Throughout this article, we denote ,j,k,l) and n (i,j,k,l,r) denote summations that are taken over all possible permutations of distinctive indices.
The test in Zhong and Chen ( 2011) is constructed through a modified F-statistic under linear model assumption, where The test statistic in Zhang, Yao, and Shao (2018) is built upon the martingale difference divergence without model assumptions, where , respectively.See more details in (Zhang, Yao, and Shao 2018, eq. 7).(Zhong and Chen 2011, Theorem 3) and (Zhang, Yao, and Shao 2018, Theorem 2.2) established the asymptotic normality properties for {n(n − 1)/2} 1/2 ZC n,p and {n(n−1)/2} 1/2 ZYS n,p when both the dimension and the sample size go to infinity.
Both tests of Zhong and Chen (2011) and Zhang, Yao, and Shao (2018) require the existence of the second moments of covariates and are not invariant under scale transformations, which indicates their power performances heavily depend on the variance magnitudes of covariates.In Sections 3 and 4, we will show the advantages of scale-invariance property from both the asymptotic and the numerical perspectives.
In high-dimension setting, easy implementation is a desirable property for testing.Therefore, we are interested in the computational complexity of these two tests.Naively computing ZC n,p and ZYS n,p by (2.1) and (2.2) is very complicated.Our fast algorithm given in the following Section 2.4 can be adapted to calculate the numerators of ZC n,p and ZYS n,p , which have the computational complexity of O(np) and O{np log(n)}, respectively.The details of the adapted algorithms can be found in Section S.7 of the supplementary materials.The denominators of ZC n,p and ZYS n,p , namely σ 1 and σ 2 , can be computed in O(n 2 p) operations.

Cumulative Covariance Revisited
We next review the cumulative covariance that is introduced by Zhou et al. (2020).Let X and Y denote two random variables.Under the assumption that the second moment of Y is finite, the cumulative covariance CCov(Y|X) is defined as (2.3) where ( X, Y) is an independent copy of (X, Y) and I(•) is an indicator function.The cumulative covariance is nonnegative and equals zero if and only if E(Y|X) = E(Y).In this sense, the cumulative covariance can fully characterize the conditional mean dependence.An appealing property of the cumulative covariance is that it keeps invariant with respect to arbitrary strictly monotone transformation of X.This invariance property, however, is not shared by martingale difference divergence.Therefore, the test proposed by Zhang, Yao, and Shao (2018), which is built upon martingale difference divergence, requires the second moment of X be finite, while the CCov-based test allows it to be infinity.We shall elaborate this in detail in the sequel.

The CCov-Based Test Statistic in High Dimension
To test H 0 in (1.2), it is natural to use the summation of all marginal cumulative covariances, which is nonnegative and equals zero if and only if the pairwise differences between E(Y|X s ) and E(Y), for all s = 1, . . ., p, are identically zero.For simplicity, we assume that all the components of x are continuous so that the probability of a tie occurring in the data is zero.A natural estimator of (2.4) can be defined as where This sample version appears straightforward, however, W n,p involves several redundant terms that bring in asymptotically nonnegligible bias-terms in high dimension, resulting in a fragile size performance for the test based on W n,p .The details of bias-terms can be found in Appendix S.2, supplementary materials.To formulate the CCov-based test in high dimension, we consider instead 1, supplementary materials ensures that T n,p is an unbiased estimator of (2.4).Thus, T n,p is basically all we need to test H 0 in (1.2).For arbitrary strictly monotone transformations M s , we have ψ(X is , X js , X rs ) ψ(X ks , X ls , Consequently, the proposed test statistic T n,p is automatically scale-invariance.

Computational Algorithm
Directly computing T n,p through (2.5) has a computational complexity of order O(n 5 p).To reduce the computational burden, we provide a computationally efficient algorithm to implement the proposed statistic T n,p .First, for any s = 1, . . ., p, sort the n observation of After these two preliminary steps, the following theorem shows how to compute T n,p efficiently.
Theorem 1 guarantees that T n,p can be computed in only O{np log(n)} operations.

Asymptotic Null Distribution
To establish the asymptotic normality of our test statistic, we study Hoeffding decomposition (Serfling 1980) for the variance of T n,p , which is valid for diverging p. Define . ., n and s = 1, . . ., p, and symmetrize it by Write h(i, j, k, l, r) = p s=1 h s (i, j, k, l, r), for i, j, k, l, r = 1, . . ., n.
Then, the statistic T n,p has the following expression, h (i, j, k, l, r).
It is clear that T n,p is actually a U-statistic of order five.This finding is very useful in subsequent derivations.For c = 1, . . ., 5, let h (c) (z 1 , . . ., z c ) = E{h(1, 2, 3, 4, 5)|z 1 , . . ., z c } be projections of h to lower-dimensional sample spaces, where To determine the asymptotic form of T n,p , we study its variance decomposition for high-dimensional data.Toward this end, we impose the following assumption.
almost surely for some constants c and C.
This assumption is also considered by Patilea, Sánchez-Sellero, and Saumard (2016) and Zhang, Yao, and Shao (2018) to derive the asymptotic properties of their test statistics.Assumption 1 holds true for any p when If the model error {Y − E(Y|x)} and x are independent, and the fourth moments of the model error are bounded, then (2.6) is trivially true.
To establish the asymptotic normality of T n,p /S, we use the martingale central limit theorem (Hall and Heyde 2014, CLT).The following assumption is imposed to facilitate the proof of martingale CLT and is closely related to the typical condition (2.1) of Hall (1984).
Assumption 2. As p → ∞ and n → ∞, Assumption 2 is presented in an abstract way and can be made more explicit under specific dependence structures.To illustrate this, we consider the commonly encountered banded dependence structure, where the random vector x is m-dependent.In Section S.9 of the supplementary materials, it can be verified that Assumption 2 is trivially satisfied if m = o(p 1/3 ) for the divergent p.In particular, if m is a fixed constant, the above conditions are fairly mild when p is divergent.Moreover, there is no explicit relationship between p and n in Assumption 2. If the coordinates of x are independent but not necessarily identically distributed, p can grow to infinity freely as n → ∞.
To formulate a test procedure based on T n,p , we need to provide a suitable variance estimator for S 2 .We consider the following estimator, where Ẏi = Y i − Y, F n,s is the empirical distribution function of X s and c n = {(1 − n −1 ) 2 + n −2 } 2 is a finite sample adjustment factor to reduce the bias of S 2 n,p .See the proof of Theorem 3 for details.The variance estimator S 2 n,p has a computational cost of order O(n 2 p).In what follows, we derive the ratio-consistency of this variance estimator.
Theorem 3. Suppose that Assumptions 1 and 2 hold.Then, we have the ratio consistency that as n, p → ∞, S 2 n,p /S 2 converges in probability to 1. Consequently, {n(n − 1)/2} 1/2 T n,p /S n,p converges in distribution to the standard normal distribution under the H 0 .
Theorems 2 and 3 reveal that the asymptotic normality of our proposed statistic under the null hypothesis holds with no restriction on relative growth rate between p and n.Theorem 3 suggests that our proposed test rejects the null hypothesis H 0 in (1.2) at significant level α if {n(n − 1)/2} 1/2 T n,p /S n,p > z α , where z α is the 1 − α quantile of standard normal.

Asymptotic Distribution Under Alternatives
For the power analysis, we consider a class of alternatives H 1 satisfying These two conditions are assumed to describe a small difference between H 0 and H 1 in intuitive way.Thus, the underlying alternatives may be viewed as "local" alternatives.Rigorous definition of local alternatives perhaps may be arguably phrased in terms of contiguity, but this is beyond the scope of this article.Under H 1 , the variance of T n,p defined in Lemma 1 remains valid.In the following theorem, we derive the asymptotic distribution of our proposed test statistic under the alternatives, which allows for power evaluations.Based on Theorems 3 and 4 as well as Slutsky's theorem, the power of the proposed test under H 1 is where (•) is the cumulative distribution function of N(0, 1), and z α denotes the 1 − α quantile of N(0, 1).The power of our test is in spirit controlled by which can be viewed as a signal-to-noise ratio.

Asymptotic Relative Efficiency
It is challenging to compare our test with the tests of Zhong and Chen (2011) and Zhang, Yao, and Shao (2018) in a completely model-free context.We study the asymptotic powers of these three tests under high-dimensional linear models, and anticipate that similar conclusions can be drawn from nonlinear models.Let us consider the model where β = (β 1 , . . ., β p ) T , x = (X 1 , . . ., X p ) T ∼ N(0, ), and ε is independent of x with E(ε) = 0 and var(ε) = σ 2 .To illustrate the implication of SNR NEW , we consider a diagonal matrix = diag(d 1 , . . ., d p ).Following Theorem 1(3) in Zhou et al. (2020), we can derive that By contrast, the asymptotic power of Zhong and Chen's (2011) test depends on From (Shao and Zhang 2014, Theorem 1(3)) and (Székely, Rizzo, and Bakirov 2007, Theorem 7(ii)), the asymptotic power of Zhang, Yao, and Shao's (2018) test is related to Based on the signal-to-noise ratios of three tests, the asymptotic relative efficiency of Zhong and Chen's (2011) test with respect to our proposal is The asymptotic relative efficiency of Zhang, Yao, and Shao's (2018) test with respect to ours is To view a rough picture of the asymptotic power comparison, we consider three scenarios in what follows.

Homoscedastic Case: The Marginal Variance of Each Covariate is the Same
In the homoscedastic case,

Heteroscedastic Case When the Number of Nonzero Effects is Fixed
For simplicity, we assume that all nonzero coefficients β s have the same magnitude, that is,  Zhong and Chen's (2011) and Zhang, Yao, and Shao's (2018) tests.To further compare the asymptotic power performances of these two tests under the heteroscedastic case, we further impose the condition This condition is also mild if the variance of each covariate differs much.Under this assumption, we show the asymptotic power of Zhang, Yao, and Shao's (2018) test is superior to that of Zhong and Chen's (2011) test, which is also a totally new finding and have not been discovered by Zhang, Yao, and Shao (2018).Therefore, the asymptotic powers of three tests arranged in a descending order are those of our proposed test, Zhang, Yao, and Shao's (2018) test and Zhong and Chen's (2011) test.
Consider an explicit scenario satisfying (3.2) and (3.3):There is a parameter δ > 0 not depending on the dimension p such that d s s δ , for s = 1, . . ., p, where for two sequences {a s } and {b s }, we write a s b s if there exist positive constants c and C such that c ≤ lim inf s a s /b s ≤ lim sup s a s /b s ≤ C.
In the ultrahigh dimension setting log p n θ , the signal-tonoise ratios of three tests are, respectively, As the dimension p → ∞, all the three tests will have trivial powers and cannot distinguish the alternatives from the null.The statistical intuition behind this phenomenon is that for fixed-dimensional signals, high dimensionality is a total curse and the signal-to-noise ratios of three tests converge to zero.Even in this case, the convergence rate of our proposed test is still slower than those of Zhong and Chen's (2011) and Zhang, Yao, and Shao's (2018) tests.We can also give the explicit order of asymptotic relative efficiency of these three tests in this case.ARE(NEW, ZC) p δ , ARE(NEW, ZYS) p δ/2 , and ARE(ZYS, ZC) p δ/2 .In terms of the asymptotic power, the performance of our proposed test is the best, followed by Zhang, Yao, and Shao's (2018) test.Zhong and Chen's (2011) test is unfortunately the worst among the three tests.

Heteroscedastic Case When the Number of Nonzero
Effects is Diverging Assume that β s = κI(1 ≤ s ≤ q), s = 1, . . ., p, for κ = 0, q(≤ p) is diverging.All other settings are remained exactly the same as those in Section 4.2.Suppose that q p τ with 0 < τ < 1.In the ultrahigh dimension setting of log p n θ for some 0 < θ < 1, the signalto-noise ratios of three tests are, respectively, (a) If the signals are dense, that is, in the order of q = p τ with 1/2 ≤ τ < 1, all the signal-to-noise ratios SNR ZC , SNR ZYS , and SNR NEW go to infinity as p → ∞.Therefore, these three tests have nontrivial power under H 1 .
Moreover, no matter the signal is dense or sparse, the asymptotic relative efficiency of these three tests are ARE(NEW, ZC) = p δ(1−τ ) , ARE(NEW, ZYS) = p δ(1−τ )/2 , and ARE(ZYS, ZC) p δ(1−τ )/2 , which implies that our proposed test is still asymptotically more powerful than Zhong and Chen's (2011) and Zhang, Yao, and Shao's (2018) tests.In summary, the aforementioned power analysis suggests that compared to Zhong and Chen's (2011) and Zhang, Yao, and Shao's (2018) tests, our proposed test has a substantial efficiency gain in heteroscedastic case while maintaining high power efficiency in homoscedastic case.We shall verify this finding through numerical studies.

Simulation Studies
We conduct simulations to evaluate the finite-sample performance of the proposed test and compare it with the two universal (n, p)-asymptotic tests proposed by Zhong and Chen (2011) and Zhang, Yao, and Shao (2018).In addition, we compare our proposal test with the ART (McKeague and Qian 2015) and its related test proposed by Zhang and Laber (2015) in the supplementary materials.
Let us consider the following three models: where x i = (X i1 , . . ., X ip ) T is generated from the following moving average model: for δ ≥ 0, T = 8 and s = 1, . . ., p. Here, The coefficients {ρ k } T k=1 are generated independently from the uniform distribution on [0, 1] and are kept fixed once generated.The moving average model (4.4) implies that = cov(x i ) = (σ st ) p×p , consists of The parameter δ controls the degree of heteroscedasticity.We consider δ = 0, 0.25, 0.5, 0.75 and 1, where δ = 0 indicates that all covariates are homogeneous.The error term ε i follows two different distributions: N(0, 1) and the centralized gamma distribution with shape parameter 1 and scale parameter 1, where the centralized gamma distribution is skewed to the right.Under the null hypothesis H 0 , the coefficients in all three models are all zero.Following Zhong and Chen (2011) and Zhang, Yao, and Shao (2018), we consider two configurations of alternative hypothesis.(a) Nonsparse case: the total number of active covariates q = [p 0.7 /2], where [x] denotes the largest integer not greater than x.(b) Sparse case: the total number of active covariates q = [3p 0.3 /2].The coefficients are defined as follows.
where ||β|| 2 = 0.04.All other entries are identically 0. We choose n = 80, 120 and p = 550, 1116, according to p = [exp(n   Table 1 reports the empirical sizes and powers of our proposed test as well as those of Zhong and Chen's (2011) and Zhang, Yao, and Shao's (2018) tests for linear model (4.1).The empirical sizes of all three tests are reasonably close to 5% under two different error distributions.We display the kernel density estimates for the standardized test statistic T n,p for linear model (4.1) in Figure 1, which can be well approximated by a standard normal distribution.This confirms the theoretical results in Theorem 3. Since our proposed test is scale-invariance, its empirical sizes stay the same under different values of δ.This property is not shared by Zhong and Chen's (2011) or Zhang, Yao, and Shao's (2018) test.In terms of power, when δ = 0, Zhong and Chen's (2011) test is an obvious winner, and Zhang, Yao, and Shao's (2018) test is slightly better than our proposed test.The differences among three tests are not remarkable.This echoes the theoretical finding in Section 3 that our proposed test has little efficiency loss when each covariate has the same variance.However, when δ is larger than zero, the story becomes totally different.Our proposed test has the best performances, followed by Zhang, Yao, and Shao's (2018) test, and then Zhong and Chen's (2011) test.When δ = 1, the empirical powers of our proposed test and the two competitors are 0.998, 0.133, 0.056, respectively, under sparse H 1 with (n, p) = (120, 1116) and normal errors.Under this scenario, our proposed test significantly outperforms the competitors while the empirical power of Zhong and Chen's (2011) test is close to the significance level.This implies that even under the linear models, our proposed test has substantial efficiency model assumptions.Our test statistic is built on the cumulative covariance, which has an explicit form and is completely free of tuning parameters.The limiting distributions of our proposed test statistic are normal under both the null hypothesis and the alternatives.Our asymptotic power analysis and numerical studies show that even under the high-dimensional linear models, our proposed test has substantial power improvement compared to the tests of Zhong and Chen (2011) and Zhang, Yao, and Shao (2018) in the heteroscedastic covariates setting, while maintaining high efficiency in the homoscedastic cases.It is also important to remark here that our proposed testing procedure can be easily generalized to multivariate response case.For the multivariate response y ∈ R q (q > 1), where q is allowed to be large but finite, we can analogously define E cov{y T , I(X < X)| X}cov{y, I(X < X)| X} , where ( X, y) is an independent copy of (X, y).This metric reduces to (2.3) in the univariate case of q = 1.The corresponding test statistic is further defined by ,j,k,l,r) (y i − y j ) T (y k − y l ) ψ(X is , X js , X rs )ψ(X ks , X ls , X rs ).
The computationally efficient algorithm and theoretical analysis for our test statistic in (2.5) can be directly applied to the above statistic.It is also worth noting that our test procedure is built on a sum-of-squares-based statistic and targets dense alternatives.
To enhance its power for sparse signals, we suggest to follow the

Figure 2 .
Figure 2. The standard deviation of each covariate.
p} is fixed.In this setting, if we assume the condition

Table 1 .
The empirical sizes and powers for linear model (4.1) at the significance level 5%, where δ controls the degree of heterogeneity in terms of the covariate variances.

Table 4 .
The empirical powers for ZC, ZYS tests and our proposed test.