High-dimensional asymptotic expansion of the null distribution for Schott’s test statistic for complete independence of normal random variables

Abstract This article is concerned with the testing complete independence for the elements of observed vector. Schott proposed the testing statistic T and gave limiting null distribution under the high-dimensional asymptotic framework that the sample size n and the dimensionality p go to infinity together while p/n converges to a positive constant. In this article we give a one-term asymptotic expansion of the null distribution for T as min{n,p} tends toward infinity. We derive a correction of the critical point for Schott’s test based on this expansion. The finite sample size and dimensionality performance for attained significance level is evaluated in a simulation study and the results are compared to those of Schott’s test.


Introduction
This article is concerned with a high-dimensional tests for complete independence of variables comprising the p Â 1 vector x having a multivariate normal distribution with covariance matrix R: Let x 1 , … , x N be a random sample from p-dimensional normal distribution with covariance matrix R ¼ ðr ij Þ: We consider to test for the null hypothesis ffiffiffiffiffiffiffiffiffi ffi r ii r jj p ¼ 0, 8 i, j 2 f1, 2, :::, pg, i > j against all alternatives.Let R ¼ ðr ij Þ be the sample correlation matrix defined by r ij ¼ The likelihood ratio(LR) test rejects the null hypothesis when jRj is small.An extensive overview is provided in classical multivariate analysis literature; see, for example, Muirhead (1982), Anderson (2003), and Fujikoshi, Ulyanov, and Shimizu (2010).Clearly, LR testing statistic can not be used whenever p > n ¼ N À 1 because jRj ¼ 0: As a test statistic which is applicable for p > n Schott (2005) considered the sum of squared r ij for i > j.Schott (2005) showed that T ¼ P p i¼2 P iÀ1 j¼1 r 2 ij À pðp À 1Þ=ð2nÞ converges in distribution to the normal distribution with mean 0 and variance limr 2 under a high-dimensional asymptotic framework; that is, p and n approach infinity in such a way that limðp=nÞ ¼ c 2 ð0, 1Þ, where r 2 ¼ pðp À 1Þðn À 1Þ n 2 ðn þ 2Þ : (1) Srivastava (2005) proposed the same testing statistic as Schott (2005).In addition, Srivastava (2005) reported the testing statistic which is related to the elements of S, only.Mao (2014) gave a test based on the sum of r 2 ij =ð1 À r 2 ij Þ for i > j, and claimed that the limiting null distribution is normal under high-dimensional asymptotic framework which is the same as the treated one in Schott (2005).Chang and Qi (2018) asserted that the asymptotic normality for both of Schott (2005)'s test statistic and Mao (2014)'s testing statistic does not need the condition limðp=nÞ ¼ c 2 ð0, 1Þ in the high-dimensional asymptotic framework.He et al. (2021) generalized Schott (2005)'s test statistic as the norm of the vector comprised by the off-diagonal elements of R and proved asymptotic normality as n ! 1, p ! 1: Aforementioned results are based on the limiting distribution.The present study, however, focuses on asymptotic expansion under high-dimensional asymptotic framework.Edgeworth expansion for the LR statistic is given by Akita, Jin, and Wakaki (2010), Kato, Yamada, andFujikoshi (2010), andYamada (2012).These results are derived under the high-dimensional asymptotic framework that p ! 1, n ! 1, p=n !c 2 ð0, 1Þ: In this article, we give a one-term asymptotic expansion of the null distribution for the normalized Schott (2005)'s test statistic T=r as minfn, pg tends to infinity by making use of the result of asymptotic expansion for martingale.It is worth noting that the condition limðp=nÞ ¼ c 2 ð0, 1Þ does not need in our result.We propose a correction of the critical point for Schott (2005)'s test from the expansion.
The remainder of this article is organized as follows.In Sec. 2, we show asymptotic expansion for martingale given in Mykland (1993).The main result is presented in Sec. 3. By using asymptotic expansion, we correct the critical point for Schott (2005)'s test, which is given in Sec. 4. We examined small scale simulation for assessing the precision of the asymptotic approximation, of which results are written in Sec. 5. Proof of the core theorem is provided in Appendix A, and additional theoretical results are given in the online supplementary material.
Hereafter, we use the notation P n, k as the number of k-permutations of n, that is, The symbol ðxÞ n, k denotes k-Pochhammer symbol (cf.D ıaz and Pariguan 2007) defined by Note that the Pochhammer symbol ðxÞ n equals to ðxÞ n, 1 : Define ðxÞ 0, k ¼ 1 for k 2 N: In addition, we define " " as the asymptotic equivalence, as follows: We denote "! p " and "! D " as the probability convergence and the convergence in distribution, respectively.The asymptotic theory is considered under minfn, pg !1: We shall use the notation " P p i 1 , :::, i k " as the sum over all different indices i 1 , :::, i k 2 f1, 2, :::, pg; for example,

Preliminaries
Let ðf ðnÞ i Þ i2N be a sequence of martingale difference with respect to a filtration ðF i Þ i2N of the underlying probability space ðX, F , PÞ, ðp n Þ be a sequence such that lim n!1 p n ¼ 1 through fp n : n 2 Ng & N and ðr n Þ be a sequence such that lim n!1 r n ¼ 0: An asymptotic expansion of the distribution for P p n i¼1 f ðnÞ i can be obtained by simplifying the result for Studentized version of the martingale given in Mykland (1993).To state this result, we firstly define the order proposed by Mykland (1992).
Definition 1. Suppose that fG n : n 2 Ng is a set of functions of finite variation.Define for every set C which is a collection of twice continuously differentiable functions g : R !R satisfying: (i) there is an M > 0 such that jgðxÞj M, jg 0 ðxÞj M and jg 00 ðxÞj M for all x and all g 2 C, and (ii) fg 00 : g 2 Cg is equicontinuous almost everywhere with respect to Lebesgue measure.Next, we state the conditions for the asymptotic expansion.
Integrability condition for the fourth-order variation.
Integrability condition for the square variation.There exist k and k with being that 0 where Ið:Þ denotes the indicator function.
The central limit condition.Suppose that Varð P p n k¼1 f ðnÞ i Þ ¼ 1: Then there exist Borelmeasurable functions w 1 and w 2 , so that, whenever Now we give the asymptotic expansion proposed by Mykland (1993).

High-dimensional asymptotic expansion
In this section, we shall give a one-term asymptotic expansion of the null distribution for T= ffiffiffiffiffi r 2 p as minfn, pg !1: Put where Under the assumption that H 0 : q ij ¼ 0 ði > jÞ, r ij can be expressed as r ij ¼ u 0 i u j , where u 1 , :::, u p are independently distributed, each having a uniform distribution on the surface of the n-sphere.Then we have Let F i be a r-algebra generated by fu 1 , :::, u i g and F 0 ¼ f;, Xg: Then F iÀ1 & F i for i 2 N: Thus T 0 is considered as a martingale with respect to filtration ðF k Þ: The main goal of this article is to obtain an asymptotic expansion of the null distribution for T 0 as minfn, pg !1: In this asymptotic setting, one can see that p ¼ p n for fixed n such that lim n!1 p n ¼ 1 through fp n : n 2 Ng & N: Then it is observed that i : For this reason we can derive asymptotic expansion by according to Theorem 1.
As the rate of the convergence as minfn, pg ! 1, define From the perspective given in the above paragraph, we can see that r n, p ¼ f1=n, 1= ffiffiffiffi ffi p n p g ¼ r n ¼ oð1Þ as n !1: Hereafter, we write p instead of p n unless confusion.
We shall give a main result in the following theorem.
Theorem 2. Under the assumption that H 0 is true, , where T 0 is defined by ð2Þ, r n, p is defined by ð3Þ, Theorem 2 is deduced from Theorem 1 by taking f The following lemma is used to show that the integrability conditions and the the central limit condition hold.
Lemma 1.Let u 1 , u 2 , u 3 are independently distributed, each having a uniform distribution on the surface of the n-sphere.Then Proof.Let z ¼ ðz 1 , z 2 , :::, z n Þ 0 be a random vector distributed as n-dimensional normal distribution with mean 0 and covariance matrix I n , identity matrix, and be independent to fu 2 , u 3 g: It can be observed from Lemma 1 in Yamada, Himeno, and Sakurai (2017) that the distribution for ðu 0 2 z, u 0 3 zÞ is equal to the one for : By making use of this equivalence we can easily derive the closed-form expressions for E½ðu 0 2 zÞ 2k ju 2 and E½ðu 0 2 zÞ 2k ðu 0 3 zÞ 2' ju 2 , u 3 : On the other hand, letting u 1 ¼ z= ffiffiffiffiffiffi z 0 z p and r ¼ z 0 z, u 1 is independent of r, and r is distributed as the chi-squared distribution with n degrees of freedom.From this result we can find that E½ðu 0 The desired result is obtained by solving these equations.
In the following subsections, we shall show that the integrability conditions and the central limit condition hold.
3.1.Proof of the integrability condition for the fourth-order variation From ( 22) in the supplementary material and the fact that r À2 n, p p, we have where By virtue of the definition for r 2 given by ð1Þ, the right-hand side of the inequality is dominated by 2

Proof of the integrability condition for the square variation
It can be expressed that From ð16Þ in Appendix A, we have where the third equality follows from Lemma 1 and the first inequality follows from the fact that r À2 n, p n 2 : In view of ð4Þ, we can find that where the second inequality follows from the Chebyshev's inequality.Thus, we find that, for any 0

Proof of the central limit condition
In this section we show that the central limit condition is satisfied.Define Firstly, we shall express r À1 n, p T 1 as the form of the sum of centralized terms.The statistic r À1 n, p T 1 is given by Expanding the square in ð5Þ, and writing the sum of centralized terms, we have where (7) Note that these centralizedness can be confirmed from Lemma 1.
Next, we shall give a closed-form expression for r À1 n, p T 2 : It can be expressed that In view of Lemma 1, we can derive and so where From ð2Þ, ð6Þ and ð8Þ, where Theorem 3. The statistic T1 defined by ð7Þ converges to 0 in probability.
Proof.From the definition we have Eð where the last equality follows from the fact that r 2 p 2 n À2 and It is found that Varð T 1 Þ converges to 0 as minfn, pg !1: The theorem follows from the Chebyshev's inequality.
Theorem 4. Let fg n g be a sequence of random vector defined in ð10Þ.Then where V ¼ ðv ij Þ is non-negative definite symmetric matrix whose components are defined as follows: Here, r n, p is defined by ð3Þ: The proof of Theorem 4 is given in Appendix A. Note that we use ð11Þ in Theorem 4. Now we turn to show the central limit condition.From Theorem 3 and Theorem 4, the following convergence in distribution is derived by applying Slutsky's theorem to ð9Þ: where the diagonal matrix whose diagonal entries starting in the upper left corner are 1, 2, 4. It is noted that the convergence in ð12Þ is followed from ð11Þ: , we observe that q 21 ¼ q 31 , and therefore The central condition is satisfied by virtue of ð13Þ: From Theorem 1, we have where This is the result given in Theorem 2.

Correction of critical point
In this section we shall provide a correction of critical point for test based on T 0 ¼ T=r as an application of Theorem 2. Define the quantity h 1 by Covariances CovðT 0 , T 1 Þ and CovðT 0 , T 2 Þ are given as follows: CovðT From this convergence we have Consequently, the following asymptotic expansion is obtained: As minfn, pg ! 1, where The following theorem immediately holds from expansion ð14Þ: Under the assumption that H 0 is true, as minfn, pg !1: Theorem 5 shows that PrðT 0 x cf Þ À UðxÞ is the term of order o 2 ðr n, p Þ: In this sense the precision of the approximation PrðT 0 x cf Þ % UðxÞ is better than the one of PrðT 0 xÞ % UðxÞ: We note that h 1 in Theorem 5 can be n À1 by ignoring the term o 2 ðr n, p Þ in ð15Þ:

Simulation results
To see the performance of our approximation for finite sample, we show the graph of loggitðUðxÞÞ, loggitðPrðT 0 xÞÞ and loggitðAEðxÞÞ for the case in which n ¼ p ¼ 128, where loggitðxÞ ¼ log fx=ð1 À xÞg and AEðxÞ is asymptotic expansion of the null distribution for T 0 based on Theorem 2, which is as follows: The true distribution function PrðT 0 xÞ is obtained by Monte Carlo simulation based on 1,000,000 repetition.From Figure 1, we can observe that our correction gives a good approximation than the limiting distribution UðxÞ in the both tails.Next, we investigated attained significance level (ASL) based on r ¼ 10,000 independent samples.With r independent samples under the null hypothesis, ASL for the testing statistic T with the critical point c was computed as where t i is the value of the testing statistic T based on i-th sample, i ¼ 1, :::, r: We calculated ASL for c ¼ z 1Àa À ð2=3Þh 1 ð1 À z 2 1Àa Þ and wrote it in Table 1 for the case in which a ¼ 0:05, where z 1Àa is the 100ð1 À aÞ% quantile of the standard normal distribution.The setting of the critical point is based on Theorem 5. Here, h 1 was computed by following the definition ð15Þ: We performed simulation for n, p 2 f4, 8, 16, 32, 64, 128, 256, 512g: To compare these results with the limiting approximation, we had tabulated the values of ASL for c ¼ z 1Àa under the same setting in Table 2.For each case of the combination of n and p, it is expected that values in Table 1 are closer to 0.05 than those in Table 2. Especially, it seems that our correction gives considerable improvement for the case in which either n or p is small.In addition, we demonstrated simulation for the case in which a ¼ 0:01, and wrote these results in Tables 3 and 4.These tendencies can be seen for the case in which a ¼ 0:01: Through simulation results, we can roughly say that the test becomes conservative by using the critical point given in Theorem 5.In addition, our correction has good precision in the tail side of percentile.

A Proof of Theorem 4
It can be easily checked that fg i g is the sequence of martingale difference with respect to the filtration ðF k Þ: By virtue of Corollary 6.3 in Kł opotowski (1977), martingale central limit theorem for fg i g holds if the following conditions are satisfied:

&
These conditions are asserted when the following conditions are claimed, respectively: In view of the conditional expectation, we can write them as Derivations for the formulae ( 16) are simple but tedious; therefore, it is omitted here but provided in the supplementary material.Now, we treat the quantity VarðY 1 Þ: It can be seen that From the Jensen's inequality, the right-hand side of ð17Þ is dominated by where B 1 and B 2 are the first and the second expectations within the braces in the left hand-side of ð18Þ, respectively.A closed-form expression for B 1 can be derived by using Lemma 1 in the following way: This implies B 1 ¼ Oðp 6 n À2 Þ: By using the same derivation, we have

Table 2 .
ASL with the critical point z 1Àa based on 10,000 samples when a ¼ 0:05: