Bootstrap methods for multivariate hypothesis testing

ABSTRACT The nonparametric and parametric bootstrap methods for multivariate hypothesis testing are developed. They are used to approximate the null distribution of the test statistics proposed by Duchesne and Francq (2015), resulting in bootstrap testing procedures. In the problem of testing for the mean vector of a multivariate distribution, the asymptotic validity of the bootstrap methods is proved. The finite sample performance of the new solutions is demonstrated by means of Monte Carlo simulation studies. They indicate that for small-sample size, the bootstrap tests provide a better finite sample properties than the asymptotic tests considered by Duchesne and Francq (2015).


Introduction
Let us follow a framework of general testing problem introduced by Francq (2010, 2015). Assume that X i = (X i1 , . . . , X id ) , i = 1, . . . , n, are d × 1 random vectors (d ≥ 1), whose entire distribution is not specified. However, this distribution is supposed to be dependent on a parameter of interest μ = (μ 1 , . . . , μ p ) , p ≥ 1, and a nuisance parameter ξ, which generally belongs to an infinite dimensional parameter space. Let θ = (μ , ξ ) . We are interested in verifying H 0 : μ = μ 0 versus H 1 : μ = μ 0 . Let Z n = √ n(T n − μ 0 ), where T n = (T n1 , . . . , T np ) is a sequence of estimators of μ = (μ 1 , . . . , μ p ) . Under the null hypothesis, assume that as n → ∞, where d −→ denotes convergence in distribution and 0 p is the p-dimensional null vector. The possibly singular covariance matrix 0 = (θ 0 ) = 0 p×p is taken to be unknown, where θ 0 = (μ 0 , ξ 0 ) . Many particular hypothesis testing problems of statistics and econometrics may be represented by a general testing framework described above. For instance, a test for the mean vector of a multivariate distribution with possibly singular dispersion matrix is of interest in multivariate analysis (Bhimasankaram and Sengupta, 1991;Moore, 1977). We consider this problem in our simulations in Section 5. More examples of specific testing problems are described in Sections 1.1 and 6 of Francq (2010, 2015) (see also Duchesne and Francq, 2008;Hansen, 1982;Ljung, 1986;Moore, 1977;Robinson, 1972).
The Wald-type test statistic for testing H 0 : μ = μ 0 against H 1 : μ = μ 0 is defined by the formula where W n is a weight matrix. The null hypothesis H 0 : μ = μ 0 is rejected for large values of this test statistic. The Wald-type test statistic can be used whether 0 is singular or nonsingular. Let n be an estimator of the asymptotic covariance of Z n , such that n a.s. −→ 0 under the null. This estimator depends on the problem at hand (e.g., in multivariate sampling, the sample covariance matrix S n can be used). For different weight matrices, we can construct usually different test statistics of the form (2) whose performance is not the same in general. A few weight matrices and the corresponding tests were considered by Francq (2010, 2015). They propose the following matrices as W n : the identity matrix of size p, denoted by I p , leading to test statistics based on the Euclidian norm of Z n ; a {2}-inverse of n , denoted by − k n (Getson and Hsuan, 1988); and finally the Moore-Penrose inverse of n , denoted by + n . The testing procedures based on these weight matrices are asymptotic, i.e., the critical values are the (1 − α)-quantiles of the asymptotic distributions of the test statistics under the null. The results of Francq (2010, 2015) show that these tests are asymptotically valid, i.e., they are of asymptotic level α under the null hypothesis and consistent under fixed alternatives, under additional assumptions. However, it is worth noting that large sample size n is usually required to obtain a satisfactory approximation (see the simulations in Section 5). For small-sample size, the tests do not keep the preassigned type I error, some of them are conservative while the others are liberal. Such behavior of testing procedures based on the Wald-type statistic is quite common. For instance, Konietschke et al. (2015), Pauly et al. (2015), and Smaga (2015) had this problem in hypothesis testing in general factorial designs and partially solved it using bootstrap and permutation methods.
The aim of the present article is to improve the small-sample behavior of the tests proposed by Francq (2010, 2015). This is achieved by applying an adequate bootstrap approach to the tests. The proposed bootstrap methods are shown (theoretically or empirically) to be asymptotically valid at least for some special cases of the general testing problem under consideration.
This article is structured as follows. In Section 2, the asymptotic tests of Francq (2010, 2015) are discussed in more detail. The information given there is needed in the following sections. The novel nonparametric and parametric bootstrap testing procedures are then explained in Sections 3 and 4, respectively, where their theoretical properties are also stated. In Section 5, the methods are compared with respect to type I error rates and power using Monte Carlo simulation. Finally, a summary section concludes this article (Section 6).

Asymptotic testing procedures
This section presents the information about the asymptotic tests considered by Francq (2010, 2015), e.g., the null asymptotic distributions, conditions for consistency, and remarks about implementation. Here, some assumptions, definitions, and notation needed in the remainder of the article are also given. The reader can find this section useful to better understand the results of the present article.
A {2}-inverse of a matrix is any matrix * satisfying * * = * (the second relation defining the Moore-Penrose inverse). The {2}-inverses of 0 are constructed in the following way. For k = 1, . . . , r 0 − 1, let is not a generalized inverse of 0 and it is not uniquely defined, when some eigenvalues of 0 display multiplicities. The next problem with {2}inverse is that n a.s.
To overcome these problems, Francq (2010, 2015) considered that two eigenvalues are identical when the absolute value of their difference is less than a tolerance ε > 0, and they define the algorithm U B,k,ε , where B is an arbitrary but fixed basis of R p , which transforms n into − k n,B,ε . The obtained {2}-inverse is uniquely defined and under the following assumption, which is satisfied for almost all basis B, where P B is a unique matrix called B-eigenvector matrix of 0 defined by Francq (2010, 2015).
Assumption A( , ε) Let B = {u 1 , . . . , u p } be a given basis of R p . The eigenvalues of are λ 1 ≥ λ 2 ≥ · · · ≥ λ r > 0 and λ r+1 = · · · = λ p = 0. The tolerance ε > 0 is such that: In Smaga (2015), a recent application of {2}-inverses in hypothesis testing in general factorial designs is given. Moreover, Smaga checked numerically the validity of the above assumption for = TAT, a form of covariance matrix considered in general factorial designs, where A is the identity matrix, completely symmetric matrix or "autoregressive" covariance matrix, and T is certain hypothesis matrix. Slight modifications of TAT did not result in completely different {2}-inverses, confirming the continuity of U B,k,ε . The condition for eigenvalues was also usually satisfied. Only for "autoregressive" covariance matrix and the autoregressive parameter close to 0 or ±1, a smaller ε than usual (see the next paragraph) may be needed for satisfying this condition.
The basis B composed of the columns of I p and ε = .Machine$double.eps, where .Machine$double.eps = 2.220446 · 10 −16 is the machine epsilon in the R language (R Core Team, 2015) are used to compute − k n,B,ε by the algorithm of Francq (2010, 2015).
Similarly as for {2}-inverse, n a.s. (Andrews, 1987). For this reason, a tolerance ε > 0 is also required for assessing the nonzero eigenvalues of n . Let n = P n n P n be the spectral decomposition of n , where n = diag(λ 1 ( n ), . . . , λ p ( n )). Instead of n we use n,ε = P n n,ε P n , where n,ε is the matrix obtained by replacing by zero the elements of n which are less than ε. Under a condition given in Theorem 2.1, we have + n,ε a.s. −→ + 0 . In the R program, we can use the same ε as for the tests based on {2}-inverses. The asymptotic null distributions of the test statistics Q n (I p ), Q n ( − k n,B,ε ) and Q n ( + n,ε ) are given in the following theorem. −→ 0 , as n → ∞.

Under the above assumptions,
. . , r 0 correspond to independent random variables of N(0, 1) distribution.
2. Let B = {u 1 , . . . , u p } be a basis of R p . Under assumption A( 0 , ε), if k ≤ r 0 − 1, it fol- If ε > 0 is sufficiently small, so that P(rank( n,ε ) = rank( 0 )) → 1, as n → ∞, it fol- By Theorem 2.1, the critical regions corresponding to the asymptotic tests based on Q n (W n ) with W n = − k n,B,ε , + n,ε are easily defined. The critical region for testing procedure based on Q n (I p ) is of the form {Q n (I p ) ≥ c α (λ 1 , . . . ,λ p )}, whereλ 1 , . . . ,λ p are the eigenvalues of n and c α (λ 1 , . i . This quantile can be evaluated by means of the Imhof 's algorithm (Imhof, 1961). Thus, the testing procedure based on Q n (I p ) is called the Imhof-based test. Using the imhof function available in the R package CompQuadForm (Duchesne and Lafaye De Micheaux, 2010), we can calculate p-values of the test based on Q n (I p ) using this algorithm. In practice, the unknown rank r 0 has to be estimated based on n , unless we know its theoretical value, e.g., in testing for the vector of cell probabilities in a multinomial model we have r 0 = p − 1.
Consistency of the asymptotic tests is established by the next theorem. Consider the spectral decomposition 1 = P 1 1 P 1 with P 1 P 1 = I p of the covariance matrix of the asymptotic distribution of is the eigenvector space associated with the eigenvalues λ 1, j 1 , . . . , λ 1, j l of 1 .

Then
The Imhof-based test is consistent under all alternative hypotheses. Unfortunately, this is not true for the tests based on {2}-inverses. If μ 1 − μ 0 belongs to a specific subspace of is consistent and may be even more powerful than the other methods. However, when μ 1 − μ 0 does not belong to such subspace, then this testing procedure may be completely powerless. Similar conclusions are true for the Moore-Penrose inverse test, but this test is consistent for a wider class of alternatives that the tests based on Q n ( − k n,B,ε ). The amount of the power of the tests depends on the alternative and in general, none test is dominated by another one. Moreover, the power of the testing procedures based on {2}-inverses and Moore-Penrose inverse depends on whether or not μ 1 − μ 0 belongs to a specific subspace of the nonzero eigenspace of the covariance matrix. In choosing possibly the most powerful test in practice, a careful examination of the spectral decomposition of a consistent estimator of the covariance matrix may be useful. For more evidence, we refer to the results about the Bahadur slopes and the asymptotic local powers of the tests obtained by Francq (2010, 2015) and the simulation studies presented there and those of Section 5 of this article.

A nonparametric Bootstrap method
In this section, a nonparametric bootstrap approach is considered. First, the steps of the procedure are described. Some remarks about the implementation are also given. Next, the validity of this bootstrap method is shown in certain special case of our general framework.

A nonparametric Bootstrap procedure
. . , n, be a nonparametric bootstrap sample drawn with replacement from the original sample X i , i = 1, . . . , n. Denote by T * n , * n , W * n and Q * n (W * n ) the nonparametric bootstrap versions of the estimators T n , n , the weight matrix W n and the Wald-type test statistic Q n (W n ). A weight matrix can be one of those proposed by Francq (2010, 2015) (see Section 2). Of course, when W n = I p , W * n is also equal to I p . We propose the following simple bootstrap procedure: 1. Compute T n and Q n (W n ) for the original sample data.
n and W * ,b n are the values of the estimator T * n and the weight matrix W * n for the bth bootstrap sample.
where I(A) stands for the usual indicator function on the set A (takes value 1 if A is true and 0 otherwise). The proposed nonparametric bootstrap procedure may be easily implemented in the R program (R Core Team, 2015). One thing worth noting is that we use only those bootstrap samples for which the rank of * n is the same as the rank of the estimator n . The idea behind this is to possibly obtain a more accurate finite sample approximation by better mimicking the given covariance structure of the original observations. Moreover, it is needed to perform the tests based on {2}-inverses, namely, when rank( * n ) is smaller than rank( n ), {2}-inverses of * n for k such that rank( * n ) < k ≤ rank( n ) do not exist. The rank issue is a serious problem for very small-sample sizes. Fortunately, the proportion of rejected bootstrap samples decreases quite fast with increasing sample size, which seems to be theoretically confirmed in the third paragraph after Theorem 3.1 below. For instance, Table 15 in the Supplementary Materials presents the numbers of rejected bootstrap samples for which rank( * n ) = rank( n ) per 1000 for some data generated similarly as in Section 5. More obvious is that, for each bootstrap sample, the basis B used to calculate ( * n ) − k B,ε has to be the same as that chosen for − k n,B,ε (see Section 2). We do not choose it separately for bootstrap samples.

Validity of nonparametric Bootstrap
In order to obtain a valid bootstrap test, the conditional distribution of the Wald-type bootstrap statistic Q * n (W * n ) should always approximate the null distribution of Q n (W n ). Obviously, we are not able to show this in the general framework under consideration. However, it is possible theoretically for the test statistics with W n = I p , S + n,ε and empirically for these with W n = (S n ) − k B,ε in the problem of testing for the mean vector of a multivariate distribution with possibly singular dispersion matrix.
Before presenting the result let us introduce the following notation. Assume that X i = (X i1 , . . . , X ip ) , i = 1, . . . , n, are independent, with common multivariate distribution in R p , where E(X 1 ) = μ and Cov (X 1 ) = are unknown parameters. Aggregate the individual vectors into X = (X 1 , . . . , X n ) . We consider T n =X n = 1/n n i=1 X i . Let denote the bootstrap versions of the sample mean and the sample covariance matrix, respectively.
The proof of Theorem 3.1 is deferred to the Appendix. Assume that W n = I p , S + n,ε and respectively W * n = I p , (S * n,ε ) + . Under appropriate assumptions, the above result shows that the conditional bootstrap distribution of Q * n (W * n ) always mimics the null distribution of the test statistic Q n (W n ) (see Theorem 2.1). The asymptotic behavior of the conditional bootstrap distribution of Q * n (W * n ) does not depend on μ and it always approximates the unconditional null distribution of Q n (W n ). To be more precise, under assumptions of Theorems 2.1 and 3.1, we have for any underlying μ ∈ R p , as n → ∞, where P μ (Z ≤ x|X) and P μ (Z ≤ x) denote the conditional and unconditional distribution function of Z, respectively, under the assumption that μ is the true underlying parameter. Such property of a resampling procedure is desirable. For instance, the bootstrap tests of Dette and Paparoditis (2009) and Konietschke et al. (2015) or the permutation tests of Pauly et al. (2015) and Smaga (2015) also have this property. The reason for this is that the property ensures that when the null hypothesis is true, the asymptotic distribution of Q n (W n ) and asymptotic conditional distribution of Q * n (W * n ) are the same. Therefore, both asymptotic and bootstrap testing procedures are of asymptotic level α. Moreover, if the null hypothesis is not true and the assumptions of Theorem 2.2 hold, the asymptotic test Q n (W n ) is divergent, while the asymptotic conditional distribution of Q * n (W * n ) is still the null distribution of Q n (W n ), as n → ∞. Thus, the nonparametric bootstrap test is consistent. It also has the same power as the test based on Q n (W n ), asymptotically.
As we mentioned in Section 3.1, for certain bootstrap samples, (S * n ) Similarly as for n,ε and , convergence in conditional probability of S * n,ε to (see the proof of Theorem 3.1) does not entail convergence of their Moore-Penrose inverses, as indicated by Andrews (1987). The condition about the ranks of S * n,ε and in Theorem 3.1 is needed to guarantee that convergence. Although numerical investigation suggests that this condition is satisfied, it is not easy (if possible at all) to check it, but we can show that the probability of some subevent of {rank(S * n,ε ) = rank( )} converges to zero. Let r = rank( ) < p < n. Moreover, assume that the vector of all observations X = (X 1 , . . . , X n ) is given. The number of all possible bootstrap samples is n n . Now, we calculate the number of bootstrap samples for which rank(S * n,ε ) is surely less than r. The rank of S * n,ε is less than r, when the number of different vectors in bootstrap sample is less than or equal to r. Let x = 1, . . . , r be fixed. We have n x possibilities for choosing x vectors from n ones. Let A i denote the set of bootstrap samples that the ith vector of these chosen appears at least once in the bootstrap sample, i = 1, . . . , x. Then, |A 1 ∩ · · · ∩ A x | = x n − |A 1 ∪ · · · ∪ A x | is the number of bootstrap samples containing only these x vectors chosen. By the inclusion-exclusion principle, we have Denote the right side of (3) by f (x, n). Then, the conditional probability of that the number of different vectors in bootstrap sample is less than or equal to r is equal to r x=1 n x f (x, n) n n , and hence it converges to zero. This indicates that the probability of that for bootstrap sample rank(S * n,ε ) is less than r is likely to converge to zero. Unfortunately, there may be a problem with the observations X which results in rank(S * n,ε ) = rank( ). This indicates that the rank condition is needed to be assumed.
Unfortunately, despite the asymptotic validity of the nonparametric bootstrap tests, they may not work well under small-sample size (see Section 5). In the next section, we propose a parametric bootstrap solution (less frequently considered in nonparametric framework), which seems to improve the finite sample behavior of some of test statistics under consideration.

A parametric Bootstrap method
In this section, we consider a parametric bootstrap approach, which is also called an asymptotic model based bootstrap. This method is usually used in parametric models (Jafari and Kazemi, 2013;Krishnamoorthy and Lu, 2010;Krishnamoorthy et al., 2007;Ma and Tian, 2009;Tian et al., 2009;Xu, 2015;Xu et al., 2013). However, as it was recently shown in Konietschke et al. (2015) and is presented in this article, a parametric bootstrap method may be successfully applied in nonparametric framework. The idea is to consider parametric bootstrap samples from multivariate normal distribution, whose mean is dictated by the null hypothesis (i.e., μ = μ 0 ) and the covariance matrix is connected with the observations (i.e., = n ).
∼ N p (μ 0 , n ). Observe that these vectors are connected with the situation when the null hypothesis is true. Let T • n , • n , W • n , and Q • n (W • n ) be the estimators T n , n , the weight matrix W n , and the Wald-type test statistic Q n (W n ) calculated from the parametric bootstrap variables. All weight matrices considered in Section 2 can be used. If W n = I p , then obviously W • n := I p . We propose the following parametric bootstrap test: 1. Compute n and Q n (W n ) for the original sample data.

Generate B independent parametric bootstrap samples
3. For each parametric bootstrap sample, compute the value of Similarly to the nonparametric bootstrap method, the parametric bootstrap test may be easily implemented in the R language (R Core Team, 2015). Moreover, the same remarks about implementation as for nonparametric case are true for the parametric bootstrap procedure. However, it is worth mentioning that for parametric bootstrap, our simulation studies suggest that the proportion of rejected bootstrap samples for which rank( • n ) = rank( n ) is very small or even is often equal to zero. Thus, the rank issue for the parametric bootstrap tests is much less serious problem than for nonparametric ones even for very small-sample sizes.

Validity of parametric Bootstrap
Similarly as for the nonparametric bootstrap method, we show that the parametric bootstrap testing procedures are asymptotically valid under the problem of testing for the mean vector of a multivariate distribution with possibly singular dispersion matrix introduced in Section 3.2. Denote by the parametric bootstrap versions of the sample mean and the sample covariance matrix, respectively. The proof of the following result is outlined in the Appendix.
Theorem 4.1. Under the notation of Section 3.2 and that given above, r = rank( ): 1. The conditional distribution of Q • n (I p ) weakly converges to r i=1 λ i N 2 i in probability for any underlying μ ∈ R p , as n → ∞, where λ 1 ≥ λ 2 ≥ · · · ≥ λ r > 0 are the eigenvalues of and N i , i = 1, . . . , r correspond to independent random variables of N(0, 1) distribution. 2. When ε > 0 is sufficiently small, so that P(rank(S • n,ε ) = rank(S n )|X) → 1, as n → ∞, and rank(S n (ω)) = rank( ) for all ω and for all but a finite number of n, it follows that the conditional distribution of Q • n ((S • n,ε ) + ) weakly converges to the central χ 2 rdistribution in probability for any underlying μ ∈ R p , as n → ∞.
1. Let W n = I p , S + n,ε and respectively W • n = I p , (S • n,ε ) + . Under the assumptions of Theorems 2.1 and 4.1, the conditional bootstrap distribution of Q • n (W • n ) always approximates the null distribution of Q n (W n ). Hence, the asymptotic behavior of the parametric bootstrap method is the same as that of the nonparametric bootstrap test described in the paragraph after Theorem 3.1. In particular, by Theorems 2.1 and 4.1, under their assumptions, we conclude that for any underlying μ ∈ R p , as n → ∞. 2. Similar (empirically obtained) properties as for the nonparametric bootstrap tests based on {2}-inverses (given in the second paragraph after Theorem 3.1) are also true for the parametric ones.

Simulation studies
In this section, we check the finite sample performance of bootstrap methods considered in the previous sections. We also compare them with the asymptotic testing procedures of Francq (2010, 2015).

Experimental setup
We consider the test for mean vector of a multivariate distribution with singular dispersion matrix. For given n, mean vector μ, and covariance matrix , we first generate a multivariate sample as X i = μ + 1/2 e i , i = 1, . . . , n. We independently generate the p entries of the error terms e i using four schemes: (1) from the N(0, 1) distribution, (2) from the Laplace distribution, (3) from the χ 2 20 distribution and (4) from the log-normal distribution, so that we always have E(e i ) = 0 p and Cov(e i ) = I p . This means that we generate the multivariate normal or nonnormal sample X 1 , . . . , X n with the given mean vector μ and covariance matrix . We consider H 0 : μ = 0 p , p = 6, Z n = √ nX n , n = S n (the sample covariance matrix) and the covariance matrices = (i) , i = 1, 2, 3 where (1) = diag(10, 10, 1, 1, 0, 0), and (2) (resp. (3) ) is constructed as follows. First, we generate six independent observations from N 4 (0 4 , CS(0.6)) (resp. N 4 (0 4 , AR(0.7))) where CS(ρ) denotes a compound symmetry matrix with covariance coefficient ρ (resp. AR(ρ)-a first-order autoregressive covariance matrix with autocorrelation coefficient ρ). From those observations, we form the 6 × 4 matrix which is of full column rank. Next, we add to this matrix two columns equal to the first two columns of it. Hereby to obtain the symmetric matrix (2) (resp. (3) ) from the underlying singular matrix, the covariance matrix is calculated. Hence, we have r = rank( ) = 4.
The numbers of observations and the non-null mean vectors under the alternative hypotheses are specified in Tables 1-14 in the Supplementary Materials, where the empirical sizes and powers (the proportions of rejecting the null hypothesis) of considered tests are given. To estimate them, we used simulation consisting of 1000 runs, and recorded corresponding p-values. The p-values for the bootstrap tests were estimated from B = 1000 replications. The non-null mean vectors under the alternative belong to a space generated by the columns of the covariance matrix. Otherwise, the hypothesis is inconsistent with the data and is to be reject right away (see, e.g., Bhimasankaram and Sengupta, 1991, p. 474). In all the simulations conducted, we used α = 0.05 for simplicity.
Experiments were performed using the R language (R Core Team, 2015). We use the imhof function available in the R package CompQuadForm (Duchesne and Lafaye De Micheaux, 2010) to calculate p-values of the test based on Q n (I p ) using Imhof 's algorithm.
To compute {2}-inverses we use the algorithm of Francq (2010, 2015). The basis B and tolerance ε are chosen as in Section 2.

Discussion of the simulation results
Tables 1-4 and 9-14 in the Supplementary Materials contain the simulation results for the cases, when the number of observations is small and close to the number of parameters. Empirical sizes for greater number of observations are given in Tables 5-8  The test based on Q n (I p ) is slightly liberal for (1) and more liberal for more complicated covariance matrices, i.e., for (2) and (3) . The nonparametric bootstrap version of this test may be even more liberal (except possibly for n = 20). Moreover, it tends to highly over-reject the null hypothesis in the case of n very close to p, e.g., n = p + 1. The empirical sizes of the parametric bootstrap test are usually very similar to those of the asymptotic Imhof-based testing procedure.
The tests based on {2}-inverses with small k (k = 1 in our experiments) have conservative character for (1) while for (2) and (3) , they result in slightly liberal decisions. The bootstrap versions of these testing procedures improve maintaining of the preassigned type I error, but they are still quite conservative for (1) . Similarly as for the Imhof-based test, the nonparametric bootstrap test is highly liberal for n = p + 1.
We observe that the asymptotic tests based on Q n (S − k n,B,ε ) with k close to r − 1 (k = 2, 3 in our simulations) tend to highly over-reject the null hypothesis in almost all situations under consideration. Except n = p + 1, the nonparametric bootstrap method control the type I error rate, but it may be conservative. The parametric bootstrap procedure performs even better, but it may be a little conservative for n = p + 1.
The asymptotic test based on Q n (S + n,ε ) is doubtless the most liberal of all testing procedures under consideration. On the other hand, its nonparametric bootstrap version is very conservative in most scenarios, while parametric bootstrap one demonstrates quite accurate control of the nominal type I error level in all investigated situations.
Under log-normal distributed errors ( with k close to r − 1, Q n (S + n,ε ) and Q • n ((S • n,ε ) + ) do not keep the preassigned type I error. The testing procedures based on {2}inverses with small k behave similarly as under the other distributions of errors, but they are usually more conservative or liberal for (1) and (i) , i = 2, 3, respectively. Moreover, the nonparametric bootstrap method seems to perform better than the parametric bootstrap one (except the case of n = p + 1). The tests based on Q * n ((S * n ) − k B,ε ) with k close to r − 1 are usually only slightly liberal for n = 10, 14, 20. Unfortunately, they are much more liberal for n = 7. On the other hand, the nonparametric bootstrap test based on the Moore-Penrose inverse is slightly liberal in the case of n = p + 1 and more or less conservative in the other situations.
To sum up, the bootstrap methods seem to not improve the finite sample behavior of the Imhof-based test under the null hypothesis. For the other testing procedures, they often result in better retaining the preassigned type I error. The nonparametric bootstrap tests show tendency of conservativity, except for n very close to p where they are too liberal. The parametric bootstrap procedures seem to be more accurate in maintaining the nominal type I error level, except the case of extremely skewed distribution like log-normal one.
We also tried larger values of n to observe which of them warranty that the empirical sizes of the tests belong to the usual significance interval [3.65, 6.35] (see Francq, 2010, 2015). The results of simulations are given in Tables 5-8 in the Supplementary Materials. We observe that the behavior of the empirical sizes of the tests depends on n, but also on the dimension and on the amount of skewness. From our simulation results, we obtain the following conclusions: except extremely skewed distributions, the tests based on W n = I p and W n = (S n ) − k B,ε with small k, except possibly the asymptotic ones, perform adequately in scenarios with sample sizes greater than or equal to 30. The asymptotic testing procedures based on {2}-inverses with k close to r − 1 and the Moore-Penrose inverse may highly over-reject the null hypothesis even with large number of observations. Conservativity of their nonparametric bootstrap versions decreases with increasing n, but its behavior depends on the distribution. Notice that the Q * n ((S * n,ε ) + ) method performs quite well under log-normal distribution for n 20. The empirical sizes of the parametric bootstrap test based on Moore-Penrose inverse belong to the usual significance interval even for very small-sample sizes, except for log-normal distribution where they need a lot of observations for that. Similar situation can be observed for the parametric bootstrap tests based on {2}-inverses with k close to r − 1, but they may need more (resp. less) data in non-log-normal (resp. log-normal) case.
In order to assure that the bootstrap procedures controlled the error of the first kind, our power simulation results are obtained under Laplace and χ 2 20 distributions of error terms and n = 20. The empirical powers of the considered tests are given in Tables 9-14 in the Supplementary Materials. Since the asymptotic tests based on Q n (S − 2 n,B,ε ), Q n (S − 3 n,B,ε ), and Q n (S + n,ε ) are usually too liberal, their empirical powers are not really comparable. However, they are included for illustration and completeness.
The empirical powers of the three tests based on W n = I p are very similar. This is not surprising, since their empirical sizes are also quite similar. So, the bootstrap methods do not appear to provide improvement of the power of the Imhof-based test. For = (1) , the bootstrap procedures based on {2}-inverses with small k are more powerful than the asymptotic tests, and the empirical powers of nonparametric bootstrap are slightly greater than these of the parametric one. However when = (i) , i = 2, 3, we observe opposite behavior. The parametric bootstrap tests based on {2}-inverses with k close to r and on the Moore-Penrose inverse are more powerful than the nonparametric ones. This is also dictated by the finite sample behavior of these testing procedures under the null. Unfortunately, for n = 20, we cannot correctly compare the power of the asymptotic tests based on Q n (S − 2 n,B,ε ), Q n (S − 3 n,B,ε ), and Q n (S + n,ε ) with their bootstrap versions. However, we notice that our additional simulation (not shown) indicates that when n is large enough to keep the preassigned type I error by these asymptotic tests (e.g., n = 150), the empirical powers of them and the bootstrap tests are very similar.
The empirical powers of the tests behave very similar and even are not very different under Laplace and χ 2 20 distributions of errors. However, they strongly depend on the alternative. More precisely, they depend on whether or not μ belongs to a specific subspace of V ({λ 1 , . . . , λ r }), where λ 1 ≥ · · · ≥ λ r are the nonzero eigenvalues of . The behavior of the empirical power of the asymptotic tests is discussed by Francq (2010, 2015). From our simulation results, it follows that the nonparametric and parametric bootstrap procedures present similar behavior. Below, we discuss it for the nonparametric bootstrap tests. The conclusions for parametric bootstrap method are very similar. Assume that μ ∈ V ( ), where is a subset of {λ 1 , . . . , λ r } containing λ i and λ j , 1 ≤ i ≤ j ≤ r, as the largest and the smallest element, respectively. Moreover, denote −r = + for simplicity.
Then, the test based on Q * n ((S * n ) − j B,ε ) is usually the most powerful when λ j > λ j+1 . However, when the eigenvalue λ j is multiple and λ j = · · · = λ l > λ l+1 for some l > j, the test based on Q * n ((S * n ) − l B,ε ) seems to have the best power (see Tables 9 and 12 in the Supplementary Materials). If λ i−1 > λ i , the bootstrap tests based on {2}-inverses with k = 1, . . . , i − 1 have no power, i.e., their empirical powers are close to their empirical sizes. On the other hand, when λ h−1 > λ h = · · · = λ i for certain h < i, the bootstrap tests based on {2}-inverses with k = h, . . . , i − 1 offer some power, but it is quite low in comparison to the other tests. Low power may be also observed for k = i, . . . , j − 1. The empirical powers of the Imhof-based bootstrap test are very high and even slightly greater than the largest of those for Q * n ((S * n ) − k B,ε ) for k = 1, . . . , m, when λ 1 = · · · = λ m > λ m+1 and μ ∈ V ({λ 1 , . . . , λ m }). However, for μ not belonging to V ({λ 1 , . . . , λ m }), this test usually has low power, especially for more complicated covariance matrices (i.e., = (i) , i = 2, 3 in our simulations).

Concluding remarks
In general framework, the Wald-type statistic may be an efficient procedure. However, the tests based on it and its asymptotic distribution may not keep the preassigned type I error for smaller sample sizes. This drawback may be fixed by applying some resampling approach. In the present article, we have considered such a situation. The proposed nonparametric and parametric bootstrap methods provide an improvement to the testing procedures considered by Francq (2010, 2015) for small-sample size case. Both testing procedures are shown (theoretically or empirically) to be asymptotically valid in the problem of testing for the mean vector of a multivariate distribution. However, our simulations suggest that the parametric bootstrap approach seems to yield the most accurate performance, except for the extremely skewed log-normal distribution where it is usually highly liberal. In the case of this distribution, the nonparametric bootstrap testing procedures based on {2}-inverses or the Moore-Penrose inverse perform better than the parametric ones, but they do not result in very satisfactorily accurate test decisions. The different resampling methods might be more appropriate in the skewed distribution case.
In this article, we have focused on the problem of testing for the mean vector of a multivariate distribution. For the other particular problems of general framework considered by Francq (2010, 2015) and presented in Section 1, further investigation or extension of the bootstrap methods may be needed.