Preliminary Multiple-Test Estimation, With Applications to k-Sample Covariance Estimation

Abstract Multisample covariance estimation—that is, estimation of the covariance matrices associated with k distinct populations—is a classical problem in multivariate statistics. A common solution is to base estimation on the outcome of a test that these covariance matrices show some given pattern. Such a preliminary test may, for example, investigate whether or not the various covariance matrices are equal to each other (test of homogeneity), or whether or not they have common eigenvectors (test of common principal components), etc. Since it is usually unclear what the possible pattern might be, it is natural to consider a collection of such patterns, leading to a collection of preliminary tests, and to base estimation on the outcome of such a multiple testing rule. In the present work, we therefore study preliminary test estimation based on multiple tests. Since this is of interest also outside k-sample covariance estimation, we do so in a very general framework where it is only assumed that the sequence of models at hand is locally asymptotically normal. In this general setup, we define the proposed estimators and derive their asymptotic properties. We come back to k-sample covariance estimation to illustrate the asymptotic and finite-sample behaviors of our estimators. Finally, we treat a real data example that allows us to show their practical relevance in a supervised classification framework.


Introduction
The present article is motivated by the problem of estimating the covariance matrices 1 , . . . , k associated with k distinct p-dimensional populations. This is a very classical point estimation problem in multivariate analysis. It is, for example, of paramount importance when building discriminant analysis rules or when performing MANOVA or MANOCOVA. When 1 , . . . , k are unconstrained, this multisample problem of course reduces to a collection of k separate estimation problems. In many applications, however, it is assumed or suspected that there is some link between the various covariance matrices. In line with this, Boente and Orellana (2004) and Jensen and Madsen (2004) considered k-sample covariance estimation under the assumption of proportionality, that specifies that 1 , . . . , k are equal to a common covariance matrix up to group-specific scalar factors. Flury (1984) tackled the same estimation problem in the common principal components (CPC) model, under which 1 , . . . , k share the same eigenvectors. Many later works focused on this CPC model: Flury (1986) derived what is now the textbook Gaussian asymptotic theory; Boente, Pires, and Rodrigues (2002) defined robust estimation procedures for this model, whereas Hallin, Paindaveine, and Verdebout (2014) focused on rank-based estimation; Browne and McNicholas (2014) considered the problem in high dimensions, while functional extensions were proposed in Benko, Härdle, and Kneip (2009 Prior to performing k-sample covariance estimation under some specific assumption (proportionality, CPC, etc.), it is of course natural to first perform a test to investigate whether or not the data are compatible with that particular assumption. If one assumes that the k covariance matrices are equal to each other, then one should accordingly perform a test of homogeneity, among those from Schott (2001) or Hallin and Paindaveine (2009), etc. For tests of proportionality, one may refer to Liu et al. (2014), Tsukuda and Matsuura (2019) and the references therein, while tests for the CPC structure were proposed in Flury (1986), Schott (1991), Boente, Pires, andRodrigues (2009), and, to cite only a few.
Estimation is then based on the decision of such a preliminary test, as we now illustrate by considering the assumption of homogeneity. If φ cov is a test of the null hypothesis of homogeneity H cov 0 : 1 = · · · = k , then, writing I[A] for the indicator function of A, the resulting natural estimator of ( 1 , . . . , k ) is where theˆ 1 , . . . ,ˆ k are "unconstrained" estimators of 1 , . . . , k and whereˆ is an estimator of the common value of the 's under the null hypothesis of homogeneity; as usual, φ cov = 1 (resp., φ cov = 0) indicates rejection (resp., nonrejection). The estimator in (1) is a preliminary test estimator (PTE) in the sense of Saleh (2006); we refer to Maeyama, Tamaki, and Taniguchi (2011) and Paindaveine, Rasoafaraniaina, andVerdebout (2017, 2021) for recent contributions on such estimators. PTEs typically achieve a good compromise between (ˆ 1 , . . . ,ˆ k ) and (ˆ , . . . ,ˆ ) in the vicinity of H cov 0 and, provided that φ cov is a consistent test, PTEs are also asymptotically equivalent to the classical estimator (ˆ 1 , . . . ,ˆ k ) "away from the null hypothesis H cov 0 , " that is, for fixed parameter values that do not satisfy H cov 0 . Motivation for the present work lies in the fact that, in many situations, multiple constraints may be considered. To provide an example in the above k-sample covariance estimation framework, let us factorize the k covariance matrices as = σ 2 V := (det ) 1/p { /(det ) 1/p } to emphasize their "scale" σ and their "shape" V . With this notation, one may consider the constraints associated with the null hypotheses of scale homogeneity H scale 0 : σ 2 1 = · · · = σ 2 k and of shape homogeneity H shape 0 : V 1 = · · · = V k (note that H shape 0 coincides with the null hypothesis of proportionality). If φ scale and φ shape are tests for these null hypotheses, then a natural estimator is the preliminary multiple-test estimator (PMTE) where theσ 2 's andV 's are unconstrained estimators (these are the scale and shape of the unconstrained estimatorsˆ ),σ 2 is an estimator of the common value of the σ 2 's under H scale 0 , and whereV, similarly, is an estimator of the common value of the V 's under H shape 0 . The estimator in (2) is a PTE that involves two constraints, whose intersection is associated with the null hypothesis H cov 0 of homogeneity of the k covariance matrices. Obviously, more than two constraints may be considered. For instance, factorizing further the covariance matrices into = σ 2 V = σ 2 O O based on the usual spectral decomposition of the shape matrix V (here, O is an orthogonal matrix and is a diagonal matrix collecting the eigenvalues of V on the diagonal), one may consider the constraints associated with the null hypothesis of scale homogeneity H scale 0 , the null hypothesis of shape eigenvalue homogeneity H eig 0 : 1 = · · · = k , and the CPC null hypothesis H CPC 0 : O 1 = · · · = O k . Combining the outcomes of tests for the three null hypotheses allows one to define a three constraints PMTE of the same nature as in (2). Such an estimator actually formalizes the estimator practitioners would use in practice in the present k-sample covariance estimation setup. It remains unclear, however, how this estimator behaves since, to the best of our knowledge, such PMTEs have not been considered in the literature.
The main objective of the present work is therefore to introduce and to study PMTEs, and show their practical relevance. The outline of the article is as follows. We first introduce PMTEs in a general context (Section 2). Then, we derive the asymptotic behavior of such estimators in the common framework of locally asymptotically normal (LAN) models (Section 3). In particular, we show that away from all constraints, there is no asymptotic loss to consider PMTEs rather than standard PTEs, since both types of estimators then turn out to be asymptotically equivalent to the unconstrained estimator. As we show, however, PMTEs dominate their competitors in the vicinity of the considered constraints. To demonstrate the practical relevance of PMTEs, we mainly focus on the multisample covariance estimation problem that motivated this work (Section 4). We first derive the various estimators in this context (Section 4.1), then compare them theoretically and empirically, respectively, through the computation of asymptotic efficiencies and through simulations (Section 4.2). Last, we illustrate our methodology in a real data example involving different species of voles (Section 4.3). We conclude with final comments (Section 5). In the supplementary materials, we perform finite-sample comparisons with estimators resulting from a BIC-based model selection (Section A) and provide the technical proofs (Section B).

Preliminary Multiple-Test Estimators (PMTEs)
Consider a model that is indexed by a parameter θ = (θ 1 , . . . , θ d ) ∈ ⊂ R d and assume that, in line with the situation considered in the introduction, m possible constraints on θ are suspected to hold. For the sake of clarity, we first introduce PMTEs in the case involving m = 2 constraints only. Since we will actually restrict throughout to linear constraints on θ , these two constraints take the form θ ∈ M(ϒ j ), j = 1, 2, for some d × r j matrices ϒ j (r j < d), where M(A) denotes the vector subspace of R d that is spanned by the columns of A (without any loss of generality, we will assume in the sequel that the ϒ j 's have full rank). To make the notation lighter, we will throughout tacitly restrict to values of θ that belong to , which allows us to write θ ∈ M(ϒ j ) instead of θ ∈ ∩M(ϒ j ), or θ ∈ R d \ M(ϒ j ) instead of θ ∈ \ M(ϒ j ), etc. For d = 2, the two constraints are vectorial lines that are, respectively, spanned by the d-vectors ϒ j , j = 1, 2. In this framework, we assume, for each j = 1, 2, that a test φ j for the null hypothesis H j0 : θ ∈ M(ϒ j ) is available. The outcome of these tests is coded as φ := (φ 1 , φ 2 ) ∈ {0, 1} 2 ; as in the introduction, φ j = 1 (resp., φ j = 0) indicates that φ j leads to rejection (resp., nonrejection) of H j0 . If φ = (1, 1), then it is natural to adopt an unconstrained estimatorθ U (taking values in R d ), whereas other values of φ would lead to considering various constrained estimators of θ , namely a constrained estimatorθ (0,1) Summing up, this leads to the PMTÊ which is obtained by taking into account the 2 m = 4 possible (joint) outcomes of the tests φ 1 and φ 2 . We now discuss the general case involving an arbitrary number m of linear constraints. To do so, let ϒ j , j = 1, . . . , m, be fullrank d×r j (r j < d) matrices such that the jth constraint takes the form H j0 : θ ∈ M(ϒ j ) and let φ j be a test for the corresponding null hypothesis. Any element c = (c 1 , . . . , c m ) ∈ C m := {0, 1} m may be used to indicate which constraints are satisfied by a given parameter value θ : letting J c = {j = 1, . . . , m : c j = 0}, θ ∈ ∩ j∈J c M(ϒ j ) means that θ meets the constraints indexed by J c but not the other ones. Any c ∈ C m also corresponds to a possible decision for the m-tuple of tests φ = (φ 1 , . . . , φ m ). With this notation, the resulting PMTE is then whereθ c is a constrained estimator taking values in ∩ j∈J c M(ϒ j ).
In the sequel, it will be convenient to fix, for any c ∈ C m , a fullrank matrix ϒ c such that M(ϒ c ) = ∩ j∈J c M(ϒ j ). Note that for c = 1 m −e j , where 1 m is the m-vector of ones and e j is the jth vector of the canonical basis of R m , we may simply take ϒ c = ϒ j (so thatθ c is a constrained estimator based on the jth constraint only), whereas for c = 1 m , then we may take ϒ c = I d (so thatθ c = θ U is an unconstrained estimator), where I stands for the -dimensional identity matrix. Clearly, for m = 2, the PMTE in (4) reduces to the one in (3).
under P (n) θ (here, ν −1 n and (n) θ are as in Assumption (A)); (ii) the test φ j rejects the null hypothesis H j0 : θ ∈ M(ϒ j ) at asymptotic level α when Q Assumption (B) might look restrictive but it is actually extremely mild: provided that Assumption (A) holds, it indeed only requires the existence of an unconstrained estimatorθ U admitting a Bahadur-type representation. To see this, assume for the sake of simplicity that ν n is as usual given by ν n = n −1/2 I d (extension to a general ν n is trivial), and that, for any θ ∈ , the estimatorθ U satisfies the Bahadur representation . . , n, are mutually independent and share a common distribution that has mean zero and has finite second-order moments. Since the CLT for triangular arrays entails that S (n) (6) under the usual mild Lévy-Lindeberg condition, this already ensures that Assumption (B)(i) holds for c = 1 m , with B 1 m ,θ : θ , which shows that Assumption (B)(i) is fulfilled. As for Assumption (B)(ii), it will be satisfied by Wald tests for H j0 : θ ∈ M(ϒ j ) constructed in the usual way from (7). This confirms that the only key point in Assumption (B) is the existence of an unconstrained estimatorθ U satisfying (7). In virtually all models, M-, L-, and R-estimation will provide an unconstrained estimator of this type, so that Assumption (B) is indeed extremely mild.
In the LAN framework of Assumption (A), one may alternatively want to rely on (constrained) asymptotically efficient estimation. The resulting estimators of θ will satisfy Assump- since, for the constraint associated with c ∈ C m and any corresponding θ ∈ M(ϒ c ), the constrained asymptotically efficient estimatorθ c is such that see, for example, Chapter 5 of Ley and Verdebout (2017).
Note that for locally and asymptotically discrete estimatorsθ c , (8) holds in particular when the LAN property in Assumption (A) is reinforced into a ULAN (uniform local asymptotic normality) one; see, for example, Chapter 5 of Ley and Verdebout (2017) for details. These considerations explain that Assumptions (A)-(B) will hold for a wide variety of models and corresponding estimators, including efficient estimators for the large class of Gaussian processes considered in Dahlhaus (1989), efficient estimators of regression models with long memory disturbances (Hallin et al. 1999), adaptive estimators in semiparametric ARMA, ARCH or TAR time series models (Drost, Klaassen, and Werker 1997), Mestimators (Lopuhaa 1992;Paindaveine and Van Bever 2014) and R-estimators (Hallin, Oja, and Paindaveine 2006;Hallin, Paindaveine, and Verdebout 2014) of scatter in elliptical models, efficient estimators and R-estimators of location in rotationally symmetric models for directional data (Ley et al. 2013), to cite only a few.

Asymptotic Results
We now study the asymptotic behavior of the PMTE estimator in (4). Fix c 0 ∈ C m and consider a parameter value θ that is such that θ / ∈ M(ϒ j ) for any j such that c 0j = 1. Using the notation introduced in Section 2, this rewrites θ ∈ R d \ ∪ j / ∈J c 0 M(ϒ j ). Consider then the oracle PMTE,θ PMTE,c 0 say, that would be the natural one to consider if it would be known , that is, the PMTE that does not involve tests of the constraints that are known not to be met.
Letting C m c 0 := {c ∈ C m : c j = 1 for any j / ∈ J c 0 }, this oracle PMTE is given bŷ where φ j = c 0 c means that φ j = c j for any j ∈ J c 0 (so that, as intended, this oracle PMTE does not involve the tests for the constraints that are known not to be met). We then have the following result.
This result, that interestingly only requires consistency of the tests φ j , j = 1, . . . , m, shows that the proposed PMTE is asymptotically equivalent in probability to the oracle PMTE constructed on the basis of the constraints associated with c 0 as soon as the true value of the parameter θ is fixed outside these constraints. Note that taking c 0 = 1 m readily yields the following corollary.
It directly follows from this result that, away from the constraints, there is no cost, asymptotically, to use the proposed PMTE rather than its unconstrained antecedentθ U . Finally we stress that it is only in the very particular case #J c 0 = 1 that the oracle PMTE of Theorem 3.1 is a standard single-constraint PTE, with a known asymptotic behavior; consequently, in cases where #J c 0 > 1, Theorem 3.1 on its own does not allow one to deduce the asymptotic behavior ofθ PMTE from the singleconstraint theory.
We thus turn to the study of this asymptotic behavior in the general case. We will actually derive the asymptotic distribution ofθ PMTE in two types of asymptotic scenarios, namely under fixed parameter values meeting (at least) one constraint or under sequences of local perturbations of such fixed parameter values.
To be more precise, fix again c 0 ∈ C m \ {1 m } and consider now a parameter value θ that meets the corresponding constraints; in other words, θ ∈ M(ϒ c 0 ). The aforementioned local perturbations are then of the form θ n = θ + ν n τ n , where τ n is a bounded sequence in R d . Since we do not exclude the case τ n ≡ 0, this actually also covers the case for which θ ∈ M(ϒ c 0 ) is fixed. To describe the asymptotic distribution ofθ PMTE in these asymptotic scenarios, we need to introduce the following notation. Recalling that J c 0 = {j = 1, . . . , m : c 0j = 0}, we will denote the (ordered) elements of J c 0 as j 1 , . . . , j s , and the corresponding 2 s elements of C m c 0 = {c ∈ C m : c j = 1 for any j / ∈ J c 0 } as c 1 , . . . , c 2 s (the ordering is here arbitrary but fixed). Based on this notation, let further where (D 1 , . . . , D s , ) , takes values in R sd and where, for (note that only the tests φ j (D), = 1, . . . , s, are involved in W(D)). We then have the following result.
Let R and D c 0 be random vectors whose joint distribution is described as In this result, W R, conditional on D c 0 , is asymptotically normal with mean vector and covariance matrix with P(A) := A(A A) − A (throughout, A − denotes the Moore-Penrose inverse of A). This allows us to obtain the unconditional asymptotic distribution of ν −1 n (θ PMTE − θ n ) under P (n) θ n : indeed, Theorem 3.2 implies that D c 0 is asymptotically normal with mean vector μ D c 0 := G θ τ and covariance where φ μ, stands for the density of the d-variate normal distribution with mean vector μ and covariance matrix . Comparing competing estimators will not be done on the basis of (13) but rather on asymptotic mean square errors (AMSEs). We define the AMSE ofθ PMTE under P (n)  (to keep the notation light, we do not stress the dependence of these quantities on c 0 ), where D c 0 is multinormal with mean vector μ D c 0 = G θ τ and covariance matrix D c 0 = G θ G as above. We then have the following result. θ n is given by Note that for τ = 0, (14) reduces to While the expressions (14)-(15) are quite complex, they allow one to compare theoretically the proposed PMTE with competing estimators, and in particular with single-constraint PTEs. In the next section, we illustrate this in the k-sample covariance estimation framework described in the introduction.

PMTE Based on Scale and Shape Constraints
Consider k(≥ 2) mutually independent samples of random pvectors X 1 , . . . , X n , = 1, . . . , k, with respective sample sizes n 1 , . . . , n k , such that, for any , the X i 's form a random sample from the p-variate normal distribution with mean vector 0 and nonsingular covariance matrix (due to the block-diagonality of the Fisher information matrix for location and scatter in elliptical models, all results below can easily be extended to the case where observations in the th sample would have a common, unspecified, mean μ , = 1, . . . , k; see, e.g., Paindaveine 2006, 2009). As explained in the introduction, the covariance matrices can be reparameterized into = σ 2 V , where σ := (det ) 1/(2p) is their "scale" and V := /(det ) 1/p is their "shape. " Under the only assumption that λ := λ (n) := n /n := n /( k r=1 n r ) converges in (0, 1) for any (to make the notation lighter, we will not stress the dependence in n in many quantities below), it follows from Hallin and Paindaveine (2009) that the sequence of Gaussian models indexed by where vech V =: (V 11 , ( • vech V) ) is the vector stacking the upper-triangular elements of V, is ULAN; note that since det V = 1, the upper-left entry V 11 of V can be obtained from • vech V (a vector with dimension b p := p(p + 1)/2 − 1), which explains that the upper-left entries of the various shape matrices do not enter the parameterization in (16). The dimension of θ is thus d := k(b p + 1).
To provide more details, we need the following notation: denoting as e r the rth vector of the canonical basis of R p and by ⊗ the Kronecker product, we let K p := p r,s=1 (e r e s ) ⊗ (e s e r ) be the p 2 × p 2 commutation matrix, whereas the nonsingular information matrix takes the form (A 1 , . . . , A m ) is the block-diagonal matrix with diagonal blocks A 1 , . . . , A m ). The corresponding matrices ν n in Assumption (A) are then given by ν n = n −1/2 r n := n −1/2 diag λ We consider here the estimation of 1 , . . . , k or, equivalently, the estimation of θ in (16). An advantage of the θparameterization is that it allows addressing situations in which one would suspect scale homogeneity H scale 0 : σ 2 1 = · · · = σ 2 k , shape homogeneity H shape 0 : V 1 = · · · = V k , or (the intersection between scale and shape homogeneity:) covariance homogeneity H cov 0 : σ 2 1 V 1 = · · · = σ 2 k V k , that is, H cov 0 : 1 = · · · = k . In the present Gaussian model, an asymptotically efficient unconstrained estimator of θ iŝ Writing S := n −1 k =1 n i=1 X i X i for the pooled covariance matrix estimator (with respect to the fixed locations μ 1 = · · · = μ k = 0), asymptotically efficient constrained estimators, for the three constraints H scale 0 , H shape 0 and H cov 0 above, arê : θ ∈ M(ϒ shape ), withϒ shape := diag(I k , 1 k ⊗ I b p ), and H cov 0 : θ ∈ M(ϒ cov ), withϒ cov := diag(1 k , 1 k ⊗ I b p ). Now, if the d × r matrix ϒ stands for either ϒ scale , ϒ shape , or ϒ cov (of course, each constraint matrix has its own r), the locally asymptotically most stringent test φ ϒ for H 0 : θ ∈ M(ϒ) rejects the null hypothesis at asymptotic level α when whereθ =θ ϒ is a constrained estimator (for the three constraints considered, these are the estimators in (18)-(20)). On the basis of these various tests, the PTEs involving a single constraint arê whereas the PMTE proposed in this work is given bŷ In the next section, we investigate how this PMTE compares with its single-constraint PTE competitors in (22)-(24), both asymptotically and in finite samples.

Comparing PMTE With Single-Constraint PTEs
In the vicinity of the scale homogeneity constraint and away from the shape homogeneity constraint,θ PMTE is asymptotically equivalent toθ scale PTE (Theorem 3.1). Similarly, in the vicinity of the shape homogeneity constraint and away from the scale homogeneity constraint,θ PMTE is asymptotically equivalent tô θ shape PTE . In both cases, thus, the asymptotic properties ofθ PMTE can be deduced from those of a single-constraint PTE, so that the AMSEs of this estimator can then be obtained from Theorem 2 in Paindaveine, Rasoafaraniaina, and Verdebout (2021). Recall from Section 3.2, however, that it is only when #J c 0 = 1 that the asymptotic behavior of the proposed PMTE can be obtained from the single-constraint theory. In the general case #J c 0 ≥ 1, the asymptotic results from Section 3.2 are the only ones that allow us to grasp the asymptotic behavior of our PMTE.
In the present situation involving m = 2 constraints, a point that is not covered by the single-constraint theory is the comparison betweenθ PMTE andθ cov PTE in the vicinity of covariance homogeneity, that is, close to the null hypothesis H cov 0 . There, usingθ PMTE rather thanθ cov PTE should intuitively have a cost, as the test for H cov 0 : θ ∈ M(ϒ cov ) is not used when definingθ PMTE . We now evaluate this cost by comparing the asymptotic performances of both estimators, measured by the corresponding AMSEs. Since these AMSEs are matrix-valued, one needs to base this comparison on a scalar summary, such as, for example, the trace of the AMSEs. In the present setup where asymptotically efficient estimators are used, the benchmark unconstrained estimator satisfies AMSE θ ,τ (θ U ) = −1 θ , which makes it natural to consider the scalar summary AMSE scalar θ,τ (θ ) := tr which, irrespective of θ , yields AMSE scalar θ,τ (θ U ) = d for the benchmark estimator. Under the considered covariance homogeneity constraint, we then have the following result.
where stands for the cumulative distribution function of the χ 2 distribution.
It follows from Paindaveine, Rasoafaraniaina, and Verdebout (2021) that, for any θ ∈ M(ϒ cov ), which allows for a direct comparison with the AMSE in (27). More precisely, Figure 1 plots the asymptotic relative efficiency as a function of the dimension p and of the number k of populations (here, the nominal level of all preliminary tests is fixed at α = 5%). Clearly, irrespective of p and k, the loss that results from usingθ PMTE rather thanθ cov PTE under covariance homogeneity is extremely small (the minimal ARE, which is obtained for p = 2 and k = 13, is about 0.928). Remarkably, this loss actually converges to zero as p or k diverges to infinity.
We turn to Monte Carlo exercises that aim at comparing the finite-sample performances of the proposed PMTE with those of its single-constraint PTE competitors. Throughout, these exercises focus on two populations (k = 2) that are two-dimensional (p = 2) and balanced (n 1 = n 2 ). We considered two scenarios. In the first one, we generated, for each ξ in {0, 1, . . . , 10} and for each value of the common sample size n 1 = n 2 in {100, 400, 1000}, a collection of M = 10,000 samples of mutually independent observations X 11 , . . . , X 1n 1 , X 21,ξ , . . . , X 2n 2 ,ξ , where the X 1i 's are N (0, 1 ) and the X 2i,ξ 's are N (0, 2,ξ ), with 1 = I 2 and 2,ξ = σ 2 2,ξ V 2,ξ based on The value ξ = 0 provides covariance homogeneity, hence also scale and shape homogeneity, whereas ξ = 1, . . . , 10 provide both increasingly distinct scales and increasingly distinct shapes. For any estimatorθ of the corresponding parameter value θ , the finite-sample performance ofθ can be measured through the scalar quantity with whereθ (m) denotes the value ofθ in the mth replication. The left panels of Figure 2 then plot MSE scalar θ , (θ ) as a function of ξ , for the PTEθ shape PTE in (23) based on the single shape homogeneity constraint, for the PTEθ cov PTE in (24) based on the single covariance homogeneity constraint, for the proposed PMTEθ PMTE in (25) based on the shape and scale homogeneity constraints, and for their unconstrained antecedentθ U (all preliminary tests were performed at asymptotic level α = 5%). In the present setup involving deviations from covariance homogeneity, the estimatorθ cov PTE is an oracle one, that is expected to outperform its competitors. Remarkably, the results show that the multipleconstraint estimatorθ PMTE show virtually the same performances asθ cov PTE . Under covariance homogeneity (ξ = 0), this is in line with our theoretical results above, as Figure 2 indeed confirms the very close values AMSE scalar θ,0 (θ PMTE ) = 3.679 and AMSE scalar θ,0 (θ cov PTE ) = 3.500 that result from (27) and (28), respectively.
We repeated the exercise above in a second scenario, that is obtained from the first one by replacing σ 2 2,ξ in (29) with σ 2 2,ξ = 2 for any ξ . Irrespective of ξ , thus, this new scenario stays away from scale homogeneity (hence also from covariance homogeneity), whereas ξ = 1, . . . , 10 provide increasingly severe departures from the shape homogeneity situation obtained for ξ = 0. The right panels of Figure 2 show the resulting values of MSE scalar θ , (θ ) for the same four estimators as above. The results clearly support Theorem 3.1, that states thatθ PMTE andθ shape PTE are asymptotically equivalent, as one cannot discriminate between the MSE curves of these estimators (further simulations revealed that one needs to consider sample sizes as small as n 1 = n 2 = 100 to see a tiny difference in these MSE curves). Overall, thus,θ PMTE dominatesθ shape PTE since the former was performing better than the latter in the first scenario. Incidentally, note that, in this second scenario,θ cov PTE andθ U behave very similarly, which is reasonable since all values of ξ considered provide a setup that is far from covariance homogeneity (Corollary 3.1).

CPC and Homogeneity of Eigenvalues: A Real Data Example
Studies of Microtus population biology have attracted a lot of attention in the past decades; see, for example, Wallace (2006) and Conroy and Gupta (2011), and the references therein. In this real data illustration, we discuss the estimation of covariance matrices for two samples (k = 2) of different species of voles: a sample of n 1 = 43 Microtus multiplex and a sample of n 2 = 46 Microtus subterraneus. Eight measurements (p = 8) are made on each animal: (i)-(iii) the width of upper left molar 1-3, (iv) the length of incisive foramen, (v) the length of palatal bone, (vi) the Condylo incisive length or skull length, (vii) the skull height above bullae, and (viii) the skull width across rostrum; see Airoldi, Flury, and Salvioni (1996). The dataset is available in the R package Flury (data microtus).
We consider estimation of the underlying covariance matrices 1 and 2 using preliminary tests of the following three constraints (m = 3): homogeneity of scales (H a 0 ), homogeneity of the shape matrices' eigenvectors (H b 0 , described in the introduction as the CPC hypothesis), and homogeneity of the shape matrices' eigenvalues (H c 0 ). We performed the optimal Gaussian test for H a 0 from Hallin and Paindaveine (2009), the optimal Gaussian test for H b 0 from Hallin, Paindaveine, and Verdebout (2013), and the optimal Gaussian test for H c 0 we derived from the local asymptotic normality result in Hallin, Paindaveine, and Verdebout (2013). The respective p-values are 0.0121, 0.0010, and 0.2004, indicating that, for α = 5%, the PMTE based on the three constraints above would be an estimator of ( 1 , 2 ) assuming common shape eigenvalues only, whereas for α = 1%, it would be an estimator assuming both common scales and common shape eigenvalues-that is, assuming that 1 and 2 share the same eigenvalues.
As explained in Airoldi, Flury, and Salvioni (1996), Microtus multiplex and Microtus subterraneus are difficult to distinguish morphologically. Actually, it is only since Krapp (1982) and Niethammer (1982) that they are considered two distinct species-as a reaction to the vision in Ellerman and Morrison-Scott (1951). One possible way to explore the practical relevance of the PMTE above is thus to perform supervised classification. To do so, we randomly sampled 30 observations in each group and trained various classifiers on the resulting training set of size 60. The misclassification rate of each classifier was then evaluated on the basis of the test set made of the remaining 29 observations. To ensure that the results are not specific to a particular partition of the dataset into a training set and a test set, this was repeated M = 2000 times; Figure 3 provides, for each classifier, a boxplot of the resulting M misclassification rates. The considered classifiers all perform quadratic discriminant analysis (QDA) using plain sample averages as estimatesμ , = 1, 2, of the group-specific mean vectors, hence only differ through the estimatesˆ , = 1, 2, of the corresponding covariance matrices. The classical QDA procedure, that will be the benchmark, is based on the unconstrained sample covariance matri-cesˆ ,U , = 1, 2. The other classifiers are based on various preliminary (single or multiple) test estimatorsˆ , = 1, 2, using the same unconstrained estimators as in the benchmark and the constrained estimators obtained from the following estimators: lettingˆ pool := (n 1ˆ 1,U + n 2ˆ 2,U )/(n 1 + n 2 ) be the pooled (θ ) for the single-constraint PTEsθ shape PTE andθ cov PTE , the PMTEθ PMTE based on the shape and scale homogeneity constraints, and their unconstrained antecedentθ U (all preliminary tests are performed at asymptotic level α = 5%). In Scenario 1, larger values of ξ provide increasingly severe deviations-both in terms of scale and shape-from the covariance homogeneity obtained at ξ = 0, whereas, in Scenario 2, scale heterogeneity holds for any ξ and larger values of ξ provide increasingly severe deviations from the shape homogeneity obtained at ξ = 0; see Section 4.2 for details. The green and orange points in the left panels indicate the values of AMSE scalar θ,0 (θ PMTE ) and AMSE scalar θ,0 (θ cov PTE ), respectively.
covariance matrix, the common value of the scale parameter under H a 0 is estimated by (detˆ pool ) 1/p (recall that p = 8), the common eigenvectors matrix under H b 0 is estimated using the eigenvectors matrixβ pool ofˆ pool , whereas the common value of the eigenvalues matrix under H c 0 is estimated usinĝ pool /(detˆ pool ) 1/p , withˆ pool :=β poolˆ poolβpool . Figure 3 provides the boxplots of the misclassification rates (and reports the average misclassification rates) for the resulting   (32) for (in red) the MCD estimator of and (in green) the MCD-based PMTE associated with the sphericity and multivariate independence constraints (with preliminary tests performed at asymptotic level α = 5%). Here, the larger the value of ξ , the more severe the departure from the sphericity and multivariate independence constraints (that are both met at ξ = 0); see Section 5 for details.
five QDA classifiers, namely the ones based (i) on unconstrained estimators (that is, the classical QDA classifier), (ii)-(iv) on PTE estimators associated with the single constraint of homogeneity of scales (H a 0 ), homogeneity of the shape matrices' eigenvectors (H b 0 ), and homogeneity of the shape matrices' eigenvalues (H c 0 ), and (v) on the PMTE involving these three constraints. Each of the classifiers (ii)-(v) was considered in four versions, accord-ing to the nominal level α used for preliminary tests: α = 0.1%, 1%, 5%, and a value of α obtained from 6-fold crossvalidation. Clearly, the results indicate that the best classifiers are the ones based on the PMTEs and that those based on PTEs only marginally improve over the benchmark unconstrained classifier. Moreover, it is seen that cross-validation provides an effective way to choose the tuning parameter α. Figure 4 also provides the boxplots of the p-values of the tests for H a 0 , H b 0 , and H c 0 , obtained in the collection of M = 2000 training samples above. These boxplots reveal that it is not uncommon that the null hypotheses H a 0 and H c 0 both fail to be rejected, which explains that our PMTE has an edge in the present classification exercise.

Final Comments
When demonstrating the practical relevance of our PMTE in Section 4, we focused (i) on multisample covariance estimation and (ii) on constrained and unconstrained estimators that are of a Gaussian nature (more precisely, the estimators there were all Gaussian maximum likelihood estimators). As explained in Section 3, however, our methodology is very widely applicable as it merely only requires that the considered model is LAN and that unconstrained estimators of the corresponding parameters admitting Bahadur representation results are available. To showcase practical use of PMTEs in another situation and based on estimators of another nature, we consider robust one-sample covariance matrix estimation.
Testing for linear constraints on a covariance matrix has been much considered in the literature; see, for example, Zhang, Pantula, and Boos (1991) and the references therein (see also Dryden, Koloydenko, and Zhou 2009 for testing constraints with observed covariance matrices). Commonly considered constraints on are associated, for example, with the assumptions of sphericity (H sph 0 : = λI p for some λ > 0) or multivariate independence (H ind 0 : = diag( 1 , 2 ), where 1 and 2 are q × q and (p − q) × (p − q) covariance matrices, respectively). When aiming at robust estimation based on preliminary testing, it is natural to consider robust tests for these constraints, such as the  sign test for H sph 0 and the Taskinen, Kankainen, and Oja (2003) rank test for H ind 0 . Obviously, one then also needs to rely on a robust unconstrained estimator of , such as the celebrated minimum covariance determinant (MCD) estimator from Rousseeuw (1985) (which, as required, satisfies a Bahadur representation result of the form (7); see Cator and Lopuhaa 2010). To explore the performance of a corresponding robust PMTE, we generated, for each ξ in {0, 1, . . . , 5} and for each n ∈ {100, 400, 1000}, M = 10,000 independent random samples X 1,ξ , . . . , X n,ξ of size n from the four-dimensional (p = 4) multinormal distribution with mean vector zero and covariance matrix ξ = and H ind 0 , with q = 2. For this PMTE, the preliminary tests (performed at asymptotic level α = 5%) are the robust ones mentioned above, and the MCD was used to obtain the needed unconstrained and constrained estimators of (for instance, the constrained estimator of under H ind 0 is diag(ˆ 1 ,ˆ 2 ), whereˆ 1 andˆ 2 stand for the MCD estimators obtained from the first q marginals and last pq marginals, respectively).
Denoting asˆ (m) the value of a given estimator in the mth replication, Figure 4 provides, both for the MCD and for this robust PMTE, the MSE quantities with respect to the value of = ξ at hand. As Figure 4 clearly shows, this robust PMTE shows the same dominance over its unconstrained antecedent in the vicinity of the constraints (note that the constraints are met at ξ = 0) as the Gaussian ones considered in the multisample framework of Section 4. We conclude the article with a brief discussion of possible research perspectives. First, implementing the PMTE proposed in this work requires selecting the nominal level α at which to perform the preliminary tests, and it would be natural to develop methods for this choice. One way to tackle this problem is to try to adapt the "minimax regret" strategy proposed in Giles, Lieberman, and Giles (1992) using the general value of the AMSE measure in (14). In some specific contexts, though, a suitable value of α may simply be chosen through cross-validation, as we showed in the supervised classification exercise conducted in Section 4.3. Second, while we focused on asymptotic scenarios where the dimension d of the parameter remains fixed as the sample size n diverges to infinity, it would be interesting to tackle the high-dimensional case where the dimension d = d n diverges to infinity with n. This is of course quite challenging, particularly so for covariance estimation since it is well-known that covariance matrices cannot always be estimated consistently in high dimensions. Finally, another interesting venue for future research on PMTEs, that would be especially relevant in high dimensions, would be to consider asymptotics as the number of constraints m = m n increases with n. The need to resort to multiple testing corrections would then maybe be more imperious than in the fixed-m framework we considered. These research perspectives all are quite ambitious and are left for future work.

Supplementary Materials
In the supplementary materials, we perform finite sample comparisons of PMTEs with estimators resulting from a BIC-based model selection and provide the technical proofs of the results.