Shrinkage of Variance for Minimum Distance Based Tests

This paper promotes information theoretic inference in the context of minimum distance estimation. Various score test statistics differ only through the embedded estimator of the variance of estimating functions. We resort to implied probabilities provided by the constrained maximization of generalized entropy to get a more accurate variance estimator under the null. We document, both by theoretical higher order expansions and by Monte-Carlo evidence, that our improved score tests have better finite-sample size properties. The competitiveness of our non-simulation based method with respect to bootstrap is confirmed in the example of inference on covariance structures previously studied by Horowitz (1998).


INTRODUCTION
The optimal minimum distance (OMD) estimatorˆ n of a vector of p unknown parameters identified by K ≥ p constraints = g( ) is the solution of the minimization problem n = arg min ∈ n(ˆ n − g( )) V −1 n (ˆ n − g( )), where ⊂ p is the parameter space,ˆ n is a √ n-consistent asymptotically normal estimator with a positive definite asymptotic variance V and V n is any consistent estimator of V .

329
The focus of our interest in this paper is the test of a null hypothesis We study the dependence of the finite-sample properties of such a test on the choice of the asymptotic variance estimator V n . We recommend a shrinkage estimator that leads to overall superior performance of the test.
By doing so, we contribute to two strands of the econometrics literature.
First, we bring a new application of information theory in econometrics. All our shrinkage estimators are computed by using implied probabilities deduced from the minimization of some generalized entropy. While it has been known since Corcoran (1998) that the Bartlett adjustment derived by DiCiccio et al. (1991) for empirical likelihood does not work for other Cressie-Read discrepancy statistics, we are able to perform an adjustment that is similar in spirit to Bartlett adjustment for any generalized entropy function. The reason for this is the following. By setting the focus on testing of hypotheses, we do not need to bother with estimation of the unknown parameters . Our asymptotic theory of higher order improvements provided by information theoretic extensions of Generalized Method of Moments (GMM) is new because, by contrast with the extant literature (see, for example, Newey and Smith (2004), Guggenberger and Smith (2005), etc.), our higher order expansions deal with conditional distributions given that the critical value of a given test is reached.
Second, we bring some new light on alternatives to bootstrap for finite-sample improvements. We do not try to improve the critical value of a test based on a given test statistic but rather to improve the test statistic itself. The goal is to make this statistic as close as possible to the infeasible one that is based on the known value of the asymptotic variance V of the sample mean of the moment vector. In this respect, our paper can be seen as an extension of the work of Rothenberg (1988) to nonlinear settings in overidentified models.
We assume throughout that the consistent estimatorˆ n in (1) is a sample mean of some known functions of the observations. For the sake of notational simplicity, we can write without loss of generality,ˆ In other words,ˆ n can also be seen as an efficient GMM estimator associated to the moment conditions

S. CHAUDHURI AND E. RENAULT
The results of this paper could actually be partly extended to general nonseparable moment conditions While this general case is studied in a companion paper Chaudhuri and Renault (2011), we focus here on the specific conclusions that can be drawn from the particular form (X i , ) := X i − g( ) in (2), especially because, under the null hypothesis H 0 : = 0 , it makes the expected Jacobian matrix of the moment function known All the results of this paper are based on the maintained assumption that G := E[ (X i , 0 )] is known, but does not further use the specific form of (X i , ) = X i − . We maintain throughout the common assumption for asymptotic distributional theory of OMD estimators that the Jacobian matrix G (under the null) is of full column-rank p. While this Jacobian matrix is key for efficient estimation and score-type tests, its estimation is known to be an important source of poor finite-sample behavior of GMM-based inference due to a perverse correlation of its estimator with the moment function. The problem generally gets worse when the identification of 0 is not strong. By contrast, inference in the context of (2) will involve only the estimation of the asymptotic variance matrix V of the sample mean of the moment vector. For instance, the Newey and West (1987) score test of the null hypothesis = 0 will be simply based on the test statistic where¯ n := n i=1 i and i := (X i , 0 ) (for notational simplicity). Under the null, the score statistic S n will asymptotically follow a 2 (p) distribution if V n is a consistent estimator of the asymptotic variance matrix V . A common practice is to take for V n a moment-based estimator of V . For instance, in a case with observations without serial correlation, one would typically use a naive estimator like The main thesis of this paper is that, by contrast with this common practice, the size of the score test would be better controlled by using as V n an estimator of V that is asymptotically efficient under the null hypothesis. We will see that such an efficient estimation of the variance matrix V amounts to a shrinkage of the naive estimator that takes into account the information content of the moment conditions implied by the null hypothesis. The rationale for efficiently estimating V is that we try to mimic the behavior of the infeasible test statistic that would use the knowledge of the unknown asymptotic variance V , i.e., the statistic, With this goal in mind, our methodology for comparing competing tests will be twofold. On the one hand, we will provide some compelling Monte-Carlo evidence that the targeted 2 (p) distribution is better tracked by our proposed test statistics with variance shrinkage than by standard test statistics. On the other hand, we will display some rationale for this evidence through the asymptotic expansions of the distribution functions of the various test statistics. We refer to Cavanagh (1983) for technical results used for such expansions. We will typically show that we get with variance shrinkage a more accurate approximation of the distribution function of the infeasible score test statistic.
The reported Monte-Carlo illustration considers the estimation of covariance structures as commonly met in a variety of economic examples. Abowd and Card (1989) and Altonji and Segal (1996) have documented the poor finite-sample properties of OMD estimators and inference in this context. Horowitz (1998) have put forward bootstrap methods for finite-sample improvements. It is worth realizing that the shrinkage strategy studied in this paper is not aimed at replacing bootstrap. First, it proposes a simple and user-friendly way to improve finite-sample size properties of score tests, without resorting to any simulation. Second, it could well be coupled with the bootstrap methods if necessary. Our approach is actually quite close in spirit to the bootstrapping for GMM as devised by Brown and Newey (2002). As in their work, we take advantage of the probabilities implied by the moment conditions (under the null hypothesis) for a proper re-weighting of the observations at hands. While they do that for the purpose of resampling, we just do it to find the efficient estimator V n of the asymptotic variance matrix V .
The paper is organized as follows. In Section 2, we discuss the efficient estimators of the asymptotic variance matrix that are provided by a generalized maximum entropy approach to the moment conditions under the null. In Section 3, score statistics are compared through asymptotic expansions of their distribution functions. An extensive Monte-Carlo illustration is provided in Section 4 in the context of OMD inference on covariance structures. Our approach appears in many cases to be competitive with the more involved bootstrap approach. A supplemental appendix contains additional simulation results to provide further intuitions on the improvements provided by the methods proposed in this paper. Section 5 concludes. All the proofs are collected in a technical appendix.

EFFICIENT ESTIMATION OF THE VARIANCE MATRIX UNDER THE NULL HYPOTHESIS
The information theoretic approaches to inference in moment condition models have become popular in econometrics since the seminal papers by Kitamura and Stutzer (1997) and Imbens et al. (1998). The idea in the context of general moment conditions (3) is to look simultaneously for an estimatorˆ n of and for the implied probabilitiesˆ ( ) n = (ˆ ( ) i,n ) 1≤i≤n as solutions of The objective function (5) is defined for any real , including the two limit cases → 0 and → 1. The family of these functions, indexed by , is generally referred to as the Cressie-Read family of power divergence statistics (see Kitamura andStutzer, 1997, Imbens et al., 1998, and the references therein). It is known that in the case of independent and identically distributed (i.i.d.) observations X i , i = 1, n, and under standard regularity conditions, the estimatorˆ n is asymptotically efficient (and asymptotically equivalent to efficient GMM) for any value of . In case of serially dependent observations, this result can be extended by applying the above power divergence minimization to properly preaveraged moments in the spirit of Kitamura and Stutzer (1997). All what is done in the following could be extended like that to time series models but will not be stated explicitly for the sake of expositional simplicity.
It is generally believed that implied probabilities are relevant for inference only in the case of overidentified moment conditions since, when K = p, one may generically find a method of moments estimatorˆ n such that Our use of implied probabilities in this paper is new since we want to devise the proper shrinkage implied by the null hypothesis = 0 . In other words, in the context of separable moment conditions (2), we define the implied probabilitiesˆ ( ) i,n as the solutions of subject to As a consequence, even in the just-identified case, implied probabilities do not coincide with the empirical distribution (6) because the null hypothesis is not exactly fulfilled with sample moments. The consistent estimators V n of the variance matrix we promote in this paper are the ones associated to these implied probabilities, It is worth comparing the estimators V ( ) n (for any choice of the power-divergence parameter ) with the naive estimation principle based on the empirical probabilities (1/n) (and mentioned in the Introduction) that, under the null, i.e., when working with the constrained "estimator" would lead to consider The key difference between (9) and (10) is that we have replaced the empirical distribution (6) by implied probabilities which make sure that the moment conditions (with the value of under the null) are fulfilled in the sample. In yet other words, we have shrunk the variance estimator to take advantage of the information brought by the null hypothesis H 0 : = 0 . This strategy is germane to pooling data to take advantage of invariance of parameter values across different samples (see, for example, Ziemer and Wetzstein, 1983). The shrinkage interpretation will be confirmed by the computation of the implied probabilities below. Let us also note that this can be related to a point already made by Hall (2000) who recommends that variances be calculated using the data in mean deviation form for improved power properties of over-identification tests. It is precisely a way to acknowledge that the naive estimator must be shrunk in due proportion of the in-sample violation of the moment conditions. Hall's shrinkage would simply lead to replace V n by However, under the null, V n and V Center n are asymptotically equivalent estimators of V : By contrast with (11), the shrinkage (9) makes an efficient use of the information content of the moment conditions. To see this, first note that the first order conditions of the 334 S. CHAUDHURI AND E. RENAULT minimization in (7) subject to the constraints in (8) gives, for a nonzero , where ∝ means "proportional to" and stands for a vector of rescaled Lagrange multipliers. Note that, for the sake of expositional simplicity, we exclude the limit case = 1 which corresponds to the Kullback-Leibler Information Criterion estimator put forward by Kitamura and Stutzer (1997).
Since for any value of , the sequences √ n are asymptotically normal and asymptotically equivalent (see, for example, Imbens et al., 1998), it is worth interpreting the implied probabilities in the particular case = −1, which corresponds to the Euclidean Empirical Likelihood (EEL) that is extensively documented in Antoine et al. (2007). However, it must be kept in mind that the score test extensively studied in the present paper, based on the estimator V (−1) n for the variance matrix, is not the score test associated to EEL or used by Kleibergen (2005). Indeed, the first order conditions of EEL, that deliver an estimator of numerically equal to the continuously updated GMM estimator of Hansen et al. (1996) do not resort to the efficient estimator for the variance matrix. It is actually the reason why Antoine et al. (2007) had proposed the 3-step EEL. The first two steps are not needed here since is known under the null.
The advantage of the Euclidean case = −1 is that the Taylor expansion (13) is actually exact, so that we get closed form formulas for the Lagrange multipliers and the implied probabilities:ˆ This closed form formula allows Antoine et al. (2007) to give a control variable interpretation of the constrained estimator of the expectation of any integrable function of the variables i . Under the null, if the expectation with respect to the empirical distribution in (6) is denoted byÊ and that with respect to the implied probabilitiesˆ ( ) i,n byÊ ( ) , we get, for any scalar function h(X 1 ) Therefore, in order to estimate E[h(X 1 )], we compute the sample mean of the residual of the regression of h(X i ) on i . The control variable principle tells us that it is an efficient way to estimate E[h(X 1 )] while taking into account the information that E[ 1 ] = 0. In other words, as rigorously proved in Antoine et al. (2007), the estimatorÊ (−1) [h(X 1 )] reaches the semiparametric efficiency bound for the estimation of E[h(X 1 )] under the null hypothesis H 0 : = 0 . The aforementioned first order equivalence implies that the semiparametric efficiency bound is also reached under the null by a bunch of constrained estimators, associated to any value of the power : The focus of our interest in this paper will be the use, for the purpose of score testing, of two constrained estimators V ( ) n of the variance matrix associated respectively to the values = −1 and = 0. As explained above, the use of V (−1) n is in the line of score testing in the context of 3-step EEL. The use of V (0) n is in the line of score testing with Empirical Likelihood (EL) as studied by Guggenberger and Smith (2005) since the minimization of (5) in the limit case → 0 amounts to the maximization of the empirical likelihood n i=1 log( i ) The control variable interpretation above allows us to interpret these constrained estimators of the variance matrix as a result of a kind of shrinkage, that is replacing the cross products of components of i := X i − g( 0 ) by the residual of their regression on the moment function. This interpretation is exact in the EEL case and asymptotic in the EL case (and all Cressie-Read cases as well).
Note also that the implied probabilities in the EEL case may take negative values in finite samples. It may be an issue for positive definite estimation of the variance matrix. Antoine et al. (2007) have proposed an additional shrinkage step to get rid of this nonpositivity issue. They consider instead the following implied probabilities: They show that this additional shrinkage does not prevent from reaching the semiparametric efficiency bound under the null. We will denote , we end up with an estimator of the variance matrix that is asymptotically equivalent under the null to where, for l = 1, 2, , K, l is the square symmetric matrix of size K with coefficients Cov[ ih ik , il ], for h, k = 1, 2, , K. Of course, in practice, V (−1) n is infeasible and the population moments l , for l = 1, , K, and V should be replaced by their sample counterparts. However, all these estimators would be asymptotically equivalent and, therefore, their differences are immaterial for us, as will be shown explicitly in the next section. Note that, as residuals of regressions on the moment function, these estimators are actually shrunk by comparison with the naive sample variance. Also see Brown and Newey (1998) and Antoine et al. (2007) for related discussion in the context of semiparametric efficiency.
We refer the reader to the companion paper Chaudhuri and Renault (2011) for the statement of the regularity conditions that make all Cressie-Read variance estimators asymptotically equivalent. In particular, it may require the existence of the eight moment of i . In this paper, we maintain the assumption of asymptotic equivalence as a high level assumption: Asymptotic Equivalence of Cressie-Read Estimators.
A comparison of (14) and (15) clearly shows that the proposed improvement for estimation of the asymptotic variance matrix will matter when the moment function displays some kind of multivariate skewness. Of course, as exemplified by the recent literature on heteroskedasticity and autocorrelation consistent (HAC) estimation (see, for example, Andrews, 1991, Sun et al., 2008 improving the variance estimator does not necessarily improve the finite-sample performance of inference. However, the next section of the paper will confirm that our proposed (i.e., the control-variable) improvement matters for testing of hypotheses. Even though the focus of our interest is more on size of the test, we suspect (see also Chaudhuri and Renault, 2011) that finite-sample improvements would also be noteworthy for power. It must be kept in mind that by contrast with Hall (2000), our focus of interest is not power of over-identification tests but on improved inference on the "structural" parameters.

THEORETICAL ANALYSIS
The distribution of the score test statistic S n defined in the Introduction obviously depends upon the choice of an estimator V n of the unknown variance matrix V of the moment function i .
The score statistic, as a function of V n , is defined as The dependence on V n is made explicit because the theme of the paper is the choice of V n . We will assume throughout that our estimator sequence (V n ) is such that Assumption 1 holds.

Assumption 1.
(i) For all n sufficiently large, with probability one V n is a positive definite matrix of size K.
is asymptotically normal under the null hypothesis.
Note that this assumption will be fulfilled for all our estimators of interest when V is positive definite and a central limit theorem is valid for (¯ n , Vec( V n ) ).
For our results, we will actually need to maintain a stronger assumption. To see that, it is useful to introduce the following two moment functions that define what Sowell (1996) had dubbed respectively the identifying and the overidentifying restrictions: Note that the infeasible score test statistic is nothing but Hence the critical value of the score test will be defined by a quantile of (P(G)Z 1 ) V −1 (P(G)Z 1 ), where Z 1 is a Gaussian vector equal in distribution to the limit distribution of Z 1n under the null. For the sake of higher order asymptotic assessment of the size of the test, we will need to maintain an additional regularity condition for such a quantile.
Note that the assumption CLT(a) supersedes Assumption 1. Besides the central limit theorem, we need convergence of a few additional moments The last one is actually required only in the overidentified case. The first and last limit values, a and 0, respectively, are implied by the fact that and that Z 2 − CZ 1 is independent of Z 1 since Z 1 and Z 2 are jointly Gaussian. Goggin (1994) gives some sufficient conditions for convergence in distribution of However this convergence in distribution does not imply almost sure convergence. This is the reason why we do not want to assume the validity of the limits for almost all a ∈ H but only for some given quantile at play in the proposed score test procedure.
Our assumption is more akin to assuming the following. First, the sequence of conditional distributions of (Z 1n , Z 2n ) given P(G)Z 1n = a converges weakly towards the normal conditional distribution of (Z 1 , Z 2 ) given P(G)Z 1 = a. Second, some specific moments converge accordingly.
The required convergence in distribution is germane to an assumption of stable convergence in law (see, for example, Jacod and Shiryaev, 2003). The convergence of the specific moments would come with suitable uniform integrability conditions.
The quantile a of interest will actually be defined as a = G(G V −1 G) −1/2 from a quantile of t n , vector of coefficients of columns of G(G V −1 G) −1/2 in P(G)Z 1n : Of course, in practice and in this paper, will be defined as the quantile of the distribution of a standard normal vector.
The key idea is to use an approximation procedure derived in Cavanagh (1983, Lemma A1, Chapter 2) (also see equations A1 and A3 in Rothenberg, 1988). We will start from an expansion: and use Cavanagh's result to claim that t n (V n ) admits the same Edgeworth expansion to order (1/ √ n) as the variable For the sake of expositional simplicity, it is convenient to set the focus on the case where t n (V n ) = t n is a real random variable, that is dim( ) = p = 1 This condition will be maintained throughout even though the approach is more generally valid, at the price of heavier notations. Note that the one-parameter setting allows us to even consider onesided alternatives so that the focus of our interest will be an asymptotic approximation of probabilities like P[t n (V n ) ≤ ] Using Cavanagh-Rothenberg-type approximation, we will get the expansion where f n (·) stands for the probability density function of t n . We first prove the following lemma.
Lemma 3.1. Under Assumption 1, and Z 2n (V n ) stands for the K-dimensional random square matrix such that Vec( Z 2n (V n )) = Z 2n (V n ).
Note that Cavanagh's approach can be applied since B n (V n ) is a smooth function of the asymptotically Gaussian vector Z n (V n ). It depends on our estimator V n of V through the random matrix Z 2n (V n ). The focus of our interest is to devise choices of V n such that the first term in the expansion is equal to zero, that is E[B n (V n ) | t n = ] = o(1). In order to do that, we will maintain the following assumption.
We are then able to prove our main result.
Theorem 3.2. Under Assumptions 1 and Edg( ), and under the null hypothesis H 0 , the following conditions hold: where, for l = 1, 2, , K, l is the square symmetric matrix of size K with coefficients Cov[ ih ik , il ], for h, k = 1, 2, , K; (iii) More generally, for the Cressie-Read family, Remarks.
(i) It is obvious from the expression of the matrices l for l = 1, , K that what makes the first order bias E[B n (V n ) | t n = ] non-negligible in general when V n = V n or V n = V Center n is the nonzero (multivariate) skewness of the moment function. The intuition is actually quite clear from the formula of the bias in Lemma 3.1. In both cases, B 1n and B 2n , the bias comes from the asymptotic correlation between the error Z 2n in the estimation of the variance matrix V and the moment function Z 1n = √ n¯ n . This correlation is typically akin to multivariate skewness in the moment function. (ii) In the just-identified case M(G) = 0 and hence B 2n (V n ) = 0 for all choices of V n .
In addition, since It is this distortion due to the skewness of the moment vector under H 0 that is being refined when one instead uses V n = V (−1) n or V n = V (−1,p) n or equivalently, up to the same order, some Cressie-Read family estimator V ( ) n . Of course, as stated above, refinements are also obtained in overidentified cases. The key intuition is provided by Lemma 3.1 jointly with formula (14) above. The proposed improvement in variance estimation based on the residuals of a regression on the moment function (according to the control variable interpretation) has gotten rid of the perverse correlation that is produced by multivariate skewness.
(iii) We stress that such refinements do not correct for all the skewness-related errors of approximation of the exact distribution of the test statistic. For example, in the case p = K = 1, as can be seen from a formal Edgeworth expansion of the infeasible statistic t n , the effect of skewness is still present in the first-order approximation error of its exact distribution. To see this, note that under the rotation such that E[ 2 ] = 1 and the assumptions E[| | 3 ] < ∞, and sup |s|≥ |E[exp( s )]| < 1 for all > 0 (Cramer's condition) with = √ −1 where ( ) and ( ) are, respectively, the pdf and cdf of a N (0, 1) distribution. Other existing methods of modifying the t-ratio without questioning the standard critical value, such as those proposed by Johnson (1978), Lyon et al. (1999), and Yanagihara and Yuan (2005), share with our approach a similar lack of complete skewness correction. It takes resampling methods, like bootstrap, to remove completely (up to order o p (1/ √ n)) the perverse effect of skewness on finite sample by modifying the critical value itself. This observation is confirmed by our Monte Carlo results in the next section. (iv) More generally, our approach does not really try to make the behavior of the test statistic n (V n ) the closest possible to 2 (p) (or equivalently, t n (V n ) the closest possible to normal) under the null but rather the closest possible to the infeasible n (V ) In this respect, our approach can be seen as an extension of the work of Rothenberg (1988) to nonlinear settings in overidentified models, although our focus is rather on size than on power. Extension of the results to √ n-local alternatives is straightforward for parts (i) and (ii) of the proposition. For part (iii), it is also possible because the conditions in Chaudhuri and Renault (2011) allow for that.
(v) As already mentioned in the introduction, the proposed refinements are in the spirit of Bartlett correction and differ fundamentally from the other strand of the literature that seeks to refine the critical value for the test, for instance by resampling. While it has been shown that empirical likelihood is Bartlett-correctable (DiCiccio et al., 1991) while other empirical discrepancy statistics are not in general (Corcoran, 1998), we circumvent this difficulty by focusing directly on implied probabilities (for improvement of variance estimation) and not on estimators provided by minimization of Cressie-Read discrepancies. (vi) The key point is that, even though the first order conditions of efficient GMM amount to picking a subset of just-identified moment conditions, the efficient estimation of the variance matrix under the null will take advantage of the whole set of moment conditions. In fact, simulation results reported below show that the empirical size can be made closer to the nominal level (based on the first-order asymptotics) by the use of the score statistics that involve, respectively, the modified estimators V (−1) n (i.e., EEL) and V (−0) n (i.e., EL) of the asymptotic variance.
(vii) Higher order asymptotics derived in this paper had not been explicitly studied by the extant literature on higher order improvements of GMM provided by empirical likelihood (see, for example, Smith, 2004, or Guggenberger andSmith, 2005). Since we extend the use of the Cavanagh-Rothenberg-type approximations, we set the focus on conditional expectations E[B n | t n = ] of the bias terms rather than just on their unconditional behavior.

COVARIANCE STRUCTURE MODEL: A MONTE-CARLO EXPERIMENT
In this section we demonstrate the improvement in the finite-sample behavior of the score tests, in terms of closeness of size in finite samples to the nominal level, due to the use of the modified estimators V (−1) n (EEL) and V (0) n (EL) instead of the naive estimator V Center n of the asymptotic variance V . (Results with V n are similar to that with V Center n .) We also demonstrate that such improvements may often be comparable to that obtained by bootstrap.
Estimated size of the score tests of H 0 : = 0 against H 1 : > 0 , H 2 : < 0 and H 3 : = 0 at the 5% and 10% nominal levels are reported in Tables 1 and 2, respectively, for sample sizes 500, 1,000 and 5,000. For the purpose of brevity and to keep the ratio of number of moment restrictions to observations relatively low, results are reported only for J = 4, 8 (i.e., k = 7, 15) based on 5,000 Monte-Carlo trials.
As would be expected from Theorem 3.2(i), score tests based on the naive variance estimator V Center n is heavily size-distorted in small samples. The size distortion increases with the increase in skewness and kurtosis of the underlying distribution of the moment vector (due to the choice of the distributions-uniform, normal, t, and exponential progressively).
On the other hand, supporting Theorem 3.2 (ii) and (iii), use of the modified estimators of variance corrects the size distortion substantially in all cases, although in the exponential case such correction may not be deemed enough at least when the alternative In this paper, we consider situations where the Jacobian of the moment function is known under the null and is full column rank. The need for finite-sample improvements only comes from the sample uncertainty in the asymptotic variance of the moment function. It has been well documented (see, for example, Altonji and Segal, 1996, for estimation of covariance structures) that a perverse correlation between the estimated asymptotic variance and the moment function is responsible for serious finite-sample bias in estimation and inference. Horowitz (1998) proposed a bootstrap approach for improved estimation and inference in covariance structures models.
The originality of the current paper is to propose a battery of possible adjustment procedures for score statistics that do not resort to any resampling strategy. The key idea is close to the well-known Bartlett adjustment, a correction directly on the test statistic itself without modifying the usual chi square-based critical value. Moreover, this adjustment pertains to the information theoretical methodology since it is obtained by using the implied probabilities derived from the minimization of any generalized entropy function.
While we provide evidence of improved performance of the score test, both by closed form higher order expansions and by Monte-Carlo experiments, this possibility of improvement may look at odds with the known impossibility results in the literature, at least for the generalized entropy approaches different from empirical likelihood. Corcoran (1998) had shown that Bartlett adjustments put forward by DiCiccio et al. (1991) for empirical likelihood cannot be extended to general Cressie-Read empirical discrepancies. Newey and Smith (2004) had shown that, in case of skewness in the moment function, only the empirical likelihood (and no other Cressie-Read empirical discrepancy) takes care of the efficient estimation of the asymptotic variance matrix. We actually circumvent these impossibility results by assuming that there is no unknown parameters under the null and then, the implied probabilities can be used to improve the variance estimation, irrespective of the user's preferred generalized entropy function. This is important since it allows in particular to use Euclidean likelihood which is more user-friendly, both for numerical and analytical computations, and for interpretation as well (connection with continuously updated GMM). In this respect, our paper extends the work of Antoine et al. (2007) to issues related to testing of hypotheses. Their strategy of 3-step Euclidean likelihood based estimation could be used for the more general case where some nuisance parameters, other than the asymptotic variance matrix, would remain unknown under the null. Our paper can also be seen as an extension of the classical work of Rothenberg (1988) to nonlinear settings in over-identified models. Of course, much remains to be done for the sake of finite-sample improvements in GMM inference. To the best of our knowledge, the higher order expansions of power functions that we derive in this paper have not yet been obtained in more general circumstances with unknown Jacobian matrix and/or weak identification issues.