Robust Inference for Inverse Stochastic Dominance

The notion of inverse stochastic dominance is gaining increasing support in risk, inequality, and welfare analysis as a relevant criterion for ranking distributions, which is alternative to the standard stochastic dominance approach. Its implementation rests on comparisons of two distributions’ quantile functions, or of their multiple partial integrals, at fixed population proportions. This article develops a novel statistical inference model for inverse stochastic dominance that is based on the influence function approach. The proposed method allows model-free evaluations that are limitedly affected by contamination in the data. Asymptotic normality of the estimators allows to derive tests for the restrictions implied by various forms of inverse stochastic dominance. Monte Carlo experiments and an application promote the qualities of the influence function estimator when compared with alternative dominance criteria.


INTRODUCTION
One objective of social welfare analysis is to define and implement criteria to rank distributions representing, for instance, risky prospects or realizations of income, consumption, and wealth. The major challenge is to define dominance criteria that are robust to the choice of the evaluation method adopted to assess and compare the degree of risk or inequality implied by the distributions. Motivated by welfare concerns, the literature has brought about the notion of standard stochastic dominance, which has become a workhorse in empirical distribution analysis as statistical inference techniques have been made increasingly available (Bishop, Chakraborti, and Thistle 1989;Anderson 1996;Davidson and Duclos 2000;Barrett and Donald 2003;Linton, Maasoumi, and Whang 2005).
The notion of inverse stochastic dominance (ISD) introduced by Muliere and Scarsini (1989) identifies criteria for ranking distributions that are distinct from standard stochastic dominance. The ISD is linked to the rank-dependent (nonexpected) utility approach pioneered by Yaari (1987), which resolves important paradoxes in decision theory (Quiggin 1993), and has natural applications in finance (Wang and Young 1998), as well as in inequality (Yaari 1988;Zoli 2002) and welfare analysis (Sen 1974;Aaberge 2009;Aaberge, Havnes, and Mogstad 2013). Connections with ISD are also found in program evaluation studies based on quantile regression methods (Firpo, Fortin, and Lemieux 2009;Andreoli, Havnes, and Lefranc 2014).
The most demanding notion of ISD, denoted ISD at order one or rank dominance (Saposnik 1981), holds whenever the Pen's Parade chart (i.e., the quantile function graph) of the dominant distribution lies nowhere below, and at least some point above, the corresponding chart of the dominated distribution. Weaker criteria of ISD involve comparisons of multiple partial integrals of the distributions' quantile functions at every abscissa corresponding to a population proportion. For instance, ISD at order two is implemented by comparing the integrals of the quantile functions, taken over the space of population proportions.
These give the generalized Lorenz curves of the distributions (Gastwirth 1971), with all the welfare implications that generalized Lorenz dominance bears (Kolm 1969;Atkinson 1970;Shorrocks 1983). Further integrations yield conditions for even weaker forms of ISD, which guarantee robust, yet more conclusive, welfare rankings of the distributions. Despite economic and statistical advantages of ISD over standard stochastic dominance, there is limited scope for ISD in empirical analysis, as a consistent statistical framework for testing ISD is still missing.
This article develops a statistical inference model for testing ISD at higher orders. The methodology consists in calculating the ordinates of the recursive integrals of the generalized Lorenz curves for the full set of abscissae implied by the empirical distribution functions, and then using these estimates to compute tractable formulations of the covariances between ordinates of these curves. The implementation of this approach is based on the linear decomposition of the ISD estimators into their influence functions, measuring the extent to which the estimator is influenced by an infinitesimal amount of contamination in the data (Cowell and Victoria-Feser 2002; Barrett and Donald 2009). This is done by (i) linearly decomposing the sample estimator of the multiple partial integrals of the generalized Lorenz curves evaluated at various population proportions into the corresponding influence functions; (ii) retrieving an empirically tractable estimator of these influence functions; (iii) obtaining analytical expressions of the underlying asymptotic covariances, which depend on the estimated influence functions. Standard algebra shows that asymptotic normality is granted at √ n convergence rate. Thus, Wald-type joint test statistics can be used to test the hypothesis of equality, ISD, and lack of dominance at any order.
There are proposals of estimators for ISD at order one and two that are model-free (Beach and Davidson 1983;Bishop, Chakraborti, and Thistle 1989) and/or robust to the complex design of the sample (Kovacevic and Binder 1997;Zheng 1999;Zheng 2002). Aaberge (2006) proposed related estimators in a model-dependent context. The influence function approach yields, instead, model-free estimators for ISD at any order. For instance, the covariance estimators for quantile and generalized Lorenz curves proposed by Beach and Davidson (1983) are shown to coincide with the influence function estimators for ISD at order one and two, respectively, if simple random sampling is assumed.
The ISD criteria at order one and two yield equivalent welfare implications as first-degree and second-degree stochastic dominance. The inference strategies proposed in the literature for these forms of standard stochastic dominance rest, nevertheless, on comparisons at fixed population proportions (Beach, Davidson, and Slotsve 1994;Abadie 2002;Barrett, Donald, and Bhattacharya 2014), as required by ISD implementation.
To rank distributions when their generalized Lorenz curves cross it is wise to choose criteria that are finer than generalized Lorenz dominance, but still consistent with it, that is, that are implied by ISD at order two. Extending comparisons to ISD at order three and above is an interesting possibility, since the order of ISD identifies restrictions on the class of rank-dependent evaluation functions upon which welfare dominance is evaluated (Maccheroni, Muliere, and Zoli 2005). Refinements of standard stochastic dominance (Fishburn 1976;Fishburn and Vickson 1978;Le Breton and Peluso 2009) yield welfare implications that are distinct, although equally valid from a normative standpoint, from those of ISD.
Empirical analysis seems, nonetheless, to advocate for the use of the ISD criterion. In fact, when two generalized Lorenz curves cross, the empirical practice is to give priority to welfare considerations implied by one or few inequality indicators, rather than resorting to finer dominance criteria. Almost ubiquitously, the focus is on the Gini inequality coefficient and/or on its singleparameter S-Gini extensions (Weymark 1981;Donaldson and Weymark 1983), which are strictly connected to the generalized Lorenz curve (Yitzhaki 1982) and for which asymptotic results are available (Barrett and Pendakur 1995;Barrett and Donald 2009). The ranking of distributions based upon the Gini index, however, is not necessarily consistent with standard stochastic dominance at orders higher than the second (Newbery 1970;Dardanoni and Lambert 1988). Rather, if the generalized Lorenz curve of the distribution with higher Gini coefficient intersects the generalized Lorenz curve of another distribution from above, it is always the case that the former inverse stochastic dominates at order three the latter (Zoli 1999).
Additional motivations supporting the ISD model are illustrated in the rest of the article, which is organized as follows. The normative background of ISD is examined in Section 2. The asymptotic results for the influence function approach for ISD are developed in Section 3, where it is also shown that, under reasonable assumptions, the influence functions estimates are bounded. The proposed estimator is therefore robust, in the sense that its value cannot drastically change with an infinitesimal amount of contamination in the data (Cowell and Victoria-Feser 2002). Extensions to the complex survey design setting are discussed in Section 3.4. Section 4 develops joint tests for various forms of ISD. The properties of these tests are investi-gated in the context of a Monte Carlo study (Section 5), based on parametric models for distributions of disposable income and durable consumption in the U.S., and in an empirical evaluation of equality of opportunity in France (Section 6), where higher order ISD relations are tested to uncover patterns of unfair advantage in labor market outcomes of workers with heterogenous backgrounds of origin. In both assessments, the influence function estimator's performances are compared with the bootstrap estimator for ISD and with standard stochastic dominance tests (Davidson and Duclos 2000;Barrett and Donald 2003). The simulation study, in particular, supports the use of the influence function estimator for ISD as the most powerful testing criterion when the sample size is not too small, and especially when data are contaminated. Section 7 concludes.

ROBUST WELFARE ANALYSIS AND ISD
Let Y be a random variable with cumulative distribution function (c.d.f.) F and inverse (quantile) distribution function F −1 (p) = inf{y ∈ R + : F (y) ≥ p}, for p ∈ [0, 1]. The distribution F represents, for instance, the distribution of income in the population. Following Gastwirth (1971), the integral func- Integration by part reveals that F k is a linear transformation of the distribution's quantiles: (1) The relations between the quantile function (k = 1) and its recursive integrals at orders k = 2 and k = 3 are represented in Figure 1. At order one, the so-called Pen's Parade "of dwarfs and giants incomes" of F gives the level of income attained by the poorest p100% of the population. At order two, the generalized Lorenz curve coordinates correspond to the expected income attained by the poorest p100%. At order three, the integral of Figure 1. The quantile function curve and the generalized Lorenz curve, along with its integrals. Gray dots on the curve in panel (b) correspond to the value of the integrals of the curves in panel (a), computed at different population shares p 1 , p 2 , and p 3 . the generalized Lorenz curve trades-off expectation of income attained by the poorest p100% with income inequality. As the order of integration grows, more weight is given to inequality among the poor. Muliere and Scarsini (1989) defined a situation where the graph of F k lies nowhere below the graph of G k , that is, F k (p) ≥ G k (p), ∀p ∈ [0, 1], as F inverse stochastic dominates G at some order k, denoted F ISDk G. The ISD1 criterion stands for rank dominance (Saposnik 1981), while ISD2 denotes generalized Lorenz dominance (Shorrocks 1983). Although ISD at orders one and two provide equivalent normative implications as standard first-and second-degree stochastic dominance, the two criteria do not coincide at higher orders of dominance.
The ISD relation induces a partial order of distributions parameterized by k, which has a normative appeal for social welfare analysis. It is in fact related to the Yaari's (1987) rankdependent representation of social evaluation functions (SEF). Any SEF W in the broadest class of rank-dependent SEF, denoted R, can be written as a weighted average of realizations: When w(p) = 1 for all p, the SEF is simply the expectation of F. Otherwise, w(p) provides a distortion of the probability p of observing an income lower than F −1 (p), thus incorporating value judgements about the role of low and high income realizations on overall welfare. Restrictions on these evaluations have been introduced in the form of assumptions on the alternate sign of high-order derivatives of w(p) (Maccheroni, Muliere, and Zoli 2005;Aaberge 2009). The order of these derivatives, denoted by an integer k ≥ 2, defines subsets of R. So, if R k gathers all SEF in R where k assumptions have been made on the sign of the derivatives of the weighting function up to order k, and R l is instead obtained by making l > k assumptions on the first l derivatives of the weighting function, then R l ⊂ R k ⊂ R. Muliere and Scarsini (1989) provided the normative foundation for ISD, by showing that F ISDk G if and only if W (F ) ≥ W (G) for all W ∈ R k . Since ISDk refines ISD2, ISDk is always consistent with generalized Lorenz dominance and F ISDk G implies F ISDl G, for all l > k, but not the reverse. As a consequence, when GL curves cross and second-degree stochastic dominance is rejected, refinements are still possible by studying ISD3, which gathers agreement about the preferred distribution among all SEF in R 3 .
The ISD3 criterion is also coherent with the empirical practice, which, in the presence of intersecting GL curves, seems to focus on ordering distributions through the Gini inequality coefficient. The Gini coefficient is a member of a particular family of SEF in R k , the Generalized Single Parameter S-Gini SEF (Donaldson and Weymark 1983). Denote W k (p, F ) as one of these SEF, measuring social welfare for the poorest p100% of the population in F, such that (2) The SEF related to the Gini coefficient is simply W 2 (1, F ) (Yaari 1988). Hence, on the one hand, F ISD3 G implies that the Gini coefficient of F is smaller than that of G (when both distributions have the same mean income), while, on the other hand, Gini coefficients computed at every proportion p of the poorest population in both distributions can be used to establish F ISD3 G (Zoli 1999). Using the fact that W k−1 (p, F ) in (2) is proportional to (1) by a factor 1/(k − 1)!, it is possible to show that the relation between inequality measurement and ISD extends to any order k ≥ 3.
This result reveals a clear parallel between the way in which comparisons of ISD and of standard stochastic dominance at orders three and above are implemented, although the two concepts remain distinct beyond order two. In fact, standard stochastic dominance is implemented by checking that, at every poverty line, the dominant distribution displays less poverty (as measured by the Foster, Greer, and Thorbecke (1984) poverty index) than the dominated one. Davidson and Duclos (2000) exploited poverty gaps from predetermined income thresholds to produce tests of stochastic dominance. In this vein, I propose tests for ISDk making use of estimates of F k and G k , for any pair of distributions F and G, at selected populations proportions. Since the asymptotic results for the proposed estimator crucially depend upon its influence functions decomposition, I call it the influence function (IF) estimator for ISDk.

The Influence Function IF Estimator
The derivation of the IF estimator relies on a key result in distributional theory. Denote T ( H ) a scalar valued functional of some empirical process H that is defined on [0, 1]. Using Hadamard differentiability of the linear functional T, it can be shown that where the iid random variables φ i (., H ) are referred to as the influence functions of H , and give the effect of an observation i on the estimator H of the underlying process H. The value of the influence function at p ∈ [0, 1] is φ i (p, H ) (for references, see Barrett and Donald 2009). Consider a sequence of realizations y 1 , . . . , y i , . . . , y n of nindependent random variables identically distributed as F. Denote the inverse of F as F −1 and its generalized Lorenz curve as GL F . Their empirical counterparts are denoted as F , F −1 , and GL F , respectively. The empirical counterpart of F k in (1) is denoted as F k . To avoid cumbersome notation, the explicit reference to F in the superscript is dropped, unless disambiguation is needed.
The estimator k is tied to GL by the following integral function: Since the generalized Lorenz curve is a continuous linear map defined on the space of population shares, it can always be decomposed into its influence functions as in (3). This follows by the fact that GL is a linear transformation of the quantile function, so the process √ n GL(p) − GL(p) consists in a linear transformation of the Bahadur's (1966) representation of quantiles. By setting k (p) = T ( H ) where H = GL, also the process √ n k (p) − k (p) can be represented as a sum of iid variables plus a residual term that vanishes asymptotically: The n iid random variables φ i (p, k ) are the influence functions of k (p). Their formula is implicitly given by (5). The influence function measures the impact that a given observation i would have on the estimator if the realized income of i were drawn from the true population distribution F with probability 1 − ε and with probability ε from a distribution C (i) (y), assigning a unit point mass to income y i (Cowell and Victoria-Feser 2002). The hypothetical contaminated distribution is denoted as where C (i) (y) := 1(y ≥ y i ) and 1(.) is the indicator function. The parameter ε captures the importance of the contamination. For an infinitesimal amount of contamination, define An estimator for a stochastic order is robust, that is, its value cannot drastically change with an infinitesimal amount of contamination in the data, if its influence function is bounded. Cowell and Victoria-Feser (1996) had shown that poverty indicators, upon which standard stochastic dominance relations are constructed, are generally robust, while inequality indicators (comprising the Gini index) are generally not. I show that under similar conditions, the ISDk estimator k (p) is robust.
The condition is always satisfied in cases where, for instance, ISD is evaluated over income, wealth, or consumption distributions admitting nonnegative realizations. Analytical results reported in the supplemental Appendix show that the bias due to the contamination vanishes as k grows large, implying that the asymptotic covariances of the influence functions estimates are also bounded and measured with limited bias in the presence of contaminated data.
The next proposition establishes the asymptotic distributional behavior of the IF estimator for ISDk.
Proposition 1. Suppose that the estimator k (p) for k = 3, 4, . . . is obtained from a sample of size n drawn from F, which is strictly monotonic and defined over a support that is bounded from below. Then √ n k (p) − k (p) converges in distribution to a normal distribution with mean zero and covariance kernel given by Proof. Given that k is a linear functional transformation of the quantile functions of F, the representation of quantiles by Bahadur (1966) directly implies that √ n k (p) − k (p) converges asymptotically to a sum of iid random variables with zero expectations, for any p ∈ [0, 1]. Hence, the central limit theorem applies. If F is bounded from below, the influence functions of k (p) are also bounded, and the process displays finite asymptotic covariance kernels. Noticing that the influence functions are iid random variables, it There are n of these equalities, implying that the covariance kernel can be written as It is common practice in empirical analysis to estimate a set of ordinates of the process k (p), corresponding to a set of prechosen m abscissae indexed by {p j |j = 1, . . . , m} with 0 < p 1 < · · · < p m ≤ 1. The implied ordinates can be collected in an m × 1 vector denoted k = ( k (p 1 ), . . . , k (p m )) ∈ R m + , with k being the corresponding vector in the population. The IF decomposition in (5), as well as its robustness properties, extend to vector notation: ) and the term o(1) should be understood in the proper dimensionality. It follows from Proposition 1 that the vector (7). A direct consequence is that, for any k ≥ 3, k is asymptotically distributed as N ( k , k /n).

Sample Implementation of the IF Estimator
Consider a sample of size n where realizations are denoted y 1 , . . . , y n . To save notation, the subscript i denotes both an observation and the position it occupies in the ranking of realizations arranged by increasing magnitude, so that y 0 ≤ y 1 ≤ · · · ≤ y i ≤ · · · ≤ y n with y 0 an inferior bound. The weights, often representing the inverse probability of selection from the population, are denoted i ≥ 0 and are indexed according to the sample units: 1 , . . . , n . Results are first derived within the simple random sampling structure and then extended to complex survey design.
The empirical c.d.f. F estimated at any point y is a step function with increments π i = i / i i associated with each observation. Its ordinates are denoted by If in the sample there are no ties, that is, y i < y i+1 for all i's, then F (y) = p i for any y ∈ [y i , y i+1 ). If there are τ ties in the sample among y i = y i+1 = · · · = y i+τ , then F (y) = p i+τ whenever y i−1 < y = y i = y i+1 = · · · = y i+τ . With this representation, it is possible to associate quantiles to observed incomes. The empirical quantile function F −1 (p) at population proportion p is where F (y 0 ) = p 0 = 0. Thus, for p ∈ ( p i−1 , p i ] the quantile function takes value y i and it is well defined even in the case of ties in the sample. Using the fact that can be defined as follows: In a similar way, it is possible to define a consistent estimator of the integrals of the GL curve in (1). The estimator is denoted The approximations in (11) and (12) are asymptotically valid, since the correction terms vanish as the sample size grows. Furthermore, notice that the estimator is linear in incomes, and therefore the presence of ties in the sample does not give rise to computational issues. The consistent estimator of the asymptotic variancecovariances of k , k /n = 1 n E[φ i · φ i ], can be obtained by replacing the expectation for its sample counterpart as follows: At abscissae (p j , p j ), the estimator writes: To compute the empirical covariances, it is necessary to estimate the influence functions φ i (p j , k ) at every abscissa p j . The consistent estimators of the influence functions can be obtained by plugging the influence function estimator of the generalized Lorenz curve GL(p), denoted as φ i (p, GL), into the definition of k (p) in (5). Using the influence function algebra, one can show that as in Cowell and Victoria-Feser (2002) and Barrett and Donald (2009). Equation (14) defines a model-free estimator of the influence function, meaning that its values can be computed from a sufficiently large sample of observations and no parametric assumptions on the underlying distribution functions have to be made. Plugging (14) into (5) gives the empirical counterpart of the influence function of k , which can be further decomposed into three elements as φ i (p, k ) = 3 h=1 I h , where The empirical estimator of I h with h = 1, 2, 3 can be derived using similar approximations as in (11) and (12). These estimators are asymptotically unbiased and, using integrations by parts, can be written as weighted averages across sample units: with κ a positive integer, and g can be either the estimator of the GL curve (by setting g( p j ) := GL( p j ), which yields κ ( GL( p j ))), the estimator of the c.d.f. at income y j (by setting g( p j ) := p j , which yields κ ( p j )), or a constant (by setting g( p j ) := 1, which yields κ (1)).
Replacing the estimator in (11) into I h leads to the asymptotically consistent estimator of φ i (p, k ). These estimators must be separately computed for all observations of the sample and for all quantiles implied by the chosen grid. The estimator of the asymptotic covariance k is the weighted sample covariance between the influence function estimates.

The IF Estimator for ISD1 and ISD2
Under simple random sampling, the validity of Proposition 1 extends to the IF estimators for rank dominance (ISD1) and for generalized Lorenz dominance (ISD2). Hence, the quantiles and generalized Lorenz curves estimators are asymptotically normal with covariance kernels while the influence function of the generalized Lorenz curve is given in (14). This representation turns out to be related to wellknown estimators for rank and generalized Lorenz dominance.
Remark 2. Suppose that for a set of population abscissae {p j |j = 1, . . . m}, the distribution function F is continuous and derivable in p j with density function f (F −1 (p j )) > 0 for every j. Then, for p j ≥ p j , σ 1 (p j , p j ) and σ 2 (p j , p j ) coincide with the covariance estimators in Beach and Davidson (1983), Lemma 1 and Theorem 1, respectively.
Aside from particular situations spotted in Cowell and Victoria-Feser (2002), which are unlikely to occur in empirical welfare analysis, when the domain of F is bounded from below and from above the influence functions of F −1 and of GL are bounded, meaning that the estimators proposed by Beach and Davidson (1983) are robust to contamination. This makes the IF estimator the most natural candidate for extending the robust, model-free estimator for generalized Lorenz dominance to higher-order ISD assessments.

Implementation With Complex Sampling Design
In empirical analysis, the simple random sampling assumption is highly unrealistic. Most economic data display a stratified, clustered, or multistage design. Assumptions about the sampling method may lead to different estimation procedures for the IF estimator. This is shown, for instance, in Zheng (2002) where consistent estimators for the covariance between ordinates of the Lorenz curve are derived under nonsimple random sampling design. I derive here the consistent estimator for the asymptotic covariance in Proposition 1 under the stratified single-stage sample design, a case also discussed in the empirical exercise.
To extract a stratified single-stage sample, a population is first divided into strata s = 1, . . . , S, and then a set of clusters is selected in a simple random manner from each stratum s. There are N s clusters in stratum s of the population, from which n s clusters are randomly selected. In a survey design context, each cluster j = 1, . . . , n s represents a primary sampling unit (PSU) that has a population of size N sj , with N = s j N sj the total population. Only n sj observations are drawn from each PSU. An observation is denoted with i = 1, . . . , n sj , and has a survey weight of sj i . In the presence of nonsimple sampling design, the random vector k − k is equivalent in distribution to the random vector: where θ sj = N sj /N and φ sj i is the influence function of the vector k for an observation i in cluster j of stratum s, obtained from the overall distribution. If the PSU are drawn independently both within and across strata, then k preserves its m-variate asymptotic normality, and its asymptotic covariance can be computed from the IF covariances (see Proposition 1).
To estimate the asymptotic covariance, denote the sample estimator of N sj as N sj = n sj i=1 sj i and that of the overall population as N = S s=1 n s j =1 N sj . The asymptotic covariance matrix estimator with nonsimple random sample design of the data is the covariance of the influence function realizations across PSU, multiplied by a correction term, which depends upon the strata and cluster dimensions: where θ sj = N sj / N and φ sj is the weighted average of the influence function for each PSU (s) while φ s is the sample mean of the influence functions in stratum s, that is, To obtain asymptotic consistency, the sampling design requires both N s and n s to be large. Estimation under more complex (multi-stage) sample design are covered in Zheng (2002) and Deville (1999). The extension of the results presented above in that setting is rather direct.

Alternative Estimators
Alternatively to the IF estimator, I present the bootstrap estimator for ISDk. It consists in bootstrapping a sufficiently large number of times the empirical counterpart of k from the original sample. The bootstrap covariance of these parameters is the covariance of the bootstrapped estimates. Let Y be the original sample of size n drawn from the distribution F. Bootstrap computations are conditional on Y. Let a random sample of size n * drawn with replacement from Y be denoted as It is often the case that n * = n. For every sub-sample Y b , calculate the estimator F b k (p j ) for a finite set of m abscissae. By repeatedly drawing random samples from Y, say B times, and calculating for each of the sub-samples the values taken by F b k (p j ) at each abscissa, one obtains a B × m matrix of data. The m × m empirical covariance matrix computed from these data gives the bootstrap estimator BS k /n, where The application of the bootstrap estimator only requires the calculation of a vector of m ordinates of F b k at every resampling stage, although in general it does not offer a refinement of the asymptotic approximation illustrated in Proposition 1.
Other estimators, connected to the IF approach, have been proposed in the literature. For instance, Aaberge (2006) discussed an estimator for the covariances between the partial integrals of the generalized Lorenz curve taken at different population proportions, and have proposed tests for comparing quantiles of the distributions and implied Gini indices. Aaberge, Havnes, and Mogstad (2013) studied the asymptotic properties of a Kolmogorov-Smirnov test for upward stochastic dominance criteria, which is obtained by representing the quantile function as a Gaussian continuous process. The implementation of these test statistics, however, relies on appropriate estimates of the population density functions. Reliable nonparametric estimators are difficult to obtain when the sample size is relatively small, and additional assumptions over the domain of realizations (such as boundedness) have to be imposed.

NULL HYPOTHESIS AND TEST STATISTICS FOR ISDK
For two distributions F and G, the null hypothesis of equality in distributions, that is, F ISDk G and G ISDk F , is equivalent to test G k (p) = F k (p) at every p. The null hypothesis of ISDk, that is, F ISDk G, can be tested against an unrestricted one, comprising the nondominance case. The alternative case, placing nondominance at the null, is of interest when the researcher is confident in claiming ISD only when there is strong evidence in its favor. In all these cases, the hypotheses are not formulated on k directly, but rather they postulate that an ISD relation at order k holds between the two distributions.
Following Anderson (1996), Forcina (1998, 1999), and Davidson and Duclos (2000), I define tests for ISDk based on a finite number of m abscissae. As a consequence, every null hypothesis can be formulated through joint hypothesis on a vector of parameters F k − G k . Let k be the 2m × 1 vector obtained by staking the vectors F k and G k . Its sample counterpart is denoted as k and is estimated from samples of size n F and n G , respectively, where n = n F + n G indicates the pooled sample size. The relative size of the samples is denoted by r F = n F /n and r G = n G /n. Let R = (I m , −I m ) be the m × 2m differences matrix, with I m indicating the m × m identity matrix. Define the parametric vector of differences, δ k ∈ R m , as δ k = R k .
Under the assumption that F and G are generated by independent processes, the asymptotic normality of the IF estima-tor allows to establish that √ n δ k = √ n R k is asymptotically distributed as N √ n R k , for k ≥ 1, where δ k denotes the sample counterpart of δ k , and An asymptotically valid estimator of , denoted as , is obtained by plugging the empirical counterpart of the influence function estimator in Proposition 1 in place of F k and of G k . The null hypotheses are formulated as m linear constraints on δ k . The null hypothesis of equality can be confronted with an unrestricted alternative, indicating the case in which some equalities at given population proportions only occur as a result of intersections.
Under the asymptotic normality of δ k , the null hypothesis can be assessed by a Wald-type test statistic T k 1 := n δ k −1 δ k , which is χ 2 m distributed. The decision rule can be formulated in terms of p-values as "Reject H k 0 if p k < α." The null hypothesis of dominance consists in comparing δ k nonnegative with the unrestricted alternative, that is, Under the null H k 0 , consider the following test statistic: Under the asymptotic normality of δ k , Kodde and Palm (1986) showed that T k 2 is asymptotically distributed as a mixture of χ 2 distributions: where w(m, m − j, ) denotes the probability that m − j elements of δ k are strictly positive. To estimate w m, m − j, , I draw 10, 000 m-variate normal vectors with mean zero and covariance matrix . Then, I compute the proportion of vectors with m − j positive elements. Kodde and Palm (1986) provided a tabulation of the lower (lb α ) and upper (ub α ) bounds of the rejection region for the null H k 0 at standard confidence levels α. The decision rule becomes "Reject (accept) H k 0 if T k 2 > ub α (T k 2 < lb α )." In all cases in-between, the usual decision rule based on the p-value applies. To test the reverse dominance order, that is, G ISDk F , it is sufficient to replace − δ k and −δ k in the calculation of T k 2 . Tests for F ISDk G against restricted alternatives can be derived from Dardanoni and Forcina (1999), where the asymptotic distributions of tests for equality and strong dominance are compounded.
Finally, the null hypothesis of nondominance encompasses cases in which either G ISDk F , or the graphs of F k (p) and G k (p) intersect. Opposing this hypothesis to an alternative of strong ISDk implies that the researcher is willing to conclude for ISD only if there is a strong evidence in its support, which is a defendable perspective in evaluation studies: Following Dardanoni and Forcina (1999), the test statistic under H k 0 correspond to a collection of standard-normal distributed statistics Z p j = √ n( F k (p j ) − G k (p j ))/ jj , j = 1, . . . , m, where jj is the estimator of at abscissae (p j , p j ). A test statistic can be derived in more compact notation as T k 3 = min p∈{p j |j =1,...,m} Z p ∼ N (0, 1).
Rejection of the null is based on a unilateral test with critical values taken from the standard normal tabulation. Equivalently, H k 0 is rejected only if the graph of the lower bound of the confidence interval of δ k lies entirely above the horizontal axis.
As pointed out by Davidson and Duclos (2000) in the context of standard stochastic dominance analysis, assessing dominance at a finite number of thresholds might rise the problem of test inconsistency. This is true for standard stochastic dominance, where dominance is inferred on the basis of income thresholds potentially gathering no observational mass in the close neighborhood. Even contamination at the bottom tail of a distribution might lead to crossings that extend over a number of income thresholds. The tests for ISDk, instead, allow to perform comparisons at fixed population proportions. In this way, sample coverage is always granted. When the grid is very fine (a parameter that is controlled by the researcher), the ISDk test is likely to measure differences in transformations of realized income at a continuum of population shares in the sample. This is close to the logic underpinning a Kolmogorov-Smirnov statistic for quantiles transformations. Barrett and Donald (2003) studied a consistent test for standard stochastic dominance based on the same principle, but evaluating transformations of the c.d.f. at every income level on a bounded realizations support. Their result relies on the fact that the asymptotic distribution for this test statistic involves a Brownian Bridge process. Using the Bahadur (1966) representation of quantiles, similar methods can be developed in the context of ISD analysis. For instance, Barrett, Donald, and Bhattacharya (2014) had developed consistent tests for assessing Lorenz dominance at every population proportion. The Monte Carlo study hereafter provides intuitions on the effect of increasing the quantity of population shares where ISD is tested. It also builds comparisons with standard stochastic dominance tests.

MONTE CARLO RESULTS
The size and power properties of the estimators discussed so far are assessed through a series of Monte Carlo experiments. Each experiment involves tests for ISD at order one, two, and three, the relevant case where standard stochastic dominance analysis and ISD analysis differ. The Monte Carlo experiment provides intuitions on the behavior of the different estimators when the sample size is relatively small, and allows to draw conclusions about the effect of increasing the sample size and of manipulating the number of threshold at which dominance is assessed. The study is based on reliable models of real income and durable consumption distributions in the United States, already validated in the literature.
Various estimators have been used to test the null hypothesis that a distribution F dominates another distribution G at some order k, versus an unrestricted alternative. The Monte Carlo experiment consists in simulating 1000 independent sample draws from parametric models of F and G. The design of each parametric model is inspired by Barrett and Donald (2009), whereby each simulated draw i of a random variable Y i at a given simulation stage is generated by a lognormal distribution, that is, where Z i is a realization of an N (0, 1) random variable and (σ, μ) are the dispersion and location parameters. Each experiment involves the simulation of three samples of size 100, 500, and 1500, respectively. For each experiment, the null hypothesis is tested for k ∈ {1, 2, 3} (both in strict dominance and equality forms) using different estimators for the asymptotic covariances, while setting the number of abscissae to m ∈ {5, 10, 20}. These abscissae correspond to increments in population proportions of, respectively, 20%, 10%, and 5%. The parameters μ and σ are chosen so that a specific dominance relation holds in the population. For each simulated sample, a series of indicators informing about acceptance or rejection of a given null hypothesis are recorded and then results are reported as averages of these indicators across all Monte Carlo iterations.
Three cases are investigated here. In the first case, I evaluate the size and power of the ISD3 test based on the IF estimator, and I compare it to the behavior of the bootstrap estimator. The first objective is to check the size of the tests by recording the proportion of simulated draws where the null is rejected by the data at a nominal size of 5%, knowing that the null is true and cannot be rejected when tested on the population (using one million observations). I consider F ISD3 G to be the null hypothesis. Following Barrett and Donald (2009), I assume that each income draw from F is representative of the gross individual-equivalent income in the United States from March 1998 CPS data, where μ F = 9.85 and σ F = 0.6. I consider instead that the data drawn from G are generated using μ G = 9.85 and σ G = 0.7. The difference between F and G approximates the change of gross individual-equivalent income over the 1980s and 1990s. Graphical analysis shows that the generalized Lorenz curves of F crosses that of G from above (hence F ISD2 G does not hold). Potentially, F is the distribution yielding higher social welfare for all SEF sufficiently averse to inequality.
The second objective of the Monte Carlo study is to check the power of the ISD tests. This is done by recording the proportion of simulated draws in which the null is rejected, knowing that the alternative is true. In this case, I consider two lognormal distributions F and G such that F ISDk G for some k > 3 but not F ISD3 G . Again, the parameterization is as in Barrett and Donald (2009) and gives the most likely guess for the distribution of per-capita nondurable expenditures in the United States over the 1990s. Hence, μ F = 6.37 and σ F = 0.48 while μ G = 6.4 and σ G = 0.55. In this case, the generalized Lorenz curves of F and G cross once but ISD3 does not hold in the population.
Detailed results about the size (case 1) and power (case 2) of the ISD tests based on different estimators are reported in Table 1. For relatively small samples, the IF estimator for ISD is shown to have the correct size and power in the baseline case. The size of the tests based on the ISD3-IF estimator are generally smaller than 0.10 when inference is made on five abscissae, while there is no clear pattern of variation with respect to the size of the samples. The size of the dominance test increases  ISD1  ISD2  ISD3-IF  ISD3-BS  ISD1  ISD2  ISD3-IF  ISD3-BS  ISD1  ISD2  ISD3-IF  ISD3- slightly with the number of abscissae. This is not surprising, given that the test becomes more demanding in terms of comparisons while the number of observations is held fixed, leading to higher likelihood of rejection of the true null hypothesis of ISD3. When the number of abscissae is set to 20, there is evidence of the negative association between size of the sample and size of the test, which evolves from 0.403 when the sample size is small (100 observations) to 0.223 when the sample size is larger (1500 observations). The discriminatory power of the test (case 2) is small when the number of abscissae is set to 5, but it grows rapidly to acceptable levels (in general larger than 0.7) when the sample size is of 1500 observations. The inference for ISD3 based on the bootstrap estimator (ISD3-BS) yields very similar results as for the IF estimator. In small samples, the bootstrap estimator shows, nevertheless, somehow larger size and smaller power compared to the IF estimator.
In the second case, I evaluate the size and power of the ISD3 estimators in the presence of data contamination. Results presented in the supplemental Appendix reveal similar patterns as in the baseline scenario, thus showing that the IF estimator is substantially robust to artificial contamination of the data.
In the third and final case, I contrast the size and power of the IF estimator reported in Table 1 with tests for second-and third-degree stochastic dominance. For these cases, the focus is on the tests by Davidson and Duclos (2000) (denoted as DD), implementing comparisons of distributions at 5, 10, and 20 evenly spaced income thresholds, and on the consistent test by Barrett and Donald (2003) (denoted as BD). The BD test is a Kolmogorov-Smirnov statistic of the difference between recursive partial integrals of the c.d.f. taken over a fine grid of the income realizations domain. When evaluated in the population, both tests suggest that F third-degree (but not second-degree) stochastic dominates G, while F does not stochastic dominate G neither at second-nor at third-degree. Both tests reject the null of equality at conventional levels of significance. Table 2 reports the Monte Carlo study results for the DD and the BD tests. The analysis of third-degree stochastic dominance (SD3 in the table) reveals that the size of the SD3-DD estimator is generally larger than the nominal 5%, and that the power is acceptable only when the number of income thresholds is set to 20. The SD3-BD test has the correct size but very low power, generally smaller than 0.32. A comparative analysis reveals that the IF estimator for ISD3 outperforms the SD3-DD estimator when the sample size is large. Both estimators lead to tests that have larger size than the SD3-BD test, but also substantially larger power. In the supplemental Appendix, I document that these patterns persist even in case of contamination, and that the IF estimator outperforms the DD estimator in the ability of distinguishing a genuine cross in the curves used to assess ISD3 or SD3 (which implies that the underlying distributions can be ranked at some higher order) from a situations where they are statistically indistinguishable. This is an important issue in evaluation studies, where the impossibility of rejecting the equality null hypothesis might prevent the evaluator from further investigating higher order welfare effects.
Tables 1 and 2 are also useful to assess the power of tests taking generalized Lorenz dominance as the null. In this case, the tests for ISD2 and for SD2 have equivalent normative implications but rely on substantially different implementation methods. The former is implemented by checking generalized Lorenz dominance at fixed population ranks, the latter at fixed incomes. Confronting power levels in Table 1, case 1, with the respective records in Table 2, it emerges that the DD estimator is somehow more powerful than the Beach and Davidson (1983) estimator for ISD2, which in turn dominates the BD method when the number of population proportions' abscissae is above 5. This pattern is substantially preserved when contamination is artificially introduced. Again, I find that in samples of size 500 or less, the ISD2 test is substantially more discriminatory in rejecting the false null of statistical equality of generalize Lorenz curves than the alternative SD2 estimators, although this difference vanishes in samples of larger size.

ILLUSTRATION: EQUALITY OF OPPORTUNITY IN FRANCE
This section provides an illustrative application of ISD to check for robustness of equality of opportunity (EOP hereafter) assessments. The EOP principle for income acquisition posits that differences in the background of origin across individuals should not predict their labor market income prospects, a monetary measure of the individuals' opportunity set.
To formalize this notion, consider the situation where, from an ethical standpoint, the population can be divided into two groups   gathering individuals with either background a or b. These are addressed to as individual circumstances c ∈ {a, b}. The income y i of individual i is, then, the result of the interplay between her circumstances and other components. Building on this setting, Roemer (1998) clarified that under reasonable assumptions the empirical labor income distribution F c conditional on the circumstances c serves as a valid proxy of the opportunities faced in the labor market by individuals belonging to this group. In this context, Lefranc, Pistolesi, and Trannoy (2009) proposed welfare-based criteria to assess when EOP is satisfied on the data. In their view, EOP prevails if there is no agreement among inequality averse social evaluation functions in preferring a society where incomes are distributed according to F a rather than F b (the distributions in the respective sub-populations) or viceversa. That is, F c ISD2 F c for c = c , implying that the economic advantage enjoyed by people with circumstances c over c cannot be easily established. When EOP prevails, the cases F c ISD3 F c and, more generally, F c ISDk F c with k arbitrary large are equivalent. However, there is larger agreement on the existence of an unjust advantage of c over c whenever ISD3 holds rather than when ISDk for k > 3 holds. Hence, if EOP is not rejected by the data, ISD3 becomes a natural test for the robustness of the EOP statement coming from the violation of ISD2. Practically, a test for EOP robustness requires (i) to partition the sample into groups defined by circumstances a and b, (ii) to estimate the groups' specific conditional distributions F a and F b and (iii) to check the minimal order k at which it is not possible to reject F a ISDk F b or F b ISDk F a or both. The focus of this section will be limited to ISD3 comparisons.
I make use of the French LFS-Labor Force Survey data (Enquête Emploi) provided by INSEE to estimate the labor income prospects of French workers made conditional on their background of origin. The circumstance a gathers all French workers whose parents are either non-French or were occupied as manual workers or farmers. The circumstance b is instead associated with the middle class parental background, gathering artisans, small entrepreneurs, and nonmanual workers. The residual class, gathering individuals whose father was employed as a white collar, manager, or professional is not considered in this study for expositional sake. The analysis by Lefranc, Pistolesi, and Trannoy (2009), based on the same data, shows that the opportunity profile associated with this class always dominates at ISD2 the profiles of the other classes.
To estimate the opportunity profiles of the two groups, I make use of monthly labor income realizations observed for relatively homogenous cohorts of individuals born between 1958 and 1962 whose fathers' characteristics are observed. The investigation is restricted to French LFS waves 2004, 2006, 2008, and 2010. Picking up information every 2 years allows to deal with the panel rotation mechanism of the French LFS, that after 2003 is of 1 year and a half (i.e., one-sixth of the sample is replaced every trimester). The pooled estimating sample consists of 2326 French workers (1810 observations are associated with circumstance a and 516 to circumstance b). Figure 2 reports the Pen's Parades and the generalized Lorenz curves of the two circumstances' income distributions, after that the data have been depurated of years of survey fixed effects. The two income profiles cannot be ranked according to ISD1 or ISD2 since their respective quantile functions and generalized Lorenz curves cross at least once. It seems clear, however, that the poorest workers in group a enjoy a higher advantage than the poorest workers in group b, while the order swaps as soon as the income deciles grow. Although group a is a priori expected to be the most disadvantaged one, due to the parental background characteristics it represents, it turns out that it is ranked as the advantaged one by all social evaluation functions that give enough weight to the poorest realizations. This suggests that the correct dominance relations to be verified on the data are F a ISDk F b for k = 1, 2, 3. The statistical behavior of various estimators is studied.
The French LFS data are areolar: they are not drawn directly from a selection of households or individuals, but from a selection of geographical areas made up of 20 adjacent households on average. Then, information on earnings for workers aged 15 to 65 within each area is collected in the survey. The clustered sampling scheme of the French LFS is then taken into account when computing the covariance structure of the influence functions estimator for income deciles (10 abscissae).  clearly indicating that the null hypothesis of a crossing cannot be rejected. A similar conclusion cannot be drawn immediately from the comparisons reported in panel (b) of the same figure, since the presence of intersections may simply be a symptom of a weak form of dominance. Table 3 reports income quantiles, generalized Lorenz curves coordinates, 3 coordinates for selected deciles of F a and F b . The survey design of French LFS is taken into consideration when computing the GL curves standard errors, as well as for the IF and the bootstrap estimator for ISD3. The table shows that, independently of the evaluation method used to infer ISD3, pairwise comparisons of estimated coefficients at fixed deciles lead to inconclusive results: in many cases the differences in Pen's Parades, in generalized Lorenz curves and in integrals of the generalized Lorenz curves at a given decile are not statistically different from zero, or their signs do not concord across deciles. Joint tests for the equality and dominance null hypotheses are therefore preferred.
The Wald-type test statistics and their simulated p-values (T 3 1 for the equality in distributions null hypothesis and T 3 2 for the ISD3 null hypothesis) are reported in  null hypotheses are rejected by the data at any conventional level of significance. According to the IF method, the equality of opportunity evaluation is robust, since it is not possible, at order three, to determine the advantaged group. Interestingly, the bootstrap estimator (which is computationally less intensive to obtain) leads to similar conclusions as the IF estimator. Table 4 also displays the evaluations of third-degree standard stochastic dominance, an alternative criterion for assessing the robustness of the EOP evaluation. Wald-type tests based on Davidson and Duclos (2000) estimators under the equality and dominance null hypotheses have been obtained by partitioning the income domain at 10 evenly spaced income thresholds, where poverty levels are evaluated. As shown in the table, the test rejects both dominance and equality at high significance. This is coherent with the predictions from ISD3 tests, implying that testing on a small number of income abscissae still allows to capture the patterns of the gaps in the GL curves represented in Figure 3(a). The consistent test for standard stochastic dominance at order three by Barrett and Donald (2003) leads to similar conclusions. The test is operationalized by partitioning the income domain on a fine grid (100 thresholds) and computing simulated p-values of the Kolmogorov-Smirnov test for the  Davidson and Duclos (2000) 50.16 0.000 683.43 0.000 - Barrett and Donald (2003) . . . 0.000 NOTE: French LFS data, waves 2004, 2006, 2008, and 2010. Wald-type tests for equality and dominance based on distributions deciles (for rank dominance, generalized Lorenz dominance, and ISD3) and on 10 evenly spaced income thresholds (for Davidson and Duclos (2000) tests of standard stochastic dominance dominance null hypothesis. Also this criterion allows to reject third-degree stochastic dominance. The empirical application highlights two facts. The first fact is that the ISD3 test based on the IF estimator produces evaluations that are perfectly aligned with the prediction of Davidson and Duclos (2000) tests. This coherency signals that both dominance criteria are able to detect that the negative gaps between the GL curves of F a and F b overcompensate the positive gaps (indicating dominance) at the bottom. The second fact is that the ISD3 test based on predictions at population deciles behaves comparatively as good as the consistent test for stochastic dominance in showing that EOP assessments are robust.

CONCLUDING REMARKS
Inverse stochastic dominance is a convenient tool for assessing when there is consensus, in a well-defined class of social evaluation functions, in ranking an income distribution as socially preferred to another. Applications in income (re)distribution analysis, policy evaluation, and risk assessment have clarified the importance of this tool. This article provides estimators that can be used to produce inference for inverse stochastic dominance comparisons, and shows how the restrictions implied by various forms of inverse stochastic dominance can be tested on the data.
The preferred estimator for inverse stochastic dominance is grounded on the influence functions decomposition of the recursive integrals of the generalized Lorenz curve. This methodology gives, as a special case, the estimators for rank and generalized Lorenz dominance developed by Beach and Davidson (1983) and provides their natural extension to less demanding comparisons of distributions at fixed population proportions. Monte Carlo experiments show that the influence function estimator outperforms the alternative bootstrap estimator both in term of size and power, provided that the relevant estimators are computed for a sufficiently fine grid of abscissae corresponding to population proportions. Furthermore, the size and power of inverse stochastic dominance comparisons based on the influence function estimator are virtually unaffected by the presence of contamination in the data. This is an important feature for empirical income distribution analysis. Experiments involving standard stochastic dominance tests also seem to favor the adoption of the influence function estimator, although these criteria are not comparable from a normative standpoint.
One important caveat to these simulation results is that they are all based on comparisons of pairs of distributions that are generated by independent processes. This is empirically unattractive when, for instance, actual and counterfactual distributions from a policy experiment have to be compared, or when observations are serially correlated. Linton, Maasoumi, and Whang (2005) proposed estimates of Kolmogorov-Smirnov tests' critical values in the context of non-iid samples of correlated prospects. Interesting avenues for future research are to use the influence function methodology for testing ISD over the whole set of population proportions implied by the data, as well as to expand these results to the case where the estimating samples are correlated.

SUPPLEMENTARY MATERIALS
Proofs, simulation results, and Stata routines: A supplemental Appendix collects the proofs of Lemma 1 and Remark 2 and a detailed account of the Monte Carlo experiment results. Stata routines "ISDtest" and "SDtest," reporting code to perform ISD tests and selected standard stochastic dominance tests, along with Monte Carlo simulation routines and French LFS analysis records, are made available. (Zipped archive file)