Marginal zero-inflated regression models for count data

ABSTRACT Data sets with excess zeroes are frequently analyzed in many disciplines. A common framework used to analyze such data is the zero-inflated (ZI) regression model. It mixes a degenerate distribution with point mass at zero with a non-degenerate distribution. The estimates from ZI models quantify the effects of covariates on the means of latent random variables, which are often not the quantities of primary interest. Recently, marginal zero-inflated Poisson (MZIP; Long et al. [A marginalized zero-inflated Poisson regression model with overall exposure effects. Stat. Med. 33 (2014), pp. 5151–5165]) and negative binomial (MZINB; Preisser et al., 2016) models have been introduced that model the mean response directly. These models yield covariate effects that have simple interpretations that are, for many applications, more appealing than those available from ZI regression. This paper outlines a general framework for marginal zero-inflated models where the latent distribution is a member of the exponential dispersion family, focusing on common distributions for count data. In particular, our discussion includes the marginal zero-inflated binomial (MZIB) model, which has not been discussed previously. The details of maximum likelihood estimation via the EM algorithm are presented and the properties of the estimators as well as Wald and likelihood ratio-based inference are examined via simulation. Two examples presented illustrate the advantages of MZIP, MZINB, and MZIB models for practical data analysis.


Introduction
Count data -both bounded and unbounded counts -regularly occur in a variety of disciplines. However, the standard modeling frameworks for such data, logistic regression and Poisson log-linear models, often fail to adequately account for the observed variability, a phenomenon known as overdispersion. In the common scenario that the data exhibit greater spread throughout their range (fat tails) than standard distributions would predict, a dispersion parameter can be helpful, either as a component of a model based on a more general two-parameter count distribution such as the negative binomial [13] or beta-binomial [4] distribution, or as an extra variance parameter in a model based only on moment assumptions and analyzed via quasilikelihood [24]. However, in other cases, overdispersion may result from a more specific distributional violation: an excess frequency of zero values. This paper will focus on data that fall into the second scenario, which we refer to as excess zero (EZ) data. Many research disciplines encounter EZ data, such as manufacturing [12], horticulture [6], entomology [25], fish and wildlife conservation [15], and psychology [2]. The most common type of model for EZ data is the class of zero-inflated (ZI) regression models. Such models include the ZI Poisson (ZIP) [12] and ZI negative binomial (ZINB) [5] models for unbounded counts, and the ZI binomial (ZIB) [6] model for bounded counts. Continuous data with many zeros occur as well. While such data, known as semicontinuous data, can often be analyzed more easily than discrete EZ data, there are situations for which ZI models such as the zero-inflated tobit (ZIT) model prove useful (e.g. longitudinal semicontinuous data [18], data with both true zeros, zeros due to a detection limit [17]). ZI models are based on a mixture structure in which EZs are hypothesized to come from a degenerate distribution that produces zeros with probability one, mixed with a standard distribution that is responsible for the non-zero outcomes and possibly some of the zeros. Often these models include covariate effects, in which they become ZI regression models. In these cases, the dependence on covariates is built in through generalized linear model-like specifications for the non-degenerate distribution's mean and the mixing probability.
The goal of this paper is to formulate and explore an alternative to the ZI regression model in which the marginal mean of the response variable is modeled directly. In a ZI model, linear predictors are linked to the non-degenerate distribution's mean and the mixing probability. However, both these quantities are means for latent random variables rather than the observed response. This makes the interpretation of a ZI regression model awkward, especially if interest focuses on a covariate's direct effect on the mean response. To address this issue, we develop marginal ZI (MZI) models that retain the mixture structure of classical ZI regression, but parameterize it with a linear predictor that links directly to the marginal mean response. This approach makes model interpretation more straightforward and improves the efficiency of inferences on the marginal mean.
Among the situations in which the regression parameters of a ZI regression model are of direct interest are cases in which the model is motivated by an assumed latent class structure in the population generating the data. For instance, Heilbron [8] presented EZ data from self-reported counts of risky sexual behavior during the six months preceding data collection. In this case, it is plausible to assume that the surveyed population contains individuals who never engage in the behavior of interest (abstainers) as well as those who do. Here, we may be interested in covariate effects on the risky behavior rate among the nonabstinent subpopulation, as well as the effects of predictors of abstinence. For such effects, a ZIP or ZINB regression model is perfectly suited. However, there are many applications involving EZs where one is primarily interested in the overall mean response. For example, in Section 5 we analyze two examples involving EZ data, both arising from designed experiments from horticulture. In each case, there are no obvious latent sub-populations. Rather, as in many analyses of experimental data, the questions of primary interest focus on comparisons of the mean response across treatments. In these and many other examples, it is desirable that the regression function depend upon a single linear predictor that quantifies treatment and covariate effects. Such is the case in MZI models, which is, in contrast to ZI models, where two linear predictors must be resolved to summarize the overall effects of explanatory variables. Synthesizing results from a ZI model can be confusing and cumbersome, and is often met with frustration by applied researchers who simply want to know whether the mean response differed significantly across treatments (and other explanatory factors) and by how much.
The importance of quantifying treatment and covariate effects on the marginal mean was recognized by Albert et al. [1] and Todem et al. [22], both of whom considered approaches for deriving marginal effects from fitted ZI models. However, these approaches involve additional calculations after fitting the model to obtain marginal effect estimates and delta method or bootstrap computations to obtain relevant standard errors. We find these approaches cumbersome and less appealing than modeling the quantities of interest directly in the original model specification.
The MZI models that we discuss are not entirely new. Recently, special cases of MZI regression -all involving unbounded count data and a log link for the marginal meanhave been considered by other authors. In particular, the marginal zero-inflated Poisson (MZIP) model was introduced by Long et al. [14] and extended in the form of the marginal zero-inflated negative binomial (MZINB) model by Preisser et al. [19] (see also Todem et al. [22], whose formulation subsumes these models and allows, in principle, non-degenerate distributions other than the Poisson and negative binomial for the unbounded counts). We provide a general formulation that encompasses additional MZI models -most notably the marginal zero-inflated binomial (MZIB) model for bounded counts -and give details regarding the EM fitting algorithm that have not appeared previously. Note that our terminology differs slightly from that of some previous authors in that we favor the term MZI model rather than marginalized ZI model. The latter term echoes the vocabulary of marginalized mixed-effect models, which were introduced by Heagerty [7] in the generalized linear mixed model context and have been considered in the ZI regression setting by Kassahun et al. [11]. While the goal of marginalized mixed-effect models is also to obtain covariate effects on the marginal mean, this typically involves specification of both conditional and marginal regression functions which are connected through an integral equation. Because we specify only a model for the marginal mean and there is no need for a marginalization operation to resolve marginal and conditional specifications, we prefer the term 'marginal model'.
The remainder of the paper is organized as follows. Section 2 reviews ZI regression models. Section 3 introduces MZI regression models and discusses maximum likelihood estimation via the EM algorithm. In addition, this section catalogs the most important special cases of the MZI model and discusses some model interpretation issues. Section 4 presents the results of simulation studies that demonstrate the properties of ML estimators and likelihood-based inference in MZI regression, highlighting advantages over ZI regression. Examples involving both unbounded and bounded count data are used to illustrate the proposed class of models in Section 5. Section 6 provides some concluding remarks.

ZI regression
ZIP distributions have been in the statistical literature for more than a half century [3], but the first ZI regression model, the ZIP model, was developed by Lambert in 1992 [12]. The ZIP model is based on a finite mixture of a Poisson distribution and a distribution that is degenerate at zero. Denote the response vector as y = (y 1 , . . . , y n ) , where y i is the observed value of a random variable Y i , where we assume independence among {Y i } across i and (1) The model allows λ i and π i to depend on covariates through the relationships where x i and g i are predictor vectors for the Poisson mean and mixing probability, respectively, and β and γ are the corresponding regression parameters. The coefficients are typically estimated via maximum likelihood. This approach can be implemented by direct optimization of the loglikelihood using, for example, gradient methods, or via the EM algorithm, which leads to iterative fitting of standard generalized linear models (GLMs) [6,12].
Extensions of the ZIP model are conceptually straightforward and can be accomplished by replacing the Poisson component of the model by another distribution suitable for the characteristics of the data. In particular, in a common situation in which the non-zero data exhibit overdispersion relative to a (truncated) Poisson distribution, a ZINB model [5] is useful. In this case, the non-degenerate component in Equation (1) becomes NB(λ i , φ) and the same regression structure as in the ZIP model is retained. In the case of EZ bounded count data, the ZIB model [6] is useful, in which case the non-degenerate component becomes Bin(m i , λ i ), where λ i now represents the success probability and m i the number of trials constant for the binomial distribution. In this case, the regression structures of the model take the form In these and other ZI regression models, the methods of estimation and inference are largely the same as those originally developed for ZIP regression.

Model formulation
One problem with ZI regression is that, in general, the resulting parameter estimates describe covariate effects on the latent count distribution, not the effects on the marginal mean, E(Y|X). In many applications, interest focuses on the marginal mean, and while it is possible to compute the mean based on a standard ZI model, a covariate's effect comes in through both π and λ, complicating the interpretation of the model. Instead, it is desirable to model a covariate's effect on the mean directly while also accounting for EZs. To do so, we reparameterize the ZI model. Let π i and λ i be the probability of the response being generated from the degenerate distribution and mean of the distribution, respectively, for the ith individual. Instead of a GLM-like specification for λ i , as in standard ZI regression models, we assume g(μ i ) = x i β for g a link suitable for the range of μ i . In addition, a logistic model, log(π i /1 − π i ) = g i γ is assumed for the mixing probability as usual. The resulting class of models, which we refer to as MZI regression models, are still ZI mixtures of a degenerate distribution at zero with a standard exponential family distribution, but the mean of the non-degenerate distribution λ i , and thus also its canonical parameter θ i , now depends upon both β and γ through the relationship λ i = μ i /(1 − π i ). Writing the non-degenerate distribution's density function in standard exponential dispersion family form [10], the probability density function for subject i becomes where u i = 1 if y i = 0 and 0 otherwise. The joint loglikelihood for the model is the sum of log densities of the form Equation (2). A function optimizer can be used to find the MLE. However, this can be computationally taxing if the parameter dimension is large. Therefore, we also outline an EM algorithm that is similar to the one used for standard ZI models, although it is not as convenient in the MZI case because the complete data likelihood does not factor cleanly into separate components for γ and β.
The complete data loglikelihood is where h(y i ; ) is the non-degenerate distribution, is the vector of parameters, and At the r+1th EM iteration, the E step is taken by estimating the posterior mixing probabilities. Using Bayes' theorem, we obtain (cf. [12, p. 4]). Then the parameters, , can be estimated iteratively via a two-part M step. The first part uses a generic function optimizer (e.g. the nlm() function in R) to is calculated by fitting a weighted GLM of the form g * ( . Here, note that the link g * for this step is a function of λ i but also depends on 1 −π (r+1)) , which is treated as fixed. Its form matches that of g, the link function for μ i in the MZI model specification, evaluated at 1 −π (r+1)) λ i . The E and twopart M steps are iterated until the algorithm converges. The model is considered to have converged when ( ( (r) ; y, z) − ( (r+1) ; y, z))/ ( (r) ; y, z) is less than a tolerance level, which is taken as 10 −8 for the models fit in this paper.
The benefit of using this approach is that there is a reduction in the parameter dimension that is maximized using the function optimizer. Instead of maximizing (γ , β; y) with respect to (γ , β ) using unstructured optimization, only c (γ , β (r) ; y, z (r+1) ) needs to be optimized with respect to γ in this manner. The dimension of β is often larger than that of γ , leading to a large decrease in computational complexity at each iteration of the fitting algorithm, although with the EM algorithm we can expect many iterations to be necessary. The information matrix can be found via direct differentiation of the observed data loglikelihood and is given in the appendix. At convergence, an estimated asymptotic variance-covariance matrix can be obtained from the inverse information matrix evaluated at the MLEˆ .

Special cases
In this section, we provide some further details for the MZIB, MZIP and MZINB models, which are the most important special cases within the MZI class. In each case, we give the observed and complete data loglikelihoods as well as some details of the EM algorithm. The former function can be obtained as the sum over i of the log of Equation (2) after substitution of the specific form of the exponential density h. The complete data loglikelihoods follows from Equation (3), again with substitution of the appropriate form of h. In particular,

MZIB model
For the MZIB model, the non-degenerate component is Bin(m i , λ i ) and the observed data loglikelihood is where m i is the number of trials and μ i is the marginal probability of success for the ith observation. In this model, the link function g relating μ to x β typically will be the logit, but other links such as the probit and complementary log-log functions are also permissible. The complete data loglikelihood, on which the EM algorithm is based, takes the form For the r+1th iteration of the EM algorithm, the steps are , using (2). (2) (M Step) Update the estimates for γ and β: (a)γ (r+1) is found by maximizing c (γ , β; y, m) with β fixed at β (r) .
is calculated using a GLM with: Repeat steps 1 and 2 until convergence.

MZIP Model
For the MZIP model, the non-degenerate component takes the form Pois(λ i ), leading to an observed data loglikelihood of the form where again μ i is the marginal mean for the ith observation, and a log link for μ is assumed. In this case, the complete data loglikelihood is For the r+1th iteration of the EM algorithm, the steps are (a)γ (r+1) is found by maximizing c (γ , β; y) with β fixed at β (r) .
is calculated using a GLM with: (3) Repeat steps 1 and 2 until convergence.

MZINB model
For the MZINB model, the non-degenerate component is NB(λ i , φ), and the observed data loglikelihood is As in the MZIP case, a log-linear model for μ is assumed. In this case, the complete data loglikelihood is For the r+1th iteration of the EM algorithm, the steps are (a) ((γ (r+1) ) ,φ (r+1) ) is found by maximizing c (γ , β, φ; y) with β fixed at β (r) .
is calculated using a glm with: Repeat steps 1 and 2 until convergence.
Here, the quantity φ c i is treated as a fixed dispersion parameter in the iteratively reweighted least-squares (IRLS) fitting algorithm used for β. This can be done in some GLM software and IRLS algorithms, or can be programmed easily from scratch.
Alternatively, φ can be grouped with β in the M step and profiled during the M step for (β , φ) . This would fix φ over each value in grid of values in its parameter space, allowing β to be updated via a standard GLM algorithm. This is akin to the approach is some negative binomial regression software (e.g. the glm.nb function in the MASS package for R), but offers no computational advantage over numerically optimizing c (γ , β, φ; y) with respect to both γ and φ as described above, which we find to be simpler to implement and faster computationally.

Model interpretability
Because of the functional relationship between μ i and π i , one should take care to use sensible specifications for these two quantities in an MZI model. In particular, while it is not incompatible to include a covariate in the model for π i but exclude its effect from the linear predictor for μ i , such models are best avoided. The reason for this recommendation is that if a covariate x i is specified to have a direct effect on π i , then it will necessarily also have an effect on μ i through the relationship μ i = (1 − π i )λ i . Thus, omitting x i from the model for μ i implies that λ i changes with x i in just such a way as to nullify this covariate's effect on π i . Such a model seems highly implausible; therefore, for model interpretability's sake we recommend that, in general, covariates in G be a subset of those in X.
Also worth noting is the relationship between covariate effects on λ i and μ i in ZI and MZI models involving log link. In particular, consider a ZI model with specifications The relationship between μ i , λ i , and π i implies log(μ exp(g i γ )). Therefore, if the predictor x ik is not included in g i , then eβ k quantifies the multiplicative effect of x ik on both λ and μ: where, without loss of generality, , and λ i , λ i are defined similarly, conditional on z i = 1.
In particular, when the mixing probability is constant, then the ZI and MZI models are reparameterizations of each other. In this case, the regression coefficients β in an MZIP or MZINB model differ from those in the corresponding ZI model only through the intercept, β 0 , which is shifted by the constant ν = log(1 − π). The equivalence extends to models in which the mixing probability differs only across groups of subjects (that is, for an ANOVA-like specification for g i γ ), provided that the group structure is also included in the specification of x i β. For example, in a ZI model with logit(π ij ) = γ j , and log(λ ij ) = β j + x i β, then a marginal group mean for group j at fixed x i is given by For cases when the mixing probability depends on continuous covariates or for models that lack a log link (e.g. for bounded count data), however, the ZI and MZI model classes are not equivalent.
When MZI and ZI models are distinct, the advantage of the former class is that its parameters can be interpreted as covariate effects or differences in group means on the marginal mean response (on a link-transformed scale, as in all GLMs). In contrast, ZI model parameters pertain to the mean of the latent non-degenerate variable in the mixture underlying both model classes. While effects on the marginal mean can be estimated from ZI model parameters, there are problems with such quantities. First, in a ZI model, there is typically no scale on which a covariate effect is constant; it depends on the values of other explanatory variables in the model. This vastly diminishes the interpretability of a ZI regression model when interest focuses on marginal mean effects. Secondly, when marginal mean effects are computed in a ZI model, these quantities are, in general, nonlinear functions of bothβ andγ . While asymptotic normality for these quantities can often be justified via the delta method, the resulting asymptotic standard errors and normal-theory reference distributions for Wald inference can be expected to be poor relative to the more direct inference on marginal mean effects that is available for an MZI model. The latter consideration is not a concern when likelihood ratio inference is used, but Wald inference is often much easier in practice for these models because, for example, Wald confidence intervals are much less demanding computationally than profile likelihood intervals, and formulating and/or fitting the model under a null hypothesis of interest is sometimes difficult. We investigate the inference advantages of MZI models over traditional ZI regression in Section 4, but it is worth emphasizing the fact that the ZI model often does not yield constant covariate effects at all, over problems with inference on those effects in a ZI model.

Simulation
Simulation studies were performed to demonstrate the advantages of the MZI model over a ZI model, and to examine the properties of ML estimators and Wald and likelihood ratio-based inferences in these models. Following a similar formulation of Long et al. [14], data were generated from MZI models, all of which take the form logit 1 is a vector of an equal number of zeros and ones and x 2 ∼ Uniform(−1, 1). Table 1 reports results for data generated from MZIB models where true parameter values are taken to be γ = (−1.386, 0.539, .25) and β = (−.405, .811, −.25). Properties of the estimates from correctly specified models fit to 10,000 data sets of size n = 100, 200, and 1000 are reported. These properties include percent relative median bias (PRMB), simulation standard deviation (SSD), the model-based standard error (MBSE), and model-based coverage probability (MBCP) of 95% Wald confidence intervals. Similar simulations were conducted for MZIP and MZINB models, but due to the similarity of the simulation scenarios and results to those reported by Long et al. [14] and Preisser et al. [19], these cases are relegated to an online supplement to this article available from the journal website. Calculations for the simulations and examples were done in R [20] using the zeroinfl function of the pscl package [9] for fitting ZI models and functions written by the authors for MZI model estimation and inference. R code for fitting the MZI models of Section 5 is included in the online supplement.
Results in Table 1 show reasonably small values of PRMB that decrease with the sample size, as expected. While there is some inaccuracy in the MBSEs at the smaller sample sizes, this diminishes with increasing sample size, and the Wald coverage probabilities are accurate for all choices of n. For the MZIP and MZINB cases (not shown), the PRMB values at n = 100 are larger than those in Table 1, but the general pattern of results is similar, and these simulations suggest that ML estimation and Wald inference in MZI models perform reasonably well at moderate sample sizes. Table 2 shows the coverage probabilities of Wald 95% confidence intervals of μ i = g −1 (β 0 + β 1 + β 2 x 2i ). Ten thousand data sets were generated from the MZIP and MZINB models where logit(π i ) = .6 − 2x 1i + .25x 2i , log(μ i ) = .25 + log(1.5)x 1i + .5x 2i , and  φ = 3 and from an MZIB model with logit(π i ) = −1.386 + 0.539x 1i + .25x 2i and logit(μ i ) = −0.405 + 0.405x 1i + .25x 2i , where x is defined previously. Both MZI and ZI models were fitted to these data sets and the confidence interval for μ i was calculated by g −1 {log(μ i ) ± 1.96 × se(μ i )}. For the ZI models, the delta method was used to find the standard error ofμ i based on the estimated asymptotic covariance matrix of the ZI regression parameters, which is computed as the inverse observed information matrix evaluated at the ML estimates. Three different values of x 2i were chosen when calculating the confidence interval for μ i , −1, 0, and 1. As expected, due to the linear relationship between the estimators for the ZI and MZI models with a log link when x 2i = 0, the ZIP and MZIP models gave almost identical coverage probabilities, and the same is true for the ZINB and MZINB models. Results are also similar when x 2i = 0 for the ZIB and MZIB models, but to a lesser degree because the MZI and ZI model equivalence does not hold true for the logit link. When x 2i = 0 (most noticeably for x 2i = −1), while the MZI model-based Wald intervals have close-to-nominal coverage, intervals for the ZI model have lower coverage, especially for sample sizes of 1000. As noted by Long et al. [14], the delta method for a ZI model strongly depends on the values of extraneous covariates and the larger the value of these covariates, the greater the effect on estimation of the standard error, with the effect increasing with the sample size.   Table 3 contains the results of simulations that examine tests of hypotheses that compare marginal means across two treatment groups. Here, we generated data from MZI models with g(μ i ) = β 0 + β 1 trt i + β 2 x i , i = 1, . . . , n, where trt i is a 0-1 indicator of which subjects received the treatment. In addition, we set logit(π i ) = γ 0 + γ 1 trt i + γ 2 x i and the values of γ and β are the same as the simulations of Table 2 except β 1 takes three different values: 0, log(1.5), and log (2). Define R as the mean response ratio in the two treatment groups; we are interested in testing the hypothesis H 0 : log(R) = 0. In MZI models this quantity does not depend upon x i and the hypothesis can be formulated and computed easily using either a Wald or likelihood ratio (LR) test of H 01 : β 1 = 0. In this context, ZI regression models do not provide a convenient framework for inference on R. Nevertheless, when available explanatory variables include a treatment indicator and continuous covariate it is natural to consider ZI regression models of the form g( Unlike the corresponding MZI model, here the ratio of means depends upon x i . In particular, the hypothesis β * 1 = 0 no longer addresses whether the ratio of marginal means is 0 and is not a suitable substitute for H 0 . Instead, we consider two hypothesis tests that one might conduct in the ZI model if one were interested in whether the marginal mean was equal across treatment groups. First, using the relation- and conduct a Wald test of the hypothesis H 02 : log(R i ) = 0 using the delta method to estimate the required nonlinear functions of the model's regression parameters and their standard errors. Note that in the ZI model the ratio of marginal means depends on x 2i , hence the notation R i instead of R. Thus H 02 is not equivalent to H 0 and its target of inference varies with x 2i . Despite this non-equivalence, we examine the ZI regression-based Wald test of H 02 at x 2i = −1, 0, 1. Second, a natural way to assess the overall treatment effect in a ZI model is to consider models with and without the treatment factor in the set of explanatory variables in the model. That is, for ZI models we consider Wald and LR tests of H 03 : γ * 1 = β * 1 = 0. Note that, again, H 03 is not equivalent to H 0 [1] because while H 03 implies H 0 , the converse is not true. In particular, it is possible for the treatment to have positive effects on both λ and π that offset to give a net null effect on the marginal mean response.
In Table 3, data are generated from MZI models with R = 1, 1.5, and 2 by setting β 1 to 0, log(1.5) and log(2), respectively. In the first case, H 01 is true, H 02 is true provided that x 2i = 0, and H 03 is false. In this setting, all of the tests for H 01 , as well as tests for H 02 when x 2i = 0, have empirical sizes that are reasonably close to nominal (0.05) at the largest sample size; the Wald and LR tests of H 01 , which are those that are based on fitted MZI models, show the best performance overall. When x 2i = 0, the Wald test of H 02 based on a ZI model often rejects at a much higher rate than .05, and its rejection rate increases with the sample size. The non-equivalence of H 0 and H 03 can be seen by the high rejection rates for tests of the latter hypothesis based on ZI models, when the data are generated with log(R) = 0. Thus none of the tests based on ZI models that attempt to address this hypothesis are suitable. When H 0 is false, the power of the Wald and LRTs of H 01 are similar, with neither test dominating the other.
Collectively, our simulation results demonstrate that, except in special cases when MZI and ZI models are equivalent (see Section 3.3), the latter class of models does not provide a suitable framework for estimation and inference on the marginal mean.

Apple roots
In Ridout et al. [21], data from an experiment in which the number of roots produced by shoots of an apple tree cultivar were analyzed. Growing conditions were determined by the 2 × 4 factorial combinations of two factors: length of photoperiod (8 and16 h) and concentration of the cytokinin BAP (2.2, 4.4, 8.8, and 17.6 µM). The design was unbalanced, with a total of 267 shoots propagated, where two treatments had 40 shoots, one had 39 shoots, four had 30 shoots, and one had 28 shoots propagated. The response variable is the number of roots grown per shoot. The data set contains a large number of zeros, but histograms of the response by photoperiod suggest that zero inflation is primarily of concern under the 16-h photoperiod (see Figure 1). Let y ijk be the number of roots for the kth shoot at the ith photoperiod and the jth BAP concentration. Also, π ijk , λ ijk , and μ ijk are the mixing probability, latent mean, and overall mean for y ijk , respectively. Three different ZIP and ZINB models for the data were fitted and compared: In addition, corresponding MZIP and MZINB models were also fit that have linear predictors of the same form as Models 1-3. The parameter estimates for each model are found in Table 4. Because LR tests and AIC statistics favor the NB-based models over the corresponding Poisson-based models, we restrict further attention to ZINB and MZINB models. From Model 1, it would appear that log(BAP) has little effect on the mean since the standard error is greater thanβ 2 . However, when separated into two different slopes for each photoperiod, log(BAP) does have a significant effect on the mean, one that is positive for photoperiod 8 and negative for photoperiod 16. Since by Wald and likelihood ratio tests,γ 2 is not statistically different from zero in Model 3, and Model 2 has the lowest AIC, we conclude that log(BAP) has a negligible effect on the mixing probabilities and settle on the MZINB and ZINB versions of Model 2 for further analyses. Because Model 2 has anANOVA structure for π, the MZINB and ZINB models are equivalent as described in Section 3.3. In particular, for this model, we have This relationship is satisfied by the parameter estimates in Table 4. Note that the estimated probability of zero inflation for photoperiod = 8 is so small (π 1jk = logit −1 (γ 11 ) = 0.012) that the difference in the estimates of β 11 is negligible for the MZINB and ZINB models. In contrast (and as seen in Figure 1), much greater evidence of zero inflation was found for photoperiod = 16 (π 2jk = 0.479) leading to a much more substantial shift inβ 12 for the MZINB model relative to the ZINB model. Note also that while MZI and ZI results are quite similar for Model 3, these models are not equivalent because of the presence of a covariate effect in the model for π . The close similarity of the parameter estimates and AIC statistics for the two model types is simply because the estimate of γ 2 is very near 0.

Whitefly data
van Iersel et al. [23] report the results of a horticultural experiment to examine the effect of six methods of applying pesticide to control silverleaf whiteflies on greenhouse-raised poinsettia. Fifty-four plants were used in a randomized complete block design where experimental units containing three plants were randomly assigned to each treatment in three complete blocks. Each week a collection of whiteflies was placed on each plant (confined inside a cage clipped to a leaf of the plant) and then the number of surviving whiteflies was measured two days later. It was of interest to compare the survival probabilities across plants treated with different methods of pesticide application. Despite the large frequency of zero-valued responses in these data, the ZIB model is not the best framework for analyzing them because its estimates of β quantify treatment effects on the conditional probability of survival given that the response comes from the binomial component of the mixture. In the context of the problem, one might interpret this as the conditional probability of survival given that the treatment is not certain to kill all whiteflies. In contrast, the β coefficients in an MZIB model provide a more straightforward and desirable interpretation as direct treatment effects on the log odds of survival.
Although the original experimental design included repeated measures over time, for simplicity we analyze only the data from week 6, the middle week of the study. Extensions of MZI models to the longitudinal/clustered data setting will be considered elsewhere. The six treatments used in the experiment included four methods of sub-irrigated pesticide following 4, 2, 1 and 0 days without water (treatments 1-4, respectively), a treatment involving pesticide application via hand watering (treatment 5), and a control treatment in which no pesticide was used (treatment 6).
Following van Iersel et al. [23], we summed the responses for each trio of plants and analyzed a single bounded count per experimental unit. Let μ ij represent the probability of survival for the n ij insects from the experimental unit assigned to the jth treatment in the ith block, which was modeled by logit(μ ij ) = β 0 + β 1i + β 2j where β 0 is the intercept, β 1i , i = 1,2,3, is the effect of the ith block, and β 2j , j = 1, 2, . . . , 6, is the effect of the jth treatment. Two models with this specification for μ ij were fit to the data, differing in their specification of the mixing probability: (a) logit(π ij ) = γ 0 , and (b) logit(π ij ) = γ 0 + γ 1j . Models with both treatment and block effects on the mixing probability were not considered due to limited degrees of freedom and the sparse occurrence of zeros across blocks. A Wald test of H 0 : γ 11 = · · · = γ 16 = 0 strongly favors model (b) (P(χ 2 5 > 60.27) ≈ 0). Table 5 shows the estimated marginal probabilities for each block and treatment combination. Based on model (b), there are several sets of contrasts among the six treatment effects that one might choose to test. We tested H 0A : 1 25 , and H 0C : β 21 = · · · = β 24 to compare active treatments to the control (P(χ 2 1 > 30.61) ≈ 0), subirrigation to hand watering (P(χ 2 1 > 8.29) = 0.004), and the four levels of restricting water prior to subirrigation-based pesticide application (P(χ 2 3 > 3.71) = 0.294), respectively. These hypotheses have straightforward interpretations, and we conclude that the odds of survival are significantly different (lower) when pesticide is applied, significantly different (lower) when subirrigation is used for pesticide delivery, and that the survival odds do not differ significantly depending on the length of water restriction prior to subirrigation with pesticide. An advantage of the MZIB model compared to the ZIB model is that it estimates survival odds ratios for distinct treatments that do not depend on block. Given the additive structure of the model, this feature is desirable. For example, using the MZIB model, we may estimate the ratio of survival odds for treatments 2 and 5 as exp(β 25 −β 22 ) = 20.08. In contrast, a ZIB model with linear predictors of the same form as our chosen MZIB model yields an estimated survival odds ratio for treatments 2 and 5 of ( ). Note that this quantity depends upon i; so the model, despite the additive structure of its linear predictors, essentially induces an interaction between treatments and blocks. The ratio of survival odds for treatments 2 and 5 for blocks 1, 2, and 3 for a ZIB model are 27.29, 19.88, and 25.53, respectively. Note that the average of the odd ratios from the ZIB model is not especially close to the odds ratio found from the MZIB model, which highlights the fact that these models are not merely reparameterizations of each other and should not be expected to agree closely.

Conclusion
ZI regression is a useful and appealing framework for analyzing count data with EZs, but it is awkward to obtain inferences on the marginal mean from such models. Often there is no scale on which covariate effects are constant, precluding succinct statements about the relationship between the mean response and a covariate of interest. In many contexts an MZI model will prove more convenient and interpretable. Such models retain the basic mixture structure of ZI regression, but provide direct estimates of marginal mean effects.
In addition to providing direct estimates of interpretable quantities, MZI models allow more straightforward inferences, avoiding the need for delta method confidence intervals and tests. This is not only an advantage of convenience but, as shown in our simulation results, produces more accurate coverage and error rates. As the simulations show, the coverage probabilities for confidence intervals derived from the ZI models decrease as the sample size increases while the coverage of the intervals formed from the MZI models remain close to the desired level, which is consistent with the results from Long et al. [14]. When testing for a treatment effect on the marginal mean, only those using the results of the MZI model have the correct size and are properly targeted at the hypothesis of interest; therefore the ZI model should not be used for inference on the marginal mean of the response except in special cases when its parameters have direct marginal interpretations.
While in this paper we have focused on models for independent data (e.g. a crosssectional study design), extensions to accommodate longitudinal/clustered data are desirable. For this purpose, a generalization of the MZI regression model class that is natural and tractable incorporates random cluster-specific random effects. Discussion of such an extended mixed-effect, MZI class will appear elsewhere. While this paper gives details on several special cases of MZI models (MZIP, MZIB, MZINB), these are other examples we have not discussed. For instance, an MZI tobit model may be useful for semicontinuous data and can be seen as an alternative to the ZIT model [16].