Counting the Unseen: Estimation of Susceptibility Proportions in Zero-Inflated Models Using a Conditional Likelihood Approach

Abstract Zero-inflated count data models are widely used in various fields such as ecology, epidemiology, and transportation, where count data with a large proportion of zeros is prevalent. Despite their widespread use, their theoretical properties have not been extensively studied. This study aims to investigate the impact of ignoring heterogeneity in event count intensity and susceptibility probability on zero-inflated count data analysis within the zero-inflated Poisson framework. To address this issue, we propose a novel conditional likelihood approach that uses positive count data only to estimate event count intensity parameters and develop a consistent estimator for estimating the average susceptibility probability. Our approach is compared with the maximum likelihood approach, and we demonstrate our findings through a comprehensive simulation study and real data analysis. The results can also be extended to zero-inflated binomial and geometric models with similar conclusions. These findings contribute to the understanding of the theoretical properties of zero-inflated count data models and provide a practical approach to handling heterogeneity in such models.


Introduction
Zero-inflated models are statistical models specifically designed to account for excess zeros commonly observed in count datasets (Lambert 1992).These models are a mixture of an ordinary count distribution and a degenerate distribution at zero.By incorporating a zero-inflation component, these models account for the proportion of excess zeros in count data, allowing for greater flexibility in the fitting of the model to the data.Moreover, zero-inflated models are capable of disentangling the source of zeros into structural and random zeros, where a structural zero arises from the degenerate distribution at zero, and a random zero is the zero outcome of the ordinary count distribution (Cameron and Trivedi 2013).Zero-inflated models estimate the proportion of the susceptible population by considering observations from the ordinary count distribution alongside those with potential events.This allows for the estimation of the probability of susceptibility which is of interest for many applications.For instance, the susceptibility probability can be used to identify high-risk individuals in public health research (Todem, Kim, and Hsu 2016), nondormant periods in criminology (Roeder, Lynch, and Nagin 1999), imperfect products in manufacturing (Lambert 1992), non-safety road sections in traffic accident analysis (Shankar, Milton, and Mannering 1997), and occupancy probability in species distribution models (MacKenzie et al. 2002).In this study, we consider the species' occupancy probability as an CONTACT Jakub Stoklosa j.stoklosa@unsw.edu.auSchool of Mathematics and Statistics, The University of New South Wales, Sydney, Australia.Supplementary materials for this article are available online.Please go to www.tandfonline.com/r/TAS.illustrative example of the susceptibility probability under a zero-inflated modeling framework.
Accurate estimates of species abundance and occupancy are essential for ecologists and conservationists in managing and assessing population status (Wenger and Freeman 2008).However, limitations in data collection, such as time constraints and costs, can result in an underestimation of the true abundance or occupancy probability.To address this issue, occupancy modeling make use of zero-inflated binomial models, which account for imperfect detection to estimate the occupancy (i.e., susceptibility) probability (MacKenzie et al. 2002).Occupancy models have received significant attention, leading to the development of a broad class of models (MacKenzie et al. 2017).
The zero-inflated count data model consists of two components: the event count intensity and the susceptibility probability.Both of these components can be influenced by heterogeneity arising from a range of factors.For example, ecological studies often encounter heterogeneity in animal abundance and site environmental characteristics (Royle 2006), while traffic accident analyses require consideration of roadway types and geographic locations (Shankar, Milton, and Mannering 1997).Additionally, oral health studies may encounter heterogeneity in levels of sugar intake (Todem, Kim, and Hsu 2016).These are just a few examples of the diverse range of applications where heterogeneity arises in real-world data with counts.While these models can account for inter-subject heterogeneity by including covariates and latent variables (Hall 2000;Cameron and Trivedi 2013), limitations in data collection or practical constraints may preclude the inclusion of all relevant variables in model fitting, potentially resulting in model misspecification, motivating us to investigate the impacts of ignoring heterogeneity on estimates of the susceptibility probability.
To achieve our objective, we adopt a novel conditional likelihood approach.Specifically, we begin by demonstrating that when the event count intensity follows a mixing distribution, a conditional maximum likelihood estimator coincides with the conventional maximum likelihood estimator.This result has been established previously for truncated samples (Sanathanan 1977), but the conclusion in the presence of regression models with covariates has not been explored.We show that the conditional maximum likelihood estimator is less efficient than the maximum likelihood estimator when the event count intensities are related to covariates.However, we also find that the conditional likelihood approach has an optimal estimating function property and is robust to model misspecification in the susceptibility probability.Moreover, we highlight that the conditional likelihood framework is particularly useful for exploring the properties of zero-inflated models, providing an efficient means of investigating their characteristics and theoretical properties.
Through our analysis, we have found that ignoring heterogeneity in events can result in an underestimation of susceptibility probability when the event count intensity follows a mixing distribution.Additionally, ignoring heterogeneity in susceptibility probabilities can significantly impact the estimation of the average susceptibility probability, depending on the relationship between count intensities and susceptibility probabilities.To address this issue, we propose a robust method based on the conditional likelihood to estimate the average susceptibility probability consistently under the correct count intensity model specification.Our approach can also be extended to zero-inflated binomial and geometric models with similar conclusions.
In Section 2, we introduce the zero-inflated Poisson model and examine the effects of ignoring heterogeneity in event count intensities or susceptibility probabilities.In Section 3, we demonstrate that the conditional likelihood estimator is not asymptotically equivalent to the maximum likelihood estimator in regression models, and we examine the consequences of neglecting heterogeneity in susceptibility probabilities.We provide simulations and real-data examples in Sections 4 and 5, respectively.In Section 6, we discuss the results and present potential future work.We provide proofs for all propositions in a Web Appendix.

Zero-Inflated Poisson Models
We assume that a sample of n subjects belongs to some population of interest.Let y i be the number of events of the ith subject and let A be the collection of distinct subjects with a nonzero observed total frequency; that is, A = {i : y i ≥ 1}, and let A c be the complement of event A. Let ψ be the susceptibility probability of the population, which represents the fraction of subjects likely to have positive events.If subject i is not a member of the susceptible group, then y i = 0 by definition (structural zero).In contrast, y i is a nonnegative counting integer if it belongs to the susceptibility group.A popular framework for such count data assumes that y i follows a Poisson distribution (Lambert 1992) if subject i belongs to the susceptible group.
Since we cannot observe the susceptibility status of each subject, we introduce a Bernoulli latent variable L i , where L i = 1 indicates that subject i is susceptible, and 0 otherwise.Clearly, P(L i = 1) = ψ, y i = 0 if L i = 0, and we write y i | L i = 1 ∼ Poisson(λ) where Poisson(λ) denotes a Poisson distribution with mean λ.Alternatively, we may also say y i ∼ (1 − ψ)I(0) + ψPoisson(λ) where I(•) is the usual indicator function.We use the term "event count intensity" to refer to λ, which represents the average number of occurrences of an event.However, it is important to note that the survey effort involved in collecting the sample data may also impact the degree of count intensity observed in the data.

Estimation
Let m k be the number of subjects observed k times for all k ≥ 0, k≥1 m k = n − m 0 denote the number of subjects with at least one event occurred, so that m + is equal to the size of A. The likelihood function under the zero-inflated Poisson distribution is given by Throughout, we will refer to the above likelihood function (1) as the homogeneous model.In practice, the event count intensity may exhibit heterogeneity between sampled subjects, leading to variations across the population.To allow for heterogeneity in the count intensity, we first consider a mixture distribution for λ (Royle 2006).Specifically, let g θ (λ) be the probability density function (or probability mass function) for some parameter θ which reflects dispersion in λ and hence allows for heterogeneity in the population, and let p k (θ ) = (λ k e −λ /k!)g θ (λ)dλ for all k ≥ 0. Several well-known mixture distributions can be considered for g θ (λ); for example, finite mixtures (Böhning et al. 1999) and gamma mixtures (White and Bennetts 1996) where the latter yields a negative binomial distribution.Consequently, the likelihood function with a mixture distribution for the event count intensity is given by The maximum likelihood estimators for θ and ψ can be obtained by maximizing the likelihood function (2) using an appropriate optimization algorithm.Here, we consider an alternative estimating procedure that uses the conditional likelihood in two stages as follows: (a) we first estimate θ based on the conditional likelihood; and (b) obtain an estimate of ψ (using the estimate of θ ).To see this, let p + (θ ) = 1 − p 0 (θ) = k≥1 p k (θ ) and write the likelihood function (2) as where L 1 (θ ) is the conditional likelihood given m + subjects are observed with at least one event, and L 2 (θ, ψ) is the marginal likelihood of m 0 .Since the conditional likelihood does not involve ψ, we can maximize L 1 (θ ) with respect to θ and obtain an estimate of θ .The resulting estimator is denoted by θ which is then substituted into L 2 (θ, ψ) to obtain an estimator for the susceptibility probability, given by ψ = m + /{np + ( θ )}.
Lemma 1 (Web Appendix) proves that the conditional likelihood approach and the conventional maximum likelihood approach lead to the same estimator.Therefore, θ and ψ obtained through the conditional likelihood are also maximum likelihood estimators.

Effects of Ignoring Intensity Heterogeneity
Next, we discuss the implications of ignoring heterogeneity when estimating the susceptibility probability ψ.Let ψ 0 be the estimator of ψ obtained via the homogeneous model (1).
Proposition 1 provides valuable insights into the potential consequences of ignoring heterogeneity in event count intensity, even when the assumed model is the simple homogeneous model.It is essential to note that this phenomenon of underestimation persists not only in regression models (as discussed in Section 3) but also in several other related zero-inflated models, as demonstrated by the simulation studies presented in Section 4. Furthermore, Proposition 1 provides an approximation of the bias, which shows that the extent of variation (σ 2 ) and the mean intensity (μ) are associated with bias.When μ is large, the bias is negligible, but it can be substantial when μ is small.We also verify the expression for the bias through a simulation study in Section 4.

Zero-Inflated Poisson Regression Models
Heterogeneity in the event count intensity is often modeled with covariates in a regression setting; for example, the intensity of subject i is modeled as λ i = exp(θ x i ) where x i is a vector of covariates and θ is the associated regression parameter.In the following, suppose that y i ∼ (1 − ψ)I(0) + ψPoisson(λ i ).The corresponding likelihood function is then given by

Conditional Likelihood Approach for Zero-Inflated Poisson Regression Models
Unlike the models mentioned earlier, the estimators obtained through the conditional likelihood approach and maximum likelihood approach for model (3) are not equivalent.The conditional likelihood function is defined as which is independent of ψ.The conditional likelihood estimator θ c is the maximizer of the likelihood function (4).Let λ i = exp( θ c x i ), then we can estimate ψ via the profile likelihood . Thus, we define the conditional likelihood estimator ψ c by solving Let ( θ , ψ) be the maximum likelihood estimators of (θ , ψ).Also, let var( θ ) and var( θ c ) denote the corresponding asymptotic variance of θ and θ c , respectively.Similar notations are defined for var( ψ) and var( ψ c ).
In Lemma 2 of the Web Appendix, we show that both estimating equations for ( θ, ψ) and ( θ c , ψ c ) belong to the family of weighted estimating equations based on φ i .In the following proposition, we present a comparison between the estimators ( θ , ψ) and ( θ c , ψ c ).
Proposition 2. We have var( θ ) ≤ var( θ c ) in the sense that var( θ c ) − var( θ ) is a nonnegative definite matrix, and var( ψ) ≤ var( ψ c ).Moreover, the score function for ( θ, ψ) is the optimal estimating function for (θ , ψ).However, if ψ is a nuisance parameter, then the conditional score function of θ c is the optimal estimating function for θ .
Proposition 2 shows that in general, the conditional maximum likelihood estimator is less efficient than the maximum likelihood estimator, except when the covariate x i is a constant and the model reduces to the homogeneous model.The optimum property of the maximum likelihood approach is a fundamental result established by Godambe (1960).In contrast, the optimum property of the conditional score function for θ c is a consequence of Godambe (1976), which implies that θ c preserves some robustness in model misspecification for estimating θ .Moreover, the optimum property of the conditional maximum likelihood estimator remains, even if the susceptibility probability is not constant; that is, if the probability of susceptibility ψ i is specific to each subject i, and all ψ i are unknown parameters, as in the model specified by ( 7).However, if the regression model ( 3) is misidentified as a homogeneous model (i.e., no nonconstant regressors are included), there will be a downward bias effect on estimating the susceptibility probability.The proof follows a similar approach to the proof for Proposition 1, but with the assumption that x i is collected from a random sample, and is thus omitted.It is worth noting that the downward bias effect in the regression model persists when some of the covariates are ignored, in general.We demonstrate this using both the maximum likelihood and conditional likelihood approaches in a simulation study in Section 4.

Effects of Ignoring Heterogeneity in the Susceptibility Probability
As susceptibility status is binary, represented by the variable L i , heterogeneity in susceptibility probabilities is commonly modeled by logistic regression.In this framework, covariate information is used to account for sources of heterogeneity among individuals.Let H(x) = 1/{1 + exp(−x)} be the logistic function.For convenience, denote p y (θ ) as a mixture Poisson distribution with parameter θ , and assume that where ψ i = H(γ z i ) for a vector of covariates z i with γ being the associated regression parameter.Model ( 6) is reduced to model (2) when z i = 1 for all i.The average susceptibility probability ψ (i.e., the mean of ψ i ) is often the primary parameter of interest in many applications.For example, in ecology, ψ is interpreted as the overall occupancy rate over a study region when fitting a site occupancy model.Thus, we focus on estimating this parameter and examine the effects of ignoring heterogeneity in ψ i in greater detail.Note that although ψ can be consistently estimated even if heterogeneity in ψ i is ignored in model ( 6) (see Lemma 3 in the Web Appendix), the extension is not as straightforward when considering a regression framework for the count intensity component, as in model ( 7).However, when we consider the regression framework, we can only justify the limit of the ψ estimator using the conditional likelihood approach.Specifically, we consider where λ i = exp(θ x i ) and ψ i = 1/{1 + exp(−γ z i )} for some covariates x i and z i .In this setting, we can observe that ψ c remains consistent for ψ under the condition that the detection intensities λ i and susceptibility probabilities ψ i are uncorrelated.This assumes that the nonintercept covariates in z i are not related to those in x i , and that the covariates can be regarded as outcomes of random variables under a super-population assumption.However, if λ i and ψ i are positively or negatively correlated, ψ c tends to overestimate or underestimate ψ, respectively.Specifically, model ( 6) represents the uncorrelated case between λ i and ψ i .
A justification for the above phenomena can be summarized as follows.Let ω be the limit of ψ c as n → ∞.Since θ c is consistent for θ , by model ( 5), it can be shown that ω is, asymptotically, the solution of A first-order approximation then implies that ψπ − ω π / (1 − ω π ) ≈ 0 where ψπ denotes the average of ψ i π i .Therefore, we have ω ≈ ψπ/ π which supports the finding as 1 − e −λ is a monotonic increasing function of λ, noting that maximum likelihood ψ shows a similar behavior as ψ c as seen in Section 4.Moreover, count intensities and susceptibility probabilities are generally not independent; for example, Lambert (1992) considers ψ i as a function of λ i in the number of defects analysis in manufacturing.Thus, the average susceptibility probability estimator may be biased if ψ i are not homogeneous.
Although ψ c is not a consistent estimator for the average susceptibility probability ψ under model ( 7), we propose an alternative robust (to model misspecification) estimator ψ = (1/n) i∈A 1/(1 − e − λ i ) where λ i = exp( θ c x i ) and θ c is the conditional likelihood estimator of (4).Recall that A is the set of {i : Specifically, when the λ i 's are known, ψ is a Horvitz-Thompson type estimator since Consequently, ψ preserves consistency as long as the parameter θ can be estimated consistently, where the latter is generally true only if the event count component model is specified correctly.Moreover, as shown in Proposition 2, the estimating function of θ c is optimal in this case.Hence, the estimator ψ preserves some robustness to model misspecification.
To estimate the variance of ψ, we consider an approximation by using the delta method where D θ = i∈A e −λ i λ i x i /(1−e −λ i ) 2 and var( θ) is the inverse observed Fisher information of the likelihood (4).Since the ψ i 's are unknown, we propose the following variance estimator where θ is evaluated at θ .The estimator n ψ is frequently used in species richness estimation and closed population capture-recapture models.For instance, Van Der Heijden et al. (2003) employed this estimator to estimate the count of illegal immigrants in the Netherlands using a zero-truncated Poisson model.However, the variance estimator typically used in capture-recapture studies cannot be directly applied here because it necessitates the estimation of ψ i for each individual, as previously discussed.

Simulations
We conducted various simulation studies to verify the performance of conditional likelihood and maximum likelihood estimation, and to investigate the impacts of ignoring heterogeneity in count intensities and susceptibility probabilities.For each simulation scenario below, we generated 1000 datasets.

Scenario (a)
First, we examined the effects of ignoring heterogeneity in the event count intensity, see Section 2.2.We generated data from a zero-inflated negative binomial distribution.Let NB(κ, μ) denote a negative binomial distribution with dispersion parameter κ and mean μ, such that, NB(κ, μ) reduces to a Poisson(μ) as κ is increased to infinity, and NB(κ, μ) is equivalent to a mixture Poisson(λ) where λ ∼ Gamma(κ, μ/κ).Hence, the mixing distribution has a mean μ and variance μ 2 /κ, and this allows us to obtain the approximate bias as given in Proposition 1.We investigated the estimation of ψ when using the homogeneous zero-inflated Poisson model.In this simulation scenario, conditional likelihood estimates are exactly equal to maximum likelihood estimates (by Lemma 1), we therefore only presented maximum likelihood estimates for data generated from zero-inflated negative binomial models.
We set ψ = 0.2, 0.5, 0.8, μ = 1, 2, 5, n = 200 and let κ range from 10 −2 to 10 3 .In Figure 1 we plot the relative % bias for ψ when using the maximum likelihood estimator and compare this with the asymptotic (relative %) bias for ψ against increasing values of κ which are on the log 10 scale.These findings highlight the issue of underestimation that arises when heterogeneity in the count intensity is disregarded.We observe a consistent pattern between the empirical bias and the approximate bias derived in Proposition 1, particularly when the variance μ/κ 2 diminishes.

Scenario (b)
Next, we introduced covariates in the count (Poisson) component (i.e., regression models) and examined model performance.
To evaluate the performance of conditional likelihood and maximum likelihood in regression models (Proposition 2), we compared their estimates for θ when two covariates x 1 and x 2 are included in the detection component (Model x 1 + x 2 ).We set θ = (1, −1, 1), ψ = 0.75 and n = 200.Table 1 presents the results for θ , while Table 2 reports the results for ψ, obtained by fitting Model x 1 + x 2 and including covariates in the count (Poisson) component.We report the average of estimates (AVE), the sample standard deviation (SD), the average of the standard error estimates (A.SE), and the sample coverage percentage (CP) of the 95% confidence intervals of the estimators.In Table 2, we also compared estimates for ψ when fitting Models • and x 1 (i.e., when heterogeneity in the count intensity is completely or partially ignored).
In Table 1, we see that maximum likelihood outperforms conditional likelihood in terms of bias and root mean square error, but the differences are generally minor, especially for case (i).Nevertheless, we noticed that for the coverage percentage criterion, maximum likelihood estimation achieved the nominal  95% level closer than conditional likelihood estimation.Specifically, the sample coverage percentages of conditional likelihood were only around 90%-93% for the nonintercept regression parameters θ 1 and θ 2 .Although the parameter estimates for the conditional likelihood were close to unbiased, their standard error estimates were negatively biased for finite samples, resulting in shorter confidence intervals.A bootstrap confidence interval could improve the interval estimation, but this requires further investigation.Additionally, in Table 2, we once again see that the effects of ignoring heterogeneity in the count model will yield a negative bias when estimating the susceptibility probability.The downward bias effect persists even when only a subset of covariates is omitted.This is evident in the average estimates obtained from Model x 1 in Table 2, where the covariate x 2 was not included in the model.

Scenario (c)
Lastly, we examined the impact of disregarding heterogeneity in susceptibility probabilities.To explore this, we employed a similar approach as in Scenario (b) but now incorporated regression models for both the Poisson intensity and susceptibility probability components.Specifically, we defined λ i as exp(θ 0 + θ 1 x 1i + θ 2 x 2i ), where x 1i and x 2i follow independent standard normal distributions.Additionally, let z ji = x ji + ji for j = 1, 2, where ji ∼ N(0, σ 2 ), and they are independent.
We further standardized z ji to ensure they follow independent standard normal distributions for all j and i (although z ji and x ji are correlated).Subsequently, we defined the susceptibility probabilities ψ i as 1/{1 + exp(−γ 0 − γ 1 z 1i − γ 2 z 2i )}.For data generation, we set σ = 0.5, γ 1 = γ 2 = 1 and considered three values of γ 0 : −1, 0, and 1.These corresponding values yielded average susceptibility probabilities of approximately 0.3, 0.5, and 0.7, respectively, representing low, medium, and high susceptibility probability levels.Additionally, we considered three cases for θ : (1, 1, 1), (1, 0, 0), and (1, −1, −1), denoted as c+, c0, and c−, respectively.These cases were designed to have positive, zero, and negative correlations between the Poisson count intensities and susceptibility probabilities.As in Scenario (b), we examined different models, assuming the correct specification of the Poisson count structure.We fitted the following models for the susceptibility component: a constant only, the covariate z 1 only, and both covariates z 1 and z 2 (correct model).These models are referred to as Model •, Model z 1 , and Model z 1 +z 2 in Web Tables 1-6 and Figure 2. Maximum likelihood and conditional likelihood, along with the proposed Horvitz-Thompson estimator ψ (referred to as CL * ), were used for estimation.
We did not present the results for maximum likelihood and conditional likelihood estimates for θ in the full model as their performances were comparable.However, when using the misspecified reduced susceptibility probability models (Model • and Model z 1 ), maximum likelihood estimates of θ exhibited noticeable biases (refer to Web Tables 1-3).These biases were less pronounced in Model z 1 compared to the constant-only Model •, which was expected.Furthermore, the bias direction differed for the cases c+, c0, and c−, suggesting that the dependence structure between λ i and ψ i is crucial.Biases in the c0 case were particularly remarkable, even worse than those in the c− case.Recall that in the c0 case, the event count intensity is constant and unrelated to the covariates used in ψ i , whereas a negative correlation exists between λ i and ψ i in the c− case.On the other hand, Web Tables 1-3 also demonstrate that the conditional likelihood estimates were unaffected by the misspecification of susceptibility probability models, as indicated in Proposition 2.
Results for ψ are plotted in Figure 2 and tabulated in Web Tables 4-6.The boxplots in Figure 2 illustrate that the CL estimator yields a positive, negligible, and negative bias for cases c+, c0, and c−, respectively, when using the reduced models (Model • and z 1 ).The results align with our justifications in Section 3.2.As previously noted in the estimation of θ , these biases are less pronounced in Model z 1 compared to the constant-only Model •.Furthermore, Figure 2 shows that the biases for the maximum likelihood estimates follow the same direction as the conditional likelihood estimates for cases c+ and c−, but the bias magnitude is more pronounced under each misspecified reduced model.In the case c0, maximum likelihood still exhibits some positive biases for all three levels of low, medium, and high susceptibility probabilities.Additional comprehensive results can be found in Web Tables 4-6, where, for instance, the corresponding CPs significantly deviate from the nominal 95% level, with the lowest being only 10% for maximum likelihood and 33% for conditional likelihood.However, both maximum likelihood and conditional likelihood demonstrate similar performance and consistency when the susceptibility component is correctly specified (Model z 1 + z 2 ), as observed in Web Tables 4-6.
The proposed conditional likelihood Horvitz-Thompson estimator, CL * , is clearly shown to be a reliable method for estimating ψ where only minor biases were observed in certain situations (see Figure 2).The proposed standard error estimator for CL * also demonstrates good performance, as evidenced by the close adherence of the corresponding confidence intervals to the nominal level (Web Tables 4-6).In fact, in terms of confidence CPs, CL * outperforms maximum likelihood and conditional likelihood even when the full model is correctly specified.However, we note that CL * produces more outliers in the right column of Figure 2, which corresponds to the case of a negative correlation between λ i and ψ i and a higher proportion of susceptible subjects (those with L i = 1) going undetected.We also observed that CL * and its associated confidence intervals occasionally yielded unexpected values, such as estimates being greater than one or lower bounds of confidence intervals being smaller than the observed susceptibility probability (m + /n).In the simulation studies, we seldom encountered unreasonable estimates, and in those rare instances, we truncated them.Eren et al. (2012) proposed a modified log-transformation confidence interval that can handle the situation when the point estimator is smaller than one but is bounded, but there is currently no literature on implementing a constrained Horvitz-Thompson estimator with values in an interval.
Finally, it is worth noting that the bias behavior of the conditional likelihood estimator ψ c fundamentally aligns with our justifications.In Web Figure 1, we present a comparison of the five methods examined, measured along a sequence of θ 1 = θ 2 (denoted as α) ranging from −1 to 1.When covariates in the susceptibility component are not fully specified, the results confirm that the biases of both the maximum and conditional likelihood largely follow a monotonic pattern in relation to α.Typically, we observe negative biases when α < 0 and positive biases when α > 0 for the conditional likelihood.However, maximum likelihood estimates give positive biases around α = 0 and close to unbiased results when shifted to α = −0.2,indicating that the bias behaviors of maximum likelihood are notably more complex.Furthermore, it is crucial to emphasize that the bias became more apparent as σ was decreased.This aligns with our expectations, as reducing σ strengthens the correlation between the event count intensities and susceptibility probabilities.

Applications
We investigated the potential impacts of neglecting heterogeneity in the susceptibility and count intensity by fitting zeroinflated models to real data.In the first example, the data used consists of the presence-absence of fish, and we consider a zero-inflated binomial (site occupancy) modeling approach.For our second example, we used traffic violation data collected on motorcyclists in Taiwan and modeled these using a zeroinflated Poisson model, commonly applied for the analysis of traffic accident data (Shankar, Milton, and Mannering 1997).Both data can be found on Zenodo (Hwang et al. 2022).The aim of both analyses is to fit the methods developed in Section 2, and to examine their performances when ignoring heterogeneity and estimating the average susceptibility probability.

Brook Trout Data
These data were collected by Professor James T. Peterson via electro-fishing, and were analyzed for site occupancy estimation in Karavarsamis and Huggins (2019).Sampling was conducted on 50m sections of streams at 57 locations in the Upper Chattahoochee 371 River basin, USA.The number of sites was n = 77 with T = 3 sampling occasions.The frequencies for m 0 , . . ., m 3 were 45, 11, 17, and 4, so the observed occupancy percentage was 32/77 = 41.6%.Two covariates were known to be associated with event count detection (p) and occupancy (ψ ) probabilities (see Karavarsamis and Huggins 2019): the elevation (elev), which was measured in kilometers, and the stream mean cross-sectional area (CSA).We took the average of CSA measurements over three visits for each site.
We used CSA and elev to model the binomial count and occupancy components, respectively.In Table 3, we present the estimates of each coefficient when using maximum likelihood and conditional likelihood (we also considered the conditional likelihood with the Horvitz-Thompson estimator ψ, denoted by CL*).In this example, maximum likelihood and conditional likelihood showed some differences in the event count detection component, logit(p), where the effect of CSA is of lesser significance for conditional likelihood.However, results were more consistent for the occupancy component, logit(ψ).
Next, we postulate that the above (full) model is true, and we removed the covariates (i.e., the modeled heterogeneity) from the occupancy and binomial count components to see the impacts of estimating the average occupancy (susceptibility) probability ψ.We report the results in Table 3. First, the average occupancy probability is estimated at 48.6% by maximum likelihood; hence, it implies that about 7% of sites did not detect any fish.By removing the heterogeneity in the binomial count component (see model Cou.const.), the maximum likelihood estimate of ψ reduces by 2.6% (which contributes to a significant portion in the 7%).Similar findings were seen for conditional likelihood and CL*, though these were less significant due to the effects of CSA being less relevant by conditional likelihood (the degree of detection heterogeneity is more negligible in conditional likelihood).Finally, by removing the heterogeneity in the occupancy component (see model Sus.const.),all estimates of the three methods showed almost no change compared to the full model.The reason for this is that event count detection probabilities and occupancy probabilities are nearly uncorrelated as the correlation coefficient of CSA and elev is only 0.036 (p-value = 0.75).

Motorcycle Traffic Violation Data
This survey study consists of 7386 respondents who provided information on their motorcycle usage, such as maintenance cost, the purpose of use, mileage, engine volume, and information on the owners (i.e., the riders), including age and the number of traffic violations in one year.The Ministry of Transportation and Communication conducted the survey in Taiwan in 2007.
We considered traffic violation frequencies by riders as the response variable, y.Due to some riders violating the traffic rules but not being detected, the zero counts of y are a mixture of random and structural zeros.Lukusa et al. (2016) fitted a zeroinflated Poisson model with respect to two regressors: riding distance per year and motorcycle engine volume.Similarly, we define km = I(riding distance > 10,000km) and eng = I(engine volume > 250cc).Engines exceeding 250cc are classified as heavy motorcycles, which require a special driver's license in Taiwan.These data contained some missing values, but their absence only affected the results slightly (Lukusa et al. 2016).In our analysis, we used the complete dataset of 5447 observations.The observed presence percentage was 10.8%, km = 1(849) and eng = 1(754).We present model estimates when using maximum likelihood and conditional likelihood in Table 3.For this example, maximum likelihood and conditional likelihood showed similar estimates in both the log(λ) and logit(ψ) components.
As in Section 5.1, we postulate that the above (full) model is true, and we removed the model heterogeneity from the susceptibility probability and count intensity components.We report the results in Table 3. First, the average susceptibility probability is estimated at 17.5% by maximum likelihood; this implies that an additional 6.7% of the traffic violation drivers were not detected.Next, by removing heterogeneity in the count intensity component (see model Cou.const.),all three estimates of ψ were reduced by 0.2%.This may be due to low presence percentages in the data.However, by removing the heterogeneity in the susceptibility component (see model Sus.const.), the maximum likelihood estimate is increased to 28% (with an increment of 10.5%, see the full model).The reason for this is the count intensity and susceptibility probabilities are highly positively correlated, where both components use the same covariates, and all have significant positive effects.Here, the average susceptibility probability estimate for conditional likelihood had also increased, but this was not as profound.Again, the specification of the susceptibility component does not affect the average susceptibility probability estimate for the CL* approach.

Discussion
In this study, we investigated parameter estimators perform in zero-inflated Poisson models when there is heterogeneity in susceptibility probabilities and count intensities.Our research yielded three key findings.First, we found that the conditional maximum likelihood estimator is not equivalent to the usual maximum likelihood estimator for regression models with covariates in the event count intensity.Second, we discovered that the susceptibility probability is underestimated when heterogeneity in the event count intensity is not taken into account.This was evident for using either maximum or conditional maximum likelihood estimators in our simulation study.Third, we established new findings concerning the average susceptibility probability in scenarios where susceptibility heterogeneity is not considered.Although our results for findings (1) and (2) were established for constant susceptibility probabilities, they may also apply to non-homogeneous susceptibility probabilities.Note that a similar underestimation phenomenon as in finding (2) was observed in capture-recapture models (Hwang and Huggins 2005) and occupancy models (Hwang, Huggins, and Chen 2017).Regarding finding (3), we only provided partial results when heterogeneity in the count intensity is considered to be a mixture model or when the conditional likelihood approach is used in a regression framework.Further research is needed to determine results for maximum likelihood estimation in the regression setting.
Although we have demonstrated that conditional likelihood and maximum likelihood estimators are not asymptotically equivalent in the regression case, our empirical results suggest that they have similar levels of efficiency.In practice, maximum likelihood is widely used; however, the conditional likelihood has a much-simplified form which makes computation/model fitting manageable.Additionally, if covariates are used, condi-tional likelihood benefits from having an optimal estimating function property that ensures robustness against model misspecification on the susceptibility probability (Proposition 2).Hence, we recommend the use of conditional likelihood in studies involving zero-inflated models.
In addition, we proposed a Horvitz-Thompson type estimator based on maximizing the conditional likelihood when estimating the average susceptibility probability.We demonstrated its robustness when the count intensity component model is correctly specified.Although there is no general method for checking model misspecification in the count component, in practice, the Horvitz-Thompson type estimator can be a valuable tool when assessing the suitability of a susceptibility model.For example, when the underlying susceptible component model is complex due to the latent nature of the true susceptible status (L i ) and it cannot be directly observed, the Horvitz-Thompson type estimator can be treated as a reference estimate.A nonparametric bootstrap test could be used to compare the maximum (conditional) likelihood estimator with the Horvitz-Thompson type estimator.By evaluating the agreement between these two estimators, we can gain further insights into the adequacy of the susceptibility component model.
The findings in this study can be further extended to zeroinflated binomial or zero-inflated geometric models.Although we did not consider these models in the current presentation for the sake of simplicity, these extension are straightforward, with the exception of zero-inflated negative binomial models which are less trivial.For example, results from a simulation study (not reported here) showed that the susceptibility probability are overestimated when heterogeneity in the event count intensity is not taken into account, which is contrary to our finding (2).This discrepancy arises because the zero-inflated negative binomial model includes an additional dispersion parameter that may confound with the count intensity parameters.
We also focused on the impact of ignoring heterogeneity among subjects in zero-inflated models and how it can affect model accuracy.However, it is challenging to ensure that the count intensity component is accurately specified or to determine if the model contains the true covariates in practice.This problem is not unique to our study but is a well-known issue in statistical science.Ignoring significant covariates is a widespread problem, as seen in Simpson's paradox and the growing field of causal inference (Glymour, Pearl, and Jewell 2016).The two sub-models in the zero-inflated model exacerbate this issue, making it challenging to resolve.Despite these challenges, this study provides a robust method that allows for some model misspecification when fitting zero-inflated models.
Finally, additional factors such as within-subject correlation among repeated measured data and spatial-temporal correlation between subjects should also be considered in the model.Hall (2000) and Min and Agresti (2005) incorporated random effects to accommodate the within-subject correlation of repeated data and showed that it could substantially improve model fitting.Agarwal, Gelfand, and Citron-Pousty (2002) and Ver Hoef and Jansen (2007) developed spatial and spatial-temporal zero-inflated Poisson models, respectively; both studies have applications in ecology.To the best of our knowledge, no studies have emphasized the effects of such dependencies, if ignored, on susceptibility probability estimation and inference.Examining these additional effects has been similarly considered in capture-recapture models, see Rivest (2008).We envisage that similar results could be developed in a zero-inflated analysis provided that the susceptibility probability is constant; however, this can be challenging when heterogeneity in the susceptibility probabilities is also included.In addition, there may be interactions between these factors, which adds further complexity to the problem.Extensions to other zero-inflated models, such as a zero-inflated negative binomial model or a zero-inflated Conway-Maxwell-Poisson model (Shmueli et al. 2005) would also be useful, but we did not consider these models here.

Table 2 .
Simulation Scenario (b): Estimates for ψ when using maximum likelihood and conditional likelihood for cases (i) and (ii) under three Poisson count models (Models •, x 1 and x 1 +x 2 ).

Table 3 .
(a)Estimates and standard errors (in parentheses) using the Brook trout data (Trout) and Taiwan motorcycle data (Motor.)by maximum likelihood (ML) and conditional likelihood (CL).(b) Average susceptibility probability ( ψ) estimates and standard errors using the Trout and Motor.datasets when fitting CL and ML.
NOTE: Method CL * uses the conditional likelihood with the Horvitz-Thompson estimator ψ.