On Arbitrarily Underdispersed Discrete Distributions

Abstract We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property.


Introduction
Ask an undergraduate to "construct a distribution with mean 81.2 and variance 5.4, " say, and it will be trivially solved using a normal, gamma, or essentially any other continuous distribution taught in first-year statistics. Modify this slightly, however, to "construct a discrete distribution with mean 81.2 and variance 5.4" and it immediately becomes a much more challenging exercise. This is because, unlike continuous distribution families, discrete families that can be both overdispersed and underdispersed have proven much harder to construct, less intuitive to characterize, and consequently more difficult to study (Sellers and Morris 2017).
Indeed, only a handful of proposed overdispersed and underdispersed generalizations of the Poisson distribution have gained traction in applied statistical work. These include the generalized Poisson (Consul and Jain 1973), double Poisson (Efron 1986), exponentially weighted Poisson (Ridout and Besbeas 2004), Conway-Maxwell-Poisson (Shmueli et al. CMP, 2005), hyper-Poisson (Sáez- Castillo and Conde-Sánchez 2013), and more recently the mean-parameterized Conway-Maxwell-Poisson (Huang 2017) and extended Poisson-Tweedie (Bonat et al. 2018) distributions. Of course, this is by no means an exhaustive list of generalized count distributions, with new approaches and refinements being proposed continually, such as the flexible weighted Poisson (Cahoy, Di Nardo, and Polito 2021) which extends the weighted Poisson of Castillo and Pérez-Casany (1998) and contains the CMP and hyper-Poisson distributions as special cases. Other novel approaches that do not contain the Poisson distribution as a special case when equidispersed have also been proposed, such as the generalized inverse trinomial distribution (Sim and Ong 2016). While such models are interesting, they are beyond the scope of this note and will be considered in a future study.
This note considers a simple question: can any of these models be arbitrarily underdispersed? For example, is it possible for any of these distributions to have a mean of 81.2 and a variance of 5.4, say? How about a mean of 81.2 and a variance of 1, or in the most extreme case a variance of 0.16 = 0.2 × 0.8, which is the smallest possible variance for any discrete distribution with mean 81.2 (Hagmark 2009)? Conversely, given a severely underdispersed set of counts, which (if any) of these discrete distributions can provide a decent fit to such data? Consider, for example, a sample of counts y = (26, 27, 27, 28, 28, 28, 28) with sample mean 27.43 and sample variance 0.62, The "perfect" model fit here would be the empirical distribution withp 26 = 1/7,p 27 = 2/7 andp 28 = 4/7, which attains the highest possible log-likelihood of −3.758 and lowest possible AIC of 11.52, but of course lacks predictive power outside of the observed support. For comparisons, the maximum likelihood fits for each of the above models, along with their estimated means, variances and AICs, are given in Table 1.
We immediately see from Table 1 that arbitrary underdispersion is not at all a trivial property: only the double Poisson, exponentially weighted Poisson and the mean-parameterized CMP have the potential to be arbitrarily underdispersed, with all other models being immediately ruled out of consideration due to restrictive parameter spaces that do not allow for severe (or even moderate) underdispersion. In other words, some of these other models fail to be "statistical models" according to McCullagh's philosophy that valid statistical models must admit a "natural extension that includes the domain for which inference is required" (McCullagh 2002)-the "domain" here being  (26,27,27,28,28,28,28) with sample meanȳ = 27.43, variance var(y) = 0.62, and best possible AIC of 11.52.

Model
Parameter restrictions MLE Fitted values AIC where m is the largest integerθ = 57.834σ 2 = 7.23 the space of discrete random variables with arbitrarily small variance. This is also consistent with the critique and discussion in Sellers and Morris's (2017) review paper on underdispersed count models.
In Section 2, we investigate the limiting behavior of these three candidate distributions when the dispersion becomes arbitrarily small. We find that only one of these can have an arbitrarily small variance for any given mean value. We formally state this result in Section 2.3. In Section 3, we demonstrate four practical implications of this result. We conclude in Section 4 by proposing that all (future) generalized count distributions should be tested against this key property. The supplementary materials contains a proof of the main result, along with a generalization to sums of discrete random variables.

Double Poisson
The double Poisson distribution was introduced by Efron (1986) as a special case of a class of double exponential family models. It is characterized via its probability mass function (pmf), where λ ≥ 0 is a centering parameter and θ > 0 is a dispersion parameter such that θ = 1 coincides with the Poisson distribution, θ > 1 leads to underdispersion and θ < 1 leads to overdispersion relative to the Poisson distribution. Although the centering parameter λ is often interpreted as the "mean" with "high accuracy" (see Efron 1986, p. 715), we find that this is not strictly true, especially under severe underdispersion. This is demonstrated in Figure 1 which plots the limiting behavior of the double Poisson under increasing underdispersion. We see that when λ is an integer, the limiting distribution is a unit point mass at λ. However, when λ is not an integer then the limiting distribution is a unit point mass on the nearest integer round(λ). For example, the double Poisson distribution with λ = 4.321 and θ = 1000 is essentially a unit point mass at y = 4, and the double Poisson distribution with λ = 4.567 and θ = 1000 is essentially a unit point mass at y = 5. Thus, the double Poisson distribution always degenerates to a single point mass under arbitrarily small underdispersion for fixed centering parameter λ.
A potential remedy is to allow λ = λ(μ, θ) to be a function of both the mean μ ≥ 0 and dispersion θ ≥ 0 via the solution to so that the mean of the distribution is fixed at μ for any θ . This modification is examined in more detail in Huang and Lembryk (2022). As it stands, however, the current implementation of the double Poisson in the state-of-the-art gamlss R package can only model the rate parameter λ. This limitation also leads to an inconsistency when interpreting regression models based on the double Poisson distribution: while regression parameters can be (and often are) interpreted as mean contrasts when the data are overdispersed, they lose this meaning when the data are underdispersed.

Exponentially Weighted Poisson
The exponentially weighted Poisson (EWP) family of distributions was introduced by Ridout and Besbeas (2004) as a special case of a weighted Poisson distribution. It is characterized by its pmf, where λ ≥ 0 is a centering parameter and β 1 , β 2 ∈ R are dispersion parameters such that β 1 = β 2 = 0 coincides with the Poisson, β 1 , β 2 > 0 corresponds to underdispersion and β 1 , β 2 < 0 corresponds to overdispersion. Ridout and Besbeas (2004) also introduced a two-parameter version of this model with β 1 = β 2 = β so that the dispersion behavior is "symmetric" around λ. In examining the limiting behavior of EWP distributions, it suffices to consider this twoparameter symmetric version and take the common dispersion parameter β to be arbitrarily large. In Figure 2, we see the same limiting behavior as with the double Poisson: when λ is an integer, the limiting distribution is a unit point mass at λ, and when λ is not an integer then the limiting distribution is a unit point mass on the nearest integer round(λ). Thus, the EWP distribution also degenerates under arbitrarily small underdispersion, and the centering parameter λ is no longer the mean of the distribution either.

Mean-Parameterized Conway-Maxwell-Poisson
The mean-parameterized Conway-Maxwell-Poisson distribution (Huang CMP μ , 2017) is a re-parameterization of the Conway-Maxwell-Poisson distribution that has seen a recent surge in popularity for the modeling of both underdispersed and overdispersed counts in a wide range of fields, including ecology (Brooks et al. 2017;Wei et al. 2020), item response theory (Forthmann, Gühne, and Doebler 2020), medical statistics (Mui et al. 2021;Stuber et al. 2021), and generalized linear mixed models (Brooks et al. 2019). It is characterized by its pmf, where μ ≥ 0 is the mean, ν ≥ 0 is a dispersion parameter such that ν = 1 coincides with the Poisson, ν > 1 corresponds to underdispersion and ν < 1 to overdispersion, and the rate λ(μ, ν) is given implicitly by the solution to which ensures that the mean of the distribution is μ. Figure 3 visualizes the convergence paths of the CMP μ as the dispersion parameter ν gets increasingly large. We see that that when μ is an integer, the limiting distribution is again a unit point mass at μ, but when μ is not an integer then the limiting distribution is a shifted Bernoulli on the two integers μ and μ with probabilities precisely equal to the fractional parts of μ. For example, an arbitrarily underdispersed CMP μ distribution with μ = 4.321 converges to a shifted Bernoulli on the values 4 and 5 with probabilities 0.679 and 0.321, respectively, while the CMP μ distribution with μ = 4.567 converges to a shifted Bernoulli on the values 4 and 5 with probabilities 0.433 and 0.567, respectively.
Thus, in contrast to the double Poisson and EWP class of models, CMP μ distributions do not always degenerate under severe underdispersion and has the "correct" limiting behavior given the value of the mean parameter. We formalize this result in the following proposition-to the best of our knowledge, this is currently the only known generalization of the Poisson distribution that exhibits this property: A. HUANG  Proposition 1. As ν → ∞, the CMP μ distribution with mean μ ≥ 0 converges to 1. a unit point mass at μ if μ is integer, that is, P(Y = μ) → 1. 2. a shifted Bernoulli on the two integers μ and μ if μ is noninteger, with probabilities equal to the fractional parts of μ, that is, A corollary of Proposition 1 is that the CMP μ is currently the only known family of non-degenerate distributions on the nonnegative integers that can achieve Hagmark's (2009) lower bound of (1 − ) for the smallest variance possible for a given mean μ with decimal part = μ − μ .
Proposition 1 is established with the aid of the two lemmas which give bounds on the rate λ = λ(μ, ν) for given mean μ as the dispersion ν gets arbitrarily large. The statement of these lemmas and proof of Proposition 1 are given in the supplementary materials. The technical details rely on tedious but otherwise elementary calculations, and are not particularly exciting-far more interesting are the practical implications of such a result.

Functional Independence of Parameters
Proposition 1 demonstrates that the two parameters μ and ν in the mean-parameterized CMP distribution are functionally independent, that is, any choice of μ ≥ 0 and ν ≥ 0 leads to a legitimate probability distribution with no restrictions on the parameters. This makes it rather unique among generalized count distributions-the generalized Poisson, hyper-Poisson, extended Poisson-Tweedie, and the original CMP all place restrictions on one parameter based on the value(s) of the other parameter(s). In particular, the larger the mean count the less underdispersion is permissible for these models. For example, the most underdispersed hyper-Poisson distribution, which is obtained by taking γ → 0, implies that the smallest possible variance is μ − 1 for any mean μ > 1. Thus, for a mean of 27.43, say, the most underdispersed hyper-Poisson distribution has variance 26.43. Some of the other distributions fare worse: the underdispersed generalized Poisson pmf does not sum to 1, and the underdispersed extended Poisson-Tweedie pmf does not even exist(!).
Additionally, unlike the double-Poisson or EWP models, the mean-parameterized CMP is always parameterized through its mean and dispersion, making model specification and interpretation arguably more accessible for applied statistical work. In particular, the interpretation of the location parameter does not switch from an "approximate mean" (when overdispersed) to the "nearest integer mode" (when underdispersed), as with the other two candidate models.

Double Generalized Linear Models
The functional independence of parameters allows specification of double generalized linear models (e.g., Smyth and Verbyla 1999), such as where X and Z are vectors of predictors, without any constraints on the regression parameters β and γ . This again satisfies McCullagh's validity criterion for a statistical model.

Simulating Underdispersed Counts
The mean-parameterized CMP distribution remains a full probability model over its entire parameter space, and can therefore be used to generate counts with any combination of mean and dispersion. The ability to generate arbitrarily underdispersed counts is particularly useful for simulation studies, as demonstrated in Forthmann, Gühne, and Doebler (2020) which used the CMP μ distribution to simulate underdispersed itemresponse counts with means as high as 81.2 and variances as low as 5.4. As we argued in the introduction, this is not at all a trivial task. For completeness, we answer our introductory question here: it is possible to construct a discrete distribution with mean 81.2 and variance 5.4, 1, or 0.16, simply via the CMP distribution with mean μ = 81.2 and dispersion ν = 15.1, 81.7, or ∞, respectively.

Severely Underdispersed Count Regression
Continuous datasets with arbitrarily small error variance pose no problems for normal, gamma, and other continuous models because these typically have a dispersion parameter that can be made arbitrarily small for any set of conditional mean values. This is not the case for discrete data. Consider the male elephant mating example from (Sellers and Shmueli 2013, sec. 3), where the authors simulated underdispersed Mating counts as a function of Age. Here, we push the simulation setting to its limits via the deterministic relation Matings | Age = round(exp(−1.579 + 0.069Age)) , with the Age of elephants sampled from a normal population with mean 40 years and standard deviation 8. This leads to arbitrarily underdispersed count datasets, an example of which is given in Figure 4.
Fitting a CMP μ generalized linear model (GLM) to these data gives almost exact point estimates (β 0 = −1.583,β 1 = 0.069) of the true regression coefficients with minimal standard errors and an overall AIC value of 28.8; see Figure 4 for the full fitted model output from the glm.cmp function in the mpcmp R package (Fung et al. 2020). In comparison, the fitted Poisson GLM gave a much larger AIC of 132.6, failing to capture the extreme underdispersion exhibited by the data, while the double-Poisson GLM via the gamlss package failed to converge for these data. The EWP GLM could not be fit to these data because code is currently unavailable for this model.

Conclusion
Of the many discrete distributions considered in this note, only the mean-parameterized CMP distribution has been shown to handle arbitrarily small underdispersion, with the limiting distribution-either a single probability mass or a shifted Bernoulli-being the most underdispersed possible for any discrete distribution with a given mean. It is currently the only known generalized discrete count distribution possessing this property. The practical and philosophical implications of this result add to the increasingly strong case for the CMP μ model to be the default for both analyzing and simulating underdispersed counts. We propose that all (future) generalizations of the Poisson distribution be tested against this property. Further research into the convergence rates in Proposition 1 is also warranted.

Supplementary Materials
Technical details: Contains proof of Proposition 1. (.pdf file)