Bayesian estimator of multiple Poisson means assuming two different priors

Abstract This paper describes an empirical Bayesian estimator of multiple Poisson means based on a novel concept. The idea is to assume two different priors; Jeffreys’ prior for determining the hyperparameter and the uniform prior for the canonical parameter for estimating multiple Poisson means. The validity of this idea is discussed in detail. Furthermore, the empirical Bayesian estimator is constructed using the posterior mean of the canonical parameters instead of that of the mean parameters. The proposed estimator is evaluated using three familiar losses, and the results are found to be very promising. Finally, an actual dataset is analyzed to demonstrate the practical use of the proposed estimator.


Introduction
Let x k be a sample from the k-th Poisson distribution Poðk k Þ with unknown mean k k (k ¼ 1, :::, K). An estimator of the Poisson mean k ¼ ðk 1 , :::, k K Þ is derived from the dataset x ¼ ðx 1 , :::, x K Þ: The conditional probability function in the k-th Poisson distribution for x k , given the mean k k , is: which belongs to the exponential family, and the canonical parameter is h k ¼ log k k : Let the Poisson sampling density with the K-dimensional mean, k, be pðx; kÞ ¼ Q K k¼1 pðx k jk k Þ: The prior and the posterior densities in the conjugate analysis are expressed in terms of the Kullback-Leibler divergence (KLD) (Kullback and Leibler 1951) as: where DðÁ, ÁÞ, bðÁÞ, and kðÁ, ÁÞ are the KLD, the supporting prior density, and the normalizing constant, respectively (Yanagimoto and Ohnishi 2005). Here, bðkÞ often denotes a non-informative function. Relatively little attention has been paid to this function, but it is important in constructing a promising estimator. Bolstad (2007,(184)(185) reviewed some priors for the Poisson distribution, which including Jeffreys' prior (bðkÞ / 1= ffiffi ffi k p ) and the uniform prior for the canonical parameter (bðkÞ / 1).
The Bayesian method often uses the posterior mean of the mean parameters, but it is possible to use the posterior mean of the canonical parameters (Corcuera and Giummol e 1999;Yanagimoto and Ohnishi 2005). The estimator of multiple Poisson means is obtained by the inverse logarithmic transformation of the posterior mean of the canonical parameters. Further, using the novel idea of assuming two different priors, we propose an empirical Bayesian estimator of multiple Poisson means.
Estimators of multiple Poisson means have been proposed to minimize various losses, such as the weighted squared error loss (Tsui 1979), squared error loss (Tsui 1981), s-normalized squared error loss (Tsui and Press 1982), and entropy loss (Ghosh and Yang 1988). Another use of these losses is for evaluating the estimator. This paper describes the use of three familiar losses for such evaluation. The first is the loss based on the dual version of the KLD (dKLD), which attains a minimum at the posterior mean of the mean parameters. This was first proposed by Aitchison (1975), and is widely employed; for example, see Komaki (2015). The second is the loss based on KLD that attains a minimum at the posterior mean of the canonical parameters. This was proposed by Corcuera and Giummol e (1999) and further explored by Yanagimoto and Ohnishi (2009). The third is the loss based on the s-normalized squared error (SE). We choose a simple case of s ¼ 0 because, in the setting considered in this paper, the choice of s does not affect the evaluation of the estimator. In previous studies, estimators have been evaluated based on one or two losses. As the performance of the estimator can be highly dependent on the chosen loss, we employ three losses in our comparison studies. Our aim is to construct an estimator that is close to the population Poisson mean, which is that the three losses are small.
In Sec. 2, we introduce the empirical Bayesian method using the posterior mean of the canonical parameters. Further, using the novel idea of assuming two different priors, we propose an empirical Bayesian estimator of multiple Poisson means. In Sec. 3, the effectiveness of the proposed estimator is verified through the comparison of multiple Poisson means for various populations. Section 4 describes an attempt to estimate multiple Poisson means in different regions using the reported number of tetanus cases from only the first year. Finally, the conclusions to this study are presented in Sec. 5.

Empirical Bayesian method and proposed estimator
We first demonstrate that the empirical Bayesian estimators assuming Jeffreys' and the uniform priors are the same. To construct an estimator that is closer to the population Poisson mean, we then propose an empirical Bayesian estimator of multiple Poisson means assuming two different priors.

Jeffreys' prior
A conjugate prior density assuming Jeffreys' prior is written as: The posterior density is given by p J ðk k jx k ; m, dÞ ¼ p J ðk k ; ðx k þ dmÞ=ð1 þ dÞ, 1 þ dÞ for an arbitrary sample x k . The posterior mean of the canonical parameter is written as h k, J ¼ wðx k þ dm þ 0:5Þ À log ð1 þ dÞ, where wðÁÞ is the digamma function. Through the inverse logarithmic transformation of h k, J , the corresponding estimator of the k-th mean, k k, J , can be written as: The prior and posterior densities in K-dimensions are written as p J ðk; m, dÞ ¼ Q K k¼1 p J ðk k ; m, dÞ and p J ðkjx; m, dÞ ¼ Q K k¼1 p J ðk k jx k ; m, dÞ, respectively. In the empirical Bayesian method, the hyperparameter is determined by maximizing the marginal likelihood. To determine the values of m and d that maximize the logarithmic marginal likelihood log ML J ðm, dÞ, the partial derivatives of log ML J ðm, dÞ with respect to m and d are both set to 0. We can derive dm ¼ d x À 0:5 with x ¼ P K k¼1 x k =K; details of the derivation are outlined in Supplemental material A. By substituting dm ¼ d x À 0:5 into log ML J ðm, dÞ, we obtain the logarithmic marginal likelihood as: which is independent of m. Then, d ¼d is determined by maximizing log MLðdÞ: Further, by substituting dm ¼ d x À 0:5 and d ¼d into (4), we obtain the estimator of the k-th mean as: In this paper, the estimator of this original version is referred to as the original estimator.

Uniform prior
A conjugate prior density assuming the uniform prior is written as: Using the same procedure as for Jeffreys' prior, the corresponding estimator of the k-th mean, k k, U , is written as: The logarithmic marginal likelihood assuming the uniform prior is written as log ML U ðm, dÞ: When the partial derivatives of log ML U ðm, dÞ with respect to m and d are both set to 0, we can derive dm ¼ d x À 1: By substituting dm ¼ d x À 1 into log ML U ðm, dÞ, we obtain log MLðdÞ in (5). Further, by substituting dm ¼ d x À 1 and d ¼d into (8), we obtain the estimator of the k-th mean as:

Two different priors
In (6) and (9), the estimators assuming Jeffreys' and the uniform priors are the same. The same estimator can be obtained even when the estimator is derived assuming the other prior. The original estimator uses a common prior to determine the hyperparameter and estimate multiple Poisson means, but this study considers the case where we do not have the advantage of assuming a prior. To overcome this drawback, we describe a novel idea of assuming two different priors. Note that the empirical Bayesian method can be divided into two parts; determining the hyperparameter and estimating the parameter. We propose that two different priors should be assumed in the two parts. This idea has not previously been discussed in the literature on empirical Bayesian method, but it is our understanding that the fundamental empirical Bayesian method framework remains unchanged. This novel idea is expected to be applicable to other distributions.

Proposed estimator
We begin by explaining the two different priors assumed in the proposed estimator. First, Jeffreys' prior (see Sec. 2.1) is assumed to determine the two hyperparameters d and m by maximizing the logarithmic marginal likelihood. The uniform prior (see Sec. 2.2) is then assumed to estimate the Poisson mean by taking the posterior mean of the canonical parameter. Therefore, by substituting dm ¼ d x À 0:5 and d ¼d into (8), the proposed estimator of the k-th mean is expressed as:k The difference between the proposed and the original estimators lies in the variable of the digamma function: the former variable has a value equal to the latter plus 0.5. Note that there are conditions for the choice of the two different priors in this paper. For example, if the two hyperparameters are determined assuming the uniform prior and the estimator is obtained assuming Jeffreys' prior, the variable of the digamma function may be negative; details of the conditions for the choice of the two different priors are given in Supplemental material B.
Here, we discuss some details of the practical numerical calculations. When the difference between each Poisson mean is small, log MLðdÞ increases monotonically in d. Thus,d attains a maximum at infinity, and both the proposed and the original estimators become x: We use the following treatments to obtain an estimator that is closer to the population Poisson mean. Even when log MLðdÞ is increasing monotonically in d, the increase is often slow. The proposed and the original estimators use d ¼d when log MLðdÞ increases by less than 0.01 as d increases by 1. Using this rule, although log MLðdÞ ¼ 0 in the case of x ¼ ð0, :::, 0Þ,d ¼ 0 is used in the proposed and the original estimators. This treatment ensures thatd takes a finite value and that the estimators include information from x k , so they are expected to be closer to the population Poisson mean.

Risk comparisons
We compared the performance of the proposed estimator with that of four existing estimators (three different estimators and the original estimator).

Competitors
We compared three existing estimators with the proposed estimator. The first estimator is the maximum likelihood estimation (MLE)k MLE ¼ x, which has been widely used to estimate the parameter. However, it is known that the MLE is not close to the population parameter, especially when K is large. The second estimator is: proposed by Ghosh and Yang (1988). The first term on the right-hand side was x in their paper, but there are cases where the estimator takes a negative value. Therefore, we have assumed that x should be x þ 1: As the difference ink GY is small for d ¼ 1, 2, 3, 5, 10, we usedk GY with d ¼ 2 to compare the estimators. This is called the GY estimator in this paper. Using simulation studies, Ghosh and Yang (1988) showed that the risk of dKLD from the GY estimator is less than that from the MLE þ 1: The risk of dKLD in the GY estimator is greatly reduced when the population Poisson means are small, but is only slightly reduced when the population Poisson means are large.
The third estimator is the posterior mean of the mean parameter assuming the uniform prior, written as:k This was traditionally used as an empirical Bayesian estimator, and is called the traditional empirical Bayesian estimator in this paper. From our discussion of the empirical Bayesian method in Secs. 2.1 and 2.2, we know that the traditional empirical Bayesian estimators assuming Jeffreys' and the uniform priors are the same.

Losses
Three losses are considered in this paper. The first is the loss based on KLD, which is written as: where k ¼ ð k 1 , :::, k K Þ is the estimator result, and DðÁ, ÁÞ is the KLD defined in (2). The second is the loss based on dKLD, written as L m ð k, kÞ ¼ P K k¼1 Dðk k , k k Þ: The third is the loss based on SE, written as L SE ð k, kÞ ¼ P K k¼1 ð k k À k k Þ 2 : According to Ghosh and Yang (1998), the population multiple Poisson means set 17, 12, and 6 patterns for K ¼ 2, 5, and 10, respectively. As there were not many combinations of ðx 1 , x 2 Þ when K ¼ 2, we calculated the three risks for all combinations of 0 x 1 50 and 0 x 2 50: Note that the probability of combinations satisfying x 1 > 50 or x 2 > 50 was negligibly small for all 17 patterns; details of the numerical calculations are given in Supplemental material C. When K was 5 or 10, the three risks were evaluated by simulation studies, as there were many combinations to be calculated. The replication size in this paper was 1,000,000; details of the simulation study procedures are presented in Supplemental material C.

Results
The three risks given by the five estimators for K ¼ 2 are summarized in Table 1. The three estimated risks given by the five estimators for K ¼ 5, 10 are summarized in Tables 2 and 3, respectively. The three risks from the proposed estimator are smaller than those from the traditional  (9.0, 9.0, 9.0, 9.0, 9. empirical Bayesian estimator for K ¼ 2, 5, 10. The three risks from the proposed estimator are much smaller than those from the GY estimator when k K =k 1 < K for K ¼ 2, 5, 10, except for the risk of dKLD in No. 5 for K ¼ 2. When k K =k 1 ! K, the three risks from the proposed estimator are larger than those from the GY estimator for many patterns for K ¼ 2, but those from the proposed estimator are smaller than those from the GY estimator for K ¼ 10. The three risks from the proposed estimator are much smaller than those from the MLE for K ¼ 2, 5, 10, except for the risks of KLD and SE in No. 3 for K ¼ 2. The three risks from the proposed estimator are smaller than those from the original estimator when k K =k 1 < K for K ¼ 2, 5, 10, except for the risk of SE in No. 15 for K ¼ 2. When k K =k 1 ! K, the three risks from the proposed estimator are slightly larger than those from the original estimator for many patterns for K ¼ 2, but the opposite is true for many patterns for K ¼ 5, 10. Summarizing the above, when k K =k 1 < K for K ¼ 2, the three risks from the proposed estimator are smaller than those from the other four estimators, with a few exceptions. Even in these exceptional cases, the risk from the proposed estimator is the next-smallest, which is only slightly greater than the smallest. When k K =k 1 < K for K ¼ 5, 10, the three risks from the proposed estimator are less than those from the other four estimators in all patterns without exception. When k K =k 1 ! K, the three risks from the proposed estimator does not give the smallest risk values for many patterns for K ¼ 2, but does give the smallest for many patterns for K ¼ 10. We used the software R version 3.6.2 (R Core Team 2019) for numerical calculations and simulation studies.
The risks of dKLD from the proposed and the GY estimators do not become infinite because all k k > 0 are estimated, even when x ¼ ð0 Á Á Á 0Þ: For the case where x ¼ ð0 Á Á Á 0Þ, the risks of dKLD from the traditional empirical Bayesian estimator, the MLE, and the original estimator become infinite because k ¼ ð0 Á Á Á 0Þ is estimated. However, exp½wð0Þ in the original estimator is treated as 0. The proposed and the GY estimators satisfy k k > 0 in the Poisson distribution definition.

Practical example
We analyzed the reported number of tetanus cases in Japan from 2006 to 2017 (source: National Institute of Infectious Diseases). The dataset is summarized by prefecture, but we have aggregated the numbers across five regions (J1, Hokkaido and Tohoku; J2, Kanto; J3, Chubu; J4, Kinki and Chugoku; and J5, Shikoku and Kyushu); see Table 4. The number of disease occurrences in Table 3. Three estimated risks simulated for the five estimators (K ¼ 10).

Population KLD
No.
(k 1 , k 2 , k 3 , k 4 , k 5 , k 6 , k 7 , k 8 , k 9 , k 10 ) Proposed Trad. GY MLE Orig.  5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7. (4.4, 4.8, 5.2, 5.6, 6.0, 6.4, 6.8, 7.2, 7.6, 8. The reported numbers of tetanus cases in J1 and J5 have the minimum and the maximum values, respectively, and the ratio 36=10 ¼ 3:6 is less than 5. The population multiple Poisson means are unknown, but the ratio is considered to be less than 5. We recommend the use of the proposed estimator, as its validity conditions (as determined from the simulation study results) are satisfied in this practical example. Both the GY estimator and the MLE overestimate the maximum value and underestimate the minimum value, whereas the proposed estimator reduces this over-and underestimation.
In addition, we compared the three Bayesian estimators (the proposed estimator, the traditional empirical Bayesian estimator, and the original estimator). The estimated multiple Poisson means for the five regions given by the three Bayesian estimators are summarized in Table S1 of Supplemental material E. Although there are small differences between the estimators, we recommend the use of the proposed estimator in light of the simulation study results.

Conclusions
We have discussed the empirical Bayesian estimator of multiple Poisson means using the posterior mean of the canonical parameters. As the original estimator is the same regardless of the assumed prior, we proposed an empirical Bayesian estimator of multiple Poisson means that assumes Jeffreys' prior for determining the hyperparameter and the uniform prior for estimating multiple Poisson means. Comparison studies with various population multiple Poisson means show that, of the estimators considered in this study, the proposed estimator gives estimates that are closest to the population Poisson mean when k K =k 1 < K for K ¼ 2, 5, 10. Indeed, the proposed estimator is closest to the population Poisson mean when K is large, regardless of k K =k 1 : As it is more practical to use K ¼ 5, 10 than K ¼ 2, the proposed estimator is effective for all practical situations. In a practical example, multiple Poisson means were estimated from the reported number of tetanus cases during the first year. As k 5 =k 1 < 5 was assumed from the dataset, we were able to recommend the use of the proposed estimator.