The empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior under Stein’s loss function

Abstract For the hierarchical normal and inverse gamma model, we calculate the Bayes posterior estimator of the variance parameter of the normal distribution under Stein’s loss function which penalizes gross overestimation and gross underestimation equally and the corresponding Posterior Expected Stein’s Loss (PESL). We also obtain the Bayes posterior estimator of the variance parameter under the squared error loss function and the corresponding PESL. Moreover, we obtain the empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior by two methods. In numerical simulations, we have illustrated five aspects: The two inequalities of the Bayes posterior estimators and the PESLs; the moment estimators and the Maximum Likelihood Estimators (MLEs) are consistent estimators of the hyperparameters; the goodness-of-fit of the model to the simulated data; the numerical comparisons of the Bayes posterior estimators and the PESLs of the oracle, moment, and MLE methods; and the plots of the marginal densities for various hyperparameters. The numerical results indicate that the MLEs are better than the moment estimators when estimating the hyperparameters. Finally, we utilize the %bodyfat data of 250 men of various ages to illustrate our theoretical studies.

As pointed out by Casella and Berger (2002), the squared error loss function penalizes equally for overestimation and underestimation, which is fine in the location case with H ¼ ðÀ1, 1Þ: In the positive restricted parameter case with H ¼ ð0, 1Þ where 0 is a natural lower bound and the estimation problem is not symmetric, we should not choose the squared error loss function, but choose Stein's loss function (James and Stein 1961; see also Brown 1990) because it penalizes gross overestimation and gross underestimation equally, that is, an action a will incur an infinite loss when it tends to 0 or 1: In the hierarchical normal and inverse gamma model (1), our parameter of interest is h ¼ r 2 > 0 which is a variance parameter.Therefore, we will choose Stein's loss function.For more literature on Stein's loss function, the readers are referred to Brown (1968), Parsian and Nematollahi (1996), Bobotas and Kourouklis (2010), Zhang (2017), Xie et al. (2018), Zhang et al. (2018Zhang et al. ( , 2019)), Zhang, Rong, and Li (2019), and Sun, Zhang, and Sun (2021).
The Bayes estimation of the variance parameter (h nþ1 ) of the normal distribution with a conjugate inverse gamma prior is studied in Example 4.2.5 (p.236) of Lehmann and Casella (1998) and in Exercise 7. 23 (p. 359) of Casella and Berger (2002).However, they only calculate the Bayes posterior estimator of h nþ1 under the squared error loss function.Since h nþ1 > 0 is a positive restricted parameter, the Bayes posterior estimator of h nþ1 under the squared error loss function is not appropriate.In contrast, we should choose Stein's loss function because it penalizes gross overestimation and gross underestimation equally.Moreover, Zhang (2017) has investigated the hierarchical normal and inverse gamma model (1) and calculated the Bayes posterior estimator of h nþ1 under Stein's loss function.However, Zhang (2017) assumes that the hyperparameters are known, which is unrealistic.In this manuscript, we determine the hyperparameters by the moment method and the Maximum Likelihood Estimation (MLE) method from the marginal distribution of the model.Then the estimated hyperparameters are plugged into the Bayes posterior estimators of h nþ1 , and finally we obtain the empirical Bayes estimators of h nþ1 : The rest of the paper is organized as follows.In the next Sec.2, we summarize four theorems.More specifically, we calculate the posterior distribution of h nþ1 for the hierarchical normal and inverse gamma model (1) in Theorem 1.Moreover, the estimators of the hyperparameters of the model by the moment method are summarized in Theorem 2. Furthermore, the estimators of the hyperparameters of the model by the MLE method are summarized in Theorem 3. Finally, the empirical Bayes estimators of the variance parameter of the model under Stein's loss function by the moment method and the MLE method are summarized in Theorem 4. In Sec. 3, we will illustrate five aspects in the numerical simulations.First, we will numerically exemplify two inequalities of the Bayes posterior estimators and the Posterior Expected Stein's Losses (PESLs).Second, we will illustrate that the moment estimators and the Maximum Likelihood Estimators (MLEs) are consistent estimators of the hyperparameters.Third, we will calculate the goodness-of-fit of the model to the simulated data.Fourth, we will numerically compare the Bayes posterior estimators and the PESLs of the three methods.Finally, we will plot the marginal densities of the model for various hyperparameters.In Sec. 4, we utilize the %bodyfat data of 250 men of various ages to illustrate the calculations of the empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior.Some conclusions and discussions are provided in Sec. 5.

Main results
Suppose that we observe X 1 , X 2 , :::, X nþ1 from the hierarchical normal and inverse gamma model: where À1 < l < 1, a > 0, and b > 0 are hyperparameters to be determined, h nþ1 > 0 is the unknown parameter of interest, Nðl, h i Þ is the normal distribution with an unknown mean l and an unknown variance h i , and IGða, bÞ is the inverse gamma distribution with an unknown shape parameter a and an unknown scale parameter b.As described in Deely and Lindley (1981), the statistician observes data ðx 1 , x 2 , :::, x nþ1 Þ ¼ x ðnþ1Þ and wishes to make an inference about h nþ1 : Therefore, x nþ1 provides direct information about the parameter h nþ1 , while supplementary information ðx 1 , x 2 , :::, x n Þ ¼ x ðnÞ is also available.The connection between the prime data x nþ1 and the supplementary information x ðnÞ is provided by the common distributions Nðl, h i Þ and IGða, bÞ: Suppose that X $ Gða, bÞ and Y ¼ 1=X $ IGða, bÞ: More specifically, the pdfs of X and Y are respectively given by

The Bayes posterior estimators and the PESLs
The inverse gamma prior is a conjugate prior for the variance parameter (h nþ1 ) of the normal distribution, so that the posterior distribution of h nþ1 is also an inverse gamma distribution.For the hierarchical normal and inverse gamma model (1), the posterior distribution of h nþ1 is summarized in the following theorem whose proof can be found in the supplement.
Theorem 1.For the hierarchical normal and inverse gamma model (1), the posterior distribution of h nþ1 is an inverse gamma distribution, that is, where Since h nþ1 > 0 is a variance parameter in (1), the Bayes posterior estimator of h nþ1 under the squared error loss function is not appropriate.In contrast, we should choose Stein's loss function (James and Stein 1961; see also Brown 1990) because it penalizes gross overestimation and gross underestimation equally.Now let us explain why Stein's loss function is better than the squared error loss function on H ¼ ð0, 1Þ: Stein's loss function is given by and the squared error loss function is given by Notice that on the positive restricted parameter space H ¼ ð0, 1Þ, Stein's loss function penalizes gross overestimation and gross underestimation equally, that is, an action a will incur an infinite loss when it tends to 0 or 1: Whereas, the squared error loss function does not penalize gross overestimation and gross underestimation equally, as an action a will incur a finite loss (in fact h 2 nþ1 ) when it tends to 0 and incur an infinite loss when it tends to 1: Figure 1 shows the two loss functions on H ¼ ð0, 1Þ when h nþ1 ¼ 2: Now we will derive two inequalities for any hierarchical model such that the posterior expectations exist.Similar to Zhang (2017), the Bayes posterior estimator of h nþ1 under Stein's loss function is given by and the Bayes posterior estimator of h nþ1 under the usual squared error loss function is given by Similar to Zhang (2017), we can prove that by exploiting Jensen's inequality.Similar to Zhang (2017), the Posterior Expected Stein's Loss (PESL) at where a Ã and b Ã are given by (2).Moreover, we also calculate the Bayes posterior estimator of h nþ1 under the usual squared error loss function, It is easy to show that which exemplifies the theoretical study of (5).Furthermore, similar to Zhang (2017), the PESL at d p, h nþ1 s ðx nþ1 Þ and d p, h nþ1 2 ðx nþ1 Þ are respectively given by where is the digamma function.It can be shown that which exemplifies the theoretical study of (6).It is worth noting that the PESLs PESL p, h nþ1 s ðx nþ1 Þ and PESL p, h nþ1 2 ðx nþ1 Þ depend only on a Ã , but not on b Ã : Therefore, the PESLs depend only on a, but not on l, b, and x nþ1 : In the simulations section and the real data section, we will exemplify the two inequalities (5) and (6).Moreover, we will exemplify that the PESLs depend only on a, but not on l, b, and x nþ1 : 2.2.The empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior To prove Theorems 2 and 3, we need the following lemmas.Lemma 1 whose proof can be found in the supplement is about the high-order moments of the normal distribution.
Lemma 1.Let X $ Nðl, r 2 Þ.Then the first four moments of X are: The following lemma whose proof can be found in the supplement is about the first two moments of the inverse gamma distribution.
Lemma 2. Let h $ IGða, bÞ follow an inverse gamma distribution with a shape parameter a > 0 and a scale parameter b > 0, whose density is given by The following lemma whose proof can be found in the supplement relates a non standardized student-t distribution to a mixture distribution by compounding a normal distribution with mean l and unknown variance, with an inverse gamma distribution placed over the variance with parameters a ¼ =2 and b ¼ r 2 =2: where À1 < l < 1, > 0, and r > 0 are hyperparameters.Then the marginal distribution of X is a non standardized student-t distribution, that is, where l is a location parameter, > 0 is a degrees of freedom parameter, and r > 0 is a scale parameter.
Combining Lemmas 1, 2, and 3, we can prove the following lemma in which we calculate the first four moments of a non standardized student-t distribution, t ðl, r 2 Þ: The proof of the lemma can be found in the supplement.
Lemma 4. Let X $ t ðl, r 2 Þ be a non standardized student-t distribution.Then the first four moments of X are: The estimators of the hyperparameters of the model (1) by the moment method l 1 ðnÞ, a 1 ðnÞ, and b 1 ðnÞ and their consistencies are summarized in the following theorem whose proof can be found in the supplement.Note that the proof of Theorem 2 depends on Lemma 4.
Theorem 2. The estimators of the hyperparameters of the model (1) by the moment method are is the sample kth moment of X.Moreover, the moment estimators are consistent estimators of the hyperparameters.
The estimators of the hyperparameters of the model ( 1) by the MLE method l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ and their consistencies are summarized in the following theorem whose proof can be found in the supplement.Note that the proof of Theorem 3 depends on Lemma 3. Theorem 3. The estimators of the hyperparameters of the model (1) by the MLE method l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ are the solutions to the following equations Moreover, the MLEs are consistent estimators of the hyperparameters.
We have to resort to numerical solutions of Equations ( 12)-( 14), because the analytical calculations of the MLEs of l, a, and b by solving the equations are impossible.We can utilize Newton's method to solve Equations ( 12)-( 14) and to obtain the MLEs of l, a, and b.Notice that the MLEs of l, a, and b are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators.
Finally, the empirical Bayes estimators of the variance parameter of the model (1) under Stein's loss function by the moment method and the MLE method are summarized in the following theorem.
Theorem 4. The empirical Bayes estimator of the variance parameter of the model (1) under Stein's loss function by the moment method is given by (7) with the hyperparameters estimated by ðl 1 ðnÞ, a 1 ðnÞ, b 1 ðnÞÞ in Theorem 2. Alternatively, the empirical Bayes estimator of the variance parameter of the model (1) under Stein's loss function by the MLE method is given by (7) with the hyperparameters estimated by ðl 2 ðnÞ, a 2 ðnÞ, b 2 ðnÞÞ numerically determined in Theorem 3.

Theoretical comparisons of the Bayes posterior estimators and the PESLs of three methods
In this subsection, similar to Sun, Zhang, and Sun (2021), we will theoretically compare the Bayes posterior estimators and the PESLs of the three methods (the oracle method, the moment method, and the MLE method).Note that the numerical comparisons of the Bayes posterior estimators and the PESLs of the three methods can be found in Subsec.3.4.Note that the subscripts 0, 1, and 2 below are for the oracle method, the moment method, and the MLE method, respectively.Similar to the derivations in Zhang (2017) and Sun, Zhang, and Sun (2021), the PESLs of the three methods are respectively given by where ðx ðnþ1Þ Þ ði ¼ 0, 1, 2Þ following the route of Zhang (2017) and Sun, Zhang, and Sun (2021).The Bayes posterior estimators of h nþ1 under Stein's loss function, d p, h nþ1 s, i ðx ðnþ1Þ Þ ði ¼ 0, 1, 2Þ, minimize the corresponding PESL, that is, is the action space, a ¼ aðx ðnþ1Þ Þ > 0 is an action (estimator), L s ðh nþ1 , aÞ given by ( 3) is Stein's loss function, and h nþ1 > 0 is the unknown parameter of interest.It is easy to calculate The Bayes posterior estimators of h nþ1 under the squared error loss function are given by The PESLs evaluated at the Bayes posterior estimators d p, h nþ1 s, i ðx ðnþ1Þ Þ ði ¼ 0, 1, 2Þ are given by The PESLs evaluated at the Bayes posterior estimators d p, h nþ1 2, i ðx ðnþ1Þ Þ ði ¼ 0, 1, 2Þ are given by Similar to Sun, Zhang, and Sun (2021), our primary objectives are to estimate the Bayes ðx ðnþ1Þ Þ by the oracle method.However, the hyperparameters l, a, and b are unknown.The oracle method knows the hyperparameters l, a, and b only in simulations.But in reality, the hyperparameters l, a, and b are unknown.
The good news is that we can actually obtain the Bayes posterior estimators and the PESLs by the moment method, and by the MLE method, once the data x ðnþ1Þ ¼ ðx 1 , x 2 , :::, x nþ1 Þ are given.
To compare the moment method and the MLE method, we can compare the Bayes posterior estimators and the PESLs by the two methods with those quantities by the oracle method in simulations.The method which produces closer Bayes posterior estimators and PESLs to the ones by the oracle method in simulations, is a better method.
Similar to Sun, Zhang, and Sun (2021), the Bayes posterior estimators d p, h nþ1

Simulations
In this section, we will carry out the numerical simulations for the hierarchical normal and inverse gamma model (1).We will illustrate five aspects.First, we will numerically exemplify two inequalities of the Bayes posterior estimators and the PESLs ( 5) and ( 6).
Second, we will illustrate that the moment estimators (l 1 ðnÞ, a 1 ðnÞ, and b 1 ðnÞ) and the MLEs (l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ) are consistent estimators of the hyperparameters (l, a, and b).Third, we will calculate the goodness-of-fit of the model to the simulated data.Fourth, we will numerically compare the Bayes posterior estimators and the PESLs of the three methods (the oracle method, the moment method, and the MLE method).
Finally, we will plot the marginal densities of the model for various hyperparameters.The simulated data are generated according to the model ( 1) with the hyperparameters being specified by l ¼ 0, a ¼ 3, and b ¼ 1.The reason why we choose these values is that À1 < l < 1, a > 2, and b > 0 are required in the moment estimations of the hyperparameters.Other numerical values of the hyperparameters can also be specified.

Two inequalities of the Bayes posterior estimators and the PESLs
In this subsection, we will numerically exemplify the two inequalities of the Bayes posterior estimators and the PESLs (5) and ( 6) for the oracle method.
First, we fix l ¼ 0, a ¼ 3, and b ¼ 1.Then we set seed (1) in R software and draw h nþ1 from IGða, bÞ: After that, we draw x nþ1 from Nðl, h nþ1 Þ: Figure 3 shows the histogram of h nþ1 jx nþ1 and the density estimation curve of which exemplify the theoretical studies of ( 5) and ( 6).Now we allow one of the four quantities l, a, b, and x nþ1 to change, holding other quantities fixed.In other words, we are interested in the sensitivity analysis of the Bayes  Adler et al. 2017;Zhang et al. 2017Zhang et al. , 2019;;Sun, Zhang, and Sun 2021).We remark that the R function persp() in the R package graphics cannot add another surface to the existing surface, but persp3d() can.Moreover, persp3d() allows one to rotate the perspective plots of the surface according to one's wishes.Figure 5 plots the surfaces of the Bayes posterior estimators and the PESLs, and the surfaces of the differences of the Bayes posterior estimators and the PESLs.From the left two plots of the figure, we see The results of the figure exemplify the theoretical studies of ( 5) and ( 6).

Consistency of the moment estimators and the MLEs
In this subsection, similar to Sun, Zhang, and Sun (2021), we will numerically exemplify that the moment estimators (l 1 ðnÞ, a 1 ðnÞ, and b 1 ðnÞ) and the MLEs (l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ) are consistent estimators of the hyperparameters (l, a, and b) of the hierarchical normal and inverse gamma model (1).Note that only x ðnÞ ¼ ðx 1 , x 2 , :::, x n Þ are used in this subsection.
Let n denote the hyperparameter l, a, or b.Then the consistency means that n i ðnÞ !p n, as n ! 1, for i ¼ 1, 2, where i ¼ 1 is for the moment estimator, i ¼ 2 is for the MLE, !p means convergence in probability, and n is the sample size.Alternatively, the consistency means that for every e > 0 and every n 2 N: The probabilities are approximated by the corresponding frequencies: where IðAÞ is the indicator function of A, which is equal to 1 if A is true and 0 otherwise, and M is the number of simulations.Therefore, the frequencies tend to 0 means that the estimators are consistent.The frequencies of the moment estimators (l 1 ðnÞ, a 1 ðnÞ, and b 1 ðnÞ) and the MLEs (l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ) of the hyperparameters (l, a, and b) as n varies for M ¼ 100 and e ¼ 1, 0.5, and 0.1 are reported in Table 1.From the table, we observe the following facts.
Table 1.The frequencies of the moment estimators and the MLEs of the hyperparameters as n varies for M ¼ 100 and e ¼ 1, 0.5, and 0.1.
Given e ¼ 1, 0.5, or 0.1, the frequencies of the estimators (l 1 ðnÞ, a 1 ðnÞ, b 1 ðnÞ, l 2 ðnÞ, a 2 ðnÞ, b 2 ðnÞ) tend to 0 as n increases to infinity, which means that the moment estimators and the MLEs are consistent estimators of the hyperparameters.For e ¼ 0:1, the frequencies of the estimator a 1 ðnÞ are still very large.However, we observe the tendency of declining to 0 as n increases to infinity.Comparing the frequencies corresponding to e ¼ 1, 0.5, and 0.1, we observe that as e gets smaller, the frequencies tend to be larger, since the constraints jn ðmÞ i ðnÞ À nj !e, for n ¼ l, a, and b, i ¼ 1, 2, are easier to be met.Comparing the moment estimators and the MLEs of the hyperparameters l, a, and b, we see that the frequencies of the MLEs are smaller than those of the moment estimators, which means that the MLEs are better than the moment estimators when estimating the hyperparameters in terms of consistency.

Goodness-of-fit of the model: KS test
In this subsection, similar to Sun, Zhang, and Sun (2021), we will calculate the goodness-of-fit of the hierarchical normal and inverse gamma model (1) to the simulated data (see Xue and Chen 2007;Ross 2013;Sun, Zhang, and Sun 2021).Note that only x ðnÞ ¼ ðx 1 , x 2 , :::, x n Þ are used in this subsection.Chi-square test is a measure of the goodness-of-fit.However, the chi-square test is very sensitive because of the problem of choosing the number of groups, and the problem of finding the cut-points.Therefore, instead of using the chi-square test as a measure of the goodness-of-fit, we will use the Kolmogorov-Smirnov test (or the KS test) as a measure of the goodness-of-fit.
The Kolmogorov-Smirnov statistic is the distance D between the empirical cumulative distribution function (cdf) F n ðxÞ and the population cdf F 0 ðxÞ, that is, In R software, the built-in function ks.test() can perform the KS test (see Conover 1971;Durbin 1973;Marsaglia, Tsang, and Wang 2003).Note that the KS test is generally valid for one-dimensional continuous cdfs.But in literature, KS-type tests have been developed for discrete data too (Dimitrova, Kaishev, and Tan 2020; Aldirawi, Yang, and Metwally 2019; Santitissadeekorn et al. 2020).In our problem, the null hypothesis specifies that where N À IGðl, a, bÞ is the marginal distribution of the hierarchical normal and inverse gamma model (1).The marginal density of the N À IGðl, a, bÞ distribution is given by (11) with ¼ 2a and r 2 ¼ b=a, which is obviously one-dimensional continuous.
Hence, the KS test can be utilized as a measure of the goodness-of-fit.The return value of ks.test() is a list containing the components statistic (the value of the test statistic, or the D value) and p.value (the p-value of the test).It is well known in the literature that a smaller D value or a larger p-value indicates a better fit of the model to the simulated data.Inspired by and based on the D value and p-value, Sun, Zhang, and Sun (2021) propose five indices (D, p À value, ðminDÞ%, ðmaxp À valueÞ%, and ðAccept H 0 Þ%) to compare the three methods, namely, the oracle method, the moment method, and the MLE method in simulations.
The results of the KS test goodness-of-fit of the model (1) to the simulated data are reported in Table 2.Note that the data are simulated according to the hierarchical normal and inverse gamma model (1) with l ¼ 0, a ¼ 3, and b ¼ 1.In the table, the hyperparameters l, a, and b are estimated by three methods.The first method is the oracle method, in that we know the hyperparameters l ¼ 0, a ¼ 3, and b ¼ 1.The second method is the moment method, in that the hyperparameters l, a, and b are estimated by their moment estimators (see Theorem 2).The third method is the MLE method, in that the hyperparameters l, a, and b are estimated by their MLEs (see Theorem 3).In Table 2, the sample size is n ¼ 1,000, and the number of simulations is M ¼ 100.D is the average D values (15) of M simulations (the smaller the better).p À value is the average p-values of M simulations (the larger the better).ðminDÞ% is the percentage of M simulations which attain the minimum D value in the three methods (the larger the better).The three ðminDÞ% values should sum to 1. ðmaxp À valueÞ% is the percentage of M simulations which attain the maximum p-value in the three methods (the larger the better).The three ðmaxp À valueÞ% values should sum to 1. ðAccept H 0 Þ% is the percentage of accepting H 0 (defined as p-value > 0.05) in M simulations for each method (the larger the better).Each ðAccept H 0 Þ% value should between 0% À 100%: From Table 2, we observe the following facts.
The D values for the three methods are respectively given by 0.0267, 0.0226, and 0.0199, which means that the MLE method is the best method, the moment method is the second best method, and the oracle method is the worst method.A possible explanation for this phenomenon is that in (15), the empirical cdf F n ðxÞ is based on data, and the population cdfs F 0 ðxÞ for the MLE method and the moment method are also based on data, while the population cdf F 0 ðxÞ for the oracle method is not based on data.The p À value values for the three methods are respectively given by 0.5207, 0.6772, and 0.7832, which also means that the MLE method ranks first, the moment method ranks second, and the oracle method ranks third.The possible explanation for this phenomenon is described in the previous paragraph.
The ðmin DÞ% values for the three methods are respectively given by 0.14, 0.26, and 0.60.The ðmin DÞ% value for the MLE method accounts for over half of the M simulations.The order of preference for the three methods is the MLE method, the moment method, and the oracle method.The ðmaxp À valueÞ% values for the three methods are respectively given by 0.14, 0.26, and 0.60.A small D value corresponds to a large p-value.Hence, the smallest D value corresponds to the largest p-value.Therefore, the ðmin DÞ% value and the ðmaxp À valueÞ% value for the three methods are the same.The order of preference for the three methods is the MLE method, the moment method, and the oracle method.
The ðAccept H 0 Þ% values for the three methods are respectively given by 98%, 100%, and 100%.The ðAccept H 0 Þ% values for the three methods are nearly 100%, which means that the three methods have good performances in terms of goodness-of-fit.In summary, for the five indices (D, p À value, ðmin DÞ%, ðmaxp À valueÞ%, ðAccept H 0 Þ%), the order of preference for the three methods are the MLE method, the moment method, and the oracle method.Comparing the moment method and the MLE method, we find that the MLE method has a better performance than the moment method in terms of all five indices.
The boxplots of the D values for the three methods (left) and the boxplots of the pvalues for the three methods (right) are displayed in Figure 6.From the figure, we observe the following facts.
The D values of the oracle method are larger than those of the other two methods.Since for the D value, the smaller the better, the order of preference for the three methods are the MLE method, the moment method, and the oracle method.The p-values of the oracle method are smaller than those of the other two methods.Since for the p-value, the larger the better, the order of preference for the three methods are the MLE method, the moment method, and the oracle method.Small D values correspond to large p-values, and large D values correspond to small p-values.
The MLE method has a better performance than the moment method in terms of the D values and the p-values.

Numerical comparisons of the Bayes posterior estimators and the PESLs of three methods
In this subsection, similar to Sun, Zhang, and Sun (2021), we will numerically compare the Bayes posterior estimators and the PESLs of the three methods (the oracle method, the moment method, and the MLE method).Note that the full data x ðnþ1Þ ¼ ðx 1 , x 2 , :::, x nþ1 Þ are used in this subsection.Moreover, the theoretical comparisons of the Bayes posterior estimators and the PESLs of the three methods can be found in Subsec.2.3.Note that the data are simulated according to the hierarchical normal and inverse gamma model (1) with the hyperparameters being specified by l ¼ 0, a ¼ 3, and b ¼ 1.Moreover, the oracle method knows the hyperparameters l ¼ 0, a ¼ 3, and b ¼ 1 in simulations.
Comparisons of the Bayes posterior estimators and the PESLs of the three methods for sample size n ¼ 10,000 and number of simulations M ¼ 100 are displayed in Figure 7. From the figure, we observe the following facts.
For the estimators of l, the MLE method is slightly closer to the oracle method than the moment method.For the estimators of a and b, the MLE method is much closer to the oracle method than the moment method.
For the Bayes posterior estimators d p, h nþ1 s, i ðx ðnþ1Þ Þ and d p, h nþ1 2, i ðx ðnþ1Þ Þ ði ¼ 0, 1, 2Þ, the MLE method is slightly closer to the oracle method than the moment method.For the PESLs PESL p, h nþ1 s, i ðx ðnþ1Þ Þ and PESL p, h nþ1 2, i ðx ðnþ1Þ Þ ði ¼ 0, 1, 2Þ, the MLE method is much closer to the oracle method than the moment method.All the plots indicate that the MLE method is better than the moment method, as the estimators of the hyperparameters, the Bayes posterior estimators, and the PESLs of the MLE method are closer to those of the oracle method than those of the moment method.
The boxplots of the absolute errors from the oracle method by the moment method and the MLE method for sample size n ¼ 10,000 and number of simulations M ¼ 100 are displayed in Figure 8.All the plots indicate that the MLE method is better than the moment method, as the absolute errors from the oracle method of the estimators of the hyperparameters, the Bayes posterior estimators, and the PESLs by the MLE method are much smaller than those of the moment method.
The averages and proportions of the absolute errors from the oracle method by the moment method and the MLE method for the estimators of the hyperparameters, the Bayes posterior estimators, and the PESLs are summarized in Table 3.The averages of the absolute errors from the oracle method by the moment method and the MLE method are the sample averages of the absolute error vectors from the oracle method by the moment method and the MLE method, the smaller the better.The proportions of the absolute errors from the oracle method by the moment method and the MLE method are the sample proportions of the absolute errors by the method being equal to the minimum of the two absolute errors, the larger the better.From the table, we observe that the averages of the absolute errors from the oracle method by the MLE method are much smaller than those by the moment method.Moreover, the proportions of the absolute errors from the oracle method by the MLE method are much larger than those by the moment method.In summary, the table illustrates that the MLE method is better than the moment method in terms of the averages and proportions of the absolute errors from the oracle method.Let h ¼ l, a, or b.Let ĥi be an estimator of h, where i ¼ 1 is for the moment estimator and i ¼ 2 is for the MLE.The Mean Square Error (MSE) of the estimator ĥi is defined by (see Casella and Berger 2002) Similarly, the Mean Absolute Error (MAE) of the estimator ĥi is defined by (see Casella and Berger 2002) Moreover, the Mean Entropy Error (MEE) of the estimator ĥi is defined by where the entropy loss function is also known as Stein's loss function given by (3).Note that the MSE, MAE, and MEE are criteria to compare two different estimators.
The MSE, MAE, and MEE are risk functions (expected loss functions) of the estimators of the hyperparameters and they could measure the performances of the estimators.The smaller the MSE, MAE, and MEE values, the better the estimator.The MSE, MAE, and MEE of the estimators of the hyperparameters by the moment method and the MLE method are summarized in Table 4. From the table, we see that the MLE method is slightly better than the moment method when estimating the hyperparameter l, as the MSE and MAE of the MLE method are slightly smaller than those of the moment method.Moreover, the MLE method is far better than the moment method when estimating the hyperparameters a and b, as the MSE, MAE, and MEE of the MLE method are much smaller than those of the moment method.Note that in the table, there are two NaNs for the MEE when estimating the hyperparameter l, because the entropy (or Stein's) loss function only applies to a positive parameter but now l ¼ 0 and thus the entropy loss function does not apply.

Marginal densities for various hyperparameters
In this subsection, we will plot the marginal densities of the hierarchical normal and inverse gamma model (1) for various hyperparameters l, a, and b.Note that the marginal density of X is given by ( 11) specified by three hyperparameters l, a ¼ =2, and b ¼ r 2 =2: We will explore how the marginal densities change around the marginal density with hyperparameters being specified by l ¼ 0, a ¼ 3, and b ¼ 1.Other numerical values of the hyperparameters can also be specified.
Figure 9 plots the marginal densities for varied l ¼ À2, À 1, 0, 1, 2, holding a ¼ 3 and b ¼ 1 fixed.From the figure we see that as l increases, the marginal density shifts to the right, while keeping the shape of the curve unchanged.That is, l is a location parameter.Moreover, all the marginal densities are symmetric about the mean l.
Figure 10 plots the marginal densities for varied a ¼ 0:75, 1:5, 3, 6, 12, holding l ¼ 0 and b ¼ 1 fixed.From the figure we see that as a increases, the peak value of the curve also increases.In other words, the variance of the marginal distribution decreases, as is a decreasing function of a.Moreover, all the marginal densities are symmetric about the mean l ¼ 0. Figure 11 plots the marginal densities for varied b ¼ 0:25, 0:5, 1, 2, 4, holding l ¼ 0 and a ¼ 3 fixed.From the figure we see that as b increases, the peak value of the curve decreases.In other words, the variance of the marginal distribution increases, as VarðXÞ given by ( 16) is an increasing function of b.Moreover, all the marginal densities are symmetric about the mean l ¼ 0.

A real data example
In this section, we utilize the %bodyfat data of 250 men of various ages to illustrate our methods (see DASL (Data and Story Library) 2019).The %bodyfat is the percent of a man's body that is fat, which is a matter of concern for health and fitness.
The histogram of the sample x ðnþ1Þ ¼ ðx 1 , x 2 , :::, x nþ1 Þ (the %bodyfat data) along with its density estimation curve is depicted in Figure 12.From the figure we see that the density estimation curve is roughly bell shaped and symmetric around its mean, and thus the hierarchical normal and inverse gamma model (1) should be appropriate (see subsection "Marginal densities for various hyperparameters" for details).
The estimators of the hyperparameters l, a, and b, the goodness-of-fit of the model, the empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior and the PESLs, and the mean and variance of the %bodyfat data by the moment method and the MLE method are summarized in Table 5.From the table, we observe the following facts.
The moment estimator of the hyperparameter l is equal to the sample mean x ¼ 0:1897992 of the first n observations.It is interesting to note that the MLE of the hyperparameter l is equal to 0.1897987, which is very similar to the moment estimator of the hyperparameter l.But the moment estimators and the MLEs of the hyperparameters a and b are quite different.This does not mean that the hierarchical normal and inverse gamma model (1) does not fit the real data, nor mean that the moment estimators and the MLEs are not consistent estimators of the hyperparameters a and b.The reason for the big differences between the two estimators is that the sample size n ¼ 249 is too small.Of course, the MLEs of the hyperparameters a and b are more reliable, as assured from the previous figures and tables in the simulations section.
We use the KS test as a measure of the goodness-of-fit.The p-value of the moment method is 0:2968 > 0:05, and thus the t 2a ðl, b=aÞ distribution with l, a, and b estimated by their moment estimators fits the sample x ðnÞ ¼ ðx 1 , x 2 , :::, x n Þ well.Moreover, the p-value of the MLE method is 0:6983 > 0:05, and thus the t 2a ðl, b=aÞ distribution with l, a, and b estimated by their MLEs fits the sample x ðnÞ well.When comparing the two methods, we observe that the D value of the MLEs is smaller, and the p-value of the MLEs is larger, which means that the t 2a ðl, b=aÞ distribution with the hyperparameters l, a, and b estimated by the MLEs has a better fit to the sample x ðnÞ than that estimated by the moment estimators.
When the hyperparameters are estimated by the moment method, we see that When the hyperparameters are estimated by the MLE method, we observe a similar phenomenon for the Bayes estimators and the PESLs.Consequently, the two inequalities ( 5) and ( 6) are exemplified.The mean of X (the %bodyfat data) is estimated by EX % l: By ( 16), the variance of X is estimated by VarðXÞ % b=ðâ À 1Þ: It is interesting to note that the mean and variance of X by the two methods are very similar, although the estimators of the hyperparameters are quite different.Moreover, it is worthy to mention that EX % 0:1897992 ¼ 18:97992%, VarðXÞ % 0:006812273 ¼ 68:12273% 2 for the moment method.The mean and variance of X are similar for the MLE method.Therefore, the variance of X is quite small, not large!

Conclusions and discussions
For the hierarchical normal and inverse gamma model ( 1 6).After proving some lemmas, the estimators of the hyperparameters of the  model (1) by the moment method are summarized in Theorem 2. Furthermore, the estimators of the hyperparameters of the model (1) by the MLE method are summarized in Theorem 3. Finally, the empirical Bayes estimators of the variance parameter of the model (1) under Stein's loss function by the moment method and the MLE method are summarized in Theorem 4.
In the simulations section, we have illustrated five aspects.First, we have numerically exemplified two inequalities of the Bayes posterior estimators and the PESLs ( 5) and ( 6).Second, we have illustrated that the moment estimators (l 1 ðnÞ, a 1 ðnÞ, and b 1 ðnÞ) and the MLEs (l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ) are consistent estimators of the hyperparameters (l, a, and b).Third, we have calculated the goodness-of-fit of the model to the simulated data.Fourth, we have numerically compared the Bayes posterior estimators and the PESLs of the three methods.Finally, we have plotted the marginal densities of the model for various hyperparameters.
Note that in Theorem 3, we only stated that the estimators of the hyperparameters of the model (1) by the MLE method l 2 ðnÞ, a 2 ðnÞ, and b 2 ðnÞ are the solutions to Equations ( 12)-( 14).We can exploit Newton's method to solve Equations ( 12)-( 14) and to numerically obtain the MLEs of l, a, and b.However, we cannot prove the existence Table 5.The estimators of the hyperparameters, the goodness-of-fit of the model, the empirical Bayes estimators of the variance parameter of the normal distribution and the PESLs, and the mean and variance of the %bodyfat data by the moment method and the MLE method.and uniqueness of the solutions to our system.The interested readers who have such kind of knowledge and skills are encouraged to solve this issue.We utilize the %bodyfat data of 250 men of various ages to illustrate our methods.The estimators of the hyperparameters l, a, and b, the goodness-of-fit of the model, the empirical Bayes estimators of the variance parameter of the normal distribution with a conjugate inverse gamma prior and the PESLs, and the mean and variance of the %bodyfat data by the moment method and the MLE method are summarized in Table 5.The t 2a ðl, b=aÞ distribution with the hyperparameters l, a, and b estimated by the MLEs has a better goodness-of-fit to the sample x ðnÞ than that estimated by the moment estimators.Moreover, the two inequalities ( 5) and ( 6) are exemplified for the sample x ðnþ1Þ : All the tables indicate that the MLEs are better than the moment estimators when estimating the hyperparameters l, a, and b.However, nothing comes for free.Compared to the moment estimators, the MLEs have a heavier computational burden, suffer from the numerical instability, require b > 0 in all the iteration process, and have no analytical solutions.Note also that the MLEs of the hyperparameters are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators.Moreover, if there is any case MLE not exist, then we have this good reason to take moment estimators.
From Lemma 3, we see that the data from the non standardized student-t distribution should have a good goodness-of-fit of the hierarchical normal and inverse gamma model.
Comparing the two Bayes posterior estimators d p, h nþ1 s ðx nþ1 Þ and d p, h nþ1 2 ðx nþ1 Þ of the variance parameter h nþ1 , we prefer Stein's estimator d p, h nþ1 s ðx nþ1 Þ, not because it is larger or smaller than the squared error estimator d p, h nþ1 2 ðx nþ1 Þ, but because Stein's loss function is more appropriate than the squared error loss function for the positive restricted parameter h nþ1 : Note that Stein's loss function penalizes gross overestimation and gross underestimation equally for h nþ1 , but the squared error loss function does not.
For the hierarchical normal and inverse gamma model (1), we can calculate the estimators of the hyperparameters, since the marginal distribution of the model (1) is proper.In empirical Bayes analysis, we use the marginal distribution to estimate the hyperparameters from the observations.There are two frequently used methods to estimate the hyperparameters by utilizing the marginal distribution, i.e., the moment method and the MLE method.In this manuscript, we use the two methods to estimate the hyperparameters of the hierarchical normal and inverse gamma model (1).
Note that the writings of this manuscript are similar to those of the published papers Zhang et al. (2019) and Sun, Zhang, and Sun (2021).However, we would like to point out that there are significant differences between the current manuscript and the published papers.In the current manuscript, we deal with the hierarchical normal and inverse gamma model (1), while in the published papers, Zhang et al. (2019) deal with the hierarchical Poisson and gamma model, and Sun, Zhang, and Sun (2021) deal with the hierarchical inverse gamma and inverse gamma model.Now we present some future work.One may consider extending the hierarchical normal and inverse gamma model (1) to different types of non conjugate priors for the

Figure 3 .
Figure 3.The histogram of h nþ1 jx nþ1 and the density estimation curve of pðh nþ1 jx nþ1 Þ:

Figure 4 .
Figure 4. Left: The estimators as functions of l, a, b, and x nþ1 : Right: The PESLs as functions of l, a, b, and x nþ1 :

Figure 5 .
Figure 5.The domain for ða Ã , b Ã Þ is D ¼ ð1, 10 Â ð0, 10 for all the plots.a is for a and b is for b Ã in the axes of all the plots.The red surface is for d p, hnþ1 2 ðx nþ1 Þ and the blue surface is for d p, h nþ1 s ðx nþ1 Þ in the upper two plots.(upper left) The estimators as functions of a Ã and b Ã : d p, hnþ1 2

Figure 6 .
Figure 6.The boxplots of the D values for the three methods (left) and the boxplots of the p-values for the three methods (right).

Figure 7 .
Figure 7. Comparisons of l, a, b, d s , d 2 , PESL s , and PESL 2 of the three methods for sample size n ¼ 10,000 and number of simulations M ¼ 100.

Figure 8 .
Figure8.The boxplots of the absolute errors from the oracle method by the moment method and the MLE method for sample size n ¼ 10,000 and number of simulations M ¼ 100.

Figure 12 .
Figure 12.The histogram of the sample x.

Table 2 .
The results of the KS test goodness-of-fit of the model (1) to the simulated data.

Table 3 .
The averages and proportions of the absolute errors from the oracle method by the moment method and the MLE method for the estimators of the hyperparameters, the Bayes posterior estimators, and the PESLs.

Table 4 .
The MSE, MAE, and MEE of the estimators of the hyperparameters by the moment method and the MLE method.