Small Area Estimation With Uncertain Random Effects

Random effects models play an important role in model-based small area estimation. Random effects account for any lack of fit of a regression model for the population means of small areas on a set of explanatory variables. In a recent article, Datta, Hall, and Mandal showed that if the random effects can be dispensed with via a suitable test, then the model parameters and the small area means may be estimated with substantially higher accuracy. The work of Datta, Hall, and Mandal is most useful when the number of small areas, m, is moderately large. For large m, the null hypothesis of no random effects will likely be rejected. Rejection of the null hypothesis is usually caused by a few large residuals signifying a departure of the direct estimator from the synthetic regression estimator. As a flexible alternative to the Fay–Herriot random effects model and the approach in Datta, Hall, and Mandal, in this article we consider a mixture model for random effects. It is reasonably expected that small areas with population means explained adequately by covariates have little model error, and the other areas with means not adequately explained by covariates will require a random component added to the regression model. This model is a useful alternative to the usual random effects model and the data determine the extent of lack of fit of the regression model for a particular small area, and include a random effect if needed. Unlike the Datta, Hall, and Mandal approach which recommends excluding random effects from all small areas if a test of null hypothesis of no random effects is not rejected, the present model is more flexible. We used this mixture model to estimate poverty ratios for 5–17-year-old-related children for the 50 U.S. states and Washington, DC. This application is motivated by the SAIPE project of the U.S. Census Bureau. We empirically evaluated the accuracy of the direct estimates and the estimates obtained from our mixture model and the Fay–Herriot random effects model. These empirical evaluations and a simulation study, in conjunction with a lower posterior variance of the new estimates, show that the new estimates are more accurate than both the frequentist and the Bayes estimates resulting from the standard Fay–Herriot model. Supplementary materials for this article are available online.


INTRODUCTION
Sample surveys are indispensable to estimate various characteristics of a population of interest. Reliability of estimates from sample surveys depends on the size of the sample. While government agencies need to estimate, for example, income and unemployment rates for the entire nation, they are also required to estimate these same characteristics for various geographic and demographic subdomains. Even though a survey may select a large sample to produce an estimate with desired accuracy for the whole population, the subsamples it allocates to various subdomains, also known as small areas, may be rather small to produce reliable direct estimates of small area characteristics based on individual subdomain samples.
Small area estimation methodology provides essential tools to statistical agencies for production of reliable small area estimates. Since a design-based direct estimate of a small area characteristic based on the area sample is usually less reliable, model-based approach to small area estimation has become very popular to produce indirect estimates. Importance and usefulness of model-based small area estimation approach may be assessed from an explosive growth of research publications. For early frequentist applications of small area estimation, we refer to Fay and Herriot (1979), Battese, Harter and Fuller (1988), and Prasad and Rao (1990). For a recent comprehensive review of this literature, one may refer to Rao (2003), Jiang and Lahiri (2006), and Pfeffermann (2013).
The landmark article by Fay and Herriot (1979) has popularized the model-based small area estimation in government statistics. Model-based small area estimation methods rely on appropriate regression models connecting direct estimates to suitable auxiliary variables that are available from other surveys and administrative records. These authors introduced an arealevel model to develop model-based estimates of per capita income for small places in the United States with population less than 1000. Denoting the design-based direct survey estimator of the ith small area characteristic by Y i and the corresponding area-level auxiliary variable by x i , a q × 1 vector, Fay and Herriot (1979) introduced the model where θ i is a summary measure of the characteristic to be estimated for the ith small area, e i is the sampling error with the estimator Y i , and the random effects v i denotes the model error measuring the departure of θ i from its linear regression on x i , x T i β. We assume that e 1 , . . . , e m are independent and normally distributed with e i ∼ N (0, D i ), and are independent of v 1 , . . . , v m , which are i.i.d. N (0, σ 2 v ). The sampling variances D i s are treated as known, but the model parameters β and σ 2 v are unknown. Random effects v i s are also known as small area effects. The model (1) was also introduced independently by Pfeffermann and Nathan (1981).
In model-based small area estimation, the goal is to develop an optimal predictor of θ i and a measure of error associated with this predictor. In small area estimation, both the frequentist and Bayesian approaches have been extensively used. In the frequentist approach based on the model given by (1), while Fay and Herriot (1979) developed an empirical Bayes (EB) predictor for θ i , Prasad and Rao (1990) and Lahiri and Rao (1995) considered an empirical best linear unbiased predictor (EBLUP) of θ i . For the normal mixed linear model in (1), EB predictor and EBLUP predictor of θ i are identical. Prasad and Rao (1990), Lahiri and Rao (1995), Datta and Lahiri (2000), and Datta, Rao, and Smith (2005) accurately approximated the mean squared error (MSE) of EBLUP of θ i by ignoring all terms of order o(m −1 ). These authors also provided secondorder unbiased estimators of the MSE with an approximate bias to the order o(m −1 ).
In the Bayesian approach to predict θ i based on model in (1), a hierarchical Bayes (HB) model, given below, is completed by assigning a prior distribution on the model parameters ψ = (β T , σ 2 v ) T .
The aforementioned HB model has been considered, for example, by Berger (1985), Ghosh (1992), Datta, Rao, and Smith (2005), among others. The success of developing reliable model-based small area estimates depends very much on the availability of good covariates that relate strongly with the small area means θ i . In the presence of good covariates, it is expected that random small area effects v i in (1) will have small variation leading to significant shrinkage of the direct estimator Y i to the synthetic regression estimator.
In a recent article, Datta, Hall, and Mandal (2011) demonstrated that in the presence of good covariates x i , the variability of the small area means θ i may be accounted well by x i , and the inclusion of a random effects term v i in the model (1) may be unnecessary. They argued that while the random effects may improve the adaptivity and flexibility of the Fay-Herriot model, it also increases the uncertainty of both point and interval estimators of small area means. In an effort to increase the accuracy of the small area estimators, these authors tested a hypothesis of no random effects in the small area model. If the null hypothesis is not rejected, these authors proposed more accurate synthetic estimators to estimate the small area means. They considered a discrepancy statistic measuring the lack of fit of the proposed multiple linear regression model of θ i on the covariates x i as a test statistic. The discrepancy function is constructed by combining certain deviance measures for all the small areas. The deviance measures the extent of lack of fit of the proposed model for each small area.
In contrast with the Fay-Herriot model which includes for each small area a random effects term to the regression function x T i β of θ i , the method of Datta, Hall, and Mandal (2011) advocates excluding the small area effects for all the small areas when a nonsignificant discrepancy statistic is realized. On the other hand, when a significant discrepancy statistic is obtained, it is often the case that the bigger value is due to large deviance from a small number of domains for which the regression model fails to be adequate. To develop a model reflecting this scenario, and as a middle ground between the Datta, Hall, and Mandal (2011) method and the Fay-Herriot model, we propose a "spike and slab" distribution for the random small area effects. In this formulation, with certain positive probability (1 − p), the random effects is assumed to be absent (i.e., a degenerate distribution at zero) for any small area, and with probability p the random effects has a nondegenerate distribution. In Section 2, we introduce this new HB model and compare it with the Fay-Herriot model. The new model provides a more flexible data-dependent shrinkage of the direct estimator to the synthetic estimator. To our knowledge, this type of model has not yet been explored in small area estimation. In Section 3, we explore the propriety of posterior distribution of our HB model that results from an improper prior distribution on the model parameters. As HB predictors of small area means for our proposed model do not admit closed-form expressions, we obtain them by Gibbs sampling. We provide the set of required full conditional distributions.
In Section 4, we consider an application of our model to the estimation of poverty ratios of school-age children for the U.S. states. We create a map of the poverty ratios based on the direct estimates and the HB estimates from the proposed model. Our proposed model uses a mixture of a degenerate distribution and a normal distribution for the random effects. Lahiri and Rao (1995) relaxed the normality of the random effects in the Fay-Herriot model by assuming finite higher order moments to estimate the MSE of the EBLUPs of small area means. For the application in Section 4, we present in Section 5 an external evaluation of the proposed estimator based on 2000 census data. We also compare the posterior variances of our estimators with the second-order unbiased estimators of the MSE of EBLUP, proposed by Lahiri and Rao (1995) for the nonnormal random effects Fay-Herriot model. We included a simulation study as supplementary materials to compare our estimators with the other standard estimators in terms of their measures of uncertainty and empirical measurers of accuracy. The simulation results show superiority of the proposed model. The article concludes with a brief summary in Section 6.

A MODEL WITH UNCERTAIN RANDOM EFFECTS
As in the Fay-Herriot model, in the proposed model we assume that the direct estimator Y i is an unbiased estimator of the small area mean θ i admitting a normal distribution with variance D i , for i = 1, . . . , m. The HB representation of our model is given as follows: Model H M : II. Conditional on δ 1 , . . . , δ m , v 1 , . . . , v m , p, β M and σ 2 vM , θ i is given by where δ 1 , . . . , δ m are independent and identically distributed with Conditional on δ 1 , . . . , δ m and σ 2 vM , random effects v 1 , . . . , v m are independently distributed with v i = 0 when δ i = 0, and conditional on δ i = 1, v i ∼ N (0, σ 2 vM ) for i = 1, . . . , m, independently. III. A priori the hyperparameters β M , σ 2 vM and p are independently distributed with prior density π (β M , σ 2 vM , p) = π σ 2 vM (σ 2 vM )π p (p), where we assign a standard improper uniform prior distribution on the regression parameter β M , and proper densities π σ 2 vM (σ 2 vM ) and π p (p) on σ 2 vM and p. While a proper prior distribution on a bounded parameter p is usually suggested, a proper prior distribution on the unbounded model variance parameter σ 2 vM is mandatory, for reasons explained next.
Based on stages (I) and (II) of the hierarchical model H M , after integrating out the v 1 , . . . , v m , and δ 1 , . . . , δ m from the joint density of (Y i , v i , δ i ), i = 1 . . . , m, we get, conditional on β M , σ 2 vM and p, that Y 1 , . . . , Y m are independently distributed, where Y i has a two-component mixture of normal distributions given by For the model H M , we use the subscript M to emphasize the mixture part of the model. Also, we use the subscript M to the model parameters β M and σ 2 vM to distinguish them from the model parameters in H 1 . From (3) we note that the first component of the mixture in the density of Y i does not include σ 2 vM . This necessitates a proper prior distribution for σ 2 vM so that the HB model admits a proper posterior distribution. For a discussion on this point, we refer the reader to Scott and Berger (2006).
The prior distribution of v i in (2) of Model H M assigns a positive mass (1 − p) at 0 and spreads the remaining mass p according to a normal distribution with mean zero and variance σ 2 vM . Ishwaran and Rao (2005) used this type of prior distribution, which is known in the literature as a "spike and slab" prior, in gene selection based on microarray data. Scott and Berger (2006) also recommended this prior in multiple hypothesis testing to determine the activated (or expressed) genes in microarray experiments. In their setup in the microarray context, since only a small fraction (p) of genes are expected to be activated, they assigned a prior distribution on p which is concentrated near zero. We are adopting this model in the small area estimation context to modify the prior distribution for the random effects v i in the Fay-Herriot model in (1). While for some small areas there may be a need to add a nondegenerate random effects to account for a lack of fit of the regression of θ i on x i , it is unlikely that all small areas will need nondegenerate random effects. In the spirit of Scott and Berger (2006), we generalize the Fay-Herriot model to specify the more flexible hierarchical model H M . In fact, in the special case of p = 1, our model H M encompasses the Fay-Herriot model.
We estimate the small area mean θ i based on the HB model H M by posterior mean of θ i ,θ iM (say), and measure the accuracy of the estimator by the posterior variance of θ i , V iM (say). While the numerical computation of these estimators will be performed by Gibbs sampling, to compare the estimatorsθ iM with the estimator of θ i under the Fay-Herriot model, we first consider θ iM (β M , σ 2 vM , p, y i ), the conditional mean of θ i , conditional on the model parameters β M , σ 2 vM , p and data y 1 , . . . , y m . By direct calculation, where the direct estimate y i of θ i is shrunk to the regression (or synthetic) "estimate" x T i β M ; the extent of the shrinkage depends on the coefficient Note thatp i is increasing in p and |y i − x T i β M |, and B iM is decreasing in σ 2 vM for fixedp i , and is decreasing inp i for fixed σ 2 vM . Thus, the shrinkage coefficient B iM will be large if either p, σ 2 vM or the "residual" |y i − x T i β M | is small, all of which make intuitive sense. While the shrinkage of the direct estimator for the proposed model is influenced by "residual" |y i − x T i β M | (i.e., the fit of the regression of the direct estimator on the covariate x i ), this is not the case for the Fay-Herriot model, where the shrinkage coefficient B iF H is given by and for known model parameters β and σ 2 v , the Bayes predictor of θ i is given byθ Note that while the shrinkage coefficient in the Fay-Herriot model depends only on the ratio D i σ 2 v , but not on the residual |y i − x T i β|, for the proposed mixture model the shrinkage depends on both these quantities and in turn it achieves a greater flexibility. Datta, Hall, and Mandal (2011) used an estimated version of the squared standardized residual (y i − x T i β) 2 /D i as the deviance of the ith small area to compute a discrepancy statistic that they proposed to test for the absence of a random effect.
While a fraction of small areas may contribute large deviances to the discrepancy statistic pushing it to be significant, suggesting a positive σ 2 v and thereby reducing the shrinkage for all small areas, the shrinkage coefficient B iM for the proposed model is more robust to large deviances from other small areas. It is definitely desirable to borrow more from the regression model by shrinking more a less reliable direct estimator to a regression synthetic estimator.

COMPUTATIONAL ISSUES AND THE POSTERIOR DISTRIBUTION
The HB mixture model H M proposed to estimate small area means θ i , for i = 1, . . . , m, is rather complex. We will implement Gibbs sampling (see, Gelfand and Smith 1990) to compute the posterior mean and posterior variance of θ i .
We discussed earlier that we need a proper prior distribution on the variance parameter σ 2 vM since it does not appear in all components of the mixture distribution. We will use a proper inverse gamma distribution as the prior with a corresponding density where both the rate parameter a > 0 and the shape parameter b > 0 are assumed to be known. Here, the mixing parameter p is the apriori probability that a random effect is nondegenerate. Our experience with small area estimation applications suggests that in the presence of good covariates nearly half of the small areas will not need any random effect term in the Fay-Herriot model. Based on such consideration, we propose a beta distribution with parameters c, d, with 0 < c < d. We use suitable values for a, b, c and d in our data analysis.

ESTIMATION OF CHILD POVERTY RATIO
The United States Census Bureau as part of their Small Area Income and Poverty Estimation (SAIPE) program annually provides estimates of poverty measure for various age groups at the state and county level. One measure of particular interest is the poverty ratio for the school going related children for the 5− to 17−year-old group, used by the United States Department of Education to implement the No Child Left Behind program. To provide an application of our method, we consider estimation of state-level poverty ratio for this demographic group based on the Current Population Survey (CPS) for the year 1999. Direct estimate y i for the ith state computed from the CPS is usually subject to large sampling error due to small sample size. To develop more accurate estimate of this characteristic, the Census Bureau suggested combining the direct estimate with other administrative data through a suitable regression model. The Census Bureau explored using the Internal Revenue Service (IRS) data measuring poverty ratio based on the number of child exemptions (covariate 1) and IRS nonfiler rate (covariate 2) which are found to be correlated with the direct CPS estimate of poverty rate y. The Census Bureau also found out that the residuals from fitting a model for the 1989 census poverty data on these covariates for the year 1989 have good predictive power in predicting the prevailing child poverty ratio in 1999. In our application, we consider the year 1999 since the poverty ratio obtained from the 2000 census data which collects poverty information for the income year 1999 can be used as a benchmark to compare accuracy of our proposed estimates and other various estimates available in the literature.
The discrepancy statistic of Datta, Hall, and Mandal (2011) suggested to test the absence of a random effects term in the Fay-Herriot model in (1)

is given by
iβ WLS ) 2 /D i is the deviance of the ith small area that contributed to the discrepancy statistic. For our application, the value of T is 67.72 which was significant with a p-value of 0.025, computed based on chi-square distribution with 47 degrees of freedom. From Figure 1, we note that only 9 or 10 (about 20%) of the 51 deviances have values larger than 4, the largest one being 14.2, corresponding to Massachusetts. While this test for the entire data suggests inclusion of the random effects term in the Fay-Herriot model, the result is dramatically different if we refit the model excluding Massachusetts. The new discrepancy statistic is quite small with a nonsignificant p-value of 0.284, and most of the individual deviances are smaller than 3.
An exploratory analysis of the poverty data presented earlier suggests adaptive shrinkage of the direct estimates to the synthetic regression-based estimates. We estimate the poverty ratio θ i by applying the HB Fay-Herriot model and the proposed model. For the Fay-Herriot model, we use an improper uniform prior π (β, σ 2 v ) = 1 to conduct the Bayesian analysis. It is wellknown that the resulting posterior will be proper (see Berger 1985), and we implement Bayesian computing via Gibbs sampling. For the proposed uncertain random effects model (add a random effect if δ i = 1, or do not add any random effect if δ i = 0), we use a partially proper prior. It is explained earlier that a proper prior on σ 2 vM is necessary to ensure a proper posterior distribution. We consider an inverse gamma distribution with meanD and varianceD 2 , whereD is the average sampling variance. The motivation behind this choice is to scale the prior variance in terms of the (average) error varianceD which is commonly done for a hypervariance (see, Scott andBerger 2006, p. 2146). We use improper uniform prior for β M , and a proper Beta(1, 4) prior for p. The beta prior reflects a 20% a priori probability that a random effects to a small area is nondegenerate (it is zero if δ i = 0).
For each model, we obtain a point estimate of the small area mean θ i by the corresponding posterior mean, and measure the uncertainty of the point estimate by the posterior variance. To demonstrate the effectiveness of the proposed model, we have plotted the ratio of the posterior variance of the proposed model to the posterior variance of the Fay-Herriot model in Figure 2. We find that except for Massachusetts, the ratio is less than 80%  for all other states, and the ratio is less than 50% for nearly half of the states. Only for Massachusetts the posterior variance of θ i for the proposed model is about 60% higher than the posterior variance of θ i for the Fay-Herriot model. The average of this ratio for all the states is 0.54.
We compare the shrinkage coefficients for all the small areas for the two models. For the Fay-Herriot model, we use D i /(D i +σ 2 v,FH-HB ) as the estimated shrinkage for the ith small area, hereσ 2 v,FH-HB is the posterior mean of σ 2 v based on the Fay-Herriot HB model (we found that this value is not much different from the average of the Gibbs sample values of B i,FH (σ 2 v )). For our proposed model, we estimated the shrinkage coefficient by E(B iM |y), where the posterior expectation is computed by averaging the values of B iM (β M , σ 2 vM , p, y i ) over the Gibbs samples generated from this model. In Figure 3, we plot these shrinkage coefficients. The plot shows that for most states shrinkage coefficients of the proposed model are much larger than the shrinkage coefficients provided by the Fay-Herriot model. This suggests that a random effects term in the Fay-Herriot model may not be needed for many areas. Indeed, we found that of the fifty-one 95% prediction intervals for the random effects in the Fay-Herriot model, 22 include the zero value. For our proposed model, we computed the posterior probability of {δ i = 1} for all the states, which are displayed in Figure 4. Except for Massachusetts, which has this probability 0.83 (nearly 1), all other states have this probability less than 0.30. Compared to a prior odds of 1 to 4, the posterior odds of δ = 1 is nearly 5 to 1 for Massachusetts, and nearly 1 to 2 (or less) for the other states. This indicates that a presence of random effects is suggested by the data only for Massachusetts.
We noticed earlier that the direct CPS estimate for Massachusetts does not fit the Fay-Herriot model well, and our model shrinks the CPS estimate much less to the synthetic estimate than the Fay-Herriot model shrinks it to the synthetic one. We also note that while our model provides greater shrinkage for CA and NY, the Fay-Herriot model shrinks much less for these two states.
In Figure 5, we provide the posterior density of δ i v i for three states (Massachusetts, California, New York). Each posterior distribution has a large positive mass at v i = 0. The vertical bar at v i = 0 corresponds to the posterior probability of v i = 0 (which is the same as P (δ i = 0|data)), and the smooth density displays how the remaining probability mass is distributed over nonzero values of v i . A higher value of the vertical bar and/or a higher concentration of the density near zero will indicate substantial shrinkage of the direct estimator toward the synthetic estimator. From these plots, we see that the direct estimate for Massachusetts is subject to moderate shrinkage, since the probability mass of v i at zero is quite small (0.17) and the remaining mass of 0.83 is spread over 0 to 15 (with practically no mass below zero), indicating that the direct estimator is substantially bigger than the synthetic estimator. This last fact is in agreement with the large positive residual for Massachusetts in Figure 1. On the other hand, direct estimates for CA and NY shrink substantially toward their synthetic values under the proposed model (the, respective, estimated shrinkage coefficients are 0.86 and 0.87). This large amount of shrinkage is also confirmed from the last two plots in Figure 5. Each of the plots has a large positive mass (bigger than 70%) at v i = 0, and the remaining mass of about 30% is spread over a relatively narrow interval around zero.
Finally, we create a poverty map to compare the CPS estimates with those obtained from the Fay-Herriot model and the proposed model. These maps, created by subtracting the "true" values (described in Section 5) of the poverty measures from these estimates, show the effectiveness of the proposed   (c) show the differences of these estimates from the "truth," for the proposed model, FH model, and CPS estimates, respectively. model. Panel (d) in Figure 6 provides the state poverty map using the true poverty ratios based on census data. Panels (a)−(c) plot the maps of the departures of the estimates from the true poverty ratios. Panel (a) is for proposed estimates, (b) is for the Fay-Herriot estimates, and (c) is for the CPS direct estimates. In panels (a)−(c), states with deviations of the estimated poverty ratios from the true ratios between −2% and 2% are shaded green, states with true poverty ratios underestimated by 2% or more are shaded with red, and overestimated by 2% or more are shaded with blue. These maps empirically demonstrate that the proposed model produces more accurate estimates than the Fay-Herriot model. While the proposed model produces estimates that are right on the target for 47 states (with deviations within 2%), it overestimates the poverty ratio by more than 2% for only three states, namely, MS, LA, and WV, and underestimates by more than 2% for MA alone. On the other hand, the estimates from the Fay-Herriot model miss the true state poverty ratios by more than 2% for eight states. The Fay-Herriot model overestimates for five states, namely, MS, LA, SD, KY, and WV, and underestimates for three states, namely, KS, DE, and MA. If we turn to the CPS estimates, they are much less accurate and they overestimate for 16 states and underestimate for 10 states. Additionally, smaller posterior variances associated with the estimates from the proposed model compared to the posterior variances associated with the estimates from the Fay-Herriot model for all states except one justify the superiority of the proposed model.

EXTERNAL EVALUATION AND COMPARISON WITH A ROBUST FREQUENTIST METHOD
We evaluate performance of the proposed model by comparing the poverty ratio estimates generated by our model with the corresponding population level poverty ratio we obtained from the 2000 census for the 50 states and Washington, DC. Since the questionnaires measuring poverty in the CPS are usually different from the questionnaires used in the census, there may be a systematic difference between the two sets of numbers. To account for any such difference, we compute ratio benchmarked census states poverty ratios by multiplying the 51 census poverty (pop) i is the estimated population of the 5 − 17 age group of related children in the ith state, c i is the poverty ratio for the group obtained from the 2000 census. This adjustment should allow an "apple to apple" comparison of the estimates based on the CPS sample to the "truth" based on the 2000 census.
To compare empirically the accuracy of an estimator t, we calculate the set of small area estimates {t i , i = 1, . . . , 51} and we compute various deviation measures of this set of numbers from the benchmarked census numbers {c i , i = 1, . . . , 51}, where c i = Rc i . Note that 51 i=1 (pop) i c i = 51 i=1 (pop) i y i , thus the benchmarked "true" census numbers c i are calibrated so that national "true" number of poor children agrees with the national CPS number of poor children, the latter is considered quite accurate since it is based on a large national sample. To evaluate effectiveness of an estimator t, we computed average absolute deviation AAD(t) = 1 (c i ) 2 of the estimated values {t i , i = 1, . . . , 51} from their corresponding "true" values {c i , i = 1, . . . , 51}. We evaluated these four summary performance measures for the direct estimate (y), the FH-HB predictor,θ FH−HB , and the proposed HB predictorθ M−HB . We display these summary measures for these three estimates in the following table. The table shows that model-based small area estimates perform much better than the direct estimates. Moreover, the lowest values of these deviation measures are realized by our proposed predictor. Figure 2, which shows that the posterior variance from the proposed model is nearly half of the posterior variance of the Fay-Herriot model, in conjunction with Table 1, establishes the superiority of our proposed model.

EBLUP Prediction for Nonnormal Small Area Means
It is well known that the BLUP of a small area mean based on a mixed linear model depends only on the first two moments of the random effects and the sampling error, and not on their distributional assumptions. In particular, the BLUP continues to remain valid under nonnormality of random effects.
Steps I and II of the model H M can be reexpressed as where e i , v * i , i = 1, . . . , m are independently distributed with a common mean zero, e i 's are normally distributed with variance D i , and v * i 's are identically distributed as the v i δ i 's. Clearly, the random effects v * i are not normally distributed but they have all moments of positive order finite. This setup is a special case of the pioneering article by Lahiri and Rao (1995). They estimated β by a two-stage weighted least squares method and the model variance σ * 2 v by the ANOVA method (see Prasad and Rao 1990). We omit the details here. Denoting the estimators byβ GLS and σ * 2 v , the EBLUP of θ i , derived by Lahiri and Rao (1995) is ). Assuming finite (8 + )-th moment of v * i , with > 0, Lahiri and Rao (1995) obtained a second-order accurate approximation to the MSE ofθ i,EBL−LR . In a serendipitous discovery, Lahiri and Table 2. Performance of the proposed methodology−average ratios of various deviation measures from the truth (Columns (1)−(4)) and average ratios of posterior variances of the Fay-Herriot model to the proposed model (Column (5))  Rao (1995) noted that the second-order unbiased estimator of the MSE, mse(θ i,EBL−LR ), derived in (5.15) of Prasad and Rao (1990) continues to be a second-order unbiased estimator of the MSE of the EBLUPθ i,EBL−LR , under the model (15). The uncertainty of the EBLUP predictors of the small area means θ i are given by the estimated MSE. For our poverty ratio data, these estimates are very similar to the posterior variances of the θ i in the Fay-Herriot HB model. In Figure 2, we also plot the ratio of the posterior variance of θ i for our proposed model to mse(θ i,EBL−LR ). This plot displays that except for the state MA, for the other 50 states, the proposed model results in predictors of small area means which are more accurate than the predictors, frequentist or Bayes, based on the traditional Fay-Herriot model.
Additionally, the estimated shrinkage coefficients, B iF H (σ * 2 v ), are very similar to the estimated shrinkage coefficients for the HB Fay-Herriot model with a uniform prior distribution on the model parameters. This results in very similar point estimates of the small area means. Various deviation measures of theθ i,EBL−LR from the "true" census poverty ratios are nearly identical to those for the HB Fay-Herriot model. Among the four sets of predictors of small area means, namely, Y i ,θ i,FHM ,θ i,FH−HB , andθ i,EBL−LR , our proposed predictorŝ θ i,M−HB have the smallest deviation measures.

CONCLUSION
We propose a flexible alternative to the Fay-Herriot model for estimation of small area means and compare performance of the resulting small area predictors with other predictors of small area means. Our model includes as special cases both the Fay-Herriot model and the model without any random effects term that may be suggested by the Datta, Hall, and Mandal (2011) method. Our model determines the "goodness of fit" of the proposed regression model for the small area mean, and adaptively includes a random effects to the model for small area mean θ i , if necessary. We use a Bayesian approach to fit our model, which was applied to estimate poverty ratios for schoolgoing children for the U.S. states. Our application shows the superiority of the proposed model when comparing resulting estimators with the HB and EBLUP predictors of small area means based on the Fay-Herriot model, in terms of posterior variance or estimated mean squared error of the estimates and various deviations of the estimates from the "true" values of the means based on census data. This superior performance continues to carry over for various simulated scenarios of our model.
The uncertain random effects model proposed in this article, which assumes an equal probability p of a nondegenerate effect in all small areas, may be generalized to unequal probabilities p i . Moreover, in the presence of suitable auxiliary variables, these probabilities may be modeled through a logistic regression. However, similar to the case of equal p for all small areas, here also we should use a proper prior on the logistic regression coefficients to ensure a proper posterior.
In this article, the proposed methodology has been illustrated with the example of SAIPE data. However, there are other applications that can also benefit by using our new method. For example, a four-person state-level median income data for the U.S. States reported by Fay (1987) or a hospital data reported by Morris and Christiansen (1995) and analyzed by Jiang and Tang (2011) are examples of potential application. Details of these examples can be found in the supplementary materials. In both examples, the proposed method produces smaller posterior variances than those from the Fay-Herriot HB model.

SUPPLEMENTARY MATERIALS
Supplementary materials contain the simulation study mentioned in the article, and additional examples. [Received April 2014. Revised December 2014