A class of general pretest estimators for the univariate normal mean

Abstract In this paper, we propose a class of general pretest estimators for the univariate normal mean. The main mathematical idea of the proposed class is the adaptation of randomized tests, where the randomization probability is related to a shrinkage parameter. Consequently, the proposed class includes many existing estimators, such as the pretest, shrinkage, Bayes, and empirical Bayes estimators as special cases. Furthermore, the proposed class can be easily tuned for users by adjusting significance levels and probability function. We derive theoretical properties of the proposed class, such as the expressions for the distribution function, bias, and MSE. Our expressions for the bias and MSE turn out to be simpler than those previously derived for some existing formulas for special cases. We also conduct simulation studies to examine our theoretical results and demonstrate the application of the proposed class through a real dataset.


Introduction
The traditional estimators of unknown parameters are solely based on the observed samples. However, if one has some uncertain non-sample prior information for the parameters, it is natural to utilize such knowledge to improve the traditional estimators. Bancroft (1944) initially proposed the idea of pretest estimators that incorporate both sample and non-sample information by using a preliminary hypothesis test. In the literature, pretest estimators are often applied under the normal models, especially for estimating the normal mean. For instance, the famous Stein-rule estimator for the normal mean vector (James and Stein 1961) can be regarded as a smooth version of a pretest estimator. Judge and Bock (1978) extensively studied pretest and Stein-rule estimators with applications to econometrics. Chang (1995) suggested five types of twostage Stein-rule estimators. Ohtani (1998) compared the mean squared error of the restricted Stein-rule estimator and minimum mean squared error estimators. Willink (2008) proposed shrinkage confidence intervals for the univariate normal mean in the presence of non-sample information. Recently, Shih et al. (2021) adopted the pretest and Stein-rule schemes for estimating an intercept term of a linear model. Let X ¼ ðX 1 , X 2 , :::, X n Þ be a collection of independent and identically distributed (i.i.d.) random samples following a normal distribution with the unknown mean l and known variance r 2 > 0, which is denoted by Nðl, r 2 Þ: The most popular estimator for l is the sample mean X n ¼ R n i¼1 X i =n: On the other hand, one may assume some uncertain non-sample prior information for l: In this case, the so-called restricted estimator l ¼ h for a known quantity h, has a smaller risk than X n if l % h: However, the mean squared error (MSE) of h is large when jl À hj ) 0 and is unbounded when the parameter space for l is unrestricted. A possible solution to this problem is a pretest estimator that is a compromise between X n and h according to the significance level for a hypothesis H 0 : l ¼ h vs. H 1 : l 6 ¼ h (Bancroft 1944). The existing solutions are limitted to non-randomization tests for H 0 : l ¼ h vs. H 1 : l 6 ¼ h, though the test function could generally be defined as a randomization test.
In this paper, we propose a class of general pretest estimators for the univariate normal mean. The main mathematical idea of the proposed class is the adaptation of randomized tests, where the randomization probability is related to a shrinkage parameter. Consequently, the proposed class includes many existing estimators, such as the pretest, shrinkage, Bayes, and empirical Bayes estimators as special cases. Also, the proposed class can be easily tuned for users by adjusting significance levels and probability function. All the resultant estimators from our class are the estimators without any random element (non-randomized estimators), contrary to the fact that the pretest is randomized. We derive theoretical properties of the proposed class, such as the expressions for the distribution function, bias, and MSE. Our expressions for the bias and MSE turn out to be simpler and easier to compute than those previously derived for some existing formulas for special cases. Simulation studies are conducted to examine our theoretical results including a sensitivity analysis when the normal distribution is contaminated.
This paper is organized as follows. Section 2 proposes a class of general pretest estimators. Section 3 develops theoretical properties of the proposed class under known variance. Section 4 extends our theory to unknown variance. Section 5 conducts simulation studies. Section 6 demonstrates the proposed class by analyzing a real dataset. Section 7 discusses some related topics.

Proposed class of estimators
With the uncertain non-sample prior information in hand, it is natural to examine its correctness by testing a null hypothesis. This leads to the pretest estimation.
A pretest estimator is a two-step estimator that estimates the parameter of interest based on the result of a preliminary test. For estimating a normal mean, we consider the hypothesis where h is a known value from the prior information. Let X ¼ ðX 1 , X 2 , :::, X n Þ, where X 1 , X 2 , :::, X n are i.i.d. random samples following Nðl, r 2 Þ, where r 2 > 0 is known (extensions to unknown r 2 will be discussed in Sec. 4). Let a r : R n 7 !½0, 1 be a test function with a r ¼ 0 (accept H 0 ), a r ¼ 1 (reject H 0 ), and 0 < a r < 1 (reject H 0 with probability a r ). For 0 a 1 a 2 1, we define a randomized test where Z n ffiffiffi n p ð X n À hÞ=r is a Z-statistic, z p is the upper p-th quantile of Nð0, 1Þ for 0 < p < 1, and q : R n 7 !½0, 1 is a known measurable function. If a 1 ¼ a 2 ¼ a, the test is non-randomized and one can define an arbitrary probability, e.g., qðXÞ ¼ 1 for all X 2 R n , without influencing the test function. We let z 0 1 as usual. Here, we have Z n $ Nðk, 1Þ, where k ffiffiffi n p ðl À hÞ=r is the departure constant from the null hypothesis, a standardized difference between l and h: If the null hypothesis is true, then k ¼ 0; otherwise k 6 ¼ 0: The performance of a pretest estimator depends mainly on k: With the randomized test, we define a general pretest estimator l GPT r ¼ a r ðXÞ X n þ f1 À a r ðXÞgh: We may rewrite the above equation aŝ where IðÁÞ is the indicator function defined as IðAÞ ¼ 1 if A is true and IðAÞ ¼ 0 if A is false. One can tune a 1 , a 2 , h, and qðXÞ in Eq.
(1) to create a variety of estimators.
Remark. The probability function qðÁÞ becomes a weight in the general pretest estimator, and hence, all the resultant estimators are non-randomized (the estimators without any random element). Therefore, the concerns for randomized estimators (Berger 1985) do not apply to our proposed class.
The pretest estimator defined in Eq. (1) includes many well-known pretest and shrinkage estimators as special cases. We list some of them below.
(i) The classical pretest estimator If a 1 ¼ a 2 ¼ a, then Eq. (1) reduces to the classical pretest estimator The case of a ¼ 0 reduces to h while the case of a ¼ 1 reduces to X n : Within the class of the classical pretest estimators, no estimator dominates any other.
(ii) The shrinkage estimator If a 1 ¼ 0 and a 2 ¼ 1, then Eq. (1) reduces to the shrinkage estimator l S r ¼ h þ qðXÞð X n À hÞ: It shrinks X n toward h, where the shrinkage pattern depends on the choice of qðXÞ: By setting qðXÞ ¼ 1 À r 2 =maxfr 2 , nð X n À hÞ 2 g,l S r becomes an empirical Bayes estimator under a prior l $ Nðh, s 2 Þ, where s 2 is unknown and estimated by the maximum marginal likelihood estimator (P.263, Example 4.6.1 of Lehmann and Casella 1998). Furthermore, if qðXÞ ¼ c, where c 2 ½0, 1 is a known constant,l S r reduces to the shrinkage restricted estimator (Ahmed 2014) l SÃ r ¼ h þ cð X n À hÞ: The last estimatorl SÃ r has a smaller MSE than X n when l % h: However, it is extremely biased if jl À hj ) 0: In a similar fashion, by setting c ¼ ns 2 =ðns 2 þ r 2 Þ,l SÃ r becomes a Bayes estimator under the same prior l $ Nðh, s 2 Þ when s 2 is known.
Note that the Stein-rule estimator (Khan and Saleh 2001) is defined aŝ However,l SR r does not belong to the proposed class of estimators since 1 À z a=2 =jZ n j is negative when jZ n j < z a=2 : (iii) The type I shrinkage pretest estimator If 0 < a 1 ¼ a < 1, a 2 ¼ 1, and qðXÞ ¼ c, then Eq. (1) reduces to the type I shrinkage pretest estimator (Ahmed 1992;Khan and Saleh 2001) This estimator is originally obtained from the classical pretest estimator with h being replaced byl SÃ r : Ahmed (1992) has pointed out that the type I shrinkage pretest estimator dominates X n in a larger range of l than the classical pretest estimator does.
(iv) The type II shrinkage pretest estimator If a 1 ¼ 0, 0 < a 2 ¼ a < 1, and qðXÞ ¼ c, then Eq. (1) reduces to the type II shrinkage pretest estimatorl SPT 2 r ¼ h þ cð X n À hÞIðjZ n j > z a=2 Þ: Similar to the type I shrinkage pretest estimator, it can be obtained in the classical pretest estimator with X n being replaced byl SÃ r : Notably, by setting qðXÞ ¼ 1 À z a=2 =jZ n j, it becomes the positive-part Stein-rule estimator (Ahmed and Krzanowski 2004) The positivity of the factor 1 À z a=2 =jZ n j is asserted by the presence of IðjZ n j > z a=2 Þ: It is known thatl SRþ r dominatesl SR r (Ahmed and Krzanowski 2004) sincel SR r may shrink X n beyond the null hypothesis h: Consequently, one should always considerl SRþ r instead ofl SR r : Following Magnus (2000), we graphically illustrate the aforementioned estimators under h ¼ 0, r ¼ 1, n ¼ 1, a ¼ a 1 ¼ 0:05, a 2 ¼ 0:2, and c ¼ 0:5 in Figure 1. Clearly, l SÃ r is a 50% shrinkage of X n toward h: The usual pretest estimatorsl PT r ,l SPT 1 r , and l SPT 2 r are choices between two of three estimators h, X n , andl SÃ r according to the quantity of j X n j: On the other hand, the proposed general pretest estimatorl GPT r with qðXÞ ¼ c is more flexible than the usual pretest estimators sincel GPT r is a choice among all three estimators h, X n , andl SÃ r : Lastly, the Stein-rule estimatorsl SRþ r andl SR r shrink X n toward h by adding or taking a constant value. One can also observe thatl SR r over-shrinks X n when j X n j is small and this problem is resolved by imposing the term IðjZ n j > z a=2 Þ, that is, usingl SRþ r : Figure 1. The patterns of pretest and shrinkage estimators under h ¼ 0, r ¼ 1, n ¼ 1, a ¼ a 1 ¼ 0:05, a 2 ¼ 0:2, and c ¼ 0:5:

Theoretical properties
This section develops some theoretical properties of the proposed estimator, such as the distribution function, bias, and MSE. One can study the theoretical properties of all existing estimators (i) -(iv) as special cases of our general theory. The following theorem derives the distribution function of the general pretest estimatorl GPT r in Eq. (1).
Theorem 1. The general pretest estimatorl GPT r follows a mixture distribution Prðl GPT r xÞ ¼ p A Prð X n xjAÞ þ p B Prfh þ qðXÞð X n À hÞ xjBg þ p C Iðh xÞ, where A fjZ n j > z a 1 =2 g, B fz a 2 =2 < jZ n j z a 1 =2 g, C fjZ n j z a 2 =2 g, where UðÁÞ is the cumulative distribution function (c.d.f.) of Nð0, 1Þ: The probabilities p A , p B , and p C are functions of k: Here, we focus on the conditional distributions ofl GPT r : First, Prðl GPT Therefore, the conditional distribution of X n on A is a truncated normal distribution. Next, we have Prðl GPT r xjBÞ ¼ Prfh þ qðXÞð X n À hÞ xjBg: Although the conditional distribution of h þ qðXÞð X n À hÞ on B depends on qðXÞ, it is still a proper distribution since qðXÞ is measurable. Finally, It is the c.d.f. of a point mass at h: Then, the proof completes.
w Under the special case of qðXÞ ¼ c, one can derive the conditional distribution of h þ qðXÞð X n À hÞ on B: It follows that Consequently, the conditional distribution of h þ cð X n À hÞ on B also follows a truncated normal distribution. The following theorem derives the bias and MSE of the general pretest estimatorl GPT r : Theorem 2. The Bias and MSE of the general pretest estimatorl GPT where /ðÁÞ is the density function of Nð0, 1Þ. Furthermore, if qðXÞ ¼ c, we have Here, we defined the conditional expectation EfqðXÞZ n jBg ¼ 0 when p B ¼ 0 (i.e., a 1 ¼ a 2 ). Then the existence of EfqðXÞZ n jBg is guaranteed by the measurability of 0 qðXÞ 1: To show this, it suffices to check the integrability of the measurable function qðXÞZ n when B is given and p B > 0 (i.e., a 1 < a 2 ) . This follows from 0 EfqðXÞZ n jBg ¼ 1 p B EfIðX 2 BÞqðXÞZ n g 1 p B EðZ n Þ < 1, whereZ n jZ n j: If EfqðXÞZ n jBg is a continuous function of k, Biasðl GPT r Þ and MSEðl GPT r Þ are also continuous functions of k: Furthermore, if a 1 > 0 and for some fixed h, n, and r 2 , one also has Biasðl GPT r Þ ! 0, MSEðl GPT r Þ ! r 2 n as l ! 61: Hence we conclude that Biasðl GPT r Þ and MSEðl GPT r Þ are bounded over the parameter space, provided that EfqðXÞZ n jBg is continuous in k and a 1 > 0: Based on the above results, for some fixed 0 < a 1 a 2 1 and qðXÞ ¼ c, the largest absolute bias of the general pretest estimator occurs at the solutions of where S a ðkÞ ¼ z a=2 f/ðz a=2 À kÞ þ /ðz a=2 þ kÞg À fUðz a=2 À kÞ À UðÀz a=2 À kÞg: Unfortunately, the analytical solutions are unavailable. However, there is an intuitive interpretation of the function S a ðkÞ: It represents the difference between the area of a trapezoid with topline /ðz a=2 À kÞ, baseline /ðÀz a=2 À kÞ (¼ /ðz a=2 þ kÞ), and height 2z a=2 and the area under normal density /ðx À kÞ within ½Àz a=2 , z a=2 ( Figure 2). On the other hand, the largest MSE of the general pretest estimator occurs at the solutions of Again, the analytical solutions are unavailable.
The proof of Theorem 2 is available in the Supplementary Material. By using Theorems 1 and 2, one can obtain the bias and MSE of the estimators (i) -(iv) as special cases. For instance, we provide the results for the type I shrinkage pretest estimatorl SPT 1 r : Figure 2. The area of the trapezoid (red dashed line) and the area under the normal density (blue dotted line) with k ¼ 1 and a ¼ 0:1: Corollary 1. The type I shrinkage pretest estimatorl SPT 1 r follows a mixture distribution where D fjZ n j > z a=2 g, E fjZ n j z a=2 g, In addition, we have The results for the classical pretest estimatorl r PT can be obtained by setting c ¼ 0: The results for the MSE of the classical pretest estimator can also be found in Magnus (2000). Corollary 1 provides new expressions for the bias and MSE of the type I shrinkage estimator. It reveals that Biasðl SPT 1 r Þ ¼ 0 if k ¼ 0: Ahmed (2014) applied the Lemmas in the Appendix B of Judge and Bock (1978) to derive where v 2 p, k is the upper p-th quantile of the chi-squared distribution with k degrees of freedom for 0 < p < 1 and H k, n is the c.d.f. of the noncentral chi-squared distribution with k degrees of freedom and noncentrality parameter n ! 0: When n ¼ 0, the noncentral chi-squared distribution reduces to the usual chi-squared distribution (P.26, Shao 2003).
It turns out that our new expressions in Eq.
(2) and Eq. (3) can be computed more easily than Eq. (4) and Eq. (5). Some computational advantages exist for our new expressions in Eq.
(2) and Eq. (3). Only the z-table is needed to calculate Eq. (2) and Eq. (3). On the other hand, the table needed for Eq. (4) and Eq. (5) is not always accessible. Moreover, the presence of the noncentrality parameter makes it difficult to be used appropriately. We can even avoid using any tables by the following approximation technique. According to Marsaglia (2004), the normal c.d.f. can be expressed through its density as Thus, a natural approximation of UðxÞ can be defined as , k 2 f1, 2, :::g: To improve the approximation for tails, we suggest a truncated version of U k ðxÞ as U k, c ðxÞ ¼ U k ðxÞIðjxj cÞ þ Iðx > cÞ, k 2 f1, 2, :::g, where c > 0 is a suitable truncation point. In particular, we suggest choosing the truncation point as c Ã ¼ fP k j¼1 ð2j À 1Þg 1=2k that maximizes U k ðxÞ: Based on this choice, U k, c Ã ðxÞ provides an explicit approximation formula for UðxÞ: Note that U k, c Ã ðxÞ can achieve arbitrary accuracy by choosing a sufficiently large k: For k ¼ 13, the graph of U k, c Ã ðxÞ is almost identical to the graph of UðxÞ (Figure 3). To illustrate our explicit approximation formulas, we approximate Biasðl SPT 1 r Þ and MSEðl SPT 1 r Þ in Eq. (2) and Eq. (3) where UðxÞ is replaced by U k, c Ã ðxÞ under k ¼ 13: For illustration, we set a ¼ 0:05, n ¼ 1, r 2 ¼ 1, and c ¼ 0:5: Figure 4 shows that there is no visible difference between the true and approximated values. We emphasize that this computation can be performed without any tables.
The simple approximation formulas are feasible due to the Taylor expansion of the normal c.d.f. around zero. A Taylor expansion for the c.d.f. of the noncentral chisquared distribution does not exist unless one erabolates on more complicated calculus.
Lastly, we provide the results for the positive-part Stein-rule estimatorl SRþ r : To the best of our knowledge, these expressions are not derived in the literature.
where D and E are defined in Corollary 1, and In addition, we have a=2 À kÞ/ðz a=2 þ kÞ À ðz a=2 þ kÞ/ðz a=2 À kÞg: With the aid of Corollary 2, now it is clear thatl SRþ r shrinks X n toward h by adding or taking a constant value z a=2 r= ffiffiffi n p on the event D (Figure 1). The proof of Corollary 2 is available in the Supplementary Material.

Extensions to unknown variance
Assume n ! 2: In order to adapt our methods to unknown variance, we replace the test function a r by where T n ¼ ffiffiffi n p ð X n À hÞ=S n is a t-statistic, S 2 n ¼ R n i¼1 ðX i À X n Þ 2 =ðn À 1Þ is the sample variance, and t p, is the upper p-th quantile of the t-distribution with degrees of freedom for 0 < p < 1: We let t 0, 1 as usual. Then, T n follows a noncentral t-distribution with n À 1 degrees of freedom and noncentrality parameter k that is the departure constant form the null hypothesis. Following our proposed idea in Sec. 2, we define the general pretest estimator with unknown variance aŝ l GPT ¼ aðXÞ X n þ f1 À aðXÞgh: The estimator can also be written aŝ The general pretest estimator includes the following estimators as special cases: l PT ¼ h þ ð X n À hÞIðjT n j > t a=2, nÀ1 Þ: l SPT 1 ¼ X n À ð1 À cÞð X n À hÞIðjT n j t a=2, nÀ1 Þ: l SRþ ¼ h þ ð1 À t a=2, nÀ1 =jT n jÞð X n À hÞIðjT n j > t a=2, nÀ1 Þ: One may study the theoretical properties of these estimators by developing the theory for the general pretest estimatorl GPT : As in Sec. 2, we derive the distribution function, bias, and MSE ofl GPT : Theorem 3. The general pretest estimatorl GPT follows a mixture distribution where Pðk, yÞ ¼ 1 CðkÞ ð y 0 t kÀ1 e Àt dt is the regularized gamma function. The conditional density of X n on the event A Ã is The event A Ã can be written as Since ðn À 1ÞS 2 n =r 2 $ v 2 nÀ1 , we have S n $ r= ffiffiffiffiffiffiffiffiffiffiffi n À 1 p v nÀ1 that is a scaled chi distribution with n À 1 degrees of freedom. According to the fact that X n is independent of S n under the normality, the joint density function of X n and S n is Therefore, the probability of the event A Ã can be obtained by integrating its area with respect to the above joint density function. Let j ¼ ffiffiffi n p jx À hj=t a 1 =2, nÀ1 and consider a change of variable u ¼ ð ffiffiffiffiffiffiffiffiffiffiffi n À 1 p y=rÞ 2 =2: Thus, we have This implies the conditional density of X n on the event A Ã is f X n jA Ã ðxÞ ¼ g A Ã ðxÞ p A Ã , À 1 < x < 1: The probability of the event C Ã is obtained similarly as On the other hand, we have Then, the proof completes.
w Under the special case of qðXÞ ¼ c, one can derive the conditional density of h þ qðXÞð X n À hÞ on the event B Ã : It follows that f hþcð X n ÀhÞjB Ã ðxÞ ¼ x À h À cðl À hÞ cr= ffiffiffi n p Â P n À 1 2 , nðn À 1Þðx À hÞ 2 2c 2 r 2 t 2 a 2 =2, nÀ1 ! À P n À 1 2 , nðn À 1Þðx À hÞ 2 2c 2 r 2 t 2 The following theorem derives the bias and MSE of the general pretest estimatorl GPT : Theorem 4. The bias and MSE of the general pretest estimatorl GPT are The existence of the conditional expectation EfqðXÞZ n jB Ã g can be proved by using the measurability of 0 qðXÞ 1 as we did in Theorem 2. In a similar fashion, if EfqðXÞZ n jB Ã g is a continuous function of k, then Biasðl GPT Þ and MSEðl GPT Þ are also continuous functions of k: Furthermore, if a 1 > 0 and for some fixed h, n, and r 2 , one also has Biasðl GPT Þ ! 0, MSEðl GPT Þ ! r 2 n , as l ! 61: Hence we conclude that Biasðl GPT Þ and MSEðl GPT Þ are bounded over the parameter space, provided that EfqðXÞZ n jB Ã g is continuous in k and a 1 > 0: The proof of Theorem 4 is available in the Supplementary Material. Similar to Sec. 3, we provide the results for the type I shrinkage pretest estimatorl SPT 1 : Corollary 3. The type I shrinkage pretest estimatorl SPT 1 follows a mixture distribution where D Ã fjT n j > t a=2, nÀ1 g, E Ã fjT n j t a=2, nÀ1 g, The conditional densities of X n on D Ã and h þ cð X n À hÞ on the E Ã are respectively. In addition, we have The results for the classical pretest estimatorl PT can be obtained by setting c ¼ 0: Corollary 3 gives new expressions for the bias and MSE of the type I shrinkage pretest estimator. Khan and Saleh (2001) applied the Lemmas in the Appendix B of Judge and Bock (1978) to derive Biasðl SPT 1 Þ ¼ À r ffiffiffi n p ð1 À cÞkG ð3, nÀ1Þ, k 2 ðF a, ð1, nÀ1Þ =3Þ, MSEðl SPT 1 Þ ¼ r 2 n 1 À ð1 À c 2 ÞG ð3, nÀ1Þ, k 2 ðF a, ð1, nÀ1Þ =3Þ þ ð1 À cÞk 2 f2G ð3, nÀ1Þ, k 2 ðF a, ð1, nÀ1Þ =3Þ À ð1 þ cÞG ð5, nÀ1Þ, k 2 ðF a, ð1, nÀ1Þ =5Þg ! , (9) where F p, ðk 1 , k 2 Þ is the upper p-th quantile of the F-distribution with ðk 1 , k 2 Þ degrees of freedom for 0 < p < 1 and G ðk 1 , k 2 Þ, n is the c.d.f. of the noncentral F-distribution with ðk 1 , k 2 Þ degrees of freedom and noncentrality parameter n ! 0: If n ¼ 0, the noncentral F-distribution reduces to the usual F-distribution (P.27, Shao 2003).
Unfortunately, we do not have the special expressions for G ð3, nÀ1Þ, k 2 and G ð5, nÀ1Þ, k 2 to retrieve our new expressions (Eq. (6) and Eq. (7)) from the previous results (Eq. (8) and Eq. (9)) as we did in Sec. 3. However, we provide numerical comparisons in Table  1 that illustrates the equivalence.
We also provide the results of the positive-part Stein-rule estimatorl SRþ : Corollary 4. The positive-part Stein-rule estimatorl SRþ follows a mixture distribution where D Ã and E Ã are defined in Corollary 3. In addition, we have Corollary 4 reveals thatl SRþ shrinks X n toward h by adding or subtracting a random variable t a=2, nÀ1 S n = ffiffiffi n p on the event D Ã : The proof of Corollary 4 is given in the Supplementary Material. Table 2. Simulation results on the bias and MSE of the pretest estimators under different departure constant k with n ¼ 10, a ¼ a 1 ¼ 0:0001, a 2 ¼ 0:001, r 2 ¼ 1, and c ¼ 0:5: Note: Ã Theoretical is based on Theorem 4 and Corollaries 3-4. ÃÃ Simulated is based on R ¼ 10, 000 simulation runs.

Simulations
This section carries out Monte Carlo simulation studies to examine our theoretical results. Since the simulation results on the cases of known and unknown variance are similar, we will only consider the latter one. We generate i.i.d. data X i $ Nðl, 1Þ, i ¼ 1, :::, n: To control the departure constant k, we set l ¼ ðk þ 1Þ= ffiffiffi n p and h ¼ 1= ffiffiffi n p such that ffiffiffi n p ðl À hÞ ¼ k: Based on the generated data, we compute the pretest estimatorsl PT ,l SPT 1 ,l GPT , andl SRþ : Then, we compare our theoretical values on the bias and MSE with their simulated values as respectively, where R is the number of simulation runs,l Ã is one of the aforementioned pretest estimators, andl Ã j is the estimate in the j-th simulation run. We set different levels of departure constant k ¼ 0:1, 0:5, 1, 2, 5 and set the tuning parameters for the pretest estimators as a ¼ a 1 ¼ 0:0001, a 2 ¼ 0:001, and c ¼ 0:5: These tuning parameters are chosen to be the same as those will be used later in real data analysis. Our simulation results are based on sample size n ¼ 10 with R ¼ 10, 000 simulation runs.
Furthermore, we also consider sensitivity analysis when the data are contaminated. We generate i.i.d. data from a contaminated normal distribution given by w i Nðl, 10Þ þ ð1 À w i ÞNðl, 1Þ, i ¼ 1, :::, n, where w i is a Bernoulli random variable with Prðw i ¼ 1Þ ¼ 1 À Prðw i ¼ 0Þ ¼ e being the contamination probability. This setting indicates that we have contamination in lower and upper tails. Similarly, we compute the pretest estimators based on the Table 3. Sensitivity analysis on the bias and MSE of the pretest estimators under different contamination probability e with k ¼ 0:1, n ¼ 10, a ¼ a 1 ¼ 0:0001, a 2 ¼ 0:001, r 2 ¼ 1, and c ¼ 0:5: generated contaminated data and then compare the theoretical values of the bias and MSE with their simulated values. We set different levels of contamination probability e ¼ 0:1, 0:2, 0:3, 0:4, 0:5 and fix the departure constant k ¼ 0:1: The other tuning parameters for the pretest estimators remain the same. Our simulation results are based on sample size n ¼ 10 with R ¼ 10, 000 simulation runs. Table 2 reveals that the simulated bias and MSE of all the pretest estimators are very close to their theoretical values under all levels of k: This result asserts the correctness of our theoretical derivations for the bias and MSE. Table 3 shows that the simulated bias and MSE ofl SPT 1 deviate from their theoretical values when the contaminated distribution departures from normal distribution. Interestingly, we also found that the contamination in lower and upper tails only has minor effects inl PT ,l GPT , andl SRþ : This may be due to the unbiasedness of X n under the contamination. Since we set the parameters k ¼ 0:1 and a ¼ a 1 ¼ 0:0001, these pretest estimators strongly shrink the estimates toward h that is close to the true value l:

Real data example
To demonstrate the application of the proposed class of estimators, we analyze the gene expression data from the microarray experiments of colon tissue samples of Notterman et al. (2001). The data consists of total 7457 gene expressions for n ¼ 18 patients on both tumor tissue and normal tissue. The main purpose of the experiments is to select genes that are associated with the adenocarcinoma and to study how the expression values differ between the tumor and normal tissue. The data has been analyzed by a number of authors, including Cipolli, Hanson, and McLain (2016) who used multiple tests with a nonparametric Bayesian approach. The R codes for the data analysis are available from the Supplementary Material. For a gene in the i-th patient, the paired gene difference is denoted as X i ¼ tumor i À normal i , i ¼ 1, 2, :::, 18: The mean gene difference is denoted by l ¼ EðX i Þ: We set the uncertain non-sample prior information h ¼ 0 since the majority of genes may not be associated with the disease of interest. For each gene, we examined the normality assumption for the paired gene difference X i by using the Shapiro-Wilk test. Under the significance level of 0.05, the overall rejection rate for the normality tests of all the genes is 0.228. Hence the normality assumption may be suitable for the majority of the genes. The normality of the tumor and normal groups was also examined separately by Feng, Zhang, and Liu (2020). We first estimate the mean gene difference l by the classical pretest estimator where X n ¼ R n i¼1 X i =n and S 2 n ¼ R n i¼1 ðX i À X n Þ 2 =ðn À 1Þ: In order to select genes associated with disease, it is usually recommended to choose a very small significant level. Here, we set a significance level a ¼ 0:0001 and found that there are 304 genes witĥ l PT ¼ X n and other genes withl PT ¼ 0: For those 304 genes associated with the adenocarcinoma, it is of our interest to see how the mean gene difference deviates from zero.
On the other hand,l PT ¼ 0 implies that we do not intend to estimate the mean gene difference. In fact, the classical pretest estimator is essentially the sample mean for those genes rejected by the t-tests. However, the MSE ofl PT has to account for the process of gene selection by pretests.
If the quantities of the mean gene differences are also of interest for those rejected by the t-tests, one may use the other pretest and shrinkage estimators. We computed the five estimatorsl SÃ ,l SPT 1 ,l SPT 2 ,l GPT , andl SRþ that belong to the proposed class. For l SÃ ,l SPT 1 , andl SPT 2 , we set the probability function qðXÞ ¼ 0:5 corresponding to 50% shrinkage of the sample mean toward zero. Forl SRþ , we set the probability function qðXÞ ¼ 1 À z a=2 =jZ n j: The significant levels forl SÃ ,l SPT 1 ,l SPT 2 , andl SRþ are all set to be a ¼ 0:0001: Forl GPT , we set the probability function qðXÞ ¼ 0:5 with a 1 ¼ 0:0001 as before, but set another significance level a 2 ¼ 0:001 to pick out genes that have moderate effects on disease. This significance level is designed to allow some, but not too many false positives (Simon 2003;Chen et al. 2007;Emura et al. 2018;Emura, Matsui, and Chen 2019a;Emura, Matsui, and Rondeau 2019b). Table 4 summarizes the estimation results together with the P-values of the t-tests. The estimates for the gene D00306 (the 9-th gene out of 7457 genes) is zero in the classical pretest estimator (l PT ¼ 0). On the other hand, it is nonzero in the general pretest estimator (l GPT ¼ 0:5 Â X n ¼ À7:85). This is the effect of picking up the moderate effect of this gene; instead of shrinking to zero, it shrinks the mean by half. We observe thatl SRþ andl SPT 2 select the same genes and then strongly shrink the estimates toward zero. These almost zero estimates do not seem to provide good estimates for the mean difference, though they might have a good performance in terms of the MSE around jlj % 0: The two shrinkage estimatorsl SÃ andl SPT 1 exhibit similar values, and are nonzero for all the genes. Although they provide more information than those zero estimates, the practical values of these shrinkage estimators may be limited within this case study.
To illustrate our theoretical results, we use the gene D00306 (k ¼ 9) to evaluate the MSE of the five estimatorsl SÃ ,l SPT 1 ,l SPT 2 ,l GPT , andl SRþ by using the formulas provided in Sec. 4. To do so, we assumed that the true variance r 2 ¼ varðX i Þ is equal to the sample variance S 2 n ¼ 223:63: Figure 5 reveals thatl SPT 1 produces the smallest MSE among the other estimators except jlj % 0, which however does not have the gene selection ability as we mentioned before. For those estimators having the selection ability, i.e., l PT ,l GPT ,l SPT 2 , andl SRþ , all of them behave similarly to the restricted estimator h in a large range of jlj due to the small significant level a ¼ a 1 ¼ 0:0001: Figure 5 also suggests thatl GPT may be the best choice for simultaneous estimation and selection based on the MSE criterion. Again, it should be emphasized that the MSE for the sample mean is invalid due to the lack of considering the selection process.

Concluding remarks
In this paper, we propose a class of general pretest estimators that incorporate non-sample prior information. By introducing a randomized test, the proposed class is extremely flexible and includes many well-known estimators as special cases. The probability function qðÁÞ is treated as a shrinkage parameter, and hence, the proposed class of estimators is non-randomized. To the best of our knowledge, pretest estimators with randomized Figure 5. The estimated MSE of the five estimatorsl PT ,l GPT ,l SPT1 ,l SPT2 , andl SRþ , with a ¼ a 1 ¼ 0:0001, a 2 ¼ 0:001, c ¼ 0:5 for the gene D00306 (the 9-th gene).
tests were not considered in the literature; the existing pretest estimators are based on nonrandomized tests (Giles and Giles 1993;Ahmed 2014). Theoretical properties of the proposed class are investigated, such as the distribution function, bias, and MSE. Our newly obtained expressions for the bias and MSE can be more easily computed numerically than the existing ones in the literature.
As a by-product of the proposed class of estimators, a new derivation of the positivepart Stein-rule estimator is obtained: it is a randomized pretest estimator. In the literature, there is no formal derivation of the positive-part Stein-rule estimator as a pretest estimator, though the former is known to be a smoothed version of the latter. Note that this derivation is ad-hoc since there are a number of different ways to smooth the test function. The randomized test gives an alternative framework to introduce the positivepart Stein-rule estimator. This idea has not been proposed in the literature.
The proposed class carries the sampling properties of the pretest estimators, yielding the sampling distribution as a mixture of three distributions (Theorem 3). Thus, the expressions of the MSE is complex. In addition, optimizing the MSE under the usual minimax criterion may lead to unreasonable or trivial results. Hence it may only be suitable to optimize the MSE function numerically under further criterion such as the minimax regret (Sawa and Hiromatsu 1973;Magnus 2000). If the class were restricted to a deterministic weighted sum of the sample mean and its non-sample estimate, the MSE optimization would be possible (e.g., Vishwakarma and Kumar 2015). Therefore, the main application of the MSE function is the numerical assessment as in Khan and Saleh (2001), which provides insights for real applications (see Figure 5). In this respect, we emphasize some computational advantages for our newly obtained MSE expressions in Eq. (2) and Eq. (3) over the existing expressions.
The proposed class of estimators has tuning parameters, such as the significance levels a 1 and a 2 , and the arbitrary form of probability function qðÁÞ: This provides a remarkable flexibility for real applications. For instance, in our application to highdimensional gene expressions (Sec. 6), we set the probability function qðXÞ ¼ 0:5 with two significance levels: a 1 ¼ 0:0001 to pick out potentially informative genes, and a 2 ¼ 0:001 to pick out genes that have moderate effects. The choice of a 2 ¼ 0:001 is suggested in analysis of gene expression data (Simon 2003;Emura et al. 2018;Emura, Matsui, and Rondeau 2019b) to allow some, but not too many false positives. The availability of such existing criterions is often helpful to decide the tuning parameters of the proposed class in applications. In some cases, one may set the probability function qðXÞ ¼ ða À a 1 Þ=ða 2 À a 1 Þ under a constraint a 1 < a < a 2 to obtain a size a test.
The idea of utilizing a randomized test may also be applied to other pretest estimators, especially in regression analysis. For instance, the pretest estimator for the intercept with some prior knowledge about the slope in a simple linear regression model (Khan, Hoque, andSaleh 2005, Waldl 2010;Shih et al. 2021) and the so-called specification pretest estimator (Hausman 1978, Gourieroux andTrognon 1984). In fact, Gourieroux and Trognon (1984) have derived the expressions for the bias and MSE for the specification pretest estimator for one regression parameter, which are similar to our expressions.