Bias-corrected estimators for proportion of true null hypotheses: application of adaptive FDR-controlling in segmented failure data

ABSTRACT Two recently introduced model-based bias-corrected estimators for proportion of true null hypotheses ( ) under multiple hypotheses testing scenario have been restructured for random observations under a suitable failure model, available for each of the common hypotheses. Based on stochastic ordering, a new motivation behind formulation of some related estimators for is given. The reduction of bias for the model-based estimators are theoretically justified and algorithms for computing the estimators are also presented. The estimators are also used to formulate a popular adaptive multiple testing procedure. Extensive numerical study supports superiority of the bias-corrected estimators. The necessity of the proper distributional assumption for the failure data in the context of the model-based bias-corrected method has been highlighted. A case-study is done with a real-life dataset in connection with reliability and warranty studies to demonstrate the applicability of the procedure, under a non-Gaussian setup. The results obtained are in line with the intuition and experience of the subject expert. An intriguing discussion has been attempted to conclude the article that also indicates the future scope of study.


Introduction
The current work considers a segmented failure dataset, where failure time or some similar entity of a particular component is available for a number of units but the units are operated or tested in different conditions, that may vary over space and time. Thus, the dataset is divided into several segments and the observations are available for each segment. The number of observations per segment (in order of tens or hundreds) might be much less than the number of segments (in order of hundreds or thousands), as the segmentation is done on the basis of time and space among other things. Thus the situation is quite similar to that of microarray datasets where thousands of genes are tested to identify differentially expressed genes based on gene expression levels of two small groups of subjects, viz. treatment group and control group. For segmented failure dataset, similar kind of questions may arise regarding identification of segment(s) for which the failure patterns of that particular component is strikingly different (worse or better) from a benchmark, say the average. To answer this question, appropriate hypotheses for each segment are framed and tested simultaneously. While testing a large number of hypotheses, a control over the false discovery rate (FDR) [1] is desirable and the classical Benjamini-Hochberg algorithm [1] might be employed to achieve it. However, the power of Benjamini-Hochberg algorithm or any general step-up procedure can be improved by incorporating a conservative estimate of the proportion of true null hypotheses (π 0 ) or equivalently the number of the true null hypotheses [2,20].
Gaussian model assumption for failure data is inappropriate, especially when sample size corresponding to each segment or equivalently each test is small. On the contrary, in such situation, exponential distribution may be a reasonable primary model choice. Under such exponential setup, we modify both the estimators proposed in Cheng et al. [6] and Biswas [3] and find that these model-based estimators are more efficient than the existing π 0 -estimators in practice. Application of adaptive Benjamini-Hochberg procedure has the ability to list the significantly different segments with respect to such time to event or equivalent entity of a certain component in our case study. For microarray datasets, the model-based approach is well established, especially under normality assumption [3,6]. In this article, we adapt and implement the same under the exponential model to a segmented failure data, where such model assumption is appropriate and an alternative model formulation may not be satisfactory to the desired extent. In what follows we introduce the parameter π 0 through the empirical Bayesian setup given in Storey [22].
Consider m similar but independent hypotheses are to be tested, viz. H 1 , H 2 , . . . , H m . For H i = 1, the ith null hypothesis is true and for H i = 0, false for any i ∈ {1, 2, . . . , m}. Thus, H i 's are Bernoulli random variables with success probability π 0 ∈ (0, 1). Let m 0 be the number of true null hypotheses. Thus, m 0 = m i=1 H i is a binomial random variable with index m and parameter π 0 . Clearly, H i 's and hence m 0 remain latent and can never be realized in a given multiple testing scenario. As in case of single hypothesis testing problem, the test statistics T 1 , T 2 , . . . , T m , respectively, for H 1 , H 2 , . . . , H m may be observed. For F 0 being the common distribution of T i |H i = 1 and F 1 being the same for T i |H i = 0, a twocomponent mixture model for T i is Thus, π 0 may be thought of as the mixing proportion of the null test statistics with the non-null test statistics when multiple tests are performed. In existing literature p-values are considered as test statistics since its use ensures similar nature of critical region, irrespective of the nature of hypotheses framed. Usually, a little abuse of notation is made while denoting p-value by p irrespective of whether it is a random variable or a realization on that.
The distinction of usage ought to be understood as the situation demands. The marginal density function of p-value [13] is where f 0 and f 1 are two p-value densities, respectively, under the null and alternative hypotheses. When the tested null is simple and the corresponding test statistic is absolutely continuous, f 0 (p) is simply 1, the density function of a uniform random variable over (0, 1) and the p-value under the alternative hypothesis is stochastically smaller than the uniform variate. In addition, the density estimation-based approaches for estimating π 0 impose certain restrictions on f 1 [9,13,17]. Often p-values under the alternative are modelled by parametric distributions [15,19] and π 0 is estimated using maximum likelihood methods. This requires the p-values to be independent among themselves which is rarely satisfied. Storey's estimator [22] is constructed on the basis of a tuning parameter λ ∈ (0, 1) such that, f 1 (p) = 0 for p > λ. This assumption introduces a conservative bias in the estimator that can be corrected or in practice, can be reduced as have been discussed in Cheng et al. [6]. The setup given therein for the applicability of the Gaussian model-based bias correction is discussed in Section 2. Biswas [3] has recently proposed an alternative modelbased bias-corrected estimator for π 0 under the same setup. A comparative performance study of both the estimators with simulated microarray datasets has also been provided.
There are several other works on estimation of π 0 , not directly related to the current work. The interested readers are referred to Storey and Tibshirani [23], Wang et al. [26] and Tong et al. [25]. The remaining part of the article is structured as follows. In Section 2, we reproduce Storey's estimator and the recently introduced bias-corrected estimators from stochastic ordering approach which ties them in a yarn and may inspire further works in similar line. The next section is devoted to different testing scenarios and useful properties of respective non-null p-values. In Section 4, we briefly revisit the estimation algorithms and discuss adaptation of the π 0 estimates to Benjamini-Hochberg algorithm. Section 5 deals with performance comparison of the new estimators with existing ones through extensive simulation experiment. In Section 6, a real-life synthetic segmented failure dataset is presented, has been validated for applicability of the proposed methods, analysed to demonstrate the superior performance of adaptive algorithm with the new estimators along with proper justification of the findings. We conclude the article with a mention of a few limitations of the present work and a glimpse of the future direction of the study.

Methods of estimation
Let p denote a p-value corresponding to a simple null hypothesis testing problem with continuous test statistic. Thus, p has the support (0, 1). Consider another random variable V on the same support (0, 1) with the distribution function G. Then, In the following subsections, we take different choices for G and motivate different estimators for π 0 as mentioned in Section 1.

Storey's bootstrap estimator and related approaches
Consider V to be degenerate at some λ ∈ (0, 1). Thus, Putting (2) and (4) in (3), we obtain where F is the distribution function of p,F = 1 − F and Q is the survival function of nonnull p-value. Assume, • A1: For an appropriate choice of λ, Q(λ) = P(p > λ|H = 0) i.e the probability of nonnull p-value being greater than λ equals zero [22].
When parameter of interest under alternative hypothesis is substantially far from the same specified under null hypothesis or sample size is moderate to large, p-value tends to be smaller for consistent tests. Hence, even for moderate choice of λ, the probability of pvalue under false null dominating λ, vanishes. This is a reasonable but crucial assumption in a sense that, violation of assumptions regarding the true value of the parameter of interest and sample size may not result in Q(λ) = 0. Thus, applying A1 in (5) we get Let p 1 , p 2 , . . . , p m be the p-values corresponding to the m hypotheses tested or equivalently m realizations on p. Denote W(λ) = m i=1 I(p i > λ) (I denoting the indicator function) to be the number of p-values greater than λ. Putting the plug-in estimator ofF(λ), i.e W(λ)/m in (6), an estimator for π 0 depending upon the choice of λ may be suggested aŝ For a given dataset, two different choices of λ would yield two different estimates and thus an optimum choice of λ for a given dataset is necessary. For a subjectively chosen set with possible values of λ ∈ , where = {0, 0.05, 0.10, . . . , 0.95}; a bootstrap routine is given in Storey [22] and Storey et al. [24] to approximate the best λ. Thus, Storey's bootstrap estimator is:π B 0 =π 0 (λ best ). In Storey and Tibshirani [23], natural cubic spline has been fitted to the (λ,π 0 (λ)) curve for smoothing and the evaluated value of the fit at λ = 1 (as motivated in Corollary 1 of [22]) is taken as the final estimate which we denote byπ P 0 . For a small choice of λ inπ 0 (λ), the bias of the estimator is large while the variance is small. The situation is exactly opposite for large λ. It has been first noted by Jiang and Doerge [12] and they have suggested the use of multiple λ's instead of a single best choice, in some sense. For the time being assume a fixed set S λ = {(λ 1 , λ 2 , . . . , λ k ) : 0 < λ 1 < λ 2 < · · · < λ k < 1} for a fixed k and equal width given by (λ i+1 − λ i ) for i = 1, 2, . . . , k − 1 such that A1 holds. Then, the average estimate based approach suggestŝ π A 0 = (1/k) k i=1π 0 (λ i ) to be an appropriate estimator for π 0 . The authors have also suggested a change-point based algorithm to select S λ .

Bias correction of Storey's estimator
Without the assumption A1, from (5), we get for fixed λ. Cheng et al. [6] obtained (8) from a somewhat different motivation. Substituting plug-in estimator ofF(λ) has already been discussed in Section 2.1. For estimating Q(λ) following assumptions are necessary.
• A2: The availability of a common test for all the m hypotheses.
• A3: The data-arrays used for each test are generated from a known parametric family.
• A4: The closed form distribution of the test-statistics under the null are of a known family, enabling the calculation of the exact p-values. • A5: The distribution of the non-null test-statistics and hence the non-null p-values are labelled by unknown effect sizes, which are different for each test.
A2 is generally true for microarray experiments and is also appropriate for the present setup. Cheng et al. [6] assumed normality for each expression level, such that A3 is valid. Time to events or its equivalent entities for each segment are assumed to be exponentially distributed, thus satisfying A3. Under normality, the test-statistics for usual single-sample or two-sample tests for the mean are normal under null. In this work, the test-statistics for single-sample test related to the exponential rate parameter is a χ 2 variate under null and for a two-sample problem the test-statistic is distributed as a F variate. Thus, A4 also holds good. As mentioned earlier test for H i is performed by T i and we introduce the notation δ i to denote the effect size of the corresponding test. Non-null distribution of T i and hence the non-null distribution of p i is to be labelled by δ i , i = 1, 2, . . . , m.
Hung et al. [11] have discussed properties of non-null p-values, where non-null distribution of the p-value for Z-test has been explored. For single sample and two sample t-test, similar discussion is available in Section 3 of Cheng et al. [6]. We will discuss such properties of non-null p-value for single and two sample problems under exponential setup in Section 3. Let I = {1, 2, . . . , m}. Also let T denote the set of indices corresponding to the originally true null hypotheses i.e, T = {i ∈ I : H i = 1}. Thus, the cardinality of T , is m 0 . Similarly denote the set of originally false null hypotheses by F. Clearly, F = I − T with cardinality m 1 . Each null p has the same distribution, uniform over (0, 1); while the distribution of non-null p-values are different but they belong to the same family. Let f δ 1 (p) denote the distribution of p with effect size δ. Then for all i ∈ F, Q δ i (λ) = 1 λ f δ i 1 (p) dp, probability of ith non-null p-value being greater than λ. Define, Q(λ) = (1/m 1 ) i∈F Q δ i (λ), the average of non-null p-values greater than λ. To estimate Q(λ), individual δ i 's are estimated byδ i , i ∈ F. In fact,δ i 's are strongly consistent for δ i for each i ∈ F. The estimation of δ under different testing problem is discussed in Section 3. Each Q δ i (λ) is continuous in δ i and thus, Qδ i (λ) is strongly consistent for Q δ i (λ). Thus, a strongly consistent estimator for Q(λ) isQ(λ) = (1/m 1 ) i∈F Qδ i (λ). In practice, F is unknown and henceQ(λ) is unavailable. AssumeQ(λ) to be a dummy forQ(λ) such thatQ(λ) ≥Q(λ) with probability 1. The computation ofQ(λ) is discussed in detail in Section 4. Substituting the plug-in estimators forF(λ) and Q(λ) in (8), we getπ U 0 (λ) (orπ U 0 (λ)), bias-corrected estimator for π 0 with fixed choice of λ. We now address the issues related to reduction in bias and over-correction in the following result.

Result 2.1:
With the setup and notations introduced in Section 2.2, for all λ ∈ (0, 1) Result 2.1 combines claims written in Sections 2 and Section 4.2 of Cheng et al. [6]. We have been able to prove Result 2.1 in a more direct way. Thus, the approach reduces conservative bias of Storey's primary estimator while refraining from over-correction.
The situationsπ 0 (λ) ≤ 1 andQ(λ) ≤ (1 − λ) are quite usual in multiple testing setup as the first one is a reasonable estimate of π 0 andQ(λ) is a consistent estimate of Q(λ), which is obviously less than (1 − λ). If these do not hold good,π U 0 (λ) lies outside the parameter space and then we take the estimate to be the nearest boundary point.
= {0.20, 0.25, . . . , 0.5} is taken as in Jiang and Doerge [12] for similar purpose (see Section 2.2 in [6]) and we identify the following estimator as the bias and variance reduced estimator for π 0 : where # denotes cardinality of .

Estimator based on sum of all p-values
Instead of taking V degenerated at some fixed λ, assume V ∼ Uniform(0, 1). Putting since p|H = 1 ∼ Uniform(0, 1). In (9), we use e to denote expectation of non-null p-value: e = E(p|H = 0). From (9), we get To estimate π 0 , both E(p) and e are to be estimated. E(p) can be estimated by the mean of observed p-values The average of expected p-values under the alternative, e can be estimated imitating the approach of estimating Q(λ) with assumptions A2-A5. The corresponding estimator for π 0 has recently been introduced in Biswas [3] and computation of e δ i = E(p i |i ∈ F) has been demonstrated for single and two sample t-tests therein. Since each e δ i is bounded and continuous in δ i , following the discussion in Section 2.2, a strongly consistent estimator for e isẽ = (1/m 1 ) i∈F eδ i , which cannot be realized in practice for obvious reason mentioned earlier and henceπ E 0 = (p −ẽ)/(0.5 −ẽ) cannot be implemented. Forê being a dummy ofẽ withê ≤ẽ almost surely, an workable estimator for π 0 isπ E 0 = (p −ê)/ (0.5 −ê).

Result 2.2:
With the setup and notations introduced in Section 2.3 The situationsp ≤ 0.5 andẽ ≤ 0.5 are very natural in multiple testing setup asp is consistent for E(p), which is less than 0.5 and similarlyẽ is consistent for e which is also less than 0.5. If theses do not hold good,π E 0 lies outside the parameter space and then we take the estimator for π 0 asπ Both the model-based bias-corrected estimators are shown to have conservative bias for estimating π 0 . In Cheng et al. [6],π U 0 has been shown to outperform the robust estimators under reasonable model assumption, whereas under similar situationπ E 0 outperforms, it in terms of mean square error, as empirically studied in Biswas [3] through extensive simulation study. Note that, both the estimators use an initial estimator for π 0 but the computation ofπ E 0 does not require flexible threshold tuning parameters owing to the fact that it uses all the p-values. To rule out the possibility of estimates taking value outside the parameter space under very unusual situation,π E 0 is taken to be equal to the nearest boundary point when it lies outside the parameter space. Proof of the results presented in this section are provided in the Appendix.

Properties of non-null p-values
To implement the bias-corrected estimatorsπ U 0 andπ E 0 , appropriate estimates of the unknown quantities Q(λ) and e are needed. To get explicit expressions for these quantities, we need to have the probability density functions f δ i 1 (p) (for notational convenience we write this to be f δ i (p) henceforth) for each non-null p-value with effect size δ i , i ∈ F. The subscript i in effect sizes are not specified in this section for ease of notation. Thus, for different testing scenarios, we determine the probability density function f δ (p), then Q δ (λ) by integrating f δ (p) from λ to 1 and finally obtain e δ through the following results. As discussed in Section 2.2, Q δ (λ) for fixed λ and e δ are continuous in δ under each of the testing problems considered here.
Result 3.1: Assume X 1 , X 2 , . . . , X n be a random sample of size n from an exponential distribution with mean θ. Consider the following testing problem: For the corresponding likelihood ratio test (a) δ = θ and thusδ = min{1,X} Here f χ 2 ν , F χ 2 ν and χ 2 p,ν denote the probability density function, the distribution function and the upper-p point of chi-square distribution with ν degrees of freedom, respectively.

Result 3.2:
For X 1 , X 2 , . . . , X n to be a random sample from exponential distribution with mean θ, consider the testing problem: For the corresponding likelihood ratio test The notations used for stating Result 3.1 also remain relevant here. In addition to that, χ 2 ν (a, b) denotes the truncated chi-squared distribution with degree of freedom ν and region of truncation being (a, b). Here μ denotes the median of χ 2 2n distribution.

Result 3.3:
Let X 1 , X 2 , . . . , X n 1 and Y 1 , Y 2 , . . . , Y n 2 be two random samples of size n 1 and n 2 , respectively, from exponential distribution with mean θ 1 and θ 2 . Consider the testing problem For the corresponding likelihood ratio test, we have the following.
Here f F ν 1 ,ν 2 , F F ν 1 ,ν 2 and F p,ν 1 ,ν 2 , respectively, denote the probability density function, the distribution function and the upper-p point of F distribution with ν 1 and ν 2 degrees of freedom.

Result 3.4:
Consider X 1 , X 2 , . . . , X n 1 and Y 1 , Y 2 , . . . , Y n 2 to be independent random samples of size n 1 and n 2 from exponential distributions with mean θ 1 and θ 2 , respectively. Consider the testing problem For the corresponding likelihood ratio test, we have The notations used for Result 5 also remain relevant here. In addition to that, F ν 1 ,ν 2 (a, b) denotes the truncated F-distribution with degrees of freedom ν 1 , ν 2 and region of truncation (a, b). Here μ denotes the median of F 2n 2 ,2n 1 distribution. Interested readers may find proof of the results in the Appendix.

Algorithms
Algorithm for computingπ U 0 under normal model assumption is given in in Cheng et al. [6] and forπ E 0 under same setup, see Biswas [3]. First, we reframe the algorithms under current setup to maintain readability and for making the proposed estimation methods readily available to the practitioners. For the sake of brevity we only consider the testing problem in Result 3.2 and use the corresponding non-null p-value properties here. For all the four situations discussed here, the following algorithms can be implemented with obvious modifications.

Algorithm 4.1 (For computingπ
by Qδ i (λ j ) given by where, n i denotes the available sample size for testing ith hypothesis. • Using an available estimator of π 0 as initial estimatorπ I where [ ] denotes the usual box function. Arrange Qδ i (λ j )'s in increasing order and denote the ith quantity in the list asQ (i) (λ j ). Thus a conservative estimator for Q(λ j ) iŝ .
where n i denotes the available sample size for testing ith hypothesis and μ i denotes the median of χ 2 2n i distribution. • Using an initial estimator of π 0 as initial estimatorπ I 0 , calculate d = [m × (1 −π I 0 )], as before. Arrangeê δ i 's in increasing order and denote the ith quantity in the list asê (i) . Thus a conservative estimator for e iŝ • Givenê, calculateπ E 0 = min 1, max 0,p −ê 0.5 −ê .

Note 1:
The role ofπ I 0 is important in obtainingQ(λ) andê. Forπ I 0 ≥ π 0 , observe that m 1 ≥ d. Clearly,Q(λ) ≤Q(λ) andê ≤ẽ. Thus,π U 0 ≥π U 0 andπ E 0 ≥π E 0 . Note 2: For implementation of both the algorithms, we choose Storey's bootstrap estimator, given byπ B 0 as the initial estimator. This choice seems reasonable albeit being non-universal and further research on this is warranted. The algorithms could also be implemented with other choices ofπ I 0 . The performance analysis of the bias-corrected estimators under the current setup requires extensive simulation study, starting with different choices of the initial estimator. In fact, the algorithms could in principle be done several times, each time with the estimate of π 0 from the previous iteration. Obviously, this technique will become computation intensive for all practical purposes. We refrain from addressing these issues, as they are beyond the scope of the current study.
It has already been mentioned in Section 1 that Benjamini-Hochberg procedure for controlling the FDR is conservative. To understand this, we briefly discuss FDR and the algorithm for controlling it at a prefixed level q ∈ (0, 1). While testing m hypotheses simultaneously, let R be the total number of rejected hypotheses by application of certain multiple testing algorithm. From the entire set of rejected hypotheses, some hypotheses may be originally true. These are categorized as false discovery and let V denote the total number of such false discoveries. Then the false discovery proportion (FDP) is defined as Note that, prior to the application of any algorithm both V and R are random variables and the expected value of FDP is termed as FDR. Let p (1) ≤ p (2) ≤ · · · ≤ p (m) be the ordered sequence of the available p-values. Benjamini-Hochberg procedure identifies the largest k such that p (k) ≤ (k/m)q and rejects all hypotheses with corresponding p-value less than p (k) along with the hypothesis with p-value p (k) . This procedure is conservative, as the implementation of the same ensures FDR = π 0 q where, π 0 = m 0 /m. To overcome this shortcoming, Craiu and Sun [7] worked with the following adaptive Benjamini-Hochberg procedure which uses an approximation of π 0 .
Both adaptive BH procedure and Storey's q-value approach are justified to be equivalent in Craiu and Sun [7]. They have also emphasized that both the approaches require a good approximation of π 0 . Less conservative estimators for π 0 are in demand since closer approximation of π 0 will bring superiority in the adaptive procedure by increasing the number of rejections while controlling FDR at level q, as evident from Algorithm 4.3. In numerical study, we use adjust.p( ) function (available in cp4p library from Bioconductor) by Gianetto et al. [8] for obtaining the adjusted p-values.

Simulation study
We have conducted an extensive simulation study to investigate the performance of the bias-corrected estimators under different settings. The well-known and established estimators apart from the proposedπ U 0 andπ E 0 , considered for performance comparison are listed below. π B 0 : Storey's bootstrap estimator (discussed in Section 2.1) π L 0 : Convest estimator [13] π A 0 : Jiang and Doerge's average estimator (discussed in Section 2.1) π P 0 : Natural cubic spline smoothing-based estimator (discussed in Section 2.1) π H 0 : Histogram-based estimator [16] π D 0 : A robust estimator of π 0 [18] π S 0 : Sliding linear model based model-based estimator (Wang et al. [26]).

Simulation setting
We imitate a segmented time to event dataset to generate artificial datasets. For this purpose, we choose m = 100, 500, 1000 segments . Two different settings are considered,  balanced setting with sample size for each segment being equal to 35 and unbalanced setting with different sample sizes 15, 25 . . , m} we fix θ = θ 0 = 1 and for the remaining cells in the array θ , we generate values through some stochastic mechanism ensuring that they are not equal to θ 0 . For segments with better average lifetime θ ∼ Uniform(1, 1.5) and for segments with poor average lifetime θ ∼ Uniform(0.5, 1). We take the proportion of better (or poor) non-null mean lifetimes to be 0.5. After generating the array of parameter θ, we generate a sample of size n i from the exponential distribution with mean θ i for all i = 1, 2, . . . , m. Thus the dataset with m rows and varying number of columns is generated where each row correspond to observations from a particular segment and out of them m 0 (fixed by the choice of π 0 ) segments originally have mean lifetime equal to θ 0 = 1. From each row of the dataset we obtain p-value by applying appropriate test and construct a p-value array of length m to compute the bias-corrected estimators from Algorithms 4.1 and 4.2. The other estimators are computed using estim.pi0 R-function (available in cp4p library). Algorithm 4.3 also uses this array of p-values and an estimate of π 0 to identify the significantly different segments with control over FDR.

Simulation results
Under different settings mentioned in Section 5.1, each experiment is repeated N = 1000 times and the estimators are compared through MSE(π 0 ) = 1(/N) N i=1 (π 0i − π 0 ) 2 and Bias(π 0 ) = (1/N) N i=1 (π 0i − π 0 ). We also validate whether the adaptive BH algorithm using different π 0 estimators are conservative or not by simulating FDR values as a function of π 0 for the adaptive algorithms. Here we identify power of an multiple testing algorithm as the proportion of rejected nulls among the hypotheses in F. The non-adaptive BH algorithm and its different adaptive versions are also compared with respect to power. The comparative study for m = 100 under balanced setting is provided in Figure 1. The results for other simulation settings are provided in Figures 1-5 of the supplementary material. From Figure 1, it can be pointed out that the bias-corrected estimators beat other estimators over a significant region of the parameter space (for π 0 ∈ (0, 0.6)) whileπ U 0 performs slightly better thanπ E 0 . Thus, their performance may be considered to be approximately equivalent. Thus using the bias-corrected estimators for small to moderate values of π 0 brings significant improvement while for larger values of π 0 , it remains a viable alternative. For MSE, similar comments may be made. Additionally, we point out thatπ U 0 really reduces the bias ofπ B 0 for a significant portion of the parameter space. As expected, the mean squared error for different estimators increase with increasing m/n ratio, while relative performance of the proposed bias-corrected estimators gets better when the same ratio increases. However, the gain from improved estimation of π 0 needs to be elaborated. Precise estimation of π 0 is used to apply adaptive algorithm for identifying significant segments as mentioned in Section 4. For lower to moderate values of π 0 , the adaptive versions result in substantial gain in power. Percentage relative gains in power ofπ U 0 -adaptive BH over non-adaptive are 41%, 27%, 17% for π 0 =0.2, 0.4, 0.6, correspondingly. Marginal gain is observed for larger values of π 0 (8% for π 0 =0.8). It is evident that the bias-corrected estimators outperform the others for lower to moderate values of π 0 , where it really matters as pointed out from Figure 1. For higher values of π 0 , effect of bias correction is there but in a lesser extent. When almost all the null hypotheses are true, Q(λ) and e are close to 0. Thus the bias correction does not work as effectively as it does for lower to moderate values of π 0 .. FDR of all the adaptive BH algorithms are seen to be controlled below 0.1, while non-adaptive BH is the most conservative and theπ 0 U -adaptive BH is the least conservative one. Similar conclusions can be made from results of the other simulation settings reported in the supplementary material.

Data analysis
For the case-study we have considered the real-life data (with proper camouflaging, after taking care of the data confidentiality issue) used by Gupta et al. [10] in connection with reliability and warranty studies. The detailed description of the data is available there and we report only the relevant part of it, which is required in the present study. The date of failure of a particular component of automobiles along with the mileage at failure as reflected through the odometer readings are available. Although the entire data set cover two disjoint geographical regions, as reported in Gupta et al. [10], they may be further subdivided into failures corresponding to seven sub-regions, termed as zones. Owing to data confidentiality issue, let us number them from 1 to 7. We have considered failure data corresponding to a In line with the discussion in Section 1, here we are primarily interested in identifying the segments which have significantly better or poor performance in terms of mileage at failure in comparison to a bench mark. Thus, appropriate hypotheses are needed to be formed and tested separately for each of the segments making way for application of adaptive FDR-controlling algorithms. To validate our model-assumption we perform Kolmogorov-Smirnov's test with empirical p-value for exponential distribution (using R-function ks.exp.test available in exptest package) and find that, at level 0.05 exponentiality fails for only 18 out of 84 segments whereas at level 0.01 only 7 rejections are there. QQ-plot of some randomly chosen segments are given in Figure 6 of the supplementary material. As the sample sizes for most of the segments are moderate, we also check normality applying Shapiro-Wilks' test (using default R-function shapiro.test). At level 0.05, 59 out of 84 hypotheses gets rejected and at level 0.01, the number is 42. The first line of information justifies applicability of the model-based estimators for π 0 discussed in this article whereas the results from the normality test demonstrate the necessity of the modifications achieved through this work over the existing related works, usually done under normal model. Now we consider framing of the appropriate hypotheses. We assume that the mileages at failure for the ith segment to be exponentially distributed with mean θ i miles. Thus, θ i 's are the mean mileage to failure (MMTF) for the ith. segment, a quantity similar to mean time to failure (MTTF) in terms of the response variable 'mileage', for i = 1, 2, . . . , 84. We consider, as an indicator of the bench mark, the MMTF of the entire dataset as our null hypothesis point, approximately given by θ 0 = 10973 miles. This value as an benchmark seems to be justified, as the warranty mileage limit for the data base is 36, 000 miles and it is well known that such failure data are usually positively skewed. According to the research question we then simultaneously test the following hypotheses: The two-sided choice of the alternative hypotheses at all the segments needs clarification.
In the absence of any prior knowledge about the functioning of the component, it is not possible to mark any segment to be better/worse than the overall benchmark in terms of MMTF. As a result, to be on the safe side, we have suggested the alternative hypotheses at all segments to be two-sided. This is very common in multiple testing situations. As an example, in microarray data analysis, the samples used as a reference are called control samples. The other samples exhibiting different phenotypic status are called treated samples. The gene expression levels among these groups may be different. To identify whether a gene is differentially expressed or not, we fix two-sided alternative [5]. Likelihood ratio tests are performed for each of the hypotheses after scaling the original observations by θ 0 , maintaining equivalence of the test and the corresponding p-values along with effect sizes for each test are stored for further use. Table 1 of supplementary material provides details under the following heads: • segment: This column provides serial number of the segment, 1 to 84 such that segment i is for ith month of zone 1 for i = 1(1)12, segment 12 + i is for ith month of zone 2 for i = 1(1)12 and so on for the 7 zones in order as mentioned in the first paragraph of this section. • n: Provides available sample size for each segment.
• pval: Provides the obtained p-value corresponding to common likelihood ratio test performed for each segment. • del: Provides maximum likelihood estimates of the effect sizes corresponding to each test.
These array of values can be readily fed into Algorithms 4.1-4.3 to getπ U 0 ,π E 0 and the list of rejected hypotheses when adaptive FDR-controlling algorithm is applied with different π 0 -estimates. Estimate of π 0 along with the corresponding list of rejected hypotheses using the estimators already mentioned in this article are also reported. The estimated π 0 -values using different estimators are reported in the second column of Table 1.
In Table 2, we indicate the segments that are found to be significantly different from the average in terms of the mean mileage at of the designated component failure when adaptive BH-algorithm with different π 0 -estimators and non-adaptive(N/A) BH-algorithm are applied to control FDR at level q = 0.05, 0.1. For visual display, we plot the adjusted pvalues for non-adaptive Benjamini-Hochberg algorithm and its adaptive version usingπ U 0 with cut-off q = 0.1 (see Figure 7 of the supplementary material). From Table 2, it is evident that, the adaptive BH-algorithm using the proposed methods has the ability to identify a larger number of segments with significant variation from benchmark by controlling the FDR at the same level, compared to the non-adaptive BH-algorithm as well as adaptive BH-algorithm using existent estimators for π 0 .  Table 2. Significantly different segments identified by adaptive-BH algorithm with different estimators for π 0 for the synthetic data.  From the domain knowledge (not to be mentioned explicitly, owing to confidentiality issue), it is known that the functioning of the automobile component under consideration is likely to be influenced by the climate condition, reflected through the effect of the month, as well as by the effect of the zone of their usual operation. The effect of climate on the functioning of the automobiles is well known and has also been reported in Lawless [14]. For simplicity and demonstration purpose, we assume that each automobile is used only in the designated zone where the failure is reported. Although, we have used the two-sided alternative, as being done in any multiple testing problem, the corresponding confidence interval falling entirely below or above of the scaled null hypothesis point of 'unity', indicates the actual one-sided alternative for which the respective significance appears. Thus, the MMTFs of zone 4 are consistently and significantly smaller than the benchmark value (the null hypothesis point) indicating usage related adverse problem of the automobiles and this problem is persistent in the first or second quarter of the year indicating a transition from colder to warmer climate or the fourth quarter of the year indicating the transition from warmer to colder climate. Interestingly for zone 2, exactly the converse situation is prevailed and this seemingly high MMTF might not be due to the climate condition and on the contrary may be attributed to better usage scenario. For zone 5, better usage scenario is evident at least in two months, although weather related issues might not be associated with such improvement. The findings for zone 6 is heavily dependent on climate condition especially during the advent of spring where a significant decrease in MMTF is identified followed by significant increase in MMTF just after. Again during the fall a significant decrease in MMTF is found establishing the climate dependence of failure data. For zone 7, climate plays an adverse role during the end of the winter and start of the summer. The data corresponding to remaining two zones, do not reveal any deviation from the usual usage pattern and/or are not affected by extreme climate conditions. It is to be noted that for almost all the zones, the month of April becomes significant concerning either betterment or worsening of the scenario in comparison to the benchmark. On the other hand the two months, viz. December and January never become markedly different from the benchmark at all the locations. It might be attributed to the fact that in the winter, the relatively colder temperature does not affect all the zones, while a transition in temperature, as observed in April, may play a decisive role in operating conditions in almost all the zones. Zones 1 and 3 never figure in the list and no marked deviation from the benchmark in any climate condition (non-rejection of the null hypothesis at all seasons) is observed. This homogeneity might be attributed to the fact that these two zones correspond to a relatively warmer climate and hence climate dependence on the operating conditions are not present here. Although, we have to suppress the zone identity for confidentiality issue, the findings are as corroborated by the domain knowledge experts.
To conclude this section, we emphasize the appropriateness of the model-based biascorrection approach. We try to explore a 'what-if' type scenario and try to assess the validity of the findings if we assume the mileages at failure in each segment to be normally distributed, instead of the exponential assumption. Our main objective remains same, i.e to identify the significantly different segments with respect to MMTF values. If we assume that the mileages at failure for the ith segment be normally distributed with mean θ i and variance σ 2 i , the testing problem is still the same as in (15). We perform single sample both-sided t-test for each of the segments and obtain the array of p-values over all the segments. Computation of robust estimates may be done similarly as mentioned, but for the bias-corrected estimators we follow algorithms given in Cheng et al. [6] (forπ U 0 ) and Biswas [3] (forπ E 0 ) instead of Algorithms 4.1 and 4.2 for obvious reasons. The estimates of π 0 under normality assumption are reported in the third column of Table 1. The robust estimators are seen to underestimate π 0 . When the rates of exponential distributions are considered as the means of normal distributions, the sample means being the estimators under both the model assumptions overestimate the normal means. Since the overestimation of normal means makes the null hypotheses appear false, the observed p-values are less compared to those under the exponential case. The robust estimators of π 0 are nondecreasing functions of p-values and hence the underestimation of π 0 . The bias-corrected estimators get disrupted owing to the inappropriate model assumption and hence misleading effect size of test, upper tail probability and expectation of non-null p-values. The problem of overestimation of normal means transcends to overestimation of effect sizes for the single sample t-tests. The inflated estimates of effect sizes result in large estimates of Q(λ) and e. As a result, the numerator inπ U 0 andπ E 0 usually turns out to be zero or negative. Hence both the bias-corrected estimates in Table 1 are zero. Thus, appropriate model-based bias-correction seems to be appropriate and efficient by bringing out more power in adaptive algorithms, while the findings may be misleading when not applied with adequate confidence on underlying model assumption. As a result, the necessary modification of bias-correction technique under exponential model seems to be the only way out, particularly while dealing with multiple testing problem arising from segmented failure data, usually encountered in survival and reliability studies.

Discussion
We have approached the problem of estimating π 0 and thus construction of adaptive FDRcontrolling procedure from suitable model assumptions and a common test for all the hypotheses to be tested. Within the framework suggested in Cheng et al. [6] and Biswas [3], we have tried to develop methods for estimation of π 0 under exponential model and presented a simple adaptive Benjamini-Hochberg algorithm in a spirit similar to Craiu and Sun [7], which is shown to be more efficient than its counterparts for simulated as well as real-life synthetic data. The current work also motivates the Storey's bootstrap estimator for π 0 and the π 0 -estimator based on sum of all p-values based on P(p ≥ V). The cases of V being degenerated at some λ and V being uniformly distributed over (0, 1) have also been discussed. This may motivate other choices of V for further study of model-based π 0 estimators. The study on V having negatively skewed density function over (0, 1) is presently under consideration, which tries to give more importance to the p-values corresponding to true null hypotheses and the construction of new estimators in future. In the current work, it has been assumed that the p-values corresponding to the true null hypotheses are uniformly distributed. However, if there are composite null hypotheses as in one-sided hypothesis testing scenarios, p-values corresponding to the true null hypotheses are stochastically larger than the uniform variate. Superuniform p-values make the proposed estimators conservative due to the increased value of W(λ) andp. The results and methods proposed in this work do not address the issues related to superuniformity of null p-values. Though the results presented in the current work strengthens the foundations of bias-corrected estimation of π 0 in general, the distinguishing feature of this work lies on the innovative application of multiple testing procedure to segmented failure data. To the best of our knowledge, such procedures have never been applied to answer such interesting research questions framed in Section 6 related to large-scale industrial data. In this work, however, we have focused on presenting and motivating a simple yet powerful technique of identifying significantly different segments in terms of the performance of automobile and exploring the effect of zone of operation coupled with climate, that too under exponential model assumption. The synthetic data explored in this work pose several other issues that may be solved by the application of modified methods, which are to be formulated in future.
This analysis of the real-life synthetic data is based on one year data and may be carried out on the basis of monthly or even weekly data associated with the component failures. Owing to the limited number of such failure data in each segment, one has to use the standard failure models like exponential or Weibull. Instead, if one uses the usual Gaussian model to describe the failure pattern, then one is expected to commit a gross mistake and consequently, a false perception on the MMTF may be reached. This issue has been addressed with the same failure data. Instead of the exponential model, the normal probability model has been used and the test for equality of respective means in all 84 segments with the same null hypothesis point representing the bench mark, as being done in the usual multiple testing procedure, has been attempted. Interestingly, the test for normality at majority of the 84 segments fails miserably and hence conclusion on the basis of the test for MMTF with reference to the benchmark under normality will give a wrong signal about the true status of MMTF in those segments.
This work only focuses on controlling FDR by adaptive Benjamini-Hochberg algorithms with two new estimators for the proportion of true null hypotheses. It would be interesting to study control of family wise error rate (FWER) by adaptive procedures [21] with the π 0 -estimators discussed here. We have demonstrated that the new π 0 -estimators devise conservative procedures through simulation experiments. The proof is also done in asymptotic setting. Still, it is desirable to prove the same under finite sample considerations and future research on this aspect is warranted. Sarkar [20] and Blanchard and Roquain [4] provide sufficient conditions onπ 0 for proving control over FDR by the corresponding adaptive algorithms. Almost all the recently proposed estimators for π 0 including the two taken up in this work lack such structural simplicity and hence the desired result can only be verified through finite sample simulation experiments [3,5,24,26]. The estimators taken up in this work need an initial estimate of π 0 . The current work does not focus on a simulation-based choice of the initial estimate. However, the choices are not expected to be universal and in this regard one may follow the routine presented in Biswas [3] for identifying the working initial estimator. The choice of initial estimate in the current work is justified by Cheng et al. [6] and Biswas [3]. As the proposed method makes assumption regarding the distribution of the mileage to failure data, we should accept the fact that, the proposed estimator is not universally suitable in all situations. At the same time, multiple testing problem in a non-Gaussian framework seems to be novel and may cover all parametric models for scenarios where non-negative valued random variables seem to be appropriate. In such a framework, we have introduced two simple estimators for π 0 which simultaneously reduces the bias and variance of the existing estimator over a relatively important part of the parameter space. The behaviour of such estimator is studied through extensive simulation studies and the new estimator is shown to be more precise under some practical assumptions in comparison to those available in the existing literature. Involvement of numerical or Monte-Carlo integration for each segment makes the proposed method rather computation intensive. This extra labour is expected to be compensated by the gain in precision of the analysis, thus meaningfully addressing the multiple testing problem in a non-Gaussian setup.

Disclosure statement
No potential conflict of interest was reported by the authors.

Code availability statement
The necessary R codes for computing the estimatorsπ U 0 andπ E 0 are available at https://github.com/ aniketstat/EstPi0Exp2021.