Supplementary for “Estimation of Field Reliability Based on Aggregate Lifetime Data”

The Weibull and lognormal distributions are the most common distributions used in reliability applications. However, use of these two distributions for aggregate data is not tractable because the sum of i.i.d. Weibull or lognormal random variables does not have closed-form probability density function (PDF) or cumulative density function (CDF). The gamma and IG distributions, although not as widely-used, are also popular models for lifetime data. The purpose of this section is to show that even if the underlying component lifetimes follow the Weibull/lognormal distribution, the gamma and IG distributions are robust in terms of the estimated quantiles, as long as there is not a substantial amount of extrapolation to the tails. The procedure is as follows. We first generate the aggregate data of size n and mi = 3 from the Weibull distribution wb(k, 1), where k is the shape parameter of the Weibull distribution. Then we fit the data by the gamma distribution and IG distribution and afterwards we can obtain the ML estimates of τ0.1, τ0.5 and τ0.9, the 0.1, 0.5 and 0.9 quantiles of the random component lifetime. After 10, 000 replications, we can obtain biases of τ̂0.1, τ̂0.5 and τ̂0.9. Also, based on the interval estimation methods proposed in the paper, we can obtain the coverage probabilities under model-misspecification when the nominal value is set

as 95%. We consider n = 10, 20, 40 and the shape parameter k = 0.5, 1, 2. Similarly, we can generate the aggregate data of size n from the log-normal distribution lnN (0, σ), where σ is the shape parameter. After fitting data by the gamma and IG distributions, we can also investigate the biases of the estimated quantiles as well as the coverage probabilities under model-misspecification. We also set n = 10, 20, 40 and σ = 0.5, 1, 2. The simulation results are shown in Table A.1 and A.2. As we can see, the gamma and IG distributions give reasonable inferences when the 0.1 and 0.5 quantiles are considered. On the other hand, the extrapolation into the tails (0.9 quantile) would lead to large biases. Identifiability is sometimes an issue in non-parametric inference. In parametric inference, the identifiability issue arises when the number of data points is less than the number of parameters, or when there are random effects (both discrete mixture and continuous mixture). In our aggregate data problem, the gamma and IG distributions were used for the aggregate data. For the gamma distribution, there will be identifiability problem if (i) there is only 1 observation, or (ii) t i /m i are the same for all systems. Point (i) is obvious. In the following, we would justify point (ii) in terms of the point estimation and interval estimation, respectively.
We first consider the point estimation of gamma parameters using the moment method.
The moment estimators for k and θ exist if and only if the model is identifiable. The cumulative operation time T i follows a gamma distribution with shape parameter m i k and rate parameter θ. Therefore, E(T i /m i ) = k/θ and E(T 2 i /m 2 i ) = (k 2 + k/m i )/θ 2 . LetM = i 1/m i . By solving i (t i /m i ) = nk/θ and (t 2 i /m 2 i ) = (nk 2 +M k)/θ 2 together, we can obtain moments estimators of k and θ aŝ As can be seen, the identifiability issue occurs when n 2 (t 2 i /m 2 i ) − n( t i /m i ) 2 = 0. This happens only when t i /m i = t 2 /m 2 = · · · = t n /m n . In fact, we can also use the contour plot of the likelihoods to illustrate this point. We take the airplane indicator light data provided in the paper as an example. The left panel of Figure A.1 shows the contour plot of the likelihoods based on the original data. As we can see, unique values of the ML estimateŝ k andθ can be determined. However, if we set the t i /m i = 33 for all the systems, there would be a wide range of combinations of (k, θ) such that the likelihoods are almost same, as shown in the right panel of Figure A.1. In such case, the identifiability problem arises.
In the interval estimation procedure, we use the approximation W 0 = −2N k log(ỹ/ȳ) ∼  The above analysis can be applied to the IG distribution, and the identifiability condition is the same. To be specific, the shape parameter λ is not identifiable when t i /m i is constant for all i. When the PDF of the product lifetime exists, the probability of observing constant t i /m i from all systems is 0. Nevertheless, due to the round-off errors in data collection, there might be a small chance to observe constant t i /m i . In the case that there is only one single system or t i /m i is constant, the exponential distribution has to be used for the data.

Derivation of Equation (1)
To establish Equation (A.1) (Equation (1) in the main text) we first show that S 0 andȳ are statistically independent. If we express S 0 as S 0 (t 1 , · · · , t n ), then it is equivalent to show that S 0 (t 1 , · · · , t n ) is independent with t i . In addition, because S 0 is a scale invariant statistic, without loss of generality, set θ = 1 here. Let be the characteristic function of ( t i , S 0 ). Then Substitute q i = t i (1 − u) and together with the fact S 0 (q 1 , · · · , q n ) = S 0 (t 1 , · · · , t n ), we have .
Note that the two terms in the above equation are the characteristic functions of t i and S 0 , respectively. Therefore, S 0 is independent with t i (Applebaum et al. 2005, Theorem 2.1) and thus independent withȳ. Then we have E( afterwards it can be shown that when z > −k,

Proof of Theorem 1
when r > −1. In addition, we can find that and thus 0 ≤ S 2 ≤ 1. By substituting Stirling's approximation to the gamma function (Wrench 1968) As W 0 = −2 log S 2 , by simple transformation, the moments of W 0 when k → ∞ can be expressed as On the other hand, a Taylor series expansion for the gamma function (Wrench 1968) We can then obtain that when k → 0, Similarly, the moments of W 0 when k → 0 can be expressed as

Proof of Theorem 2
First, by the joint distribution of T 1 , T 2 , · · · , T n and the distribution ofμ, the conditional joint distribution of T 1 , T 2 , · · · , T n givenμ can be obtained as Then, the MGF of 1/λ givenμ can be expressed as Since the above MGF does not containμ and λV = λn/λ, we can easily find that λV is independent ofμ and its distribution is χ 2 (n−1) .

Proof of Theorem 3
Then, the joint distribution of Z and U can be shown as Therefore, the joint distribution of H and U can be expressed as Integrating with respect to u, we can obtain the PDF of H This completes the proof.
4 Simulation Results 4.1 Simulation results for the shape parameter k and the rate parameter θ of the gamma distribution We first consider the shape parameter k. A wide range of values of k, i.e., k = 0.05, 1, 5, 10 is considered to examine the performance of our two-moment matching method. Small sample sizes, i.e., n = 5, 10, 20 are used in the simulation. We are interested in the equal-tailed 90% and 95% confidence intervals of k. For each combination of k and n, the simulation was replicated 10,000 times, based on which the coverage probabilities were obtained from the proposed method, the Wald method, and the bootstrap-t method. The results are shown in Figure A.2 and A.3. As can be seen, the bootstrap method performs poorly when the sample size is small especially for large k, and its performance improves with the sample size n. The large sample approximation tends to work well when n is large. But its performance is poor when n = 5, especially when the confidence level is 90%. On the other hand, our proposed two-moment matching method works for all combinations of k and n. Also, between the two methods we proposed in Section 2.1, the equation-solving method (MM1) seems to have better performance than the the plug-in method (MM2), where MM stands for moment matching. Then we consider the generalized pivotal quantity method for the rate parameter θ.

More simulation results for the mean parameter µ of the gamma distribution
In the paper, we have shown the simulation results for the mean parameter µ of the gamma distribution when constructing the one-sided 97.5% confidence limits in the case k = 3. Here, the coverage probabilities of the confidence interval for µ when k = 0.5, 1, 3 are presented in Figure A.10-A.15. As expected, the method based on the generalized pivotal quantity has a much better performance than the Wald method and the bootstrap-t method.

Normal Distribution Model
In the main text, we have used the gamma and IG distributions to fit the aggregate data.
Actually, normal distribution can also be used for aggregate data because the sum of i.i.d.
normal distributed random variables is also normal distributed. Though normal distribution is less commonly used in the lifetime data analysis, it is proved to be useful for certain lifetime data (Meeker and Escobar 1998, Section 4.5). Therefore, we also proposed inference method about the normal distribution based on the aggregate data here. Suppose the component lifetime X follows a normal distribution N (µ, σ 2 ) with PDF Since the sum of identical normal distributed random variables is again normal distributed, the distribution of i-th aggregate lifetime is N (m i µ, m i σ 2 ). The log-likelihood function of θ based on the data D, up to a constant, can be specified as The ML estimates of µ and σ can then be obtained aŝ where N = m i andȳ = t i /N . As for the interval estimation of these two parameters, we first construct the modified unbiased sample variance as where y i = t i /m i as defined in the article. The following two theorems show that (n − 1)s 2 σ 2 ∼ χ 2 (n−1) andȳ − µ s/ √ N ∼ t n−1 and then the confidence intervals of µ and σ can be constructed.
The proofs are trivial and hence omitted here.
For the interval estimation of quantiles of a normal distribution, the generalized pivotal quantity method can also be used. Define W 4 = (n − 1)s 2 /σ 2 ∼ χ 2 (n−1) . A GPQ for σ can be R σ (D, W * 4 ) = (n − 1)s 2 /W * 4 , where W * 4 is treated as random. Based on Theorem 5, if we define W 5 = √ N (ȳ − µ)/s ∼ t n−1 , a GPQ for µ can be R µ (D, where W * 5 is treated as random. Therefore, a GPQ for the quantile of a normal distribution where W * 4 , W * 5 are treated as random and other quantities are computed based on the data D.
Algorithm 5: Confidence interval procedure for τ p of a normal distribution.
(i) Based on the data D, compute the quantities s, N andȳ.