Estimation of Field Reliability Based on Aggregate Lifetime Data

Because of the exponential distribution assumption, many reliability databases recorded data in an aggregate way. Instead of individual failure times, each aggregate data point is a summation of a series of collective failures representing the cumulative operating time of one component position from system commencement to the last component replacement. The data format is different from traditional lifetime data and the statistical inference is challenging. We first model the individual component lifetime by a gamma distribution. Confidence intervals for the gamma shape parameter can be constructed using a scaled χ2 approximation to a modified ratio of the geometric mean to the arithmetic mean, while confidence intervals for the gamma rate and mean parameters, as well as quantiles, are obtained using the generalized pivotal quantity method. We then fit the data using the inverse Gaussian (IG) distribution, a useful lifetime model for failures caused by degradation. Procedures for point estimation and interval estimation of parameters are developed. We also propose an interval estimation method for the quantiles of an IG distribution based on the generalized pivotal quantity method. An illustrative example demonstrates the proposed inference methods. Supplementary materials for this article are available online.


INTRODUCTION
Field failure data are an important source of product reliability information. They are preferable to lab testing data because they inherently contain information on real use conditions (Wu 2013). Many organizations have launched ambitious plans to collect field failure data on a large variety of components, encompassing most common components seen in the market. The ultimate goal is to estimate reliability of any new system based on component reliabilities estimated from these databases. In the ideal case, we record individual failure times (possibly subject to censoring) of all components. Because there are thousands or even millions of components in a database, however, their failure information has to be recorded in a very succinct way.
According to Coit and Jin (2000), many large databases use a concise way that records the cumulative operating time of a component position in a system from system installation to the last component replacement, as well as the number of replacements in between. See the databases in OREDA (2009), Mahar et al. (2011), and Denson et al. (2014) for a few examples. Figure 1 demonstrates the format of the data. The starting point of each system in Figure 1 is different because the commission date of a system is random and generally different. On the other hand, the end time is different because of the random component lifetimes. Coit and Jin (2000) extracted one dataset from an earlier version of a database (Mahar et al. 2011) maintained by the Reliability Information Analysis Center (RIAC), as presented in Table 1 One reason to record data in this format is the exponential distribution assumption widely adopted in the industry. The cumulative operating time over the total number of replacements is a sufficient statistic to summarize all information contained in the individual failure times under the exponential assumption. Unfortunately, the assumption of exponential distribution is not valid for many components. For example, it is well known that electrical components often have a bathtub failure rate curve, while mechanical components fail mostly due to wear-out. The hypothesis testing on the exponential distribution assumption performed by Coit and Dey (1999) also suggests that the exponential distribution fails to fit these merged field data.
Since the data have been aggregated due to an untenable assumption, a serious consequence is that it is difficult to fit the aggregate data using common lifetime distributions, as the convolutions of most distributions are intractable. Two exceptions are the gamma distribution and the inverse Gaussian (IG) distribution. These two distributions, though not as widely used as the Weibull or log-normal distribution in lifetime data analysis, do offer a good fit to various sets of lifetime data (Meeker and Figure 1. A schematic of the aggregate data: the circle denotes the commission date, the vertical short line at the right is the last replacement recorded, and the short dashed lines in between are the unobserved replacements. Escobar 1998, sec. 5.1;Lawless 2002, sec. 1.3.5). In addition, a simulation in the online supplementary materials reveals that in terms of estimating quantiles and reliabilities, these two distributions are quite robust to the Weibull/lognormal distribution over a wide range of parameter values as long as there is not much extrapolation to small or large quantiles. Therefore, this study uses the gamma distribution and IG distribution to deal with the aggregate data.
The gamma distribution is a flexible model in lifetime data analysis. Applications of this distribution in lifetime analysis can be found in Krishnamoorthy, Mathew, and Mukherjee (2008), Fernández (2011), Fan andYu (2013), and Balakrishnan and Ling (2014), among others. Similar to the Weibull distribution, the gamma distribution can exhibit increasing, decreasing, and constant failure rate when the shape parameter is greater than 1, less than 1, and equal to 1. The exponential distribution is a special case of the gamma distribution when the shape parameter is 1. Test of the exponential hypothesis, which is important for aggregate data (Coit and Dey 1999), can thus be done by testing the shape parameter. Therefore, the first objective of this study is to make inference about the gamma distribution for the aggregate data. Based on the gamma distribution assumption, Coit and Jin (2000) specified the likelihood function for the aggregate data. They used extensive simulation to demonstrate that the maximum likelihood (ML) estimators of the model parameters have small bias and standard deviations with reasonable sample sizes. To capture uncertainties in the estimation, a more important task here is to construct confidence intervals for the parameters. There has been a bulk of literature on interval estimation of the gamma distribution based on individual lifetime data (e.g., Bain and Engelhardt 1975;Engelhardt and Bain 1977;Bhaumik, Kapur, and Gibbons 2009). However, these methods cannot be directly applied to interval estimation based on aggregate data because of the difference in the data formats. In addition, there is still a lack of efficient methods for interval estimation of the rate parameter, the mean, and the quantiles based on individual gamma-distributed lifetime data. The argument is supported by our simulation results in Section 2.5, where large sample approximations and the bootstrap are used. In the individual lifetime case, the ratio of geometric mean to arithmetic mean is often used to construct the confidence interval for the shape parameter. This study modifies this ratio to obtain an interval estimation of the shape parameter when the lifetime data are aggregated. We then propose an interval estimation method for the rate parameter, the mean parameter, and the quantiles based on the generalized pivotal quantity (GPQ) (Weerahandi 1995;Hannig, Iyer, and Patterson 2006). Performance of the proposed methods will be investigated theoretically and by simulation. The gamma distribution for lifetime data is more like a datadriven approach, as it does not have obvious physical meanings. In contrast, there has been strong support and justification for the IG distribution as a lifetime model, because it is the first-passage-time distribution of the Wiener degradation process to a fixed threshold (Seshadri 1993, sec. 7.7). Although the failure mechanism may deviate from the Wiener-degradation assumption, the IG distribution does provide a good approximation to degradation or wear induced failures (Goh, Tang, and Lim 1989;Durham and Padgett 1997;Park and Padgett 2006). This is partially because the IG distribution is quite similar to the Birnbaum-Saunders and lognormal distributions in terms of the hazard function and they can approximate each other quite well in degradation-related applications (Bhattacharyya and Fries 1982;Desmond 1986;Newby 1988), while the Birnbaum-Saunders and lognormal distributions are good models for fatigue life. Discussions about the similarity can also be found in Meeker andEscobar (1998, sec. 5.7.5) andNewby (1988). Considering that some databases are for nonelectrical components (e.g., Mahar et al. 2011), including mechanical components subject to wear-out failure, the IG distribution can serve as a good model therein. By assuming the IG distribution for the component lifetime, the cumulative operating time is also IG distributed. Therefore, the second objective of this study is to investigate maximum likelihood (ML) estimators of the IG parameters and the construction of confidence intervals. In the individual lifetime case, there have been well-developed interval estimation methods for the shape and mean parameters of the IG distribution (Seshadri 1999, sec. 3.5). Here, we extend these methods to the aggregate data case. For the interval estimation of the quantiles, however, there is still a lack of efficient methods. We propose an interval estimation method for the quantiles based on the generalized pivotal quantity. The performance will be assessed by a simulation study.
The normal distribution, though less common in reliability data analysis, has proved to be useful for certain lifetime data when the mean is greater than 0 and the coefficient of variation is small (Meeker and Escobar 1998, sec. 4.5). This is partially because of the popularity of the normal distribution in statistics. If the normal distribution is assumed for the component lifetime, the aggregate lifetime is again normally distributed. Inference procedures based on the normal distribution are developed in the online supplementary materials for reference.
The remainder of the article is organized as follows. Section 2 introduces the gamma distribution model and discusses methods for interval estimation. The performance is assessed by a simulation study. In Section 3, the IG distribution is used for the aggregate data and confidence intervals for the two parameters and quantiles are constructed. An illustrative example is provided in Section 4. Section 5 concludes the article and provides some further comments on the aggregate data.

GAMMA COMPONENT LIFETIME
Consider a component with random lifetime X and the lifetime X has a probability density function (PDF) f X (x). Suppose the component is installed in a system and is immediately replaced with a brand-new one when it fails. The replacement time is assumed ignorable. Failure times of the component installed in the system are aggregated. Let M be the total number of component failures and T the cumulative operating time from the first installation to the last failure. What we observe from the system is a realization of (M, T ), denoted as (m, t). Suppose that we collect the aggregate data from n systems and then the data available to us are D = {(m i , t i ); i = 1, 2, . . . , n}. In this study, two popular lifetime distributions for the component lifetime X are considered, that is, the gamma distribution and the IG distribution, each with two parameters. In the online supplementary materials, it is shown that as long as n > 1 and t i /m i are not constant for all i, the two distributions are identifiable based on the observed data D. When the PDF of the product lifetime X exists, the probability of observing constant t i /m i from all systems is 0. Nevertheless, due to the round-off errors in data collection, there might be a small chance to observe constant t i /m i , in which case the exponential distribution has to be used for the data. This section discusses statistical inference based on the gamma distribution and the next section deals with the IG distribution.
Suppose the component lifetime X follows a gamma distribution Gamma(k, θ) with PDF where k is the shape parameter and θ is the rate parameter. A good property of the gamma distribution is that the sum of identical gamma random variables is again gamma distributed. Therefore, the ith aggregate lifetime t i is the sum of m i gamma random variables and its distribution is Gamma(m i k, θ). Based on this property, the log-likelihood function of θ = (k, θ) based on the data D, up to an additive constant, can be easily specified as ML estimators can be obtained from direct maximization of the above likelihood function. For example, Coit and Jin (2000) used a quasi-Newton search to obtain the ML estimators. They further conducted extensive simulation to show that the ML estimators of k and θ have small bias and standard errors. In addition to the point estimation, we are often more interested in confidence intervals of the parameters, as the interval estimation captures the uncertainty in the inference. However, construction of the confidence intervals for the gamma distribution tends to be difficult. There has been a bulk of literature discussing interval estimation for gamma parameters. All existing studies deal with individual failure times X rather than aggregate lifetimes T. Interval estimation for the shape parameter k is often based on a χ 2 approximation to a quantity containing the ratio of the geometric mean to the arithmetic mean (Bain and Engelhardt 1975;Glaser 1976). Good performance of the χ 2 approximation for interval estimation has been verified by many following studies (e.g., Bhaumik, Kapur, and Gibbons 2009;Bhaumik et al. 2015). In this study, we will modify the χ 2 approximation for the aggregate data, as detailed in Section 2.1. On the other hand, there is not a universal solution for interval estimation of the gamma rate parameter. A seemingly ideal way for the estimation is to base on the uniformly most powerful unbiased test for θ (Engelhardt and Bain 1977Bain , 1978. Nevertheless, implementation of the method requires looking up quantiles in a table, which is inconvenient to apply. In addition, the simulation in Bhaumik, Kapur, and Gibbons (2009) reveals poor performance of the tabulation method. Therefore, other methods such as the percentile bootstrap and χ 2 approximation (Bhaumik, Kapur, and Gibbons 2009) have been proposed. However, the performance of these methods is quite poor with small sample sizes based on our simulation results in Section 2.5. In this section, we propose a new confidence interval procedure for the rate parameter based on the generalized pivotal quantity (Weerahandi 1995). The procedure can be applied to individual lifetime data as well as aggregate data. The simulation results in Section 2.5 indicate that the performance of our method is universally good for different sample sizes and all ranges of the parameter values. The mean time to failure of a component is a widely used metric in reliability (Grice and Bain 1980). It roughly reflects component reliability and is especially useful for short-life components, for example, bulbs in the databases. Existing methods for interval estimation of the mean parameter k/θ are similar to those for the rate parameter, and the performance is also similar (Kulkarni and Powar 2010). Again, the generalized pivotal quantity methods can be developed for the inference, and a comparison with existing methods verifies the superb performance of this procedure. The quantiles and cumulative distribution function (CDF) of a lifetime distribution are sometimes of more interest than the traditional parameters (Meeker and Escobar 1998, sec. 1.1.3). Unfortunately, little literature has considered interval estimation for the gamma quantiles and CDF, mainly because they do not have closed-form expressions. We show that confidence intervals for the gamma quantiles and CDF can also be effectively constructed using the generalized pivotal quantity method.

Interval Estimation for the Shape Parameter
When individual lifetimes are available, the ratio of the geometric mean to the arithmetic mean can be used for interval estimation. To show how it is done, let x i , i = 1, 2, . . . , n be n independent realizations of the component lifetime X. The arithmetic mean and the geometric mean arex = n i=1 x i /n andx = ( n i=1 x i ) 1/n , respectively. Then the quantity −2nk log(x/x) is approximately a scaled χ 2 random variable cχ 2 (v) , where c is a scaling parameter and v is the degrees of freedom (Bain and Engelhardt 1975). Values of these two parameters c and v can be determined by two-moment matching. This idea will be extended to aggregate data here.
Given the data {(t i , m i ); i = 1, 2, . . . , n}, define y i = t i /m i and N = i m i . Then the pseudo arithmetic meanȳ and pseudo geometric meanỹ can be defined asȳ = i t i /N and y = ( i y m i i ) 1/N , respectively. As a result, the pseudo ratio statistic is S 0 =ỹ/ȳ. It is readily seen that S 0 is scale invariant and thus the distribution of S 0 is independent of θ . Without loss of generality, let θ = 1 in this subsection.
Define the quantity W 0 = −2Nk log S 0 . Similar to the individual lifetimes case, it is reasonable to believe that W 0 is approximately a scaled χ 2 random variable, denoted as cχ 2 (v) . The values of c and v can be obtained through matching the first two moments of W 0 and cχ 2 (v) . In previous studies based on individual lifetimes, the two parameters c and v are tabulated based on extensive simulation because closed-forms of the first two moments of W 0 are not available (Bhaumik, Kapur, and Gibbons 2009). The need for this tabulation makes application of the χ 2 approximation inconvenient. In this study, we derive the first two moments of W 0 and then the moment matching can be easily done.
Define S 1 = log S 0 . Since W 0 = −2NkS 1 , the mean and variance of W 0 can be obtained through the moment-generating function (MGF) of S 1 . After some algebraic manipulation, we can show that when z > −k. The derivation of Equation (1) is given in the online supplementary materials. Note that the MGF of S 1 can be related to S 0 by (1) yields the MGF of S 1 , and thus the cumulant-generating function is The expectation and variance of S 1 can then be derived as where ψ(x) is the digamma function given by ψ(x) = d dx log (x) and ψ 1 (x) is the trigamma function given by ψ 1 (x) = d 2 dx 2 log (x). Therefore, we have E(W 0 ) = −2NkE(S 1 ) and var(W 0 ) = 4N 2 k 2 var(S 1 ). On the other hand, the mean and variance of cχ 2 (v) are cv and 2c 2 v. Consequently, c and v can be calculated as We express c and v in the above way to reveal that they are functions of the unknown parameter k. One method to deal with the unknown k is to use its ML estimatek. Then the distribution of W 0 is known and an equal-tailed 100(1 − α)% confidence interval for k can be easily constructed as On the other hand, the problem of unknown k can be cleverly circumvented in the following way. Based on the scaled χ 2 approximation to W 0 and the values of c, v derived above, a 100(1 − α)% lower confidence bound k l can be found by solving the following equation of k l : Similarly, an equal-tailed 100(1 − α)% confidence interval for θ , say [k l , k u ] can be found by solving Performance of the plug-in method and the above equationsolving method will be evaluated in Section 2.5, where it is found that the performance of the equation-solving method is slightly better. All the above analysis is based on a scaled χ 2 approximation to the quantity W 0 . The following theorem supports the rationale of the approximation. The proof is given in the online supplementary materials. The results in the theorem indicate that when k is small and when k is large, the χ 2 approximation to the quantity W 0 should work very well.

Interval Estimation for the Rate Parameter
We have described three approaches to interval estimation for the rate parameter θ based on individual lifetimes, that is, uniformly most powerful unbiased test, the bootstrap, and a χ 2 approximation to 2nxθ, where the arithmetic meanx is defined at the beginning of the above subsection. The performance of these methods is not good with small sample sizes (Bhaumik, Kapur, and Gibbons 2009). If these methods are extended to aggregate data, poor performance is expected, as can be seen from the simulation results in Section 2.5. Therefore, we propose a new method based on the idea of a generalized pivotal quantity.
The generalized pivotal quantity method was introduced to deal with the problem of nuisance parameters (e.g., Weera-handi 1995). Define Y = i t i and we then have 2Y θ ∼ χ 2 (2Nk) . Here k can be regarded as a nuisance parameter when we construct a confidence interval for θ . From the above subsection, we know that W 0 = −2Nk log S 0 ∼ cχ 2 (v) , where c and v can be determined analytically by moment matching. Therefore, we can express 2Nk = −W 0 / log S 0 . Conditional on W 0 , define [W 1 |W 0 ] ∼ χ 2 (−W 0 / log S 0 ) , where S 0 is treated as a known quantity based on the data D. Based on the above setting, we can define an approximate generalized pivotal quantity R θ (D, W * 0 , W * 1 ) = W * 1 /(2Y ) for θ , where W * 0 and W * 1 are treated as random variables and Y is a known value computed from D (Weerahandi 1995). Here, we call the generalized pivotal quantity R θ (D, W * 0 , W * 1 ) "approximate" because c and v are involved in the distribution of W 1 and they are dependent on k. However, our simulation below shows that the performance of this quantity is quite remarkable. The unconditional distribution of W * 1 can be derived by marginalizing W * 0 out and then the unconditional realizations of W * 1 can be generated by Monte Carlo method. Alternatively, the unconditional realizations of W * 1 can be obtained through simulation more easily. The following algorithm can be used to simulate W * 1 and construct a confidence interval for θ .

Interval Estimation for the Mean Parameter
Existing methods for interval estimation of the gamma mean μ = k/θ are almost the same as those for the rate parameter θ . According to Bhaumik, Kapur, and Gibbons (2009), as well as our simulation study in Section 2.5, the existing methods perform poorly when the sample sizes are small. Similar to the rate parameter case, we can develop an interval estimation procedure based on generalized pivotal quantities.
From the last subsection, 2Y θ ∼ χ 2 (2Nk) . Therefore, 2Y k/μ ∼ χ 2 (2Nk) . Previous studies plugged in the ML estimatek into the above relation and use it to construct confidence intervals for μ (Kulkarni and Powar 2010). However, the plug-in method ignores the uncertainty ink, resulting in poor performance when the sample size is small. To overcome this problem, we propose a generalized pivotal quantity procedure similar to that for θ . In the above, we have defined W 0 and W 1 . By using these two quantities in conjunction with the χ 2 distribution for 2Y k/μ above, we can define an approximate generalized pivotal quantity for μ as where W * 0 and W * 1 are treated as random and other quantities are treated as known based on D. Quantiles of R μ can again be obtained from simulation. The procedure is similar to that for the rate θ , as described below.

Interval Estimation for the Quantiles
The quantile is often of more interest than parameters in a lifetime distribution. However, because there is no closed form for the gamma quantile τ p = F −1 (p|k, θ), its interval estimation tends to be difficult. Two approximation methods have been proposed in the literature. Bain, Engelhardt, and Shiue (1984) proposed an approximate method that obtains τ p from the relation that 2Xθ ∼ χ 2 (2k) when X ∼ gamma(k, θ). They substitutedk in the χ 2 distribution and then the confidence interval for τ p can be readily obtained from the confidence interval ofθ . As reported in Bain, Engelhardt, and Shiue (1984), the coverage probability is only satisfactory for very low quantiles and large k. Alternatively, Ashkar and Ouarda (1998) proposed an approximation method that first approximates the gamma variable by a normal distribution, based on which a confidence interval for the normal quantile is constructed. Parameter values of the normal distribution have to be determined through extensive simulation. In addition, the method does not work well for small k (Ashkar and Ouarda 1998). In the following, we show that the idea of generalized pivotal quantity can be naturally extended to the interval estimation of τ p . The good performance of our method is reported in Section 2.5.
Although there is no closed-form expression for the quantile function of a gamma distribution, the quantile τ p can be readily computed once the values of k and θ are known. Using the quantities W 0 and W 1 defined in the last two subsections, we can construct an approximate generalized pivotal quantity for τ p as where W * 0 and W * 1 are treated as random and other quantities are treated as known based on data D. Similarly, R τ p can be obtained from simulation and the procedure is provided below.
Algorithm 3: Confidence interval procedure for τ p .
Remark 1. The generalized pivotal quantity method can also be used to construct confidence intervals for the CDF of a gamma distribution. Since the CDF is also a function of k and θ , we can construct an approximate generalized pivotal quantity for the . A similar algorithm can then be developed for interval estimation of the CDF.

Simulation Study
A simulation study is conducted to verify the performance of the proposed methods. We compare our methods to two popular approaches, that is, large sample approximation (also called Wald method) and the bootstrap, based on the coverage probability. The Wald method is based on a normal approximation to a studentized estimator. Usually, the ML estimateθ and the inverse of the observed information matrix are used for the normal approximation. The observed information matrix is the negative of the second derivatives of the log-likelihood function l(θ; D) with respect to θ and then evaluated at θ =θ. For the bootstrap, the bootstrap-t method proposed by Efron (1979) is used. In the bootstrap-t method, we generate B bootstrap samples and in each sample we compute t b = (θ b −θ )/ seθ b , wherê θ b andθ are ML estimates based on the sample and original data and seθ b is the estimated standard error ofθ b . The α/2 and (1 − α/2) quantile points of t b 's, denoted as t α/2 and t 1−α/2 , are used to construct the equal-tailed 100(1 − α)% confidence interval for θ as (θ − t 1−α/2 seθ ,θ − t α/2 seθ ), where seθ is the estimated standard error ofθ . In this article, the number of bootstrapping replications and samples are both set as 10,000. We first consider the two parameters k and θ . For the shape parameter k, as the rate parameter θ does not affect the estimation of k, we simply let θ = 1. A wide range values of k, that is, k = 0.05, 1, 5, 10 are considered. For the rate parameter θ , since the parameter values of c and v are dependent on k in Algorithm 1, we consider k = 0.5, 1, 3 and then set θ = 0.5, 1, 3 under each k. In addition, we set B=10,000 in Algorithm 1. For each parameter setting, small sample sizes, that is, n = 5, 10, 20 are used and we are interested in the equal-tailed 90% and 95% confidence intervals. The proposed methods are compared to the Wald method and bootstrap-t method in terms of coverage probability based on 10,000 replications. Based on the simulation results in the online supplementary material, our proposed methods work well regardless of the sample size n and the parameter settings. The coverage probabilities are quite close to the nominal ones. On the other hand, the Wald method and the bootstrap-t methods perform poorly when the sample size is small and their performance is not consistent in different parameter settings.
Next, the performance of Algorithm 2 for the mean μ is examined. Similar to the rate parameter case, we need to examine the potential effects of the parameters c and v on k in Algorithm 2. Therefore, we consider k = 0.5, 1, 3 and then set μ = 0.5, 1, 2, 3, 4, 5 under each k. Figure 2 shows the coverage probabilities when constructing one-sided 97.5% lower and upper confidence limits in the case k = 3. As can be seen, the GPQ method can provide accurate one-sided confidence limits (both lower and upper) as the coverage probabilities so obtained are very close to the nominal values regardless of θ and n. On the other hand, the Wald method and bootstrap-t method do not perform well when the sample size is small and when constructing the upper confidence limits, though their performance improves with the sample size n. Since one can infer two-sided confidence coverage from the one-sided results, it is not surprising to find that the GPQ method also has an excellent performance in constructing the two-sided 95% confidence interval. The coverage probabilities for constructing the two-sided confidence intervals are reported in the online supplementary materials. At last, we examine the performance of Algorithm 3 for the quantile τ p . We set p = 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5. Our proposed method based on the generalized pivotal quantity is compared with bootstrap-t method under the parameter settings k = 0.5, 1, 3 and θ = 1. In the bootstrap-t procedure, the delta method is used to obtain the standard error ofτ p . Sample sizes n = 5, 10, 20 are considered. Figure 3 shows the simulation results when k = 3. The results when k = 0.5 and 1 are given in the online supplementary material. As expected, the coverage probabilities of our proposed method are quite close to the nominal values while the performance of the bootstrap-t method is not so satisfactory especially when the sample size is small.

IG COMPONENT LIFETIME
The IG distribution proposed by Tweedie (1945) is an appropriate lifetime model for wear-out failures (Bhattacharyya and Fries 1982;Desmond 1986;Durham and Padgett 1997). Suppose the component lifetime X follows an IG distribution IG(μ, λ) with PDF where μ and λ are the mean and shape parameters, respectively. Similar to the gamma distribution, the sum of identical IG random variables is still IG distributed. The distribution of an aggregate lifetime T consisting of m component failures is IG(mμ,m 2 λ). Then the log-likelihood function of θ = (μ, λ) based on the data D, up to an additive constant, is given by By taking partial derivatives of l(θ, D) with respect to the two parameters, setting them to 0 and solving the two resulting equations, the ML estimates of μ and λ can be obtained directly asμ =ȳ and 1/λ = ( i m 2 i /t i − N/ȳ)/n, where the notations are the same as those used in Section 2. The remainder of this section discusses construction of confidence intervals for μ and λ as well as the IG quantiles τ p . There have been well-developed methodologies for interval estimation of μ and λ based on the individual lifetimes (Seshadri 1999, sec. 3.5). These methods will be modified to construct confidence intervals for the parameters based on the aggregate data. From a practical point of view, a more important task is to construct confidence intervals for the quantiles. Based on the individual lifetimes, some attempts (Seshadri 1999, chap. 5) have been made in the interval estimation of quantiles. Unfortunately, as stated by Seshadri (1999, p. 99), a graphical method is necessary to apply these methods. Therefore, these procedures are not easy to implement especially for the aggregate data case. In this section, we propose a new method based on a generalized pivotal quantity. The procedure is simple and the performance is good, as shown by our simulation.

Interval Estimation for the Shape and Mean Parameters
There has been well-developed methods for constructing confidence intervals of λ and μ (Seshadri 1999, Proposition 1.2 and Theorem 1.3). These methods can be extended to the aggregate data case in a straightforward way, as shown in the following theorems. The proofs can be found in the online supplementary materials.
Based on the pivotal quantity λV in Theorem 2, an exact twosided 100(1 − α)% confidence interval for λ can be constructed as where V can be computed based on the data D.
follows a weighted Student's t distribution with PDF where the weight q(h) can be expressed as and B(x, y) = 1 0 t x−1 (1 − t) y−1 dt is the beta function. Theorem 3 shows that H has a weighted Student's tdistribution with n − 1 degrees of freedom. In addition, the weight satisfies q(h) + q(−h) = constant. Therefore, an exact two-sided 100(1 − α)% level confidence interval for μ can be ⎡

Interval Estimation for the Quantiles
In this subsection, interval estimation of IG quantiles is proposed based on an approximate generalized pivotal quantity. Define W 2 = λV . Based on Theorem 2, we have W 2 ∼ χ 2 (n−1) . Therefore, a generalized pivotal quantity for λ can be R λ (D, W * 2 ) = W * 2 /V , where W * 2 is treated as random and V can be computed based on the data D. On the other hand, an exact generalized pivotal quantity for μ is difficult to construct since the distribution of H (μ) in Theorem 3 involves λ. Fortunately, an approximate generalized pivotal quantity for μ can be constructed based on the fact that approximately follows a standard normal distribution when X ∼ IG(μ, λ) (Seshadri 1999, Theorem 1.2)

) and then an approximate generalized pivotal quantity for
where W * 2 , W * 3 are treated as random and Y, N, V are computed based on the data D. Having these two approximate generalized pivotal quantities for λ and μ, an approximate generalized pivotal quantity for the IG quantile τ p = F −1 (p|μ, λ) can be constructed as An equal-tailed 100(1 − α)% confidence interval for the IG quantile τ p can be constructed as follows.
Algorithm 4: Confidence interval procedure for τ p .
. (iv) Sort the series {r τ p :b ; b = 1, 2, . . . , B} in ascending order and use the α/2 and (1 − α/2) quantiles to construct an equal-tailed 100(1 − α)% confidence interval for τ p . Figure 5. Coverage probability of τ p by the generalized pivotal quantity method (GPQ) and bootstrap-t method (BOOT) when two-sided 90% (solid) and 95% (dotted) confidence intervals are considered and (μ, λ) = (2, 1). A simulation was carried out to assess the performance of this algorithm. Two parameter settings, that is, (μ, λ) = (1, 1) and (2, 1), are examined. In each scenario, we are interested in the equal-tailed 90% and 95% confidence intervals for 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5 IG quantiles while the number of systems n = 5, 10, 20. The bootstrap-t method is used for comparison. We set B = 10,000 in Algorithm 4 and the coverage probabilities are obtained based on 10,000 replications, as shown in Figures 4 and 5. Compared with the bootstrap, the generalized pivotal quantity method performs quite well for small quantiles, which is of great interest in reliability engineering.
Remark 2. The generalized pivotal quantity method can also be used to construct confidence interval for the CDF of an IG distribution. Since the CDF is also a function of μ and λ, we can construct an approximate generalized pivotal quantity for the CDF F (x|μ, λ) as . A similar algorithm can be developed for interval estimation of the IG CDF. (2000) extracted an airplane light dataset from an earlier version of the large database (Mahar et al. 2011) maintained by RIAC, as presented in Table 1. They have applied the gamma distribution to fit this aggregate dataset and obtained ML estimators of parameters. In this section, we further construct confidence intervals for the parameters and quantiles in the gamma distribution by using our proposed methods. We then apply the IG model to fit the aggregate data and also construct the confidence intervals for parameters and quantiles of IG distribution. For comparison, the normal distribution is also applied to fit the data. Confidence intervals for the normal-distribution parameters and quantiles are also obtained. Detailed inference procedures for the normal distribution can be found in the online supplementary materials. Inference results of the three models are presented in Table 2. These results show that the normal distribution is not appropriate for these data because the coefficient of variation (σ/μ) is large. As a result, the values of small quantiles are negative. For the other two models, the gamma distribution (with Akaike information criterion, AIC value 65.63) Figure 6. Q-Q plots of different models using airplane light data. tends to fit the data better than the IG distribution (with AIC value 66.33) in terms of the AIC. We then assess the goodness of fit of these models by a simple graphical method. For the gamma distribution model, if X ∼ Gamma(k, θ), where k and θ are shape and rate parameters, Wilson and Hilferty (1931) stated that √ 3 of Xθk is approximately N (1 − 1/9k, 1/9k) distributed. For the IG distribution, Seshadri (1999)

Coit and Jin
(1) . The MLEsμ andλ can be substituted into the equation to assess the adequacy of the IG distribution. Figure 6 shows the quantile-quantile (Q-Q) plot based on above approximate distribution. As can be seen, the gamma distribution model seems to provide a better fit to the aggregate data.

CONCLUSION AND FURTHER COMMENTS
Aggregate lifetime data are common in field data collection systems. When the exponential distribution is not valid, analysis of such data is difficult. In this study, we investigated statistical inference of aggregate data by using the gamma distribution and the IG distribution. Accurate methods for constructing confidence intervals of parameters and quantiles in these two distributions were proposed. Interval estimation for the gamma shape was based on a scaled χ 2 approximation and the two parameters in the scaled χ 2 were determined by an analytical moment-matching method. A procedure based on the generalized pivotal quantity method was developed for the gamma rate, gamma mean, and gamma quantiles. According to our comprehensive simulation, the procedure, if applied to individual failure data (m i = 1), outperforms existing interval estimation methods for the gamma rate, mean, and quantile. For the IG mean and shape parameters, two pivotal quantities were developed and used to construct exact confidence intervals. An interval estimation method for IG quantiles was developed based on a generalized pivotal quantity. The simulation results showed the good performance of this method.
Future research can be focused on other distributions such as Weibull or lognormal. This, however, will not be an easy task as the convolution of these distributions is no longer in the same distribution family. On the other hand, nonparametric inference for the aggregate data is of interest. As exact individual lifetimes are not available, the Kaplan-Meier estimation is not applicable here. The phase-type distribution might be a useful candidate for the estimation and further exploration is needed.
The aggregate data discussed in this article are assumed to be failure censored. That is, the observation ends at the time the final failure occurs. This is a reasonable assumption because the data are often recorded in response to an event such as a failure. Our study has systematically investigated this type of aggregate data. However, in other applications data records are made for other reasons, perhaps due to a scheduled inspection or planned data reporting interval. In this scenario, the aggregate data are time censored, that is, the final failure does not occur right at the end of the observation. When the gap after the last failure in each system is not too large, the failure-censored aggregate data could provide a reasonable approximation. For demonstration, we add a gap in time to the airplane light data to make it time-censored. For each system, the gap is proportional to its average failure time (t i /m i ). We still treat the modified data as failure-censored and use the gamma distribution to fit the data. The estimated quantiles under different proportions of modifications are provided after the second column of Table  3. Comparing these results with column 2, we can see that the approximation works well even when the proportion is set as 80%.
However, rigorous analysis of the time-censored aggregate data tends to be more difficult. This is because the probability P (M i = m i |t i ) does not have a closed form, even if the gamma or IG distribution is assumed for the component lifetime. Therefore, the likelihood function L(θ , D) = n i=1 P (M i = m i |t i ) also does not have a closed form. As a result, both point estimation and interval estimation of the lifetimedistribution parameters are difficult. When the gamma or IG distribution is assumed, numerical integration methods such as the Newton-Cotes methods might be useful in computing the likelihood function. For other distributions such as Weibull and lognormal, more advanced techniques such as data augmentation may be needed. This would be an interesting topic for future research.

SUPPLEMENTARY MATERIALS
Technical details: The online supplement materials include a simulation study to assess the robustness of the gamma and the IG distributions to the Weibull and log-normal distributions. A discussion about the identifiability problem based on the aggregate data is provided. Derivation of Equation (1) and proofs of Theorems 1-3 can also been found. In addition, some more simulation results are included. We further consider the normal distribution for component lifetimes in the supplement. Point estimation and interval estimation for normal parameters as well as quantiles based on aggregate data are developed therein.