Estimating DEA confidence intervals with statistical panel data analysis

This paper describes a statistical method for estimating data envelopment analysis (DEA) score confidence intervals for individual organizations or other entities. This method applies statistical panel data analysis, which provides proven and powerful methodologies for diagnostic testing and for estimation of confidence intervals. DEA scores are tested for violations of the standard statistical assumptions including contemporaneous correlation, serial correlation, heteroskedasticity and the absence of a normal distribution. Generalized least squares statistical models are used to adjust for violations that are present and to estimate valid confidence intervals within which the true efficiency of each individual decision-making unit occurs. This method is illustrated with two sets of panel data, one from large US urban transit systems and the other from a group of US hospital pharmacies.


Introduction
Developing statistical methodologies to deal with noise has become a significant focus of data envelopment analysis (DEA) research. Chambers and Färe [12, p. 329] observed that "more and more effort has been devoted to determining the statistical properties of the DEA approach". It has 816 D.T. Barnum et al. been shown that DEA scores possess stochastic characteristics that permit many types of statistical estimations [3,25]. Stochastic variations in DEA scores have been addressed by methodologies such as chance-constrained programming [15,44], window analysis [17], sensitivity-robustnessstability analysis [16], bootstrapping [1,28,[34][35][36][37][38] and, most recently, a promising new Bayesian procedure [22]. However, none of these methodologies can estimate, with a specified probability, the confidence interval for the true efficiency of an individual organization or other entity, herein called a decision-making unit (DMU).
Bootstrapping as developed to date does not consider the stochastic variations of individual DMUs (with the exception of Atkinson and Wilson's 1995 article [1], which has not been utilized since). Indeed, Simar and Wilson's seminal articles on bootstrapping specifically state that their methodology treats the input-output set of each DMU as constant [34,35,37]. So, bootstrapped confidence intervals estimate the range within which a fixed input-output set's true efficiency occurs with a specified probability, using the set of inputs and outputs from a particular DMU. But the bootstrapping methodology does not incorporate stochastic variations in each individual DMU's output/input performance. Therefore, bootstrapping overestimates (by an unknown amount) the probability that a DMU's true efficiency will occur within the reported confidence limits. Furthermore, because the stochastic variation within each DMU is frequently heteroskedastic across DMUs, the aforementioned unknown overestimation will vary by unknown amounts among DMUs.
Gajewski et al. [22] have recently developed an innovative Bayesian methodology for estimating the best-practice frontier that represents an elegant alternative to bootstrapping. In general, it requires fewer assumptions than does bootstrapping and, more importantly, it introduces Bayesian statistical methodology to the DEA field. Given that bootstrapping and the new Bayesian approach are based on different assumptions and statistical methodologies, researchers estimating bestpractice frontiers may wish to triangulate their estimates by utilizing both methods. However, the Bayesian methodology to date is also based on variations in the frontier while ignoring variations within individual DMUs. So, like bootstrapping, it can estimate the probably distribution for the efficiency of a fixed set of inputs and outputs, but it cannot estimate the probability distribution for the efficiency of individual DMUs.
To date, both bootstrapping and Bayesian estimation have been based on only one observation of each DMU. It is clearly impossible to estimate an individual DMU's variation without multiple observations. So, for those circumstances where only cross-sectional data are available, bootstrapping and Bayesian estimation may be the best alternatives.
But those relying on these methodologies cannot determine whether an assessed DMU is efficient (or inefficient) with a specified degree of statistical significance or construct a confidence interval within which the DMU's true efficiency will occur with a specified probability. Furthermore, it is not possible to determine if an individual DMU's efficiency uptrends or downtrends are statistically significant or just random variations or whether ongoing processes evaluated by DEA scores are in or out of control.

Use of PDA with DEA and the contribution of this paper
When multiple observations of each DMU's scores are available, as that occurring with panel or clustered data, an expected mean score for each DMU can be computed. Then, we can use the distribution of the actual scores around the means to characterize the nature of the stochastic disturbances. All stochastic disturbances, including DEA score errors, can encapsulate complicated, unidentified interactions with other variables. Such disturbances can be treated as random if they satisfy appropriate empirical tests [21,23,24,47]. Therefore, we can empirically test whether DEA score residuals are independent and identically distributed (i.i.d.) and normally distributed. Statistical panel data analysis (PDA) methodologies can be used to identify violations of these requirements and, where violations exist, to employ appropriate statistical models to correct for them in confidence interval estimations.
The first article using PDA with DEA to estimate efficiency confidence intervals for individual DMUs was published in the Journal of Transportation Engineering [5] in 2008. It estimated confidence intervals for the mean DEA scores of Canadian paratransit systems, with the data adjusted for environmental variations. PDA/DEA methodologies have since been utilized by articles in journals from several fields. These consist of articles identifying confidence intervals of Canadian paratransit system efficiencies that had not been adjusted for environmental differences [6], replacing decision-making under uncertainty with decision-making under risk in operations research [7], identifying statistically significant trends in the efficiencies of hospital pharmacies [9] and measuring bus schedule reliability (rather than efficiency) for individual routes with a DEA-inspired linear programming model [30].
This paper contributes to the body knowledge in several important respects. All of the preceding PDA/DEA papers have been directed to specific fields or industries, and all are published in journals whose target audiences are neither applied statisticians nor general users of applied statistical techniques. Because the PDA/DEA methodology would be useful across a wide range of DEA issues amenable to statistical applications, we feel that it should be introduced to a broad audience of applied statisticians. In this paper, we particularly emphasize the statistical methodology and assumptions and their consequences for valid estimation. Because the PDA/DEA methodology uses parametric statistical models to estimate, validly for the first time, DEA score confidence intervals for individual units, it opens up a new application of such models to statisticians.
Furthermore, the Journal of Applied Statistics (JAS) is an especially appropriate outlet for presenting this methodology to applied statisticians because of JAS's publication of other statistical methodologies for dealing with DEA scores. The first is one of the seminal articles on bootstrapping methodology [36] published in 2000, and another is the promising new Bayesian methodology published in 2009 [22]. Unfortunately, the 2000 DEA bootstrapping article [36] explicitly estimates DEA efficiency confidence intervals of individual DMUs, in its case schools. It, therefore, disseminates bootstrapping's fatal flaw of estimating DMU confidence intervals based solely on efficiency variations of the frontier while ignoring efficiency variations of individual DMUs. Based on this and several other key articles, bootstrapping DEA efficiency scores to (incorrectly) develop confidence intervals for individual DMUs has become widespread since 2000. So, it seems appropriate that a valid alternative be published in this journal.
Finally, there is one further concern with bootstrapping as described in [36]. The authors make five assumptions that serve to characterize their data-generating process (DGP). Only the first would be subject to empirical testing and, in fact, all five assumptions are accepted with no empirical evidence that they are valid either for the data set analyzed in [36] or indeed for any other specific data set. Although we have not addressed it in our early articles, the characteristics of the DGP assumed for PDA/DEA modeling can be empirically tested, as we specifically discuss in Section 5, with other heretofore overlooked statistical consequences identified in Section 9. These expositions should remind applied statisticians and others of the importance of both identifying and empirically validating DGP assumptions.

Organization of this paper
In this paper, we exhibit the use of PDA with two sets of panel data. One panel is DEA scores from 50 large urban transit systems, with 5 scores for each system. The other panel is DEA scores from 12 hospital pharmacies, with 13 scores for each pharmacy. Where heteroskedasticity, serial correlation and contemporaneous correlation are present, we identify generalized least squares (GLS) statistical models that account for these conditions in their estimations. Using our two data sets, one that complies with the i.i.d. assumptions and the other that does not, we demonstrate how PDA methods can be used to estimate valid confidence intervals for the true efficiencies of individual DMUs.
First, we discuss our DEA model, the DGP and our statistical diagnostic tests. Next, we apply the procedure to our pharmacy data: we describe the inputs and outputs, the PDA statistical model, the results of tests for i.i.d. and normality, confidence intervals for each pharmacy's expected efficiency over time and control charts for determining when a pharmacy's efficiency is "out of control". Then, we apply the procedure to our transit data: we identify the inputs and outputs, the PDA statistical model, the results of tests for i.i.d. and normality and the confidence interval estimates for each individual transit system's efficiency. Finally, we discuss related issues and conclude.

The DEA model
The DEA scores are based on linear program 1. We utilize the Charnes-Cooper-Rhodes (CCR) input-oriented DEA model [13] adapted so that its scores are not censored at 1. That is, instead of censoring input-oriented efficiency scores θ at one (0 ≤ θ ≤ 1), the model allows θ to vary over [0, ∞), where θ < 1 is inefficient and θ ≥ 1 is efficient, by adding Equation (1.3) to the conventional CCR model. For the j DMUs (j = 1, . . . , J ), there are data on m inputs (x 11 , . . . , x j M ) and on n outputs (y 11 , . . . , y j N ). The DEA score θ identifies the technical efficiency of the assessed DMU k. All DEAs are conducted with Scheel's efficiency measurement system (EMS) software [32]. We use this model with both the transit and the hospital pharmacy data: We use the CCR model because it has been demonstrated in several other studies of hospital pharmacies [26,33] and confirmed for our data herein that hospital pharmacies have constant returns to scale. Although transit systems sometimes produce decreasing returns to scale, our sample of large urban transit systems showed constant returns to scale, so the CCR model is also appropriate for that data set.
The model is based on a contemporaneous frontier [45], that is, efficiencies are computed separately for each cross section of the data. We use a contemporaneous frontier in this paper, where efficiencies in each time period are estimated based on the most efficient DMUs in that time period, because it results in estimates more sensitive to the presence of contemporaneous correlation.
The conventional CCR model reports censored scores. The output/input ratio of the assessed DMU is compared to the output/input ratio of its efficient peers only if the assessed DMU is inefficient. If, however, the assessed DMU is efficient, then its output/input ratio is compared to its own output/input ratio, so its score cannot exceed 1.
Uncensored DEA scores result when the output/input ratio of the assessed DMU is compared to the output/input ratio of efficient peer DMUs, regardless of the assessed DMU's level of efficiency. (Note that Equation (1.3) prevents the DMU being assessed from entering into the comparison base.) For example, suppose that the maximum value of an assessed DMU's aggregated and weighted outputs to aggregated and weighted inputs is 4/10, while its composite efficient peer (using the identical set of weights) has an output/input ratio of 5/10. In this case, because the assessed DMU is inefficient, the CCR model would report its efficiency as 0.8 whether or not the scores are censored. Now, suppose that in periods 2, 3 and 4, the assessed DMU's ratios are 5/10, 7/10 and 6/10, while the composite to which they are compared remains at 5/10. If the scores are censored, the assessed DMU's reported efficiencies for the three periods will be 1.0, 1.0 and 1.0. If the scores are uncensored, then the assessed DMU's reported efficiencies will be 1.0, 1.4 and 1.2.
From the viewpoint of deterministic estimation of efficiency in which the data are assumed to be non-stochastic, the censored score is sufficient and can serve as a direct proxy for efficiency, thereby following the DEA convention that relative efficiency can never be greater than 100%. One purpose of the CCR program is to identify DMUs that are on the production frontier, and a score of 1 symbolizes a point on the production frontier.
But for statistical estimation, when a ubiquitous stochastic variation is taken into account, the uncensored score is superior. Among other reasons, information useful for estimating statistical significance and statistical confidence is not discarded.
Utilizing the full range of scores (0 ≤ θ < ∞) does not affect which DMUs are reported to be efficient or affect the scores of any inefficient DMU. For any DMU with a score of 1 or greater, we know that no other DMU is more efficient at its location on the frontier. So, the estimated efficient frontier and the DMUs defining it will be identical whether or not the scores are truncated at 1.
Likewise, the reported efficiencies of inefficient DMUs are not affected, because whether or not a benchmark DMU's score is recorded as 1 or some higher value, the outputs and inputs underlying that score will be what determine the score of an inefficient DMU that it is benchmarking. For example, suppose that the DMU assessed above is a benchmark DMU for periods 2, 3 and 4. And, the DMU that it is benchmarking has a constant output/input ratio of 4/10 for all three periods. This constant ratio will be compared to the benchmark DMU's ratios of 5/10, 7/10 and 6/10, resulting in efficiency estimates of 0.8, 0.57 and 0.67 for the assessed DMU. This is true regardless of the benchmark's reported efficiencies being truncated at 1 for all three periods or reported as 1.0, 1.4 and 1.2. In short, using uncensored DEA scores provides valuable information for statistical estimation but has no effect on which DMUs are reported as efficient or on the efficiency scores of inefficient DMUs.

The data-generating process
The appropriate method for creating confidence intervals depends on the characteristics of the DGP. We do not assume that the DGP yields errors in DMU scores that are i.i.d. and normally distributed or that it does not. Rather, we hypothesize that deviations from (DMU level) mean scores are i.i.d. and normally distributed. We empirically test the data, via examination of residuals around their DMU-specific means, to verify or reject our hypotheses. Then, we use methods for generating probabilities that have been justified by empirical evidence about the DGP rather than basing our choices on implicit or explicit assumptions.
As is true for traditional DEA [13,17,22], the DMUs in our data sets are the population, and the data cover the time period of concern. That is, we adopt the convention that a DMU's relative efficiency is determined solely by comparisons with the other real DMUs in the analysis. In many cases, those concerned with the performance of particular DMUs want to know how those DMUs compare with their actual competitors/peers. This is certainly true in our pharmacy case: the reason for using the control system is to compare the 12 pharmacies with each other and with themselves over time. In transit, policy-and decision-makers usually are most interested in how specific operations compare with other specific operations [31,39,43]. (If indeed the sample at hand does not represent the entire population of interest, then, building on current work [22], panel data Bayesian methodologies that estimate the best-practice frontier for the population would be a welcome addition to the literature.)

Statistical diagnostic testing
The standard assumptions are that the random errors in the DMU's efficiency scores are i.i.d. and normally distributed. Before any attempt to construct significance levels or confidence intervals can be validly performed, it is necessary to confirm that the preceding assumptions hold. If the conditions do not hold, appropriate GLS models would be necessary to account for violations. That is, one must empirically test the data to determine whether the DGP has produced DEA scores that are (or are not) i.i.d. and normally distributed and then use models justified by the evidence. All statistical analyses in this paper use Stata 10 [42].
To test for normal distribution of the errors, we use the Shapiro-Wilk W test, the Shapiro-Francia W test and a joint skewness and kurtosis test [40]. To determine if the errors are identically distributed, we test for heteroskedasticity across the DMUs with the Breusch-Pagan/Cook-Weisberg method [24].
The errors are not independent if either a serial or a contemporaneous correlation is present. Serial correlation, often called autocorrelation, occurs among a DMU's error terms when its error in one time period is correlated with its errors in other time periods [2,24,29,47]. We test for serial correlation using the Wooldridge test for autocorrelation in panel data [47]. Contemporaneous correlation, also called cross-sectional correlation, cross-sectional dependence and spatial correlation, occurs when the error terms across DMUs are correlated [21,24,47]. The main reason for contemporaneous correlation in DEA is that each DMU's score is influenced by the performance of efficient DMUs. If certain efficient DMUs systematically influence certain other DMUs, it may cause correlation among their error terms. Two tests for contemporaneous correlation when the number of panel members exceeds the number of time periods are Frees' R 2 AVE evaluated with his Q-distribution [20,21] and Pesaran's CD cross-sectional dependence test [18]. When the number of periods exceeds the number of panel members, the Breusch-Pagan Lagrange Multiplier (LM) test for contemporaneous correlation can also be applied [47].
GLS models that can simultaneously correct for serial and contemporaneous correlations, and for heteroskedasticity, are readily available. For example, when the number of DMUs exceeds the number of time periods, one can apply the Driscoll and Kraay [19] standard error estimator (available in Stata with the PDA command xtscc) [10]. This yields a non-parametric variancecovariance matrix estimator with standard errors that are robust to heteroskedasticity and to serial and contemporaneous correlations [27]. When the number of time periods exceeds the number of DMUs and a fixed-effects model is used, the preceding estimator as well as the Prais-Winsten estimator [24] can be utilized (available in Stata with the PDA command xtpcse) [10,41]. It computes parametric variance-covariance estimates that are robust to contemporaneous and serial correlations and heteroskedasticity.

Data, inputs and outputs
The data set comes from a system of 12 hospitals in the USA and consists of 13 periods of bi-weekly data from the first six months of 2008. The data are from the pharmacy departments within these hospitals. This group of hospital pharmacies recently adopted a new set of clinical and distributional output indicators. In another paper that analyzed all of the inputs and outputs, we used an intertemporal frontier, so all 10 outputs and 3 inputs could be included and all efficiency scores would be estimated from the same efficient set [9]. For this paper, we use a contemporaneous frontier so the statistical diagnostic tests would be more sensitive to violations of independence. However, this results in only 12 observations for each DEA, so the number of inputs and outputs had to be decreased. Therefore, we use clinical outputs and input to measure the cross-sectional clinical efficiency of each pharmacy for 13 periods.
There is one input -clinical labor hours. Labor is the most important and controllable input that impacts hospital pharmacy efficiency, particularly from the standpoint of clinical pharmacy services. While the cost of drugs (another potential input) used in the production process in hospital pharmacy is a significant resource, it relates more to the product-related or "distributive" functions of the pharmacy and is primarily considered a pass-through (or "throughput"). All other operating and capital costs are relatively insignificant, are directly related to labor hours and certainly could not be substituted for labor.
There are two outputs included in the analyses. These are the number of clinical interventions made by pharmacists and the sum of the estimated dollar savings from those interventions. Clinical functions of hospital pharmacists include activities such as medication management, drug evaluation and selection and reviewing patient drug use. By making recommendations to physicians about the appropriateness of drug therapy, pharmacist time spent in clinical activities often results in substantial savings to the hospital in drug costs as well as improvements in patient outcomes. The number of interventions serves as a proxy for the positive effects on patients, and the dollar savings stands for itself. The data for each of the clinical outputs and inputs were available for each hospital and each pay period and were generated by the pharmacy electronic documentation system used at the facilities.

Our statistical model (Equation 2) is
where θ j t is the efficiency score of pharmacy j in period t. The response variable is the product of θ j t and the weight w j , using Equation 3 to compute the weight based on the standard error of estimate for data from pharmacy j . α j is the individual effect of pharmacy j , β j is the mean change per period in the individual effect α j of pharmacy j and u j t is the random error in the response variable of pharmacy j in period t. Some pharmacies showed linear trends in efficiency over time, so Equation 2 includes a factor (t−1) that adjusts expected efficiency for the year involved. If there is no change in the efficiency scores over time for pharmacy j , or if the temporal trend is inconsistent, then β j will not be significant. Finally, in Equation 3, α * j and β * j are estimated using θ j t as the response variable. Because the pharmacies' error distributions are heteroskedastic, we obtain homoskedasticity by weighting each pharmacy j 's DEA scores by its standard error [11], as per Equation 3:

Results of statistical diagnostic tests
The Breusch-Pagan/Cook-Weisberg test for heteroskedasticity found no statistically significant differences among the pharmacies [χ 2 = 0.00, P (χ 2 (11) > 0.005) =  (13) slightly exceeds the number of pharmacies (12), the Breusch-Pagan LM test of independence is available [χ 2 = 66.667, P (χ 2 (66) > 66.667) = 0.454]. Therefore, the null hypotheses that the residuals are i.i.d. and normally distributed cannot be rejected, so we employ the standard assumptions in developing probabilities and confidence intervals.  Figure 1 provides the range within which true efficiencies (the actual value of the population mean) of each hospital pharmacy's clinical system are expected to occur with 0.95 confidence given the data. They are calculated as the expected value of the observation plus/minus the standard error of prediction times 1.96.

Confidence intervals
One point of interest is the relatively wide range within which the true efficiencies are expected to occur. For example, one would not be able to statistically reject at a 5% level, a hypothesis that the true efficiency was 0.7 or 0.8 for any of the hospitals. Though one could reject, with the exception of hospital pharmacy 9, that true efficiency was below 0.6.    Specifically, the bottom line is the expected value of the observation minus the standard error of forecast, that is, the standard error of the point prediction for one observation, times 1.282. If the most recent observation is at or below the bottom line, it is likely that its efficiency score is not just a random variation, but really is lower than expected. In such cases, an immediate examination is in order. Such an examination is needed for hospital pharmacy 9, whose most recent efficiency level is very likely a reflection of true inefficiency rather than random variation.

Control charts
Because we had 12 hospitals but only 13 periods, we included all periods in our computations. However, in order to increase the power of the models to identify variations that are not random but a true change, the final period should be excluded from both the DEA and PDA models when developing control charts so that its variation does not influence the results.

Data, inputs and outputs
Urban transit agencies often oversee multiple modes and provider types of public transportation in their metropolitan areas. In the USA, the most common non-rail modes are scheduled motorbus, and paratransit (demand-responsive transit), with service being provided both by the agency itself and by the outsourcing service. That is, US transit agencies provide non-rail service with one to four organizational subunits: directly-operated motorbus service, outsourced motorbus service, directly-operated demand-responsive service and/or outsourced demand-responsive service. The largest agencies generally have two or three subunits, with all the four subunits being utilized by some [46].
We consider one output and one input from each subunit. We use the annual number of vehicle miles supplied by each subunit as our indicator of output and each subunit's operating expenses (standardized for cost differences across time and across cities) as our indicator of input. Thus, there are four outputs and four inputs. Our sample consists of the 50 agencies with 150 or more vehicles in maximum service, which included all such agencies for which all of the needed annual data were available for the years 2002-2006. The data are from the National Transit Database [46]. More detail on inputs, outputs and organization can be found in a paper providing a protocol for analyzing the efficiency of an urban area's transit when multiple types of services are operated [8].

Statistical panel data model
The PDA regression model (Equation 4) is The definitions are the same as those for Equation 2. However, unlike the model we used for the pharmacy data, we did not include a term for trends in the transit model. We did this partly because five observations seem to be too few to validly identify trends and partly to demonstrate a model that estimates each DMU's true efficiency based on multiple observations. Equation 5 provides the model used to determine weights [11], with definitions being the same as those for Equation 3:

Results of statistical diagnostic tests
The 1046) = 0.01 and CD = 3.937, P (CD > 3.937) = 0.0001. Because both serial and contemporaneous correlations are present to a statistically significant degree, we must adjust for this in our subsequent confidence limit estimations.

Ranges within which true mean efficiencies occur
Based on the preceding diagnostics and the fact that the number of DMUs exceeds the number of time periods, we estimated the efficiencies of each DMU using the Driscoll-Kraay model, thereby correcting for serial correlation and for contemporaneous correlation. The resulting efficiency estimates for the individual DMUs are given in Table 1. The DEA model is input oriented, so lower scores mean lower efficiency. The confidence intervals are based on the standard error of prediction of the true expected value at the 0.98 level of confidence. This means that any given system will only receive the incorrect efficiency report one percent of the time.
As shown in Table 1, the point estimates of the mean efficiency for 36 DMUs showed them to be inefficient, but 7 of them were not inefficient to a statistically significant degree. Of the 14 DMUs with efficient mean point estimates, 5 were not efficient to a statistically significant degree. Thus, whether a quarter of the agencies were or were not efficient cannot be determined with statistical confidence. Therefore, classifying a DMU as efficient (inefficient) based on a single DEA score or even a five-year mean, without considering confidence intervals, renders the validity of the classifications questionable.

Discussion
It may be worth noting that a contemporaneous correlation does not bias the expected value of an estimated DEA score, but variance estimates can be more efficient if this correlation is taken into account. The main benefit of correcting for it when it exists is attaining a better precision and thereby increasing power of the model to detect true differences. However, contemporaneous correlation should not automatically be corrected for unless it can be shown to be present. For example, using a GLS model that corrects for contemporaneous correlation when it is not in truth present will underestimate the model's confidence intervals, which could result in a DMU being incorrectly classified as efficient (inefficient). We did not include any independent variables in our regressions, such as environmental or other exogenous influences on efficiency. They could be included if models correcting for any i.i.d. violations are used. But we would counsel caution in any procedure that computes DEA scores in the first stage and then estimates the effect of exogenous variables in a later stage. As is well known [14,25], such two-stage procedures can suffer from severe bias and precision problems and sometimes lack sufficient power to detect the true effects of independent variables [4,5,48].
In order to improve a DMU's efficiency, it is first necessary to validly estimate the range within which its true efficiency occurs. Developing a methodology to do so is the purpose of this paper. After the efficiency indicators are collected, it is next necessary to identify the causes behind their values. However, illustrating this type of analysis is beyond the scope of this methodologyfocused paper.

Conclusions
As exhibited in this paper, statistical PDA methodology provides a useful tool for estimating valid confidence intervals for the DEA scores of individual organizations or other entities. The PDA deals with stochastic data from both the individual entities and the production frontier and greatly increases the variety and power of statistical models, diagnostic tests and remedies. Moreover, although violations of i.i.d. and normality by DEA score residuals can occur, they are not inevitable. In our pharmacy example, there were no violations except for heteroskedasticity. As we have demonstrated herein, the PDA provides methodologies that can identify violations of i.i.d. and normality when they do occur. And, the PDA offers statistical models that can be used to remedy any violations and thereafter estimate valid confidence intervals for individual DMUs.