A new hybrid estimation method for the generalized pareto distribution

ABSTRACT The generalized Pareto distribution (GPD) is important in the analysis of extreme values, especially in modeling exceedances over thresholds. Most of the existing methods for estimating the scale and shape parameters of the GPD suffer from theoretical and/or computational problems. A new hybrid estimation method is proposed in this article, which minimizes a goodness-of-fit measure and incorporates some useful likelihood information. Compared with the maximum likelihood method and other leading methods, our new hybrid estimation method retains high efficiency, reduces the estimation bias, and is computation friendly.


Introduction
The generalized Pareto distribution (GPD) is a two-parameter distribution first introduced by Pickands (1975) having the cumulative distribution function Given a random sample from the GPD(σ, k), various parameter estimation methods have been studied. As the most popular and widely used estimation method, the maximum likelihood (ML) method has been considered by Davison (1984), Smith (1984Smith ( , 1985, Hosking and Wallis (1987), Grimshaw (1993), and the references cited therein. In general, the ML estimators possess high efficiency and the usual asymptotic properties when k < 1/2. However, in the non regular case (defined in Smith, 1984) where 1/2 ≤ k ≤ 1, severe convergence problems may arise according to Grimshaw (1993). When k > 1, the ML estimators do not exist.
Alternatively, Hosking and Wallis (1987) studied the method of moment (MOM) estimators and the probability-weighted moment (PWM) estimators of σ and k. The performance of the MOM estimators and the PWM estimators was evaluated together with the ML estimators in the range −1/2 < k < 1/2 of the shape parameter. They concluded that the MOM was unreliable for k < −0.2, the PWM method performed well only when −1/2 < k < 0, and the ML method needed a sample size as large as n = 500 to reach its asymptotic efficiency. Additionally, when k ≤ −1/2, the MOM estimators do not exist, and when k ≤ −1 the PWM estimators do not exist. It was noted that even when both of the MOM and PWM estimators exist, they may produce unacceptable results because some of the sample points may fall outside the estimated endpointσ /k. The infeasibility of the PWM estimators was also identified in Chen and Balakrishnan (1995). To remedy this problem, Dupuis and Tsao (1998) derived a simple constraint on feasibility into the MOM and PWM estimators. However, all of the estimators based on moments are known to have low large-sample efficiencies and are available within a very restricted region of the parameter space. Castillo and Hadi (1997) proposed an elemental percentile method (EPM) which is assumed to work for all possible values of the parameters. The idea is to make use of the order statistics by initially equating GPD(σ, k) to all pairs of the order statistics, and to use the medians as the final estimators of σ and k. Compared with the MOM and the PWM methods, their simulation results indicated that there was no dominant method in terms of bias and root mean squared error (MSE) when k is restricted in −2 ≤ k ≤ 2. Luceño (2006) investigated a maximum goodness-of-fit (MGF) method based on the family of empirical distribution function (EDF) statistics. By minimizing any of the EDF statistics measuring the discrepancy between F (x; σ, k) given in (1) and the EDF with respect to the unknown parameters σ and k, this method can always yield valid estimates over the whole parameter space. The MGF method when the Anderson-Darling statistic is used gives the most balanced performance. However, the bivariate search for the minimum could be slow and convergence problems arise if without a well-specified initial point. Zhang and Stephens (2009) and Zhang (2010) introduced a new efficient estimation method based on the likelihood and the empirical Bayes method (EBM). This method is computationally friendly and the data-driven prior can ensure that the parameter estimates are valid. However, the performance of the EBM estimators turned out to be dependent on a careful choice of the prior distribution. In Zhang and Stephens (2009), the prior distribution was chosen as the GPD itself with the shape parameterk = −1/2. To improve the poor performance of the EBM in the heavy-tailed situation (k < −1), a modified EBM, denoted by EBM * , was introduced in Zhang (2010). The EBM* was found to dominate the other existing competitors in the extended range −6 ≤ k ≤ 1/2 in terms of estimation bias and efficiency.
The goal of this article is to develop a new hybrid parameter estimation method for the GPD to improve upon the existing methods. In Section 2, we provide the motivation and the details of the new hybrid method. In Section 3, the performance of the proposed hybrid method is evaluated against several significant competitors via simulation. In Section 4, we illustrate our new hybrid method through revisiting the Bilbao sea waves data analyzed in the literature, and in Section 5 we draw some conclusions.

A new hybrid parameter estimation method
The basis of our new hybrid method is the MGF method when the Anderson-Darling statistic is used. Given a random sample X = {X 1 , . . . , X n } from the GPD(σ, k) with distribution function F (x; σ, k) given in (1) and define the indicator function The EDF of the sample is given by The Anderson-Darling statistic A 2 (σ, k) is defined as Denoting the ith order statistic of the sample by X (i) and applying the probability integral transformation to get The MGF method estimatorsσ MGF andk MGF are obtained by minimizing the above Anderson-Darling statistic with respect to the unknown parameters σ and k over the param- This method uses the information from F (x; σ, k) and all order statistics. It ensures that the parameter estimators are valid and the estimators obtained have small biases, but it suffers from a slow bivariate search for minimum and more seriously from convergence problems when k ≤ −1 (the final estimates are very sensitive to the initial values (σ (0) , k (0) )). The MGF estimators also have low efficiency compared with the ML estimators if exist. The log-likelihood function for the parameters σ and k in GPD(σ, k) is given by and the score equations for estimating σ and k are As pointed out by Davison (1984), if we define θ = k/σ and reparameterize (σ, k) into (θ, k), we can use the second score equation to link k to θ through the relation , and obtain the profile log-likelihood function for θ as Suppose that a local maximum of (θ ) can be found atθ MLE numerically over the space B = θ ∈ R : θ < 1/X (n) , then the corresponding ML estimatorsσ MLE andk MLE of σ and k are given byk However, maximizing (θ ) can still be problematic. As shown in Grimshaw (1993), there can be more than one root to d (θ )/dθ = 0, and a carefully chosen initial value θ (0) is essential. An algorithm for computing the ML estimators based on (θ ) was designed in Grimshaw (1993). The interested reader can consult the supplementary materials for this article where proportions of the convergence problem that occurred for both the ML method and the MGF method are reported, based on the algorithm of Grimshaw (1993) and the R function gpdmgf available in package POT, respectively.
The MGF estimators enjoy small estimation bias and wide feasibility, while the high estimation efficiency of the ML estimators is also desired. We, however, want to avoid the computational issues associated with these two methods. These considerations motivate us to study the following hybrid parameter estimation method, which may retain a high degree of efficiency as well as small bias, and is computation friendly.
Under the parameterization of (θ, k) for the GPD and utilizing the likelihood-based link k = −n −1 n j=1 log 1 − θX j , we estimate θ througĥ where G(θ ) = A 2 (θ, k) and B = {θ : θ < 1/X (n) }. Let then the computation formula for the target function G(θ ) is It is much more convenient to minimize this function G(θ ) in one dimension than the ML and MGF methods subject to θ < 1/X (n) to obtain our hybrid estimatorθ HYB of θ. Then our hybrid estimatorsk HYB andσ HYB of k and σ can be found from log(1 −θ HYB X i ) andσ HYB =k HYB /θ HYB .
It seems not easy to derive the (asymptotic) variances for the hybrid estimators due to the hybrid mechanism. However, the bootstrap method introduced by Efron (1979) can provide useful approximations to the distributions of the hybrid estimators. Using the bootstrap method to find standard errors for other estimators of GPD has already been done by many authors, such as Castillo and Hadi (1997) for their EPM estimators and Zhang and Stephens (2009) for their EBM estimators. However, the potential inconsistency of the parametric bootstrap distributions due to the irregularity of the GPD leads us to examine several commonly used bootstrap procedures via simulation in the next section. We have written an R function to compute the suggested bootstrap standard errors and confidence intervals for the hybrid estimators, which is available upon request.

Finite-sample performance comparison
Based on the performance comparison results for various estimators in the literature, we only select the top performers in our performance comparison. This includes the ML estimators, the MGF estimators, and the EBM * estimators.
Data are generated from F (x; σ, k) for σ = 1 and −6 ≤ k ≤ 2. There is no loss of generality to fix σ at 1 because of its invariance property. This parameter range of k contains all published ranges in the literature as subsets and is also of theoretical and practical concerns. First, this range of k covers the exponential distribution when k = 0 and the uniform distribution when k = 1. Second, many real-world examples show the possibility of non regular case where estimates of k > 1/2, such as the two examples in Castillo and Hadi (1997), and one of them will be revisited in the next section. Third, estimates of k ≤ −1 are often observed in heavy-tailed financial data.
Since the ML method can lead to severe convergence problems, we follow Luceño (2006) to use the quasi-maximum likelihood (QML) estimates as the ML estimates when the ML method fails to converge, that is, when the algorithm of Grimshaw (1993) returns "MLE does not converge." It is well known that the ML method does not have a finite estimate of k when k > 1. To accommodate this boundary condition issue, we also use the QML estimates if the computedk > 1 in our simulation. We consider sample sizes n = 20, 50, 100, and 200. For a given sample size n and a given value of k in the interval [−6, 2], we simulate 10,000 samples of size n from F (x; 1, k) and estimate σ and k using our hybrid method and the other methods. For each method, we compute the bias and the MSE for each parameter estimator. For sample sizes n = 50 and n = 200, the biases for the various estimators of σ and k are plotted against k in Fig. 1, and the MSEs are plotted against k in Fig. 2.
We note that the MGF method of Luceño (2006) was not compared with the EBM * method in Zhang (2010) or in other published literature. Luceño (2006) used the root mean squared error (RMSE), while Zhang (2010) used the relative efficiency (the MSE divided by the Cramér-Rao lower bound) as their evaluation measures. In our simulation studies, we found that the RMSEs were usually too close to each other to tell the differences graphically. On the other hand, the Cramér-Rao lower bound does not exist for k > 1/2. We therefore choose the MSE to compare the different estimators.
From Fig. 1, we see that our hybrid method generally outperforms the other three methods in terms of estimation bias. In terms of MSE, we see from Fig. 2 that our hybrid method outperforms the ML method (especially in estimating σ ) and the MGF method (especially in estimating k) and performs better or the same as the EBM * method for −6 ≤ k < 0, slightly worse than the EBM * method for 0 ≤ k < 1 and surprisingly better than the EBM * method for 1 ≤ k ≤ 2. The poor performance of the popular EBM * method for 1 ≤ k ≤ 2 is probably due to the fact that the data-driven prior used in EBM * was set up when −6 ≤ k ≤ −1 was well looked after, but no consideration was given to the case of k > 1/2 because in this case the performance measure (relative efficiency) is not defined. In comparison, our proposed hybrid method behaves well in such a non regular range of k.
Since we would expect asymmetry in the sampling distributions fork andσ due to perhaps the heavy-tail behavior and the boundary condition, it is worth comparing the results for several bootstrap procedures. For the proposed hybrid estimators, we carry out a simulation study comparing three bootstrap percentile confidence intervals based on the parametric bootstrap, the non parametric bootstrap, and the skew-corrected and accelerated bootstrap (BC a ) described in Efron (1987) and implemented in R package boot. In the simulation, we consider 2, 000 replications with 1, 000 bootstrap samples for each replication. The scale parameter σ is fixed at 1 as before and k values are selected between −6 ≤ k ≤ 2. Table 1 reports the coverage probabilities and the average lengths (in parentheses) for the three types of 95% bootstrap percentile confidence intervals when the sample size n = 50. Similar conclusions can be drawn for 90% confidence level and other sample sizes.
It can be seen from Table 1 that in general all three bootstrap percentile confidence intervals perform well, except for k = 1 (uniform distribution over [0, σ ]) and k = 2. For both   The confidence level is %, the sample size n = 50, and σ = 1. the parametric bootstrap and the BC a , when k = 1, their coverage probabilities are quite a bit lower than the 95% confidence level. Since we know that the standard n-out-of-n parametric bootstrap has been proved to be inconsistent when the ML estimators are used in this case, the parametric bootstrap and the BC a based on our hybrid estimation may suffer the same way. When k = 2, the parametric bootstrap also produces slightly lower coverage probabilities (under 92%). On the other hand, the non parametric bootstrap has stable coverage probabilities around the 95% level with sometimes slightly longer interval lengths for the parameter range considered. Therefore, we recommend the non parametric bootstrap procedure for making inference about the hybrid estimators in practice.

Real data example
To illustrate the use of our new hybrid estimation method, a real-world example originally analyzed in Castillo and Hadi (1997) is revisited here. The dataset consists of the zero-crossing hourly mean periods (in seconds) of the sea waves measured in a Bilbao buoy, Spain, in January, 1997. Later on, this dataset was analyzed in Luceño (2006) and in Zhang and Stephens (2009). Only the 197 observations with periods above 7 s are taken into consideration. Following the above-mentioned authors, we fit this dataset to GPD(σ, k) using thresholds at t = 7.0, 7.5, 8.0, 8.5, 9.0, and 9.5. In Table 2, the parameters σ and k are estimated using different estimation methods (see also Table 4 in Castillo andHadi, 1997, Fig. 3 in Luceño, 2006 and Table  3 in Zhang and Stephens, 2009). Note that because the ML estimators do not exist if k > 1, instead of using NAs, we use the estimatorsk = 1 andσ = max X i if the computedk > 1 in Table 2. We can observe from Table 2 that all the methods give estimatesk outside the commonly used range −1 < k < 1/2. In this case, the MOM and the PWM estimates are deficient and may give infeasible results, and the ML estimates are computationally unstable especially for the small sample sizes, and thus is disfavored.
Using the parameter estimation results at threshold t = 7.5, we plot in Fig. 3 the GPD function fitted by the new hybrid estimators together with the EDF and also the quantilequantile plot for the Bilbao waves data (see also Fig. 2 in Luceño, 2006, andFig. 4 in Zhang andStephens, 2009). We see from Fig. 3 that the new hybrid method gives an overall good fit to the Bilbao waves data.
The non parametric bootstrap standard errors forσ HYB andk HYB are found to be se(σ HYB ) = 0.186 and se(k HYB ) = 0.103. The corresponding 95% non parametric bootstrap percentile confidence intervals for σ and k are (1.282, 2.024) and (0.409, 0.823), respectively. The bootstrap confidence interval for k implies that k is significantly different from 0 and 1 and is likely to belong to the non regular case.

Conclusions
In this article, we have developed a new hybrid parameter estimation method for the GPD. Unlike the MOMs, the PWM method, and the ML method, the new hybrid method can always provide valid estimates for a wide parameter space investigated. Unlike the ML method and the MGF method, the new hybrid method does not suffer from the convergence problem and is easier and faster to implement. Compared with the top performer proposed by Zhang (2010), the new hybrid method offers smaller estimation bias and comparable or smaller mean squared error. Coupled with the non parametric bootstrap procedure, the new hybrid method can allow practitioners to conduct reliable and accurate data analysis using the GPD.