Robust estimation of Pareto-type tail index through an exponential regression model

Abstract In this paper, we introduce a robust estimator of the tail index of a Pareto-type distribution. The estimator is obtained through the use of the minimum density power divergence with an exponential regression model for log-spacings of top order statistics. The proposed estimator is compared to existing minimum density power divergence estimators of the tail index based on fitting an extended Pareto distribution and exponential regression model on log-ratio of spacings of order statistics. We derive the influence function and gross error sensitivity of the proposed estimator of the tail index to study its robustness properties. In addition, a simulation study is conducted to assess the performance of the estimators under different contaminated samples from different distributions. The results show that our proposed estimator of the tail index has better mean square errors and is less sensitive to an increase in the number of top order statistics. In addition, the estimation of the exponential regression model yields estimates of second-order parameters that can be used for estimation of extreme events such as quantiles and exceedance probabilities. The proposed estimator is illustrated with practical datasets on insurance claims and calcium content in soil samples.


Introduction
Extreme value theory (EVT) has become an important tool for the estimation of rare events in many disciplines that are related to environmental science, hydrology, insurance and finance, among others. The process of extreme value analysis involves fitting an extreme value distribution, characterized by a tail index, which measures the tail heaviness of the distribution function. The most common method for estimating the parameters of an extreme value distribution in an extreme value analysis is maximum likelihood. Also, in the semi-parametric framework, the Hill estimator (Hill 1975) remains the most popular among a number of estimators. However, these estimators do not take into account possible deviations from assumed extreme value models. These may arise as a result of possible outliers in the data that may (or may not) have been recorded in error. In such a dataset, the estimators mentioned above are known to be sensitive to these outlying observations, affecting their quality. In addition, small errors in the estimation of model parameters, such as the tail index, can cause significant errors in the estimation of extreme events such as high quantiles and exceedance probabilities (see e.g., Brazauskas and Serfling 2000).
Robust statistics presents a better method for addressing outliers and deviations from assumed parametric models. In the context of extreme value analysis, its usage may appear to be contradictory. However, it has been shown that employing robust statistical ideas in extreme value theory, improves the quality and precision of estimates (Dell'Aquila and Embrechts 2006). Early applications of robust estimators include the Optimal Biased Reduced Estimator (OBRE) of the parameters of the GEV distribution (Dupuis and Field 1998), generalized mean and trimmed mean type estimators Serfling 2000, 2001), method of medians for the generalized Pareto distribution (Peng and Welsh 2001), and an integrated squared error approach on partial density component estimation of the parameters of the generalized Pareto distribution (Vandewalle et al. 2007).
Furthermore, Ju arez and Schucany (2004) seems to be first authors to employ the minimum density power divergence (MDPD) of Basu et al. (1998) for the robust estimation of the parameters of an extreme value distribution. Since then, this divergence measure has become the most sought after divergence measure for robust estimation of parameters of extreme value distributions. Kim and Lee (2008), Dierckx, Goegebeur, and Guillou (2013), Goegebeur, Guillou, and Verster (2014), Dierckx, Goegebeur, and Guillou (2021) have made use of the MDPD in estimating the tail index and quantiles from Pareto-type distributions. Recently, Ghosh (2017) proposed a robust MDPD estimator for a real-valued tail index. This estimator is a robust generalization of the estimator proposed by Matthys and Beirlant (2003) and addresses the non-identical distributions of the exponential regression model using the approach in Ghosh and Basu (2013). Also, Dierckx, Goegebeur, and Guillou (2013) employs the MDPD concept on an extended Pareto distribution for relative excesses over a high threshold. This distribution has second-order properties that are suitable for bias reduction in e.g., quantile estimation (Dierckx, Goegebeur, and Guillou 2021).
In the present paper, we propose a robust estimator for the tail index of a Paretotype using the MDPD idea on an exponential regression model. Our estimator is a robust generalization of the estimator in Beirlant et al. (1999), and hence, it is different from the estimator in Ghosh (2017). Again the use of this exponential regression model leads to estimates of other second-order parameters that can be used to obtain biasreduced estimators of extreme events such as quantiles and exceedance probabilities.
The remainder of the paper is organized as follows. In Section 2, we present the robust estimation methods of the tail index, beginning with an introduction to extreme value theory. The robustness properties of the proposed estimators are studied using influence functions and gross error sensitivity analyses in Section 3. In Section 4, the proposed estimator of the the Pareto-type tail index is compared with two existing estimators in the literature via a simulation study. Section 5 presents an illustration of the proposed estimator applied to estimation of the tail index of practical data sets from insurance and pedochemical studies. We provide concluding remarks in Section 6.

Estimation method
Let X 1 , X 2 , :::, X n be a sample of independent and identically distributed observations from some process with underlying distribution F. Also, let X 1, n X 2, n :::, X n, n be the corresponding order statistics associated with the sample. In order to carry out inferences on extreme events in the far tails or beyond the data, one approach is to study the behavior of the sample maximum, X n, n ¼ maxfX 1 , :::, X n g: The well-known Fisher-Tippett Theorem (see e.g., Fisher and Tippett 1928;Gnedenko 1943;de Haan and Ferreira 2006) ensures that a suitably normalized maximum, X n, n , converges in distribution to a non-degenerate limit as n ! 1: Such a limit distribution was shown to be the generalized extreme value (GEV) distribution. Formally, if normalizing sequences of constants a n > 0 and b n 2 R exist, then lim n!1 P X n, n À b n a n x If data from a distribution function, F, satisfies (2), F is said to belong to the domain of attraction of G c , denoted by F 2 DðG c Þ: Here, c is the shape (tail index) and it measures the tail heaviness of the underlying distribution, F. In particular, the distribution belongs to the Fr echet domain of attraction for c > 0, Gumbel domain of attraction for c ¼ 0, and the Weibull domain of attraction for c < 0 with a finite right endpoint. The goal of extreme value analysis is mainly to obtain estimators of high quantiles, exceedance probabilities and return periods. However, each of these estimators depends on the extreme value index, c: Therefore, the estimation of c remains an important research area in EVT. Another approach to obtaining the tail index relies on the Balkema and de Haan (1974) and Pickands (1975) theorem, which states that the underlying distribution is in the max-domain of attraction of the GEV distribution if and only if the distribution of excesses over high thresholds is asymptotically the generalized Pareto (GP). An application of this theorem in Davison and Smith (1990) gave rise to the so-called Peaks-Over-Threshold (POT) methodology in extreme value analysis.
Among the early and popular methods for estimating the parameter in (2) include the maximum likelihood method, probability weighted moments and elemental percentile. In addition, other semi-parametric estimators exist such as Hill (Hill 1975), moment (Dekkers, Einmahl, and de Haan 1989), exponential regression (Beirlant et al. 1999, Beirlant, Joossens, andSegers 2009). However, in most instances the parametric distribution, GEV or GP, may not model all the data well. In addition, small deviations from the assumed model may cause considerable effect on estimation of parameters and thereby affect the estimation of extreme events such as high quantiles and exceedance probabilities. Robust estimation aims at providing estimates that are stable or consistent within the neighborhood of the assumed model and can provide an assessment for the fit of the data to the model. If an extreme observation is down weighted, then inferences on the GEV or GP is potentially flawed. Two options available are to base inferences on the part that is well fitted by the extreme value distributions, GEV and GP, or to obtain a desirable model where the weights are consistent with the bulk of the data.
In this paper, we follow the latter and consider the estimation of c > 0, i.e., in the Fr echet domain of attraction. Such domain has survival function, or tail quantile function with Q the quantile function of F. Here, ' F and ' U are slowly varying functions defined as, and similarly for ' U : In the next two sub-sections, we present the two methods used in the estimation of the tail index of a distribution function. In the third sub-section, we discuss the robust method of estimation of the tail index using the minimum density power divergence method of Basu et al. (1998).

Extended Pareto model
From (3), the conditional survival function of the relative excesses P X u > x X > u j Þ À converges to x À1=c for x > 1: Using the k upper order statistics and the Pareto-type behavior, an estimate of c is obtained as the slope of the Pareto quantile plot. Also, the maximum likelihood estimator of c is the usual Hill estimator (Hill 1975) given bŷ j log X nÀjþ1, n À log X nÀj, n À Á : This estimator has been studied extensively in the literature because of its attractive properties. However, it is known to have large bias and sensitive to outliers.
In view of this, Dierckx, Goegebeur, and Guillou (2013) employs the second-order condition of Beirlant, Joossens, and Segers (2009) on the rate of convergence of (5) to improve on the bias of this estimator. Denote by RV b , a class of functions regularly varying at infinity, with index b, satisfying with (7) reducing to (5) if b ¼ 0: The second-order condition needed to obtain the survival function of the extended Pareto distribution is given by: Condition 1. Suppose c > 0 and s < 0 are constants, the distribution function F is said to satisfy the second-order condition if x 1=c ð1 À FðxÞÞ ! C 2 ð0, 1Þ as x ! 1 and the function d defined via is ultimately non-zero, of constant sign and jdj 2 RV s (Dierckx, Goegebeur, and Guillou 2013, 71). Equivalently, from Condition 1, the tail quantile function U, satisfies y Àc UðyÞ ! C c as y ! 1: Also define a function a implicitly as with aðyÞ ¼ dðC c y c Þð1 þ oð1ÞÞ as y ! 1: Thus, jaj 2 RV q where q ¼ cs: The secondorder condition was then used to obtain an extended Pareto distribution with survival function given by and a density function where, c > 0, s < 0 and d 2 maxfÀ1, 1=sg: In practice, (9) is fitted to relative excesses over a threshold, X nÀk , denoted Y j :¼ X nÀjþ1, n =X nÀk, n , j ¼ 1, 2, :::, k: The parameters c > 0, s < 0 and d can be estimated through maximum likelihood (Beirlant, Joossens, and Segers 2009).

The exponential regression model
Consider again X 1 , X 2 , :::, X n i.i.d. random variables with common underlying distribution F and associated quantile function, Q. Then for the Pareto-type tails i.e., c > 0, the survival function is given by (3). Similarly, the associated tail quantile function U can be written in terms of the associated slowly varying function ' U in (4). From (4), the order statistics X 1, n , X 2, n , :::, X n, n can be represented jointly as where U À1 j, n , j ¼ 1, 2, :::, n denote the order statistics of the standard uniform distribution, Uð0, 1Þ: From (11), Beirlant et al. (1999) obtains an approximate representation for k 2 f2, 3, :::, n À 1g: The authors state that a more accurate representation is obtained from (12) by implementing a slow variation with remainder condition on the rate of convergence to the limit in (5). This is given as Condition 2: Condition 2. There exists a real constant q 0 and a rate function b satisfying bðxÞ ! 0 as x ! 1 such that for all u ! 1, as x ! 1 with j q ðuÞ ¼ Ð u 1 v qÀ1 dv (Beirlant et al. 1999, pg. 183).
Under Condition 2, Beirlant et al. (1999) shows that the weighted log-spacings of the order statistics, are approximately exponentially distributed. Specifically, they obtain an approximation given by, where each E i is a standard exponential random variable, and b n, k ¼ bððn þ 1Þ=ðk þ 1ÞÞ ! 0 as k, n ! 1, and q < 0 are second-order parameters. The parameters in (15) were estimated by maximum likelihood in Beirlant et al. (1999) and shown to be better at reducing bias than the traditional estimators such as Hill (1975). Also, when b n, k ¼ 0 in (15), the resulting maximum likelihood estimator is exactly the Hill estimator (Hill 1975).
In this paper, we propose estimating the parameters robustly using the density power divergence method of Basu et al. (1998). Our proposal is different from Ghosh (2017), in three ways. Firstly, whereas we use the distribution of log-spacings of order statistics, Ghosh (2017) uses the distribution of log-ratio of order statistics. Secondly, our proposal is strictly for the Fr echet domain i.e., c > 0 as against c 2 R: Lastly, the estimation of c and the second order parameters yields estimates that can be used in the reduced-biased estimators such as for quantiles and exceedance probabilities.

Robust estimation through the minimum density power divergence
Consider two density functions f and g. The minimum density power divergence between f and g, introduced by Basu et al. (1998), has been used extensively to provide robust estimators and in recent years has received attention in extreme value analysis (see e.g., Guillou 2013, 2021;Kim and Lee 2008). The popularity of the density power divergence function stems from its implicit usage of the empirical density function of the data. In this method, weighted likelihood estimation equations are developed and observations that are outliers in relation to the model distribution are down-weighted by a robustness parameter, a, of the model density.
The density power divergence between any two density functions f and g, is defined as Here, the case of a ¼ 0, is obtained by taking the limit a ! 0 of the first case a > 0 and the resulting divergence is the Kulback-Leibler divergence. Consider the i.i.d. sample X 1 , :::, X n from a model distribution function F of which h is an unknown parameter of interest. The minimum density power divergence (MDPD) estimator of h is obtained by minimizing the divergence between the data and the model density The MDPD estimator of the parameters of the extended Pareto distribution, (10), applied to the relative excesses, Y j :¼ X nÀjþ1, n =X nÀk, n , j ¼ 1, 2, :::, k, is obtained from the following system of equations and The estimating Equations (18) and (19) depend on the unknown parameter s, which is obtained in Dierckx, Goegebeur, and Guillou (2013) using the reparametrisation, s ¼ q=c: The asymptotic normality of these estimators is shown in that paper.
In the case of the exponential regression model, described in Section 2.2, the weighted log-spacings of order statistics, Z i , i ¼ 1, :::, k À 1, in (14) each has distribution function F h i and corresponding density function f h i : Although the Z i 's are independent having approximate density f h i , an exponential distribution, they are not identically distributed. Note that h i ¼ c þ b n, k ði=k þ 1Þ Àq , and hence, it is a linear function of c and non-linear functions of the other parameters, b n, k and q: The minimum density power estimator for the parameters c, b n, k and q, can be obtained by following Ghosh and Basu (2013) and Ghosh (2017), viz. by minimization of the function whereĝ i is a non-parametric estimator of g i obtained from the observed sample. Since there is only one observation for each density, g i , following Ghosh (2017) the best possible nonparametric estimatorĝ i of g i , is given by the non-degenerate distribution at Z i : Then, rewriting (20) using the exponential density, we obtain as in Ghosh (2017), where h i ¼ c þ bð i kþ1 Þ Àq with b ¼ b n, k : The parameters g ¼ ðc, b, qÞ can then be obtained by minimizing the objective function (21). Alternatively, these estimators can be obtained by solving the estimating equations @H k @g ¼ 0, given by Taking the derivatives in (22) gives where, for any l 2 ð0, 1Þ, we define J a ðl, gÞ ¼ ðJ 1, a ðl, gÞ, J 2, a ðl, gÞ, J 3, a ðl, gÞÞ 0 with J 3, a ðl, gÞ ¼ À ð1 þ aÞbl Àq log ðlÞ À Á =ðc þ bl Àq Þ aþ2 : 3. Robustness of the proposed estimators

Influence function analysis
Hampel (1974) provides a classical tool for measuring robustness known as an influence function. This function gives a first-order approximation of the asymptotic bias of any estimator under contamination by an outlying observation. In practice, we seek a bounded influence function and hence, such an estimator's bias will not increase indefinitely under contamination by very far outlying points. In order to derive the influence function for our proposed estimator, we first need to define it in terms of statistical functionals. Let Z i have a true distribution G i with density g i for each i ¼ 1, 2, :::, k À 1: Denote G ¼ ðG 1 , G 2 , :::, G kÀ1 Þ 0 andĜ ¼ ðĜ 1 ,Ĝ 2 , :::,Ĝ kÀ1 Þ 0 whereĜ i is the empirical distribution function corresponding to G i for i ¼ 1, 2, :::, k À 1: Then it is easy to see that our minimum density power divergence estimator of g ¼ ðc, b, qÞ is given byĝ a ¼ T a ðĜÞ where T a ðĜÞ is the corresponding statistical functional defined as the solution of the following population estimation equation, Now, following Ghosh and Basu (2013) and Ghosh (2017), we may assume contamination to be in any one or more (even all) g 0 i s: For simplicity, let us first assume the contamination is only in g i 0 for some fixed i 0 2 f1, 2, :::, k À 1g and the corresponding contaminated density and distribution functions are g i 0 , ¼ ð1 À Þg i 0 þ d t i 0 and G i 0 , ¼ ð1 À ÞG i 0 þ D t i 0 respectively, where d t i 0 and D t i 0 are the density and distribution functions of a degenerate distribution at the contamination point t i 0 and is the contamination proportion. Then, g ¼ T a ðG 1 , :::, G i 0 À1 , G i 0 , , G i 0 þ1 , :::, G kÀ1 Þ satisfies the estimating Equation (27) with g i 0 replaced with g i 0 , and g replaced by g (so that h i is replaced by the corresponding value, h i, , computed from g for all i ¼ 1, 2, :::, k À 1), i.e., we have Differentiating (28) with respect to at ¼ 0 and evaluating terms, we obtain the required partial influence function of T a under contamination only at the i 0 -th density as given by Here, g g ¼ T a ðGÞ, h g i 0 is the corresponding value of h i 0 obtained from g g and W n ðGÞ is defined from equations (3.3) and (3.5) of Ghosh and Basu (2013). In our case, we can simplify the form of W n ðGÞ for general distribution functions G i s to obtain 1 l Àq Àbl Àq log ðlÞ l Àq l À2q Àbl À2q log ðlÞ Àbl Àq log ðlÞ Àbl À2q log ðlÞ b 2 l À2q ð log ðlÞÞ 2 2 4 3 5 : To illustrate the influence function for our estimators, let us simplify the influence function for the case where the exponential regression model, (15), is valid. In that case, let G i F h 0 i , the exponential distribution with mean, h 0 i , computed from (true) parameter value g 0 ¼ ðc 0 , b 0 , q 0 Þ for all i ¼ 1, 2, :::, k À 1: In addition, denoting Then the influence function has the simplified form Considering the range of values for a > 0, the influence function for our estimators is bounded. However, when a ¼ 0, the expression (31) simplifies to From (32) it is easy to see that the influence function is linear in terms of the contamination point t i 0 and hence it is unbounded. Therefore, we conclude that our estimators for the parameters c, b and q of (15) obtained with a > 0 are robust with respect to contamination at any or all of the Z j 's compared to the maximum likelihood estimators obtained in Beirlant et al. (1999) for the case a ¼ 0: The fixed-sample influence functions are shown in Figure 1 for various values of c and at different contamination points, t i 0 : Clearly, the influence function is unbounded for a ¼ 0 : it has a linear increasing function as the contamination point becomes more extreme. However, in the same figures boundedness of the estimators can easily be seen as the figures become flatter when a point of contamination is farther detached from the bulk of the data.
In a similar way, if there is contamination in all densities as g i, ¼ ð1 À Þg i þ D t i at contaminant point t i , for i ¼ 1, 2, :::, k À 1, with the corresponding distribution function being G i, ¼ ð1 À ÞG i þ EÙ t i , then define g E ¼ T a ðG 1 , G 2 , :::, G kÀ1 Þ and proceed as before to obtain the corresponding influence function. When the model assumptions are correct, i.e., G i ¼ F h 0 i for all i, the total influence function of our proposed estimator of g at the contaminated points t ¼ ðt 1 , t 2 , :::, t kÀ1 Þ 0 has the form Again, it is clear from (33), that the influence function is bounded for all a > 0: However, in the case where a ¼ 0, the influence function exhibits unboundedness as it is linear in terms of t i , i ¼ 1, 2, :::, k À 1: Lastly, we compare the influence function of our estimator with that of Ghosh (2017) in Figure 2. The results show that the IF values increase with increasing values of c and bounded for large values of t 0 : Furthermore, the maximum values of the IF and also the limiting values of these bounded IFs for larger t 0 decrease as a increases. We note that our estimator, ERM_M, compares favorably with the IF values of the Ghosh (2017) estimator, ERM_G. In particular, ERM_M has significantly lower values of its IF compared to that of the ERM_G for smaller values of c and larger t 0 , indicating its greater expected robustness against infinitesimal contamination at distant outlying points.

Gross error sensitivity
The Gross-Error Sensitivity (Hampel 1974) measures the maximum possible values of the bias of an estimator under small infinitesimal contamination. Thus, the gross-error sensitivity is the supremum of the influence function and can be defined as Therefore, in considering the effect of k and a on the robustness of our estimator T a , the smaller the value of SðT a , GÞ, the more robust the estimator is in terms of these parameters. In the case of contamination at a point Z i 0 only, the gross-error sensitivity of the proposed estimator T a is given by  Figure 3 presents the values of the sensitivity measures S i 0 ðT a , F h 0 i Þ over the parameter a for selected values of k. It can be seen that the value of S i 0 ðT a , F h 0 i Þ decreases with increasing values of a: In addition, the S i 0 ðT a , F h 0 i Þ decreases with increasing k. Thus, S i 0 ðT a , F h 0 i Þ decreases as a and k increase. Furthermore, the sensitivity values decrease sharply for a < 0:2: However, it has smaller and near constant values for a > 0:2: These imply that our proposed estimators show strong robustness properties for increasing values of a and k. Similarly, the sensitivity for contamination in at least two observations can be obtained but has been omitted here for ease of presentation.

Simulation study
In this section, we compare the performance of our proposed estimator with the equivalent minimum density power divergence estimators of the Pareto-type tail index in the literature. Specifically, the proposed exponential regression model estimator based on log-spacings of order statistics, ERM_M, the Dierckx, Goegebeur, and Guillou (2013) estimator obtained from fitting an extended Pareto distribution to relative excesses, EPD_D, and the Ghosh (2017) exponential regression model estimator based on logratio of order statistics, ERM_G.

Simulation design
We consider three distributions in the Fr echet domain of attraction namely the Fr echet, Pareto, and Burr as shown in Table 1. For each distribution F, we generated samples from a mixture contaminated model: ð1 À EÞF þ EG where G is a nuisance distribution. Specifically, G is chosen in two ways: from the same distribution as F but with different parameters and a different distribution from F.
In each case, we assess the robustness of the estimators under different contamination scenarios with E ¼ 0:05 and E ¼ 0:15: Furthermore, to assess the effect of the robustness parameter, we take three values of a, at 0.1, 0.5 and 1 representing levels for increased robustness.

Discussion of simulation results
This section discusses the behavior of the proposed estimator and the two existing estimators of the tail index in the case where contamination of the base distribution comes from the same distribution but with different parameters. Here, the contaminating distribution's parameter is chosen such that the observations are generally distinct from the bulk of the data. The results of the simulation studies for the Burr distribution contaminated by another Burr but with different tail index are presented in Figures 4-6 for the Mean Square Error (MSE) and in Figures 7-9 for the bias.
From these figures, the proposed ERM_M estimator shows clear improvement on MSE and bias over ERM_G and EPD_D across the three robust tuning parameters as well as the percentage of contamination. However, for smaller values of k, the ERM_G estimator seems to provide better MSEs than the ERM_M. This can be explained as the ERM_G estimator does not involve second-order parameters and hence should in theory have less variation.
In addition, the performance of the estimators in the case of samples generated from other distributions (i.e., Fr echet and Pareto) are presented in Supplementary Appendices A.1 and A.2. In the case of the Fr echet distribution, the proposed ERM_M and EPD_G estimators are far better than the ERM_G. The two estimators, ERM_M and EPD_G, have approximately equal performance under MSE, with the ERM_M slightly better for larger values of k. However, in terms of bias, the ERM_M is the preferred estimator as it has the least values across the sample sizes, percentage of contamination and the robust parameters.
Similar performance can be seen for the ERM_M estimator in the case of samples generated from the Pareto distribution. It has smaller values of the MSE in most cases. However, unlike the other distributions, the ERM_G estimator is quite competitive and can be considered as an appropriate estimator of the tail index with bias and MSE values comparable to the ERM_M estimator.  Therefore, the simulation results indicate that across the different distributions and factors considered, the ERM_M is found to be generally a better alternative to the EPD_D and the ERM_G estimators.
In addition, from practice we found that a robustness parameter, a 2 ½0:30, 0:35 provides a reasonable choice for the tradeoff between efficiency and robustness: this is in conformity to Ghosh (2017). Lastly, the R codes for the computation of the tail index, implementation of the influence function analysis and sensitivity measures can be found at https://github.com/rminkah/RobustTailIndex.

Application
In this section, we estimate the tail index of two practical datasets from insurance claims and calcium content in soil samples. The former is the Society of Actuaries (SOA) Group Medical Insurance data studied in Beirlant et al. (2004, Chapters 1 and 5) and can be found at https://lstat.kuleuven.be/Wiley/Data/soa.txt. However, all the estimators used were non-robust including the maximum likelihood estimator based on extended Pareto distribution and exponential regression model. The latter is the Condroz data studied in Beirlant et al. (2004, Chapters 1 and 6) and also in Vandewalle et al. (2007) where a robust estimator of the tail index is proposed based on an integrated squared error approach on partial density component estimation.
We illustrate the application of the proposed robust minimum density power divergence estimator of the tail index based on log-spacings of order statistics discussed in the previous section in estimating the tail index of the SOA and the Condroz datasets.
In Figure 10, the SOA and the Condroz datasets show two particularly large claims and seven large calcium contents seem to be detached from the bulk of the data. Also, these observations are shown on the exponential and Pareto quantile-quantile plots in Figure 11 to deviate from the linearity in the case of the Condroz data and and to be far removed from the majority of the points for the SOA data. Such observations can be considered as outliers and have implication for traditional methods of estimation of the parameters of the GP distribution such as maximum likelihood and probability weighted moments. Using different robust tuning parameters, we compute the tail index as a function of the number of top order statistics, k. The results are shown in Figure 12 and show that our proposed estimator, ERM_M, is mostly stable along the path of k compared to the robust estimator of Dierckx, Goegebeur, and Guillou (2013) based on the extended Pareto distribution, EPD_D and the maximum likelihood estimator from GP  distribution. Also, in conformity with the behavior of robust estimators, the variation in the estimates increases with increasing a: Therefore, ERM_M provides a better alternative robust estimator for the tail index in the Fr echet domain as illustrated with the SOA and the Condroz datasets.

Conclusion
In this paper, we proposed a robust estimator of the tail index using the minimum density power divergence through an exponential regression model. The estimator is valid for the Fr echet domain of attraction, i.e., heavy-tailed distributions. The robustness aspect of this estimator was studied analytically by deriving its influence function and gross error sensitivity measure. In addition, the finite sample properties of the estimator were studied through a simulation study together with similar estimators using minimum density power divergence but based on an extended Pareto distribution fitted to relative excesses and an exponential regression model based on log-spacings of order statistics. The results of the simulation study show that the proposed minimum density power divergence estimator based on an exponential regression model on log-spacings of order statistics generally has better performance in terms of mean square error and bias than the existing estimators. In addition, the proposed robust estimator of the tail index is less sensitive to the number of top order statistics. The proposed estimator is also illustrated by applying it to real data sets of insurance claims and calcium content in soil samples.
The estimation of the tail index yields estimators of second-order parameters whose influence and gross error sensitivity functions were derived. These estimators can be used in obtaining reduced-bias estimators of high quantiles and exceedance probabilities. This and the theoretical properties of the estimators are the subjects of future research. Furthermore, this proposed methodology and the properties of the resulting estimators (e.g., empirical versions of influence functions) can be used for the purpose of outlier detection, which we also hope to study in more detail in our future works.