Asymptotic and finite sample properties of Hill-type estimators in the presence of errors in observations

We establish asymptotic and finite sample properties of the Hill and Harmonic Moment estimators applied to heavy-tailed data contaminated by errors. We formulate conditions on the errors and the number of upper order statistics under which these estimators continue to be asymptotically normal. We specify analogous conditions which must hold in finite samples for the confidence intervals derived from the asymptotic normal distribution to be reliable. In the large sample analysis, we specify conditions related to second-order regular variation and divergence rates for the number of upper order statistics, k, used to compute the estimators. In the finite sample analysis, we examine several data-driven methods of selecting k, and determine which of them are most suitable for confidence interval inference. The results of these investigations are applied to interarrival times of internet traffic anomalies, which are available only with a round-off error.


Introduction
Heavy-tailed phenomena have been found in a variety of fields, including finance, insurance, computer network traffic and geophysics. The theory of regular variation provides a mathematical framework for their analysis. Hundreds of papers have been written on the subject, and it is difficult to present an unbiased selection of the most important contributions, so we merely cite here the book of Resnick (2007), and discuss the most closely related references, as the presentation progresses.
This work is concerned with semiparametric estimation of the tail index, α, of a heavytailed distribution from observations contaminated by measurement or other errors. We investigate asymptotic and finite sample properties of the Hill estimator, which is the most commonly used tool for inference on α, and of the harmonic moment estimator (HME), which is a class of estimators related to and generalising the Hill estimator. The asymptotic theory establishes conditions on the errors and the number of the largest order statistics, k, that guarantee consistency and asymptotic normality. Finite sample investigation finds the best methods of constructing confidence intervals for α, focusing on data-driven methods for the selection of k, in scenarios where data are observed with errors. While the estimators considered in the paper, especially the Hill estimator, have been extensively explored, their properties in the presence of errors have been mostly unknown.
Suppose {X i , i ≥ 1} is a sequence of independent, nonnegative random variables with common distribution function F, which has regularly varying tail probabilities, i.e.
where L is a slowly varying function. The class of distributions with tail behaviour (1) coincides with the maximum domain of attraction of the Fréchet distribution, one of the three basic types of extreme value distributions. The Hill estimator is defined as with the convention that X (i) is the ith largest order statistic. Throughout the paper, we assume that The Hill estimator is often used after an examination of the Hill plot, which is also a tool for detecting the presence of heavy tails. The Hill plot and the Hill estimator have been extensively studied, and are introduced in all monographs on extreme value theory, see e.g. Embrechts, Klüppelberg, and Mikosch (1997), Beirlant, Goegebeur, Segers, and Teugels (2004), de Haan and Ferreira (2006), Resnick (2007) and Markovich (2008). Considerable research has been done to establish conditions for the asymptotic normality of the Hill estimator. If only the regular variation (1) is assumed, asymptotic normality holds with random centring. Several authors formulated conditions on F, which permit replacing the random centring by a deterministic one. The first result of this type was established by Hall (1982) for slowly varying functions, L, which converge to a constant at a polynomial rate. Davis and Resnick (1984) showed that the estimator is asymptotically normal for any regularly varying function satisfying the von Mises condition, their centring, however, depends on the sample size n. To show that the Hill estimator centred by the exponent α −1 is asymptotically normal, second-order regular variation, a refinement of the concept of regular variation, is assumed, see Haeusler and Teugels (1985), Csörgő, Deheuvels, and Mason (1985), Resnick andStărică (1997a, 1997b). The approach in Section 9.1 of Resnick (2007), which is based on tail empirical processes, also requires the second-order regular variation. Kulik and Soulier (2011) also use the tail empirical process to study asymptotic normality of the Hill estimator for long memory stochastic volatility models assuming a second-order condition. The HME was introduced by Henry (2009) to provide a broad class of estimators, which, in a sense, extend the Hill estimator and have desirable robustness properties against large outliers. Consistency and asymptotic normality of the HME was established by Henry (2009) for the Pareto distribution and by Beran, Schell, and Stehlík (2014) under a second-order regular variation condition. The HME was also studied, under a different name, by Brilhante, Gomes, and Pestana (2013), Paulauskas and Vaičiulis (2013) and Caeiro, Gomes, Beirlant, and de Wet (2016). The HME is defined in Beran et al. (2014) by where β > 0, β = 1, is a tuning parameter. For β = 1, the HME is defined by H (1) k,n := lim β→1 H (β) k,n . We, therefore, obtain the Hill estimator as the limit of the HME as β → 1. We study the Hill estimator and the HME computed from observations contaminated by measurement errors, or other errors whose origin is either difficult to understand and model or to quantify precisely. We thus assume that we observe where the ε i are i.i.d. random errors independent of the X i . The Hill estimator computed from the observations Y i is then and the HME based on the Y i is In our context, H k,n , H k,n are the estimators that can be actually used since what we observe are the Y i , not the X i . The consistency of the Hill estimator H k,n has been established in very general scenarios in Kim and Kokoszka (2020). In this paper, we want to find conditions under which the asymptotic normality of H k,n , H (β) k,n continues to hold. If the errors ε i have lighter tails than the X i , the Y i inherit the regular variation of the X i . However, the secondorder regular variation, needed for the asymptotic normality, is not inherited and suitable conditions that quantify the interplay between the X i , the ε i and k must be found. Some specific questions we seek to answer are as follows. What must we assume about the errors ε i to obtain asymptotic normality with random centring? What additional assumptions are needed for the deterministic centring? In either case, are any additional assumptions on the rate of k, beyond (2), needed? Which characteristics of the distribution of the ε i enter into these assumptions? In finite samples, how 'large', and in what sense, can the ε i be for the asymptotic confidence intervals to remain useful? It is hoped that the research we present answers such questions in a useful and informative way.
The problem of estimation in the presence of errors has received considerable attention. For example, Hall and Simar (2002), Goldenshluger and Tsybakov (2004), Kneip, Simar, and Keilegom (2015), and Leng, Peng, Zhou, and Wang (2018) study estimation of the end-point of data observed with additive measurement errors. While they all show asymptotic normality in the presence of Gaussian measurement errors, in our case we assume a broader class of error distributions because the heavy-tailed X i are 'much larger' random variables than those with a finite end-point. Most closely related is the work of Matsui, Mikosch, and Tafakori (2013) who study the Hill estimator assuming that the observations have the form Y i = 10 −l [10 l U −1/α i ], where U i is uniform on [0, 1] and [·] denotes the integer part, for l = 0, 1, 2, . . .. Such data can be written in the form of 0] is a non-positive, bounded error of a specific form. Another related, very recent work is Ma, Yan, and Zhang (2022) where rounded data from generalised Pareto distributions are treated as interval-censored data. The parameters of the GPD are estimated by maximum likelihood methods. This method works well in a parametric setting.
We consider broader classes for both the X i and the ε i under the assumption that ε i is independent of X i , reflecting our treatment of the ε i as measurement errors. We use a different asymptotic approach. We establish weak convergence of suitable empirical tail processes for observations contaminated by general errors. Asymptotic normality follows from these general results, which are also of independent interest.
The paper is organised as follows. Assumptions and main theoretical results are stated in Section 2. In Section 3, we present simulation studies examining finite sample properties of confidence intervals based on the asymptotic normal distribution, focusing on the impact of errors. This numerical investigation is followed in Section 4 by an application to the interarrival times of internet traffic anomalies. The proofs are presented in Section B of online Supplementary Material, after some preparation in Section A. Additional Tables examining the finite sample performance of the estimators we study are collected in Section C of the online material.

Assumptions and main asymptotic results
Recall that the observations are Y i = X i + ε i , 1 ≤ i ≤ n. We first state assumptions on the unobservable random variables X i . Recall that a function U :

Assumption 2.1 (Regular variation):
The X i are nonnegative, independent random variables with common distribution function F X such thatF X = P(X i > ·) ∈ RV −α .
Regular variation is not enough to establish asymptotic normality with centring by 1/α. For this, second-order regular variation is typically assumed. We stated Assumption 2.1 because it is sufficient for certain weaker results that are needed to establish our main result.
Assumption 2.2 (Second-order regular variation (2RV)): The X i are nonnegative, independent random variables with common distribution function F X , which is second-order (−α, ρ) regularly varying (writtenF X ∈ 2RV(−α, ρ)), i.e. there exists a positive function g ∈ RV ρ such that g(t) → 0, as t → ∞, and for α > 0, ρ ≤ 0, K = 0, Note that Assumption 2.2 implies Assumption 2.1. Observe, however, that condition (3) does not hold if the X i have the exact Pareto distribution, i.e. P(X i > x) = x −α . In this case, one would need to allow K = 0, and would thus lose any information contained in the function g. The case of exact Pareto tails should however be included in any reasonable theory for heavy-tailed observations. We do so by introducing a parallel set of assumptions.

Assumption 2.3 (Pareto):
The X i are nonnegative, independent random variables with a common distribution function F X such thatF The function g in (3) can be interpreted as the convergence rate ofF X (tx)/F X (t) to x −α . It has been used to restrict the sequence k = k(n). Haeusler and Teugels (1985), Csörgő et al. (1985), Resnick andStărică (1997a, 1997b) along with the second-order regular variation for ρ ≤ 0. In (4), and throughout the paper, b(·) is the quantile function, defined by where L b is a slowly varying function. Condition (4) is sufficient in our setting if ρ > −1.
To cover the 2RV case with ρ ≤ −1 and the pure Pareto case, we consider the following condition: Using (5), it is easy to verify that (4) implies k = o(n −2ρ/(α−2ρ) ), and (6) implies k = o(n 2/(α+2) ). These two rates agree at the phase transition point ρ = −1. We use Assumption 2.4 in the 2RV case and Assumption 2.5 in the Pareto case.
We now turn to the assumptions on the measurement errors ε i .
Assumption 2.6: The ε i are i.i.d. with tails satisfying Under Assumption 2.1, Assumption 2.6 implies that Y i = X i + ε i ∈ RV −α . It however does not imply that the Y i satisfy analogs of Assumptions 2.2 or 2.3. To obtain the asymptotic normality with a constant centring, a stronger, but still broadly applicable assumption on the errors is needed; the errors must have lighter tails than a power function. Assumption 2.7 is needed when we assume the second-order regular variation, and Assumption 2.8 is suitable for the Pareto distribution. Assumption 2.7 (2RV): The ε i satisfy Assumption 2.6 and for some κ > α + max(−ρ, 1).
We now proceed to define the function spaces in which our functional convergence results hold. We work in D[0, ∞), the Skorokhod space of real-valued, right-continuous functions on [0, ∞) with finite left limits existing on (0, ∞). For any s > 0, the Skorokhod metric in D[0, s] is defined by where r s x, r s y are the restrictions of x, y ∈ D[0, ∞) to the interval [0, s]. Given a sequence of random processes, X n , n ≥ 0, in D[0, ∞), we denote weak convergence of X n to X 0 by X n ⇒ X 0 . We also use ⇒ to denote weak convergence of random variables.
We define two 'increasingly empirical' measures, with only the last one being observable. We set (5). The random measures ν n ,ν n , and all other Radon measures of this type are defined on (0, ∞] compactified at ∞. Thus, for s ≥ 0, we can define the random processes We first investigate the asymptotic normality of the tail empirical processes W n , W n , then study when it implies the asymptotic normality of the Hill estimator H k,n and the HME H k,n . Theorem 2.1 shows that even very general errors specified in Assumption 2.6 do not impact the asymptotic behaviour of the tail empirical processes W n nor W n : the limit distributions of these statistics based on the Y i are the same as those of the corresponding statistics based on the unobservable X i . and where W is the standard Brownian motion on [0, ∞).
The Hill estimator can be written as an integral of the tail empirical measureν n , i.e.
Similarly, the HME can be expressed as a transformed integral of the tail empirical measurê ν n , i.e.
The order statistics used to compute the Hill estimator and the HME must be positive. In the following, all statements are tacitly assumed to hold conditional on the event {Y (k) > 0}, where k is the count of the largest order statistics in the definition of H k,n , H By putting β = 1 in Theorem 2.2 we obtain the asymptotic normality of the Hill estimator with random centring, which is stated as Corollary 2.1(a). Similarly, the asymptotic behaviour of M (β) k,n follows directly from Theorem 2.2, which is presented in Corollary 2.1(b).

Corollary 2.1: Under the assumptions of Theorem
We emphasise that Theorems 2.1, 2.2, and Corollary 2.1 hold either under Assumption 2.2 or Assumption 2.3, since both imply Assumption 2.1.
The convergence in Theorem 2.2 requires random centring with ∞ Y (k) n/kF Y (s)s −β ds, which makes Corollary 2.1 of limited practical use, but it provides a starting point for improvements. To replace it with a constant centring, we need the assumption of secondorder regular variation (or of exact Pareto tails) and the stronger assumptions on the errors.

Remark 2.1:
The case of the second-order regular variation exponent ρ = −1 needs special treatment because our arguments require that lim t→∞ tg(t) exists (∞ is allowed). By Proposition 2.6(i) in Resnick (2007) The asymptotic normality of the Hill estimator H k,n follows easily from Theorem 2.3. To obtain the asymptotic normality of the HME H (β) k,n , we must apply Theorem 2.3 and the delta method. The corresponding results are stated in the following corollary.

Corollary 2.2: Under the assumptions of Theorem
The limits in (11) and (12) are the same as for observations without measurement errors; see Theorem 3.2.5 of de Haan and Ferreira (2006) and Theorem 2 of Beran et al. (2014). The effect of suitably small errors ε i is thus asymptotically negligible. However, even for such errors, we impose conditions (4) and (6) on the rate of k in the cases of 2RV (ρ ≤ −1) and exact Pareto observations, respectively. We do not know if Corollary 2.2 remains true without these conditions on k. We also remark that Corollary 2.2(a) cannot be easily proven by verifying the conditions in Theorem 3.2.5 of de Haan and Ferreira (2006). If the X i are exactly Pareto or second-order regularly varying, the Y i need not be in any of these classes. Proposition B.1 in the online material, which may be useful in other contexts, is a related result which plays an important role in the proof of Theorem 2.3.
In the next two sections, we explore how small the errors must be in finite samples to have a practically negligible effect on confidence interval inference.

Impact of errors on confidence intervals
We investigate the effect of error contaminations on confidence intervals constructed using the more commonly used Hill estimator. The effect of various errors on the harmonic moment estimator (HME) is studied in a more limited, but informative, simulation study presented in Section C.2 of the online material.
The asymptotic level 1−p confidence interval for α −1 implied by Corollary 2.2 (a) is whereα −1 = H k,n , and z q is the upper quantile of the standard normal distribution defined by (z q ) = 1 − q. The above interval is implemented by the function hill of the R package evir, with the default asymptotic coverage 1−p = 0.95. According to Corollary 2.2(a), it is asymptotically valid even if the observations are contaminated by fairly general errors. In this section, we investigate the impact of these errors on the empirical coverage probability of the interval (13). To obtain interval (13), the number of upper order statistics, k, has to be chosen. We consider a range of values of k for a given sample size n. We also employ a few methods of selecting k, which have been proposed.
The design of our simulation study is as follows. We generate observations Y i = X i + ε i , i = 1, 2, . . . , n, where {X i } and {ε i } are independent sets of random variables. For each model/error pair, we compute 1000 confidence intervals and report the fraction of the intervals that contain the reciprocal of the true tail index. We consider sample sizes n = 500 and n = 2000. The sample size n = 500 is representative of the sample sizes occurring in the application presented in Section 4.
We use two models for the X i , both satisfying the condition of Corollary 2.2(a) and having the true tail index α = 2. The first is the standard Pareto distribution, which is not second-order regularly varying, and the second is a distribution in the Hall/Weiss class. The Hall/Weiss class provides examples of the second-order regular variation, see p. 142 of Geluk, de Haan, Resnick, and Stărică (1997). Model 2 satisfies Assumption 2.2 with g(t) = t −5 .
Error 1 [Normal] The ε i are i.i.d. random variables, drawn from a normal distribution with mean 0 and standard deviation σ Normal .
Error 2 [scaled t 8 ] The ε i are i.i.d. random variables, drawn from a scaled t-distribution with 8 degrees of freedom.
Error 4 [Uniform] The ε i are i.i.d. random variables, drawn from the uniform distribution on the interval [−a, a], a > 0.
In the investigations that follow, we need to separate the effect of the shape of the density from the effect of the typical size of the error relative to the size of the X i . We do so by reporting the ratio of the sample SDs: (error SD)/(model SD). The X i we consider have infinite variance, but the sample SD is always finite and provides a measure of the size of the generated data.
We first consider a wide range of k for a given sample size n. Tables 5 and 6 in Section C of the online Supplementary Material report coverage probabilities of the approximate 95% confidence intervals for the Pareto model, with n = 500 and n = 2000, respectively. We first observe that the coverage probabilities for samples generated from the Pareto distribution without the errors are close to the target coverage, 95%, for large k's. This is found in the row with the ratio 0 in each table. This result is in agreement with the typical behaviour of the Hill plot showing stable, unbiased estimates for large k when its underlying distribution is exactly a Pareto distribution. Second, the coverage overall decreases with the ratio, but this decrease is relatively flat over a range of the ratio from 0.01 to 0.1, for all the error types. In particular, for n = 2000, the coverage is surprisingly acceptable for a wide range of values of k; in many cases, it is close to the target of 95%. On the other hand, the coverage seems sensitive to relatively large errors with a ratio of more than 10%. An interesting observation is that, in the presence of errors, the coverage gets worse as k gets larger. This result is consistent with Corollary 2.2(a), which implies that the Hill estimator obtains the asymptotic normality if k satisfies Assumption 2.5; k goes to infinity with n, but not too fast. The reduction in the coverage probability caused by large k is not observed for data contaminated by relatively small errors. Finally, the impact on the coverage probability overall does not depend on the type of error distribution. In particular, for the small ratios, the difference that the error type makes looks negligible.
Tables in Section C of Supplementary Material report coverage probabilities of the asymptotic 95% confidence intervals for the 2RV model, with n = 500 and n = 2000, respectively. Unlike the Pareto case, the 2RV model does not achieve the target coverage, 95%, even if there are no errors. This may be due to n not being sufficiently large. The errors with a small ratio, however, have only a small impact on the coverage. It can be also seen that the impact on the coverage probability for a small ratio does not depend on the error type. Finally, we see that k cannot increase too fast, indirectly confirming the need for Assumption 2.4.
We have found so far that the coverage can achieve the target probability for some properly chosen k or cannot achieve it for any k, given a finite sample. Even if we can identify some range of k for which the coverage approaches the target, the question still remains of how to select an optimal k in practice. There are various methods for choosing it. A commonly used approach is based on the minimisation of the asymptotic mean squared error (AMSE), see e.g. Hall and Welsh (1985), Hall (1990), Drees and Kaufmann (1998), and Danielsson, de Haan, Peng, and de Vries (2001). These methods are however based on asymptotic arguments, which brings up a question of how well they perform in finite samples. Danielsson, Ergun, de Haan, and de Vries (2019) suggest a data-driven method minimising a penalty function of the distance between empirical quantiles and theoretical quantiles to improve the performance in finite samples. There are also heuristic methods, mainly trying to find the region where the Hill plot, a plot of estimates of the tail index against k, becomes more stable, see Resnick and Stărică (1997b).
To provide practically useful information on choosing a data-driven cut-off k, we examined four methods based on different underlying ideas of selecting the optimal k. The first threshold selection method, introduced by Hall (1990), uses a bootstrap procedure to find the k which minimises the AMSE. This value is computed by the function hall of the R package tea. (We also considered a few related methods based on the minimisation of the AMSE argument, but they all gave disappointing results. The coverage that the Hall method produced was always among the best of these methods.) The second method, proposed by Danielsson et al. (2019), is based on minimising a penalty function of the distance between the observed quantile and the fitted Pareto-type tail. This distance is in the quantile dimension, not in the probability dimension like the Kolmogorov-Smirnov distance. This method is suggested to remedy the behaviour that a small change in probabilities makes a large difference in quantiles. We use two different penalty functions: the supremum of the absolute distance (KS), and the mean absolute distance (MAD). Both are implemented by the function mindist of the R package tea. The final method is an Eye-Ball technique whose automatic algorithm is developed by Danielsson et al. (2019) and is carried out by the function eye of the R package tea. This heuristic method attempts to find a stable portion of the Hill plot and obtain the k at which a considerable drop in the variance occurs, as k increases.
Tables 1 and 2 report coverage probabilities and the average optimal k selected using the four different methods. For the Pareto model, the coverage decreases with the ratio for all the selection methods as shown in Table 1; again, a small ratio has a relatively small impact on the coverage. The MAD and Eye-Ball methods achieve the target coverage, 95%, when the underlying process is not contaminated by the errors. These methods also are less sensitive to the ratio increase. For the Pareto model, the MAD approach generally leads to coverage probabilities which are higher than 95%. However, as shown in Table 2, it gives very low coverage for the 2RV model. It has an unexpected, difficult to explain, property of the coverage increasing with the ratio. The Hall method also shows some fluctuation over the ratio, but this fluctuation is not found when the ratio is 0.01 and 0.02. The other methods also exhibit this insensitivity for small ratios. The Eye-Ball method seems to work well for the Pareto and 2RV models since it gives relatively high values of coverage. Its average optimal k also falls into the optimal range which gives high values of coverage in Tables 5 and 7 in Section C.
The main conclusions of the above-detailed discussion are as follows.
(1) The Eye-Ball method of selecting k is recommended for both the Pareto and 2RV models.
(2) For the heavy-tailed X i with the tail index α = 2, the coverage probability of the approximate 95% confidence interval containing the true index is robust to errors whose SD does not exceed 2% of model SD.
(3) There is no clear evidence that the coverage probability depends on the error distribution. Instead, the coverage is mainly affected by how large the ε i are compared to the X i , regardless of the threshold selection methods.
We conclude this section with a discussion of the confidence interval for α obtained via an application of the delta method. Corollary 2.2(a) and the delta method imply Notes: The Hall, MAD, KS, and Eye-Ball methods are used to choose the optimal k. The target coverage is 95%.
Thus, settingα = H −1 k,n , we get the approximate level 1−p confidence interval for α of the form One might want to use the interval (14) rather than (13) to make inference on α, but care is needed in finite samples. Since the delta method is based on an additional asymptotic approximation, confidence intervals derived from it could provide a poor approximation for small sample sizes. We have performed a simulation study for the interval (14), similar to the one described earlier in this section. We have found that it almost always gives coverage probability worse than the interval (13). Therefore, when working with sample sizes similar to n = 500 or n = 2000, we recommend using the reciprocals of the bounds of the interval (13). Finally, we note that a preliminary simulation study indicates that the moment estimator of Dekkers, Einmahl, and de Haan (1989) might also be robust to errors and suitable for the construction of confidence intervals in case of error contaminated data. A separate theoretical and empirical study is needed.

Application to Internet2 anomalous traffic
In this section, we present an application to interarrival times of anomalies in a backbone internet network, Internet2. These times are available only with round-off errors. We provide only minimal background; more details are presented in Bandara, Pezeshki, and Jayasumana (2014), a paper which to some extent motivates the present research. We describe the results of confidence interval inference for the tail index of these interarrival times. We restrict ourselves to confidence intervals based on the Hill estimator, the results for the HME are similar. We then examine the robustness of the Hill estimator to the round-off errors by a numerical experiment. The Internet2 network consists of 14 two-directional links connecting major cities in the United States, as shown in Figure 1. A traffic disruption in any of these links can slow down service in the whole country. For this reason, anomalies in the internet traffic have been extensively studied. An anomaly is a time and space confined traffic whose volume is much higher than typical. Bandara et al. (2014) developed an anomaly extraction algorithm. The anomaly extraction algorithm can identify the arrival time of an anomaly in any unidirectional link only in a resolution of five minutes. While network measurement devices operate at much higher frequencies, such a rough resolution is due to the limitation of the anomaly extraction algorithm. It is based on the Fourier transform, which eliminates noise by retaining only low-frequency harmonics. Bandara et al. (2014) created a database for the time period of 50 weeks, starting 16 October 2005. A question we seek to answer in this section is if the round-off error has a negligible or a non-negligible impact on the confidence intervals for the tail index of the interarrival times. Additionally, we would like to see if the various data-driven methods of selecting k, discussed in Section 3 lead to overlapping confidence intervals, or if they suggest different ranges of α. These conclusions could potentially be different for each of the 28 unidirectional links. We index these links by integers from 1 to 28 since it is not important for the purpose of our investigation to which nodes they correspond.
In the context of this paper, each interarrival time Y i , computed by the algorithm, is treated as a 'true' interarrival time X i measured with a round-off error, i.e. Y i = X i + ε i . The unobserved X i is not rigorously defined, but we can think of it as the time separation based on a more precise algorithm, or just a different algorithm. In the latter case, the analysis that follows provides information about the uncertainty in the estimation of α caused by the choice of a specific algorithm. The value of the ε i does not depend on X i because there is no reason to believe that, say, larger X i have a 'preference' for falling into some specific part of the 5-minute interval separating the possible measurement times.
(The X i are at least a few hours.) The errors need not be negative and it is risky to assume that the X i have exact Pareto tails, so the theory of Matsui et al. (2013) does not apply. Kokoszka, Nguyen, Wang, and Yang (2020) and Nicholson, Kokoszka, Lund, Kiessler, and Sharp (2021) showed that the Y i have regularly varying, but not exact Pareto tails. The autocorrelation analysis in these papers also showed that the Y i can be assumed to be i.i.d. Tables 3 and 4 report tail index estimates and 95% confidence intervals for each link, obtained using the four methods of selecting k discussed in Section 3. We first observe that all methods, except for the KS method, generally produce similar point estimates for each link. The interval estimates from the KS method are generally wider. In particular, some links have the infinity as the upper end. This is manually put in to deal with a negative lower end of the interval (13). We now check whether intervals from the four methods overlap. We find 20 links with a nonempty intersection of the 4 intervals and 8 links with an empty intersection. The intersection does not have any interpretation in the usual frequentist sense of Neyman (1937), but it provides, so to say, the safest region in an engineering sense, for the 20 links for which it is nonempty. For the links with the empty intersection, or even for all links, we recommend using the confidence interval produced from the Eye-Ball method, which can be considered the most reliable estimate based on the simulation result of Section 3.
We conclude this section by reporting results of an experiment designed to assess if the rounding-off errors have a practical impact on the estimates of α. For each link, we treat the value of α estimated from the observed interarrival times as the true value and the observed Y i as the true X i . We generate R = 1000 replications of error contaminated data i , 1 ≤ r ≤ 1000. We assume that the errors are uniformly distributed on [−1, 1], because, as noted above, there is no reason why the X i should prefer some parts of the 5-minute interval. (The data are normalised so that this interval corresponds to the interval [0, 1].) For each of these replications we compute the interval (13) with p = 10% and p = 5%. To choose k, we use the Hall, MAD, KS, and Eye-Ball methods described in Section 3. For each link, we determine the percentage of these intervals that cover the value of α estimated from real data. If the interarrival times were measured perfectly, i.e. ε i ≡ 0, then 100% of these intervals would cover the 'true value', so our target in this experiment is 100% rather than 95% or 90% as in Section 3. If the actual coverage is 100(1 − q)%, then we interpret q as the probability of getting a wrong interval estimate due to the round-off error. It turned out that for all links we achieved the target percentage, 100%, for both 95% and 90% confidence levels, regardless of the threshold selection methods. In light of the results of Section 3, the 100% coverage could be expected since the ratio of the Error SD to the observation SD is less than 0.001 for each link. We have seen from Tables 1 and 2 that the errors with the ratio of 0.01 had almost no impact on the coverage probability. Based on this 100% coverage, we conclude that the impact of the round-off error on the confidence interval estimate from the real data is practically negligible. This allows us to use the available rough interarrival times to make an inference on the tail index.
The conclusions of the research described in this section are as follows.
(1) For the purpose of confidence interval inference on the tail index of the anomalies interarrival times, the 5-minute resolution is acceptable.
(2) For most links, the confidence intervals obtained using the four data-driven methods of selecting k have a nonempty intersection. (3) Based on the Eye-Ball method, one can be confident that for all links the true value of α is between 1.0 and 2.7. The most typical range for α is (1.2, 2.3); each interval for half of the links falls into the range.