Two-Sample Testing for Tail Copulas with an Application to Equity Indices

A novel, general two-sample hypothesis testing procedure is established for testing the equality of tail copulas associated with bivariate data. More precisely, using an ingenious transformation of a natural two-sample tail copula process, a test process is constructed, which is shown to converge in distribution to a standard Wiener process. Hence, from this test process a myriad of asymptotically distribution-free two-sample tests can be obtained. The good finite-sample behavior of our procedure is demonstrated through Monte Carlo simulations. Using the new testing procedure, no evidence of a difference in the respective tail copulas is found for pairs of negative daily log-returns of equity indices during and after the global financial crisis.


Introduction
Measuring dependence between economic variables such as asset returns and, in particular, between their tail events is of crucial importance for risk management, asset pricing, and portfolio choice.The tail dependence structure is, for example, a main ingredient for the computation of high quantiles of aggregate loss distributions, and hence for risk capital requirements, and dictates the extent of idiosyncratic tail risk, and hence the diversification potential, within a collection of assets.
A key question in financial econometrics and international finance is whether the dependence structure between economic variables, in particular asset returns, is constant over time and across markets, regions and institutions, or whether it is subject to variation due to for example, financial market integration, emerging market developments, or financial contagion.Several papers analyze this question using a wide variety of different methodologies; see for example, King and Wadhwani (1990), Longin and Solnik (1995), Forbes and Rigobon (2002), Bekaert, Harvey, and Ng (2005), Bekaert, Hodrick, and Zhang (2009), Christoffersen et al. (2012), Bücher, Jäschke, and Wied (2015), Bormann and Schienle (2020), and the references therein.
In this article, we focus on the tail dependence structure between two risks, which is completely characterized by the bivariate tail copula.In this context, we develop a general procedure to test for the equality of the tail copulas associated with two bivariate samples.The standard one-sample problem, that is, testing for a specific parametric family of tail copulas, has been studied in Can et al. (2015).A challenging feature of the two-sample problem is that the-under the null hypothesiscommon tail copula is not specified.This complexity arises naturally as it follows from de Haan and Resnick (1977) that CONTACT Sami Umut Can s.u.can@uva.nlDepartment of Quantitative Economics, University of Amsterdam, Amsterdam, Netherlands.Supplementary materials for this article are available online.Please go to www.tandfonline.com/UBES.
the class of all possible tail copulas is so large that it cannot be represented as a finite-dimensional parametric family, rendering the testing problem nonparametric.A pivotal property of our procedure is that, through a martingale transformation, it leads to asymptotically distribution-free test statistics, so that critical values can be tabulated for universal use and no computerintensive methods have to be developed and validated.
Our testing approach can be summarized as follows.We first construct two semi-parametric estimators of the tail copulas, one for each sample, consider their suitably normalized difference, and prove that it converges weakly to a nontrivial limiting process under the null hypothesis of tail copula equality.Next, we describe the martingale transformation that transforms this limiting process into a standard bivariate Wiener process.Then, we establish that the empirical counterpart of this transformation, when applied to the empirical process given by the normalized difference of the two tail copula estimators, converges weakly to a standard bivariate Wiener process under the null hypothesis.Finally, two-sample tests for tail copulas can now be conducted by comparing the transformed empirical process to a standard Wiener process.Hence, our approach leads to an asymptotically distribution-free test process and thus, by taking functionals, to asymptotically distribution-free test statistics.In other words, our procedure leads to an entire test process from which we can generate a myriad of asymptotically distributionfree tests.
We illustrate the finite-sample performance of our approach through Monte Carlo simulations.These confirm that tests based on our test process have good size and power performances, notwithstanding the difficult nature of the testing problem considered in this article.
We apply our procedure to test for tail copula equality of equity index returns during and after the global financial crisis.In particular, we analyze the tail dependence structure among the two European equity market indices FTSE 100 (UK) and DAX 30 (Germany) and the two transatlantic equity indices FTSE 100 and S&P 500 (US), during 2008-2011 (sample 1, covering the most volatile days of the global financial crisis and the subsequent European debt crisis) and 2012-2015 (sample 2, covering a "post-crisis" period of equal length).We construct synchronized daily observations of negative log-returns on the basis of high-frequency data using Coordinated Universal Time (UTC).Computing three commonly used test statistics from our test process, we find that the null hypothesis of tail copula equality between the two samples is not rejected, despite a clearly visible change in the marginal distributions of the index returns between the two samples.This finding applies to both the European pair {FTSE 100, DAX 30} and the transatlantic pair {FTSE 100, S&P 500}.That is, although the margins became markedly less volatile from the crisis period to the post-crisis period, no statistical evidence is found for a change in the number of margin-free joint extremes for both pairs of equity market indices.
We briefly describe four papers that are related to the testing problem considered in this article.In Christoffersen et al. (2012), an extensive analysis is conducted of the evolution over time of the regular and tail dependence structures among both developed and emerging financial markets, and the associated diversification potential.Econometrically, a novel dynamic asymmetric copula model of a parametric nature is used to capture asymmetric tail dependence.When assessing tail dependence, attention is restricted to the less informative tail dependence coefficients, rather than to tail copula functions as in the present paper, and their development over a long period of 20 years.In Bücher and Dette (2013) and Bücher, Jäschke, and Wied (2015), new statistical methods are developed for testing tail copula equality, tail copula goodness-of-fit testing, and detecting structural breaks in tail dependence.The methods are applied in Bücher, Jäschke, and Wied (2015) to energy and financial markets data.An important difference between these two papers and the present one is that here convenient distribution-free weak limits of test statistics are obtained, as described above, whereas in Bücher and Dette (2013) and Bücher, Jäschke, and Wied (2015) computer-intensive multiplier bootstrap procedures are employed.Bormann and Schienle (2020) provides a detailed account of tail dependence asymmetries and inequalities in 90 years of U.S. equity data, using a refined nonparametric testing procedure that compares two tail copulas locally piecewise on disjoint intervals employing multiple testing principles.As the corresponding complex limiting distributions do not admit closed-form expressions, finite-sample approximations are simulated using multiplier bootstrap procedures, as in Bücher and Dette (2013) and Bücher, Jäschke, and Wied (2015).
The remainder of this article is organized as follows.In Section 2 we formalize our testing problem and in Section 3 we introduce our initial testing basis, which compares two semiparametric tail copula estimators constructed separately from the two samples, and analyze its asymptotic behavior.In Section 4 we describe the martingale transformation and in Section 5 we prove that the empirical counterpart of this trans-formation applied to our initial testing basis converges weakly to a standard Wiener process.In Section 6 our Monte Carlo simulation analysis is presented.Section 7 describes the results of our empirical analysis of equity indices.Conclusions are given in Section 8.The proofs of our theorems and some additional simulation results are deferred to the Appendix, supplementary materials.An R script implementing our testing approach is also provided as a supplementary material.

Testing Problem
Consider iid random vectors (X 1 , Y 1 ), . . ., (X n , Y n ) generated from some bivariate distribution function (df) F and iid random vectors We assume that the dfs F and F lie in the respective domains of attraction of bivariate extreme value distributions G and G .This means that there exist normalizing sequences as n → ∞, for all continuity points (x, y) ∈ R 2 of G and G , respectively.The normalizing sequences are chosen in such a way that the margins G 1 , G 2 , of G, and the margins G 1 , G 2 , of G , are in the form of a standard generalized extreme value (GEV) distribution: for j = 1, 2. The constants γ 1 , γ 2 ∈ R and γ 1 , γ 2 ∈ R are the marginal extreme value (EV) indices associated with F and F , respectively.Here, and in the rest of the article, expressions of the form (1 + γ • ) 1/γ should be interpreted as exp(•) when γ = 0.The distribution G, hence, the asymptotic joint tail behavior of F, is characterized by the marginal EV indices γ 1 , γ 2 and the bivariate tail copula R associated with F, which can be defined as (2) The EV indices γ 1 , γ 2 specify the margins of G through (1), and the function R specifies the dependence structure between the margins.While R is not a copula itself, it does characterize the copula of G, and hence the tail dependence structure of F. Similarly, the distribution G is characterized by γ 1 , γ 2 and the tail copula R associated with F , defined analogously to (2).We refer to, for example, the monographs Kotz and Nadarajah (2000), Beirlant et al. (2004), and de Haan and Ferreira (2006) for further details about multivariate extreme value theory.
Observe that For random variables with continuous margins, R(1, 1) is often referred to as the upper tail dependence coefficient and denoted by λ U ; see for example, Joe (2001), p. 33.It measures the probability of a "large rank" of X 1 conditionally upon a "large rank" of Y 1 .Indeed, We also note here that the function R generates a σ -finite measure R on the Borel subsets of [0, ∞] 2 \{(∞, ∞)} via the identity In this article, we develop a general procedure to construct two-sample tests for the equality of the tail copulas R and R .In other words, we assume that we have two random samples independently generated from F and F , and provide a procedure that can be used to test the null hypothesis R = R against the alternative R = R , where R and R remain unspecified.Note that, while the bivariate extreme value setting is of a semiparametric nature, both the null and alternative hypotheses we consider in this article are nonparametric.

Comparing Two Tail Copula Estimators
Suppose we have a random sample (X 1 , Y 1 ), . . ., (X n , Y n ) from F and an independent random sample (X 1 , Y 1 ), . . ., (X n , Y n ) from F .Throughout Sections 3-5 we will assume that the null hypothesis holds, that is, R and R are the same tail copula, which we will refer to as R.In the present section, we define two semiparametric estimators for the tail copula R, computed separately from the two available samples, and we describe the asymptotic behavior of the difference between these two estimators as the sample sizes tend to infinity.
From (2), one can verify that R(x, y) = lim t→∞ R (t) (x, y) for (x, y) ∈ [0, ∞) 2 , with where (4) We also define X i (t), Y i (t) analogously to (4) and R (t) analogously to (3).We will use the functions R (t) and R (t) as a basis for estimating R from the two available samples.To that end, we For notational brevity, we will write R n for R (n/k) and R n for R (n /k ) .We estimate R n and hence R by replacing the unknown quantities a j (n/k), b j (n/k) and γ j in (4) by appropriate estimators a j (n/k), b j (n/k) and γ j , for j = 1, 2, and the probability P by the corresponding empirical measure.We define, therefore, as an empirical analogue to (4), and we introduce the semiparametric estimator for the tail copula R. The random variables X i (n /k ), Y i (n /k ) and the estimator R n (x, y) are defined analogously to ( 5) and ( 6), respectively.Throughout, we fix δ and T such that 0 < δ < T < ∞.For later reference, we also introduce the process and the analogously defined T n .It is known, by (Einmahl, de Haan, and Sinha 1997, Lemma 3.1), that the weak convergence Henceforth, we will omit the arguments (n/k) and (n /k ) where appropriate, for ease of notation.Now, the estimators R n and R n estimate the same tail copula R from two different samples, while they would estimate different tail copulas under the alternative hypothesis.So the normalized difference between R n and R n is a natural starting point for a two-sample test.Thus, we define κ := kk /(k + k ) and Our first result will establish the asymptotic behavior of η n,n as n, n → ∞.We first state the necessary assumptions and definitions.
A1.For some 6-variate random vector we have the joint weak convergence Similarly, for some 6variate random vector (A 1 , A 2 , B 1 , B 2 , 1 , 2 ), we have the joint weak convergence Assumption A1 is formulated such that it allows for flexibility in choosing the estimators.It is fulfilled for, for example, the moment estimators of γ j , a j , b j , γ j , a j , and b j , provided that k and k are chosen appropriately; see de Haan and Ferreira (2006), sec.4.2 and 3.5.
Finally, for x > 0 and γ ∈ R, we define the following functions: We are now prepared to state the convergence result for η n,n .
Theorem 3.1.If Assumptions A1-A3 hold, then Remark 3.2.While the normalized difference between R n and R n provides a natural initial basis for two-sample testing, Theorem 3.1 reveals that it is not easily amenable for this purpose: the distribution of η depends upon the true underlying tail copula R, which is not specified by the null hypothesis, as well as on the true, but unknown, values of the parameters γ 1 , γ 2 , γ 1 and γ 2 .To obtain a suitable testing basis, we will in the next section describe a martingale transformation that will turn η into a standard process with a distribution that depends neither on R nor on the marginal extreme value indices.
Remark 3.3.The following relationships between the functions f , g, h defined in (10) can be verified, for any x > 0 and γ , γ ∈ R: Using these identities, the limiting process η appearing in (11) can also be written as This equivalent, but-importantly-more parsimonious, form of the limiting process η, rather than that in (11), will serve as the basis for the martingale transformation described in the next section, which would otherwise be ill-defined.
Remark 3.4.Although the bivariate case is the most interesting and the most relevant one, the setup and results of this article can be generalized to the d-variate case, for d > 2. We will not pursue this, however, because for d > 2 a d-variate tail copula R on the domain [0, ∞) d yields, in contrast to when d = 2, only limited information on the tail dependence structure of F.

Martingale Transformation
In this section, we describe a martingale transformation that enables us to transform the limiting process η into a standard bivariate Wiener process.The transformation is obtained by a suitable application of the martingale innovation transform approach originally developed in Khmaladze (1981Khmaladze ( , 1988Khmaladze ( , 1993)), and used for example, in Koenker andXiao (2002, 2006) for quantile regression, in Khmaladze and Koul (2004) and Delgado, Hidalgo, and Velasco (2005) for goodness-of-fit testing of parametric regression and time series models, and used and extended in Can et al. (2015) for parametric tail copula goodness-of-fit testing.The classical Doob-Meyer decomposition of a Brownian bridge, which turns a Brownian bridge into a standard Brownian motion by subtracting its compensator, occurs as a simple special case of this approach.Heuristically, the present martingale transformation may therefore be understood as the suitable counterpart for η of the Doob-Meyer decomposition for a Brownian bridge.
Consider the representation we derived in Remark 3.3 for the limiting process η of Theorem 3.1.It is easy to see, by direct computation of the covariance, that the process where the Z i are random variables and the Q i are deterministic functions on (0, ∞) 2 , defined by We note that ( 12) is of the general bivariate form (23) in Can et al. (2015).Hence, η can be transformed into a standard Wiener process by suitably exploiting Theorem 3.1 of that paper.We will formally state this result in Theorem 4.1, but we first introduce some assumptions and notation.
Observe that this condition allows R to have mass on the "axes at infinity" {(x, ∞) : x ≥ 0} ∪ {(∞, y) : y ≥ 0}, but it excludes (strict) tail independence, that is, r(x, y) = 0 for all (x, y) ∈ (0, ∞) 2 .We note that it can be tested whether the Ledford-Tawn coefficient of tail dependence is less than 1, see Draisma et al. (2004).If this is the case, the data are tail independent.Now, with the functions Q i as defined in (13), let us denote q i = dQ i /dR for i = 1, . . ., 8, so that where f (1) (x, γ ) := ∂f (x, γ )/∂x and similarly for g and h.
We also introduce matrices and we denote The functions q 1 , . . ., q 8 bear resemblance to score functions, corresponding to a 1 , a 2 , b 1 , b 2 , γ 1 , γ 2 , γ 1 , and γ 2 , respectively, using terminology from likelihood analysis.Note the asymmetry between the score functions associated to the two samples, which occurs due to Remark 3.3.Furthermore, I δ,T (t) can be viewed as a partial Fisher information matrix built from these scores.
Remark 4.2.Recall (12).The martingale transformation induces a nullification of η(x, y) − V R (x, y) = 8 i=1 Q i (x, y)Z i and normalizes the resulting R-Wiener process to render a standard Wiener process on [0, τ ] 2 .The double integrals with respect to the process η can be understood pathwise as Riemann-Stieltjes integrals; we refer to for example, Towghi (2002), Theorem 1.2(a), for an appropriate existence result.
Remark 4.3.We note that in the cases γ 1 = γ 1 and γ 2 = γ 2 , we will have q 5 ≡ q 7 and q 6 ≡ q 8 , respectively, and the matrix I δ,T (t) will be singular.If this information is available, the redundant q-functions can simply be omitted when constructing I δ,T (t), and Theorem 4.1 continues to apply mutatis mutandis.Otherwise, this case may be included by using the Moore-Penrose pseudoinverse of I δ,T (t) in ( 15) and adapting the proof of Theorem 4.1 accordingly.Besides, as we will show by simulations in Section 6, our testing approach, which uses the regular inverse of the matrix I δ,T (t, γ 1 , γ 2 , γ 1 , γ 2 , r, r 1 , r 2 ) defined above, can in practice still be applied without size distortions when γ 1 = γ 1 or γ 2 = γ 2 .
Let K be a univariate density serving as a kernel for kernel density estimation.To estimate the tail copula density r, we propose the nonparametric estimator where w = c n (x, y)k −1/10 and w = c n (x, y)(k ) −1/10 , with, for some c > 0, For the partial derivative r (1) , we propose the nonparametric estimator where now w = c n (x, y)k −1/12 and w = c n (x, y)(k ) −1/12 , with, for some c > 0, c n (x, y) → c, c n (x, y) → c, uniformly on [δ, T] 2 , and where K (1) denotes the derivative of K.The partial derivative r (2) is estimated analogously.In the sequel, we choose K to be the well-known triweight kernel , for convenience.We will demonstrate, as part of our main theoretical result that follows, that these estimators are consistent.In view of Theorems 3.1 and 4.1, one might anticipate the empirical process W n,n to converge weakly to a standard Wiener process as n, n → ∞.This turns out to be true, and we will formally establish this result in Theorem 5.1, but first we state one more assumption.
Theorem 5.1, our main theoretical result, states that under the null hypothesis R = R , the test process W n,n converges weakly to a distribution-free limiting process, and hence W n,n can be used as a generator of a multitude of asymptotically distributionfree test statistics for testing R = R .We will not investigate how to construct optimal tests against certain specific alternative hypotheses.Instead, we will study in Section 6 three common examples of nonparametric omnibus tests based on W n,n , not only under the null but also under the alternative hypothesis.

Simulations
In this section, we study the finite-sample behavior of the test process W n,n in (16), both under the null hypothesis R = R and the alternative hypothesis R = R , via Monte Carlo simulations.All the computations of the present section, as well as those of the subsequent Section 7, are implemented in the software R.An .R file for the implementation of our testing approach is provided as a supplement.
To improve the finite-sample approximation of Theorem 5.1, we actually consider the process W n,n , an asymptotically equivalent version of W n,n , obtained by replacing k and k by k − √ k and k − √ k , respectively, in the definition of the normalizing factor √ κ.For each model considered below, we generate 1000 pairs of bivariate samples of size n = n = 1500.We take δ = 0.25 and T = 2.In additional simulation results (not reported here), we have seen that the actual choices of the tuning parameters δ and T are relatively unimportant and we have thus fixed them throughout.Also, based on further simulations, we have decided to choose the multiplication factors in the bandwidth to be c n (x, y) = c n (x, y) = c n (x, y) = c n (x, y) = 1.5.This fixes the bandwidth once k is chosen.The choice of k is known to be important and difficult in extreme value statistics.As a rule of thumb, we consider the relation between k and the sample size n to be k ≈ n 3/4 if n is not very large (say, n ≤ 10,000).If we are in the luxurious situation that n > 10,000, we suggest k ≈ 10n 1/2 .In the simulations, for n = n = 1500, we take k = k = 250, but the simulation results under the null hypothesis do not change substantially when k changes within reasonable limits.In particular, we find no clearly visible effect on the PP-plots.Of course, the power is affected when k varies.Indeed, when k becomes larger, the power increases sharply, for two reasons: (i) increased (effective) sample size; and (ii) increased bias.In empirical applications, a further sensitivity analysis can be helpful for a range of k-values, where the largest k is twice the smallest one, with the rule-of-thumb value included; see Section 7. The estimators a j , a j and γ j , γ j , j = 1, 2, are taken to be the moment estimators (see, e.g., de Haan and Ferreira (2006), sec.4.2 and 3.5), and we set, as usual, b 1 = X n−k:n , b 2 = Y n−k:n (and analogously for b 1 , b 2 ), with X i:n , Y i:n denoting the marginal order statistics.
Once a pair of bivariate samples is generated, we construct the process W n,n =: W n on a 100 × 100 finite grid of equidistant points spanning [0, 1] 2 .To compare this observed path with a standard Wiener process, we compute three test statistics, namely W n (x, y) 2 , (Cramér-von Mises type) where G is the mesh length, that is, 1/100.The same statistics are also computed from 10,000 paths of the true standard Wiener process generated on the same grid G, to create benchmark distribution tables.Recall that these tables have to be made only once as a consequence of obtaining a distribution-free limit.We denote the statistics, computed from the true Wiener process, by κ, ω 2 and A 2 .For each model, we present PP-plots for a visual comparison of the empirical distributions of κ n versus κ, ω 2 n versus ω 2 and A 2 n versus A 2 .We also present rejection at 5% and 1% significance levels.

Simulations Under the Null Hypothesis
We consider the following four models: Note that we specify each bivariate distribution F (and F ) in terms of a copula C and marginal dfs F 1 , F 2 , which characterize F via F(x, y) = C(F 1 (x), F 2 (y)).Also, Pareto(α) with α > 0 refers to the df 1 − 1/x α , x > 1; Gumbel(θ ) with θ ≥ 1 refers to the copula Clayton(θ ) with θ > 0 refers to the copula and therefore the Clayton(θ ) survival copula is given by Since the tail copula of a bivariate df is entirely determined by the copula, the null hypothesis R = R clearly holds in Models I-III, where F and F have identical copulas.In Model IV, the copulas of F and F are different, but R = R still holds, as both the independence copula and the Clayton(1) copula have upper tail independence (i.e., zero tail copula).
The PP-plots in Figure 1 indicate that the process W n indeed behaves like a standard Wiener process, providing an empirical confirmation of Theorem 5.1.Moreover, in Table 1 we present the rejection counts at 5% significance level.For each particular model and test statistic, the rejection count is simply the observed number of times (out of 1000) the test statistic exceeds the 95th percentile of the corresponding distribution obtained from the true Wiener process.In all cases, the observed counts are consistent with the Binomial(1000, 0.05) distribution, substantiating a good size performance.Note that the results for Model II show that our approach works well when F and F have identical marginal distributions; see Remark 4.3.

Simulations Under the Alternative Hypothesis
Before describing the simulation models under the alternative, we first note that the estimation of R(x, y) is statistically difficult in the sense that generally the corresponding estimator has considerable bias and variance.For simplicity, consider R(1, 1), the upper tail dependence coefficient.For the sample sizes used in this simulation section, it is hard to distinguish between the values R(1, 1) = 0.5 and R (1, 1) = 0.6, say, so a test cannot achieve a high power in such a case.In order to substantiate this, we reuse the simulation results from the df F with the Clayton(1) survival copula and Pareto(3) margins in Model III above.In this case R(1, 1) = 0.5.The histogram in Figure 2 depicts the distribution of the 1000 estimates R n (1, 1).The (empirical) bias of these estimates is as high as 0.046 and the sample standard deviation is 0.026.We see from the figure that the central 95% of the estimates range from 0.492 to 0.596; the true value is barely included in this interval and the range is rather wide.Note that when performing a two-sample test, the bias and variance of the second sample also have to be taken into account.
In this section, we consider the four models listed below.The linear factor model of Model IVa refers to the bivariate random vector with Z 1 , Z 2 independent and λ, μ ∈ (0, 1) deterministic parameters.
As in Section 6.1, we construct PP-plots for each model and each of the three test statistics; see Figure 3.The PP-plots show dramatic deviations from the behavior under a standard Wiener process.We also present rejection counts at 5% significance level in Table 2.As suggested by the PP-plots, the three omnibus tests have high power with, for the present alternatives, Cramérvon Mises outperforming Kolmogorov-Smirnov, and Anderson- Darling performing the best.Based on this, we recommend to apply both the Cramér-von Mises and the Anderson-Darling type test statistics.
In Figure 4, we also provide empirical power curves at 5% significance level for the three test statistics, computed from a sequence of eight models similar to Model Ia.In each model, F has Gumbel(θ ) copula and Pareto(3) margins, F has Gumbel(6) copula and Pareto(4) margins, and θ takes a sequence of values   in the interval (1, 6) so that the difference of the tail dependence coefficients R (1, 1) − R(1, 1) takes a sequence of values ranging from 0.1 to 0.8.Observe that the power grows quickly and is high as soon as the difference is 0.2.The exact values of θ that were used, and the corresponding differences R (1, 1)−R(1, 1), can be seen in Table 3.Each point in a given power curve represents the observed rejection rate at the 5% level among 200 sample pairs.

Empirical Analysis
Emboldened by the Monte Carlo simulation results of the previous section, demonstrating good size and power performances, we now take our approach to real equity index data of three financial markets.We take δ = 0.25, T = 2 and c n (x, y) = c n (x, y) = c n (x, y) = c n (x, y) = 1.5 as in Section 6, and consider a range of k-values (see below).
An important concern when analyzing multiple financial markets operating in different time zones is the synchronicity of the data, which requires a careful treatment.We use intradaily data to construct three synchronized time series of daily equity index returns: for FTSE 100 (UK), DAX 30 (Germany), and S&P 500 (US).Data are obtained from the Tick History Database of Thomson Reuters.Our full sample spans the period May 2008 to August 2015.Using Coordinated Universal Time (UTC), care has been taken to account for the change to Daylight Savings Time (DST), which does not occur on the same day around the world.
We focus on the pair {FTSE 100, DAX 30} for a detailed exposition.We construct two bivariate samples from the daily negative log-returns of these two indices: the first sample is generated from the 900 trading days preceding January 1, 2012 (spanning the period from May 23, 2008to December 31, 2011), and the second sample is generated from the 900 trading days following January 1, 2012 (spanning the period from January 1, 2012 to August 13, 2015).To be more precise, the daily negative log-returns for a given index are the numbers log(p t /p t+1 ), with p t denoting the index price recorded at 14:30 UTC (13:30 UTC during European DST) on trading day t.Note that the first sample covers some of the most volatile trading days of the global financial crisis and the subsequent European debt crisis, while the second sample can be considered a relatively calm "post-crisis" period.We aim to test whether or not the tail dependence structure in the two samples, as characterized by the tail copula, shows statistical evidence of a change.As an interesting alternative analysis, which we will not pursue here, one could compare the tail copulas associated with the upper and lower tails of the daily bivariate log-returns within the same sampling period.Indeed, these bivariate extremal gains can be seen to be asymptotically independent of the bivariate extremal losses.
Figure 5 displays plots of the daily negative log-returns for the two indices during the two time periods, and Table 4 provides some sample statistics for these time series.It is visibly clear that the marginal structure changes for both indices from period 1 to period 2, but a possible change in the tail dependence structure is much more difficult to detect by visual inspection.The change in the marginal structure can also be observed in Figure 6, which plots the estimated extreme value index γ for each marginal dataset over a set of k values ranging from 100 to 200.It is clear that the (right) tail of the daily negative log-returns is heavier in the first period (with EV index estimates firmly positive) than in the second period (with EV index estimates mostly hovering below 0).The next step in our analysis is to transform the observed negative log-returns (X i , Y i ) in period 1 and (X i , Y i ) in period 2 into ( X i , Y i ) and ( X i , Y i ) as described in (5), which leads to estimates R n and R n of the tail copulas R and R in the two periods.Figure 7 displays scatterplots of the transformed negative log-returns in the two periods, for k = 150.Since the transformation (5) maps joint extremes (X i , Y i ) to points near the origin, the tail copula estimate R n (x, y) is proportional to the number of points ( X i , Y i ) inside the rectangle [0, x] × [0, y] (and analogously for R n ).In particular, the upper tail dependence coefficients R(1, 1) of period 1 and R (1, 1) of period 2 can be estimated by dividing the number of points ( X i , Y i ) or ( X i , Y i ) in [0, 1] 2 by k.This leads to estimates R n (1, 1) = 115/150 = 0.77 and R n (1, 1) = 104/150 = 0.69.The difference is small, and, also in light of the estimator's substantial bias and variance illustrated in Figure 2 for simulated data, it is not at all clear if it   indicates a difference in the true tail copulas R and R .Also note that R(1, 1) does not tell the whole story about tail dependence, which can only be properly described by the entire tail copula R.
It remains to construct the test process W n and compute the test statistics in (17), as in Section 6.We do this again for k = 100, . . ., 200.Plots of the resulting test statistics can be seen in Figure 8, together with the 95th and the 99th percentiles of their distributions under the null hypothesis.Each test statistic stays below the 95th percentile of the corresponding null distribution essentially for the entire range of k values, leading to nonrejection of the null hypothesis.Thus, the conclusion based on these three test statistics is that the tail dependence structure between the daily negative log-returns of FTSE 100 and DAX 30 does not change from period 1 to period 2, despite the visible evidence of change in the marginal (tail) behaviors of both equity indices.
For an independent verification, we also apply the multiplier bootstrap test for the equality of tail copulas described in Bücher and Dette (2013) to our dataset.An implementation of the partial derivative multiplier (pdm) approach, as outlined in Section 4.1 of that paper, with k = 150 and 500 bootstrap replications, leads to an estimated p-value of 0.42 for the null hypothesis of equal tail copulas, which is consistent with our conclusion.
A possible objection against this application could be that, per period, the data may be serially dependent.First note that pairwise dependence in the tails can only be "positive"; for example, for the countermonotonic copula, the case of perfect negative dependence, the corresponding tail copula is just the 0-function, the same as for the independence copula.Loosely speaking, in case of (positive) serial dependence at high levels, the data contain less information about the underlying df than in the iid case.Hence, estimation errors, that is, variances, become larger   and so do therefore the critical values of the tests.As a result, our tests can, in fact, be considered as anticonservative (that is, they overreject): if the null hypothesis is not rejected under the iid assumption, it will not be rejected when serial dependence is taken into account.In the Appendix, we provide some empirical support for this heuristic argument via simulations: rerunning Model III of Section 6.1 with component-wise serial dependence in both samples leads to higher rejection rates as expected.So, our conclusion of no change in tail copulas remains.
Having considered the tail dependence between a pair of European equity indices, we now turn our attention to the tail dependence between the transatlantic pair {FTSE 100, S&P 500}.We compute, like above, the daily negative log-returns of these two indices during the same periods 1 and 2 defined earlier, using daily synchronous index price data recorded at 14:30 UTC (13:30 UTC during European DST). Figure 9 displays plots of the daily negative log-returns for the two indices during the two time periods, and Table 5 provides sample statistics.As before, we see clear evidence of a change in the marginal behavior of the negative log-returns in period 1 versus in period 2.
To test if there is a change in the tail dependence structure, we again construct our test process W n from the two bivariate samples generated from the two periods, and compute the three test statistics in (17) for k = 100, . . ., 200; see Figure 10.We find that each test statistic mostly stays below the 95th percentile of the corresponding null distribution, with a few exceptions at particular k values.Keeping in mind that (i) looking at several k values simultaneously amounts to a multiple testing problem, and (ii) our datasets likely feature some serial dependence, both of which have an upward effect on critical values for the test statistics, we conclude that our testing approach again supports the null hypothesis of no change in the tail dependence structure of the negative log-returns in period 1 versus in period 2.
As with the earlier dataset of {FTSE 100, DAX 30} negative log-returns, we also apply the pdm bootstrap test of Bücher and Dette (2013) to the present dataset, with k = 150 and 500 bootstrap runs.This leads to an estimated p-value of 0.38 for the null hypothesis of equal tail copulas, which is again consistent with our conclusion.

Conclusions
We have developed a novel, general procedure to test for the equality of the tail copulas associated with two samples of bivariate data.A natural but complex feature of this problem is that the tail copulas, of which equality is tested for, are not specified.Deploying a martingale transformation of the suitably normalized difference between two semiparametric estimators of the tail copulas, we have constructed a two-sample hypothesis testing process and established its weak convergence to a standard Wiener process under the null hypothesis.Applying our hypothesis testing procedure to samples of negative log-returns of equity indices, during and after the global financial crisis, we find no evidence to reject the null hypothesis of tail copula equality.That is, although large negative returns occur more frequently and severely during crisis versus post-crisis for the pairs of equity indices we have analyzed, our testing procedure reveals that there is no statistical evidence of a change in their tail dependence structure.These findings suggest that, for the highly developed equity markets we consider, whereas inference about marginal (tail) behavior should account for changes over time or across "regimes, " the tail dependence structure appears to be more stable and can thus be inferred from longer time periods.

Figure 1 .
Figure 1.PP-plots for the three test statistics constructed from 1000 simulated sample pairs per model.

Figure 3 .
Figure 3. PP-plots for the three test statistics constructed from 1000 simulated sample pairs per model.

Figure 4 .
Figure 4. Empirical power curves for the three test statistics, constructed from a sequence of eight models similar to Model Ia.

Figure 5 .
Figure 5. Daily negative log-returns of FTSE 100 (top) and DAX 30 (bottom) in the two periods.Tick marks on the horizontal axes indicate the first trading day of each year.

Figure 6 .
Figure 6.Estimated marginal extreme value indices of the daily negative log-returns of FTSE 100 and DAX 30 in the two periods.

Figure 8 .
Figure 8.The three test statistics in (17) computed from the daily negative log-returns of FTSE 100 and DAX 30 in periods 1 and 2. The dotted lines indicate the 95th and 99th percentiles of the corresponding null distributions.

Figure 9 .
Figure 9. Daily negative log-returns of FTSE 100 (top) and S&P 500 (bottom) in the two periods.Tick marks on the horizontal axes indicate the first trading day of each year.

Figure 10 .
Figure 10.The three test statistics in (17) computed from the daily negative log-returns of FTSE 100 and S&P 500 in periods 1 and 2. The dotted lines indicate the 95th and 99th percentiles of the corresponding null distributions.

Table 1 .
Number of rejections in 1000 repetitions at 5% significance level.

Table 2 .
Number of rejections in 1000 repetitions at 5% significance level.

Table 4 .
Sample statistics of the daily negative log-returns of FTSE 100 and DAX 30 in the two periods.

Table 5 .
Sample statistics of the daily negative log-returns of FTSE 100 and S&P 500 in the two periods.