estimation of tail dependence with application to twin data · 2014. 4. 30. · the twin data we...

Estimation of tail

dependence with

application to twin data

Master thesis by

Michael Osmann

May 21, 2012

Supervisors: Yuri Goegebeur and Jacob Hjelmborg

Contents

Abstract 4

Acknowledgements 4

1 Preliminaries 5

1.1 Classical convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 The Gumbel class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 The extremal Weibull class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Estimation of the extreme value index in practice . . . . . . . . . . . . . . . . . 11

2 Pareto-type distributions 14

2.1 Domain of attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Estimation of the extreme value index . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Estimation of the second order parameter . . . . . . . . . . . . . . . . . . . . . 27

2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.1 Proof of Lemma 2.2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.2 Lemma's needed in the proof of Theorem 2.2.7 . . . . . . . . . . . . . . 34

3 Multivariate extreme value theory 36

3.1 Limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 The exponent measure and the spectral measure . . . . . . . . . . . . . . . . . 38

3.3 Domain of attraction and asymptotic independence . . . . . . . . . . . . . . . . 41

3.4 Pickands dependence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5 The dependence measures χ and χ̄ . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 The model of Ledford and Tawn . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2

CONTENTS 3

4 Estimation of the coe�cient of tail dependence and the second order pa-

rameter in bivariate extreme value statistics 57

4.1 Estimation of the coe�cient of tail dependence . . . . . . . . . . . . . . . . . . 57

4.2 Estimation of the second order parameter . . . . . . . . . . . . . . . . . . . . . 62

4.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.1 Proof of Lemma 4.1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 Simulation study 68

5.1 Copula examples and simulation of data . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Estimation of the second order parameter τ . . . . . . . . . . . . . . . . . . . . 73

5.3 Estimation of the �rst order parameter η . . . . . . . . . . . . . . . . . . . . . . 74

5.4 Estimation of the dependence measures χ and χ̄ . . . . . . . . . . . . . . . . . . 74

6 Estimation of taildependence in BMI twindata 97

6.1 Description of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.2 Univariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.3 Multivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Epilogue 111

Bibliography 113

4 CONTENTS

Abstract

This master thesis consists of a theoretical discussion on univariate and bivariate extreme value

statistics along with an application to twin data. We �rst discuss the fundamental conver-

gence results from extreme value theory, which we use to construct the traditional maximum

likelihood estimators of the extreme value index. In order to put our work into the proper

framework, attention is paid to the three classes of extreme value distributions situated within

the max domain of attraction of the generalized extreme value distribution. Special attention

is given to the class of Pareto type distributions, since the methodology of how to construct

estimators in the multivariate setting resembles the methodology used to construct estima-

tors within the class of Pareto-type distributions. For the class of Pareto-type distributions

we propose an estimator of the extreme value index and an estimator for the second order

parameter. For both of these estimators we establish the asymptotic normality.

In the multivariate setting we start by discussing the transformation of the margins to stan-

dard Fréchet distributions and the fundamental convergence results. We discuss the domain of

attraction to the bivariate extreme value distribution and asymptotic dependence and asymp-

totic independence. We discuss furthermore the exponent measure, the spectral measure,

Pickands dependence function, the dependence measures χ and χ̄, and �nally the coe�cientof tail dependence. The interpretations of these measures are discussed and we show how they

are all connected. For the coe�cient of tail dependence we introduce a functional estimator,

for which we show how it can be bias corrected. This bias correction requires estimation of

the second order parameter τ , so we propose two estimators that can be used to estimate thissecond order parameter. The consistency of the estimators for the second order parameter are

established. We examine the �nite sample size behaviour of our estimator for the coe�cient

of tail dependence, the estimators of the second order condition and estimators of χ and χ̄using simulations.

The twin data we consider is from the older cohort of the Finnish Twin Cohort Study. For this

data we make a full univariate data analysis and estimate the coe�cient of tail dependence,

the second order parameter τ , and the measures χ and χ̄ for age and sex de�ned subsets ofthe data.

Throughout the thesis, results that are from the literature are stated with a reference, while

results that are our own are not stated with a reference.

Acknowledgements

I would like to thank my two supervisors Yuri Goegebeur and Jacob v. B. Hjelmborg for

helping me write this master thesis during the last 8 months. I would not have been ableto write this thesis without their help, and they have both spend a lot of time and e�ort on

this. I am gratefull that they decided to join forces and help me write a thesis with such an

interesting topic.

Chapter 1

Preliminaries

This chapter serves to give a short introduction to some of the basic concepts in univariate

extreme value statistics. First we will introduce a convergence result which is the foundation

of univariate extreme value statistics. It states what form the limiting distribution of a nor-

malized maximum will follow, if it exists. We will then describe shortly two of the classes of

extreme value distributions, known as the Gumbel and extremal Weibull families, respectively.

Finally, we discuss some simple ways in which the extreme value index can be estimated in

practice.

1.1 Classical convergence result

In the following we will consider a sample {Xi, 1 ≤ i ≤ n} of independent and identicallydistributed (i.i.d.) random variables having a distribution function FX . In extreme valuestatistics we consider either the maximum or the minimum of the random sample, where the

maximum is given by

Xn,n := max{X1, X2, . . . , Xn}.

We will try to describe the statistical behaviour of this maximum, but it is easy to transform

any result we obtain for the maximum to the minimum because of the relation

X1,n := min{X1, X2, . . . , Xn} = −max{−X1,−X2, . . . ,−Xn}. (1.1)

Because of the i.i.d. nature of X1, . . . , Xn, the distribution of Xn,n can be derived exactly forall possible values of n as follows

FXn,n(x) = P (Xn,n ≤ x)= P (X1 ≤ x,X2 ≤ x, . . .Xn ≤ x)= P (X1 ≤ x)P (X2 ≤ x) · · ·P (Xn ≤ x)= (FX(x))

n .

For practical purposes this relation does not help much though, since the distribution of

FX is usually unknown. One could try to estimate the distribution of FX and use this toestimate FXn,n , but small deviations in the estimation of FX can lead to large deviations inthe estimation of FXn,n . Instead we will look for approximate families of FXn,n which for large

5

6 Classical convergence result

n can be estimated by use of the extreme data only.We look at the behaviour of FXn,n as n approaches in�nity. If we denote the right endpointof FX as x∗, which means that x∗ := inf{x : FX(x) = 1}, then for any x < x∗ we have thatFnX(x) → 0 as n→ ∞. So the distribution of Xn,n is degenerate in the limit. This degeneracycan possibly be avoided if we look at an appropriate normalization, for instance

Xn,n − bnan

where (bn)∞n=1 is a sequence of constants and (an)

∞n=1 is a sequence of positive constants.

Appropriate choices of (an)∞n=1 and (bn)

∞n=1 can stabilize the location and scale of

Xn,n−bnan

. It

can be shown that the entire range of limit distributions ofXn,n−bn

an, if they exist, is given by

Theorem 1.1.1.

Theorem 1.1.1. (Fisher and Tippet, 1928; Gnedenko, 1943) Let X1, . . . , Xn be i.i.d. randomvariables with distribution function FX . If there exists sequences of constants (bn)

∞n=1 and

positive constants (an)∞n=1 such that

limn→∞

P

(Xn,n − bn

an≤ x

)= lim

n→∞FnX (anx+ bn) = G(x) (1.2)

at all continuity points of G, where G is a non degenerate distribution function, then G shouldbe of the following type

Gγ(x) = exp(−(1 + γx)−

1γ

), 1 + γx > 0, (1.3)

with γ real and where for γ = 0 the right-hand side is interpreted as exp (−e−x).

This family of distribution functions is known as the generalized extreme value (GEV) family,

for which the parameter γ is the shape parameter. This parameter is also called the extremevalue index and it describes the tail behaviour of FX , with larger values indicating heaviertails. The family consists of three classes known as the Gumbel, Fréchet and extremal Weibull

families which correspond to γ = 0, γ > 0 and γ < 0 respectively. The Fréchet class is alsoknown as the class of Pareto-type models. If the distribution FX satis�es (1.2)-(1.3) then wesay that it belongs to the max domain of attraction of Gγ , denoted FX ∈ D(Gγ).The result in Theorem 1.1.1 has some equivalent formulations. Some of these formulations

are based on the tail quantile function U(y) := Q(1− 1y

), y > 1, where Q is the quantile

function, de�ned as Q(p) := inf{x : FX(x) ≥ p}, p ∈ (0, 1). These equivalent formulationsare stated in Theorem 1.1.2.

Theorem 1.1.2. (Gnedenko, 1943; de Haan and Ferreira, 2006) Let X1, . . . , Xn be i.i.d.random variables with distribution function FX . For γ ∈ R the following statements areequivalent:

(i) There exists sequences of real constants (bn)∞n=1 and positive real constants (an)

∞n=1 such

that

limn→∞

FnX (anx+ bn) = Gγ(x) = exp(−(1 + γx)−

1γ

), (1.4)

for all x with 1 + γx > 0.

Classical convergence result 7

(ii) There is a positive function a such that for all x > 0,

limt→∞

U(tx)− U(t)a(t)

=xγ − 1γ

, (1.5)

where for γ = 0 the right-hand side is interpreted as log x.

(iii) There is a positive function a such that

limt→∞

t(1− FX(a(t)x+ U(t))) = (1 + γx)−1γ , (1.6)

for all x with 1 + γx > 0.

(iv) There exists a positive function f such that

limt↑x∗

1− FX(t+ xf(t))1− FX(t)

= (1 + γx)− 1

γ (1.7)

for all x for which 1 + γx > 0.

Moreover, (1.4) holds with bn := U(n) and an := a(n). Also (1.7) holds with f(t) =

a(

11−FX(t)

).

As seen in Theorem 1.1.2 the choice of the normalizing constant bn does not depend on thesign of γ and can be shown to always work, if we choose bn = U(n). The choice of an dependson whether we are dealing with γ positive, negative or equal to zero, so we will address thisin the sections dedicated to the corresponding classes.

In order to discuss the extremal Weibull and Fréchet classes, we need the concept of a slowly

varying function. Slowly varying functions are special cases of regularly varying functions, so

we will give the de�nition of what it means to be of regular variation. The regularly varying

functions will also be needed later in this thesis.

De�nition 1.1.3. (Beirlant et al., 2004, De�nition 2.1) Let f be an ultimately positive andmeasurable function on R+. We say that f is regularly varying at in�nity if there exists a realconstant ρ for which

limx→∞

f(λx)

f(x)= λρ for all λ > 0.

We write f ∈ Rρ and we call ρ the the index of regular variation. In the case ρ = 0, thefunction will be called slowly varying or of slow variation. We will reserve the symbol l forsuch functions. The class of all regularly varying functions is denoted by R.

The next two sections will be dedicated to the Gumbel and the extremal Weibull class, while

the Fréchet class which is of more importance for this thesis, will be discussed in the next

chapter.

8 The Gumbel class

1.2 The Gumbel class

The Gumbel class corresponds with the max domain of attraction of Gγ with γ = 0. Thefollowing proposition provides a characterization of the distributions that belong to this class.

Proposition 1.2.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ �nite or in�nite and a, f suitable positive functions, that

FX ∈ D(G0) ⇔ limt↑x∗

1− FX(t+ xf(t))1− FX(t)

= exp(−x), x ∈ R (1.8)

⇔ limt→∞

U(tx)− U(t)a(t)

= log(x), x > 0. (1.9)

For the Fréchet and extremal Weibull classes it is easy to show that the distributions belonging

to those classes satisfy (1.5), but this is not the case for the Gumbel class. This also meansthat determining the scaling parameter an for the distributions in the Gumbel class is moredi�cult. It can however be determined by the formula

an = n

∫ x∗U(n)

(1− FX(y)) dy.

We will not derive this formula, but simply take it as a fact. For details we refer to de Haan

and Ferreira (2006), Corollary 1.2.4.

Example 1.2.2. If we want to determine the parameters an and bn for the exp(1) distributionwith distribution function FX(x) = 1−exp(−x), x > 0, then we must �rst �nd the tail quantiledistribution of the exponential distribution. The distribution function has quantile function

Q(p) = − ln(1− p), 0 < p < 1. So

U(x) = Q

(1− 1

x

)= log(x), x > 1.

This means bn can be chosen asbn = U(n) = log(n)

and an can be chosen as

an = n

∫ ∞log(n)

exp(−x)dx = n exp (− log(n)) = 1.

Since we know the constants an and bn we can also show that the exponential distributionbelongs to the max domain of attraction of the Gumbel class. Indeed

P

(Xn,n − bn

an≤ x

)= FnX(anx+ bn)

= FnX (x+ log(n))

= (1− exp (−x− log(n)))n

=

(1− exp(−x)

n

)n→ exp (− exp (−x)) for n→ ∞.

The extremal Weibull class 9

The convergence of FnX (anx+ bn) to G(x) is shown in Figure 1.1. The solid line is G(x), thedashed line is for n = 2, the dotted line is for n = 5 and the dashed dotted line is for n = 10.It is clearly seen that when n grows then FnX (anx+ bn) converges pointwise to G(x).

−2 −1 0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

x

p

Figure 1.1: The convergence of FnX (anx+ bn) to G(x) for the standard exponential distribu-tion.

�

1.3 The extremal Weibull class

The extremal Weibull class corresponds with the max domain of attraction of Gγ with γ < 0.As was the case for the Gumbel class, we have a proposition which provides a characterization

of the distributions that belong to this class.

Proposition 1.3.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ �nite that

FX ∈ D(Gγ), γ < 0 ⇔ 1− FX(x∗ −

1

x

)= x

1γ lFX (x), x > 0 (1.10)

⇔ U(x) = x∗ − xγlU (x), x > 1, (1.11)

where lU (x) and lFX (x) are slowly varying at in�nity.

10 The extremal Weibull class

From (1.11) it is easily seen that (1.5) is satis�ed when t tends to in�nity. Indeed

U(tx)− U(t)a(t)

=x∗ − (tx)γlU (tx)− (x∗ − tγlU (t))

a(t)

=tγlU (t)

a(t)

(1− xγ lU (tx)

lU (t)

)∼ −γ t

γlU (t)

a(t)

xγ − 1γ

∼ xγ − 1γ

if we choose a(t) such that a(t)x∗−U(t) → −γ. This indicates that a good choice of an would be

an = a(n) = −γ(x∗ − U(n)) = −γnγlU (n).

Example 1.3.2. The reversed Burr distribution has distribution function given by

FX(x) = 1−(

ζ

ζ + (1− x)−δ

)λ, x < 1;λ, ζ, δ > 0

and so the quantile function is

Q(p) = 1− ζ−1δ

((1− p)−

1λ − 1

)− 1δ, 0 < p < 1.

So we �nd the tail quantile function U to be

U(x) = Q

(1− 1

x

)= 1− ζ−

1δ

(x

1λ − 1

)− 1δ, x > 1.

The distribution belongs to the max domain of attraction of Gγ with γ = − 1λδ . If we considerthe reversed Burr distribution with parameters λ = ζ = δ = 1, then we can choose thenormalizing constant bn as

bn = U(n) = 1− (n− 1)−1 .Since x∗ = 1 and γ = −1 we can choose the normalizing constant an as

an = 1− U(n) = (n− 1)−1 .

With these normalizing constants we can show that the reversed Burr distribution with pa-

rameters λ = ζ = δ = 1 belongs to the max domain of attraction of the Weibull class. Indeed

P

(Xn,n − bn

an≤ x

)= FnX (anx+ bn)

= FnX((x− 1)(n− 1)−1 + 1

)=

1− 11 +

(n−11−x

)n

=

(1− 1− x

n− x

)n→ exp(−(1− x)) for n→ ∞.

Estimation of the extreme value index in practice 11

The convergence of the reversed Burr distribution to its limit is illustrated in Figure 1.2. The

solid line is G(x), the dashed line is for n = 2, the dotted line is for n = 5, while the dasheddotted line is for n = 10. It is clearly seen that when n grows, then FnX (anx+ bn) convergespointwise to G(x).

−5 −4 −3 −2 −1 0 1

0.0

0.2

0.4

0.6

0.8

1.0

x

p

Figure 1.2: The convergence of FnX (anx+ bn) to G(x) for the reversed Burr distribution withλ = ζ = δ = 1.

�

1.4 Estimation of the extreme value index in practice

In practise we do not know the constants an and bn, so Theorem 1.1.1 is not very usefull ifwe want to estimate γ. However, if we for some �nite n ∈ N have that

P

(Xn,n − bn

an≤ x

)≈ exp

(−(1 + γx)−

1γ

), 1 + γx > 0,

then

P (Xn,n ≤ z) ≈ exp

(−(1 + γ

z − bnan

)− 1γ

), 1 + γ

z − bnan

> 0,

where z = bn + anx. If we let µ = bn and σ = an, then we are left with the model

P (Xn,n ≤ z) ≈ exp

(−(1 + γ

z − µσ

)− 1γ

), 1 + γ

z − µσ

> 0. (1.12)

With this model we can easily obtain maximum likelihood estimates of µ, σ and γ. To dothis, we divide the data into m blocks and de�ne z1, . . . , zm to be the block maxima of the

12 Estimation of the extreme value index in practice

m blocks. Under the assumption that Z1, . . . , Zm are independent variables having the GEVdistribution we get from (1.12) that the log likelihood is given by

logL(µ, σ, γ) = −m log σ −(1 +

1

γ

) m∑i=1

log

(1 + γ

zi − µσ

)−

m∑i=1

(1 + γ

zi − µσ

)− 1γ

.

(1.13)

The maximum likelihood estimates are then obtained by maximizing (1.13) with respect to

µ, σ and γ.Another popular model is the peaks over threshold model (POT). This model can be derived

using Theorem 1.1.2. If we assume that (1.4) is satis�ed, then there exists a positive function

f such that

limt↑x∗

P

(X − tf(t)

> x

∣∣∣∣X > t) = limt↑x∗ 1− FX(t+ f(t)x)1− FX(t) , x > 0= (1 + γx)

− 1γ , 1 + γx > 0.

For t large, we thus have

P

(X − tf(t)

> x

∣∣∣∣X > t) ≈ (1 + γx)− 1γ , x > 0 and 1 + γx > 0,which reduces to

P (X − t > z|X > t) ≈(1 + γ

z

σ

)− 1γ, z > 0 and 1 + γ

z

σ> 0, (1.14)

if we set z = f(t)x and f(t) = σ. From this we are able to get maximum likelihood estimatesof γ and σ when we choose a threshold t. If we let z1, . . . , zk denote the k observations whichare greater than the threshold t, then we obtain the log likelihood function from (1.14). Thelog likelihood is given by

logL(σ, γ) = −k log σ −(1 +

1

γ

) k∑i=1

log(1 + γ

ziσ

). (1.15)

The maximum likelihood estimates are obtained by maximizing (1.15) with respect to γ andσ.Using maximum likelihood with block maxima or peaks over threshold is an easy way to

estimate γ. There are many other ways to estimate γ but we will not go into detail aboutthem. Among the methods of estimating γ for the generalized extreme value distribution arethe Pickands estimator (Pickands, 1975), the moment estimator (Dekkers et al., 1989), and

the probability-weighted moment estimator (Hosking et al., 1985).

When considering the POT model we have to choose the threshold ourselves. There are several

ways to do this, but we will only discuss how to choose the threshold using a mean residual

life plot. An introduction to mean residual life plots requires a small lemma about a property

of the generalized Pareto distribution.

Lemma 1.4.1. If X ∼ GPD(σ, γ), then X − u|X > u ∼ GPD(σ + γu, γ).

Estimation of the extreme value index in practice 13

Proof. If X ∼ GPD(σ, γ), then FX(x) = 1−(1 + γ xσ

)− 1γ . From this we get that

P (X − u > x|X > u) = P (X > u+ x,X > u)P (X > u)

, x > 0

=1− FX(u+ x)1− FX(u)

=

(1 + γ x+uσ1 + γ uσ

)− 1γ

=

(1 + γ

x

σ + γu

)− 1γ

,

which implies that X − u|X > u ∼ GPD(σ + γu, γ).

If X ∼ GPD(σ, γ) with γ < 1, then

E(X) =σ

1− γ,

while E(X) = ∞ for γ ≥ 1. So assuming γ < 1, it follows from Lemma 1.4.1 that

E(X − u|X > u) = σ + γu1− γ

, u > 0,

and hence the mean excess function is linear in u. The mean residual life plot consists of the

points {(u,

1

nu

nu∑i=1

(x(i) − u

)): u < xmax

},

where x(1), . . . , x(nu) consists of the nu observations that exceeds u, and xmax is the largestobservation. If the GPD approximation is good at threshold u, then it should also be good ata higher threshold, so the mean excess function should be approximately linear in u beyond agood threshold.

Chapter 2

Pareto-type distributions

In this chapter we give an introduction to the Fréchet class. We start by considering the domain

of attraction of this class, similar to the discussion of the Gumbel and extremal Weibull classes.

Next we turn our attention to the estimation of the extreme value index γ for Pareto-typedistributions which satisfy a second order condition. We prove asymptotic normality for a

statistic proposed in Goegebeur et al. (2010) and use this to construct a class of estimators

for γ. From this class of estimators we construct speci�c estimators using kernel functions.We end this chapter with a presentation of an estimator of the second order parameter. The

asymptotic normality of the latter is established under a third order condition.

2.1 Domain of attraction

The class of Pareto-type models corresponds with the max domain of attraction of Gγ withγ > 0. The following proposition provides a characterization of the distributions that belongto this class.

Proposition 2.1.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ in�nite that

FX ∈ D(Gγ), γ > 0 ⇔ 1− FX(x) = x−1γ lFX (x), x > 0 (2.1)

⇔ U(x) = xγlU (x), x > 1, (2.2)

where lU (x) and lFX (x) are slowly varying at in�nity.

Tail quantile functions of the form (2.2) can be shown to satisfy (1.5) if x tends to in�nity, inthe following way

U(tx)− U(t)a(t)

=(tx)γlU (tx)− tγlU (t)

a(t)

=lU (t)t

γ

a(t)

(lU (tx)

lU (t)xγ − 1

)∼ x

γ − 1γ

14

Domain of attraction 15

when choosing a(t) = γtγlU (t) = γU(t). More generally a(t) can also be chosen as a functionsatisfying

limt→∞

a(t)

U(t)= γ.

This brings us to how an can be chosen as a normalizing constant. If we choose an = a(n) =γU(n) then we can use this constant as one of the normalizing constants for the Fréchet class.There exists full equivalence between the Pareto-type models and the extremal Weibull class.

If we let X be a random variable with FX belonging to the max domain of attraction of theextremal Weibull class with x∗ as the right endpoint, and put Y := (x∗ −X)−1, then theWeibull class and the Pareto-type models are linked through the identi�cation

FX ∈ D (Gγ) , γ < 0 ⇔ FY ∈ D (Gγ) , γ > 0.

The equivalence follows easily because

1− FX(x∗ −

1

x

)= P

(X > x∗ −

1

x

)= P

((x∗ −X)−1 > x

)= 1− FY (x).

Example 2.1.2. The Fréchet distribution has distribution function given by

FX(x) = exp(−x−α

), x > 0, α > 0.

This means it has quantile function

Q(p) = (− log p)−1α , 0 < p < 1,

and hence the tail quantile function is

U(x) =

(− log

(1− 1

x

))− 1α

, x > 1.

The Fréchet distribution has γ = 1α and the normalizing constant an can hence be chosen as

an = γU(n) =1

α

(− log

(1− 1

n

))− 1α

.

The normalizing constant bn can be chosen as

bn = U(n) =

(− log

(1− 1

n

))− 1α

.

Concerning the Fréchet distribution with α = 1 we see that

P

(Xn,n − bn

an≤ x

)= FnX (anx+ bn)

= FnX

((− log

(1− 1

n

))−1x+

(− log

(1− 1

n

))−1)

=

[(1− 1

n

)n] 11+x→ exp

(−(1 + x)−1

)for n→ ∞.

16 Domain of attraction

The convergence of the Fréchet distribution to its limit is illustrated in Figure 2.1. The solid

line is G(x), the dashed line is for n = 2, the dotted line is for n = 5, while the dashed dottedline is for n = 10. It is clearly seen that when n grows, then FnX (anx+ bn) converges pointwiseto G(x).

−1 0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

x

p

Figure 2.1: The convergence of FnX (anx+ bn) to G(x) for the Fréchet distribution with α = 1.

�

Next we give two examples of distributions that are of Pareto-type.

Example 2.1.3. The Burr distribution has a distribution function given by

FX(x) = 1−(

ζ

ζ + xδ

)λ, x > 0, λ, ζ, δ > 0.

In order to verify that the Burr distribution is of Pareto-type we start with

1− FX(x) =(

ζ

ζ + xδ

)λ= x−δλ

(ζ

ζx−δ + 1

)λ.

It is easily seen that g(x) :=(

ζζx−δ+1

)λis slowly varying at in�nity since it converges to a

constant when x→ ∞. So the Burr distribution is of Pareto-type with γ = 1λδ .a �Example 2.1.4. The absolute T distribution has distribution function given by

FX(x) =Γ(n+12

)√nπΓ

(n2

) ∫ x−x

(1 +

t2

n

)−n+12

dt, x > 0, n ∈ N.

Estimation of the extreme value index 17

In order to verify that the absolute T distribution is of Pareto-type we start with

1− FX(x) = 2Γ(n+12

)√nπΓ

(n2

) ∫ ∞x

(1 +

t2

n

)−n+12

dt

= 2Γ(n+12

)√nπΓ

(n2

) ∫ ∞x

(t2

n

)−n+12 ( n

t2+ 1)−n+1

2dt

= K

∫ ∞x

t−n−1(nt−2 + 1

)−n+12 dt,

where K := 2n

n2 Γ(n+12 )√πΓ(n2 )

. We are concerned with large values of x, so we make a Taylor series

expansion of (1 + x)−n+12 around 0, which yields

(nt−2 + 1

)−n+12 =1− n+ 1

2nt−2 +

1

2

n+ 1

2

(n+ 1

2+ 1

)n2t−4

− 16

n+ 1

2

(n+ 1

2+ 1

)(n+ 1

2+ 2

)(1 + t̃

)−n+12

−3n3t−6,

where t̃ is between 0 and nt2. From this it follows that

1− FX(x) =K(∫ ∞

xt−n−1dt− n(n+ 1)

2

∫ ∞x

t−n−3dt

+n2(n+ 1)(n+ 3)

8

∫ ∞x

t−n−5dt

− n3(n+ 1)(n+ 3)(n+ 5)

48

∫ ∞x

t−n−1(1 + t̃

)−n+12

−3t−6dt

).

Since(1 + t̃

)−n+12

−3 ≤ 1 it follows that∫∞x t

−n−1 (1 + t̃)−n+12 −3 t−6dt ≤ ∫∞x t−n−7dt, andhence

1− FX(x) =K(x−n

n− n(n+ 1)

2(n+ 2)x−n−2 +

n2(n+ 1)(n+ 3)

8(n+ 4)x−n−4 +O

(x−n−6

))=x−nC0

(1− n

2(n+ 1)

2(n+ 2)x−2 +

n3(n+ 1)(n+ 3)

8(n+ 4)x−4 +O

(x−6

)), (2.3)

where C0 :=Kn . Since the function g(x) := C0

(1− n

2(n+1)2(n+2) x

−2 + n3(n+1)(n+3)

8(n+4) x−4 +O

(x−6

))converges to a constant, when x→ ∞, the function is slowly varying at in�nity and hence theabsolute T distribution is of Pareto-type with γ = 1n .a �

2.2 Estimation of the extreme value index

In the analysis of Pareto-type models, estimation of γ plays a central role. The asymptoticdistribution of the estimator of γ is usually established under the following second ordercondition on the tail behaviour.

18 Estimation of the extreme value index

Assumption 2.2.1 (Second order condition). There exists a positive real parameter γ, anegative real parameter ρ and a function b with b(t) → 0 for t→ ∞, of constant sign for largevalues of t, such that

limt→∞

logU(tx)− logU(t)− γ log xb(t)

=xρ − 1ρ

, ∀x > 0.

The second order condition implies that |b| is regularly varying with index ρ (Geluk and Haan,1987), so the parameter ρ determines the rate of convergence for logU(tx) − logU(t) to itslimit γ log x, when t tends to in�nity. If ρ is close to zero then the convergence is slow andthe estimation of tail parameters is practically di�cult.

We will now verify that the Burr distribution and the absolute T distribution satisfy the second

order condition. That they are of Pareto-type was veri�ed in Example 2.1.3 and Example 2.1.4

respectively.

Example 2.2.2. In order to verify that the Burr distribution satis�es the second order condi-

tion we need to �nd its tail quantile function. The quantile function of the Burr distribution

is easily found by inverting the distribution function and it is given by

Q(p) = ζ1δ

((1− p)−

1λ − 1

) 1δ, 0 < p < 1.

From this we obtain the tail quantile function

U(x) = Q

(1− 1

x

)= xγζ

1δ

(1− x−

1λ

) 1δ, x > 1.

We start with the expression

logU(tx)− logU(t)− γ log x = 1δlog(1− (xt)−

1λ

)− 1δlog(1− t−

1λ

).

If we make a Taylor series expansion of log(1− x) around 0, we obtain

logU(tx)− logU(t)− γ log x =1δ

(−(tx)−

1λ − 1

2(tx)−

2λ

)− 1δ

(−t−

1λ − 1

2t−

2λ

)+O

(t−

3λ

)=

1λδ t

− 1λ

(x−

1λ − 1

)− 1λ

+

1λδ t

− 2λ

(x−

2λ − 1

)− 2λ

+O(t−

3λ

)(2.4)

=γt−

1λ

(x−

1λ − 1

)− 1λ

+O(t−

2λ

). (2.5)

From (2.5) we see that if we choose ρ = − 1λ and b(t) = γtρ, then the Burr distribution satis�es

the second order condition. More generally b(t) can be chosen such that b(t) = γtρ(1 + o(1)).a �

Example 2.2.3. From (2.3) we get for the absolue T distribution that

1− FX(x) = x−1γC0

(1− C1x−2 + C2x−4 +O

(x−6

)),


where C1 :=n2(n+1)2(n+2) and C2 :=

n3(n+1)(n+3)8(n+4) . In order to �nd the tail quantile function we

have to invert1

y= x

− 1γC0

(1− C1x−2 + C2x−4 +O

(x−6

)).

From this we �nd

x = Cγ0 yγ(1− C1x−2 + C2x−4 +O

(x−6

))γ.

If we make a Taylor series expansion of (1− x)γ around x = 0, we obtain

x =Cγ0 yγ

(1− γ

(C1x

−2 − C2x−4 +O(x−6

))+

1

2γ(γ − 1)

(C1x

−2 − C2x−4 +O(x−6

))2+O

(x−6

))=Cγ0 y

γ

(1− γC1C−2γ0 y

−2γ(1− γC1x−2 +

(γC2 +

γ(γ − 1)2

C21

)x−4 +O

(x−6

))−2+

(γC2 +

γ(γ − 1)2

C21

)x−4 +O

(x−6

)).

Now we make a Taylor series expansion of (1− x)−2 in which case we obtain

x =Cγ0 yγ

(1− γC1C−2γ0 y

−2γ (1 + 2γC1x−2 +O (x−4))+

(γC2 +

γ(γ − 1)2

C21

)x−4 +O

(x−6

)).

If we substitute the right hand side into the place of x, then it follows that

x =Cγ0 yγ

(1− γC1C−2γ0 y

−2γ

+

(γC2 −

γ(3γ + 1)

2C21

)C−4γ0 y

−4γ +O(y−6γ

)).

So the tail quantile function can be written as

U(x) = Cγ0 xγ(1−D1x−2γ +D2x−4γ +O(x−6γ)

),

where D1 := γC1C−2γ0 , and D2 :=

(γC2 − γ(3γ+1)2 C

21

)C−4γ0 . We are now ready to verify that

the absolute T distribution satis�es the second order condition. We start with the expression

logU(xt)− logU(t)− γ log x = log(1−D1(xt)−2γ +D2(xt)−4γ +O(t−6γ)

)− log

(1−D1t−2γ +D2t−4γ +O(t−6γ)

).

By making a Taylor series expansion of log(1− x) around x = 0 we obtain

logU(xt)− logU(t)− γ log x =−D1(xt)−2γ +D2(xt)−4γ −1

2

(D1(xt)

−2γ −D2(xt)−4γ)2

+D1t−2γ −D2t−4γ +

1

2

(D1t

−2γ −D2t−4γ)2

+O(t−6γ)

=−D1t−2γ(x−2γ − 1

)+

(D2 −

1

2D21

)t−4γ

(x−4γ − 1

)+O(t−6γ) (2.6)

=−D1t−2γ(x−2γ − 1

)+O(t−4γ). (2.7)


From (2.7) we see that if we choose ρ = −2γ and b(t) of the form b(t) = −ρD1tρ(1 + o(1)),then the absolute T distribution satis�es the second order condition.

a �

We now return to the estimation of γ. The estimator of γ we will consider is based on a kernelstatistic with kernel function K. This statistic is given by

Tn,k(K) :=1

k

k∑j=1

K

(j

k + 1

)Zj , (2.8)

where Zj := j (logXn−j+1,n − logXn−j,n). This statistic will also serve as the basic buildingblock for the ρ estimator we propose in section 2.3. We need some conditions on the kernelfunction, but �rst we introduce the following notation

µ(K) :=

∫ 10K(u)du,

I1(K, ρ) :=

∫ 10K(u)u−ρdu,

σ2(K) :=

∫ 10K2(u)du.

With this notation the kernel function must satisfy

Assumption 2.2.4. Let K be a function de�ned on (0, 1) such that

(i) K(t) = 1t∫ t0 u(v)dv for some function u satisfying

∣∣∣∣(k + 1) ∫ jk+1j−1k+1

u(t)dt

∣∣∣∣ ≤ f ( jk+1) forsome positive continuous and integrable function f de�ned on (0, 1),

(ii) σ2(K) 0 and the log kernels Lδ(t) := (− log t)δ, δ > 0.

Lemma 2.2.6. The function K(t) := tτ (− log t)δ satis�es Assumption 2.2.4.

The proof of Lemma 2.2.6 can be found in Appendix 2.4.

a �


With Assumption 2.2.1 and Assumption 2.2.4 we are able to establish the following result.

Theorem 2.2.7. Let X1, . . . , Xn be i.i.d. random variables according to a distribution sat-isfying Assumption 2.2.1. If further Assumption 2.2.4 holds, then for k, n → ∞ such thatkn → 0 we have

Tn,k(K)D= γµ(K) + γσ(K)

Nk(K)√k

+ b(nk

)I1(K, ρ) (1 + oP(1)) , (2.9)

where Nk(K) is asymptotically a standard normal random variable.

A proof of this theorem is given in Goegebeur et al. (2010), we will however give an alternative

proof of the result.

Proof of Theorem 2.2.7. Let U1,n ≤ . . . ≤ Un,n be order statistics from a random sample ofsize n from the U(0, 1) distribution. By using the inverse probability integral transform we�nd that

Xi,nD= Q (Ui,n)

D= Q (1− Un−i+1,n)

= U

(1

Un−i+1,n

).

Since the Xi are of Pareto-type it follows that

Xi,nD=

(1

Un−i+1,n

)γlU

(1

Un−i+1,n

).

From this we get

logXi,nD= −γ logUn−i+1,n + log lU

(1

Un−i+1,n

).

Hence

logXn−j+1,n − logXn−k,nD= −γ log Uj,n

Uk+1,n+ log

lU

(Uk+1,nUj,n

1Uk+1,n

)lU

(1

Uk+1,n

) .Since

Uj,nUk+1,n

D= Vj,k, where Vj,k is the j'th order statistic in a random sample of size k from

the U(0, 1) distribution, it follows that

logXn−j+1,n − logXn−k,nD= −γ log Vj,k + log

lU

(1

Vj,k1

Uk+1,n

)lU

(1

Uk+1,n

)D= −γ log (1− Vk−j+1,k) + log

lU

(1

Vj,k1

Uk+1,n

)lU

(1

Uk+1,n

) .Using that the quantile function of the standard exponential distribution is Q(p) = − log(1−p), 0 < p < 1, and denoting by E1,n ≤ . . . ≤ En,n the order statistics of a random sample of


size n from the standard exponential distribution, we get using Assumption 2.2.1 and inspiredby Lemma 2.4.3, that

logXn−j+1,n − logXn−k,nD= γEk−j+1,k + b0

(1

Uk+1,n

) ( 1Vj,k

)ρ− 1

ρ+ b0

(1

Uk+1,n

)R̃n,k(j),

where R̃n,k(j) :=logU

(1

Uk+1,n

1Vj,k

)−logU

(1

Uk+1,n

)−γ log 1

Vj,k

b0

(1

Uk+1,n

) −(

1Vj,k

)ρ−1

ρ . Thus

Zj = j (logXn−j+1,n − logXn−j,n)

D= j

γEk−j+1,k − γEk−j,k + b0( 1Uk+1,n

) ( 1Vj,k

)ρ−(

1Vj+1,k

)ρρ

+ b0

(1

Uk+1,n

)Rn,k(j)

,(2.10)

where Rn,k(j) := R̃n,k(j) − R̃n,k(j + 1), with the convention R̃n,k(k + 1) := 0 and with b0 afunction satisfying b0(t) ∼ b(t) for t → ∞. Using the Rényi representation (Rényi, 1953) wecan express each Ej,k as

{Ej,k}j=1,...,kD=

{j∑

i=1

Ek−i+1k − i+ 1

}j=1,...,k

,

where the E1, . . . , Ek are independent random variables from a standard exponential distri-bution. Hence

Ek−j+1,k − Ek−j,kD=

k−j+1∑i=1

Ek−i+1k − i+ 1

−k−j∑i=1

Ek−i+1k − i+ 1

=Ejj. (2.11)

Combining (2.10) and (2.11) we �nd that

ZjD= γEj + b0

(1

Uk+1,n

)j

(1

Vj,k

)ρ−(

1Vj+1,k

)ρρ

+ b0

(1

Uk+1,n

)jRn,k(j).

Let Y1,k ≤ . . . ≤ Yk,k be order statistics of a random sample of size k from the standard strictPareto distribution. Then we have

1

Vj,k

D=

1

1− Vk−j+1,kD= Yk−j+1,k.

Using this we get that

ZjD= γEj + b0 (Yn−k,n) j

Y ρk−j+1,k − Yρk−j,k

ρ+ b0 (Yn−k,n) jRn,k(j).


Hence

Tn,k(K)D=1

k

k∑j=1

K

(j

k + 1

)(γEj + b0 (Yn−k,n) j

Y ρk−j+1,k − Yρk−j,k

ρ+ b0 (Yn−k,n) jRn,k(j)

)

=γ1

k

k∑j=1

K

(j

k + 1

)Ej + b0 (Yn−k,n)

1

k

k∑j=1

K

(j

k + 1

)jY ρk−j+1,k − Y

ρk−j,k

ρ

+ b0 (Yn−k,n)1

k

k∑j=1

K

(j

k + 1

)jRn,k(j)

= : T(1)n,k + T

(2)n,k + T

(3)n,k .

Using Assumption 2.2.4 (iii) we get for the �rst term that

T(1)n,k = γ

1

k

k∑j=1

K

(j

k + 1

)+ γ

1

k

k∑j=1

K

(j

k + 1

)(Ej − 1)

= γµ(K) + o

(1√k

)+ γσ(K)

Ñk(K)√k

, (2.12)

where Ñk(K) :=√k

1k

∑kj=1 K(

jk+1)(Ej−1)

σ(K) . The term Ñk(K) is according to Lemma 2.4.1 in

Appendix 2.4 an asymptotic standard normal random variable. In (2.12) we can combine the

o(

1√k

)with Ñk(K) to get

T(1)n,k = γµ(K) + γσ(K)

Nk(K)√k

,

where Nk(K) is again an asymptotic standard normal random variable.

Since Yi,kD= 11−Ui,k and the standard exponential distribution has quantile function Q(p) =

− log(1− p) it follows that T (2)n,k can be written as

T(2)n,k

D= b0 (Yn−k,n)

1

k

k∑j=1

K

(j

k + 1

)jexp (ρEk−j+1,n)− exp (ρEk−j,n)

ρ.

Using the mean value theorem we �nd that

T(2)n,k

D= b0 (Yn−k,n)

1

k

k∑j=1

K

(j

k + 1

)j (Ek−j+1,n − Ek−j,n) exp (ρQj,k) ,


where Qj,k is a random value between Ek−j,k and Ek−j+1,k, and hence

T(2)n,k

D=b0 (Yn−k,n)

1

k

k∑j=1

K

(j

k + 1

)Ej exp (ρQj,k)

=b0 (Yn−k,n)1

k

k∑j=1

K

(j

k + 1

)(j

k + 1

)−ρEj

+ b0 (Yn−k,n)1

k

k∑j=1

K

(j

k + 1

)Ej

(exp (ρQj,k)−

(j

k + 1

)−ρ)= : T

(2,1)n,k + T

(2,2)n,k .

Concerning the term T(2,1)n,k we get

T(2,1)n,k =b0 (Yn−k,n)

1

k

k∑j=1

K

(j

k + 1

)(j

k + 1

)−ρ

+ b0 (Yn−k,n)1

k

k∑j=1

K

(j

k + 1

)(j

k + 1

)−ρ(Ej − 1) ,

so by the law of large numbers it follows that

T(2,1)n,k = b0 (Yn−k,n) I1(K, ρ) (1 + oP(1)) .

We now turn to T(2,2)n,k . Note that for j = 1, . . . , k we have that

exp (Ek−j+1,k)D= exp (− log (1− Uk−j+1,k))D= exp (− log (Uj,k))

=1

Uj,k,

and hence∣∣∣∣∣exp (ρQj,k)−(

j

k + 1

)−ρ∣∣∣∣∣ ≤ max{∣∣∣∣∣exp (ρEk−j,k)−

(j

k + 1

)−ρ∣∣∣∣∣ ,∣∣∣∣∣exp (ρEk−j+1,k)−

(j

k + 1

)−ρ∣∣∣∣∣}

D= max

{∣∣∣∣∣U−ρj+1,k −(

j

k + 1

)−ρ∣∣∣∣∣ ,∣∣∣∣∣U−ρj,k −

(j

k + 1

)−ρ∣∣∣∣∣}

≤ max

{∣∣∣∣∣U−ρj+1,k −(j + 1

k + 1

)−ρ∣∣∣∣∣+ cj,k,∣∣∣∣∣U−ρj,k −

(j

k + 1

)−ρ∣∣∣∣∣},


where cj,k =(

j+1k+1

)−ρ−(

jk+1

)−ρ. From this it follows that∣∣∣∣∣∣1k

k∑j=1

K

(j

k + 1

)Ej

(exp (ρQj,k)−

(j

k + 1

)−ρ)∣∣∣∣∣∣≤ 1k

k∑j=1

∣∣∣∣K ( jk + 1)∣∣∣∣Ej

∣∣∣∣∣U−ρj+1,k −(j + 1

k + 1

)−ρ∣∣∣∣∣+ 1kk∑

j=1

∣∣∣∣K ( jk + 1)∣∣∣∣ cj,kEj

+1

k

k∑j=1

∣∣∣∣K ( jk + 1)∣∣∣∣Ej

∣∣∣∣∣U−ρj,k −(

j

k + 1

)−ρ∣∣∣∣∣=: T

(2,2,1)n,k + T

(2,2,2)n,k + T

(2,2,3)n,k .

According to Lemma 2.4.2 the terms T(2,2,1)n,k and T

(2,2,3)n,k are OP

(1√k

). Using the mean value

theorem we see that we can write the term T(2,2,2)n,k as

T(2,2,2)n,k =

|ρ|k + 1

1

k

k∑j=1

∣∣∣∣K ( jk + 1)∣∣∣∣ z|ρ|−1j,k Ej ,

where zj,k is a value betweenj

k+1 andj+1k+1 . When |ρ| ≥ 1 it follows that

T(2,2,2)n,k ≤

|ρ|k + 1

1

k

k∑j=1

∣∣∣∣K ( jk + 1)∣∣∣∣Ej ,

and hence by the law of large numbers it follows that T(2,2,2)n,k = OP

(1k

). When |ρ| < 1 we have

T(2,2,2)n,k ≤

|ρ|k + 1

1

k

k∑j=1

∣∣∣∣K ( jk + 1)∣∣∣∣ ( jk + 1

)|ρ|−1Ej ,

which by Assumption 2.2.4 (v) and the law of large numbers implies that T(2,2,2)n,k = OP

(1k

).

So

T(2)n,k = b0 (Yn−k,n) I1(K, ρ) (1 + oP(1)) .

Concerning the term T(3)n,k we �nd using Assumption 2.2.4 (i) that∣∣∣T (3)n,k∣∣∣ =

∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑

j=1

Rn,k(j)

∫ jk+1

0u(v)dv

∣∣∣∣∣∣=

∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑

j=1

Rn,k(j)

j∑i=1

∫ ik+1

i−1k+1

u(v)dv

∣∣∣∣∣∣=

∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑

i=1

∫ ik+1

i−1k+1

u(v)dv

k∑j=i

Rn,k(j)

∣∣∣∣∣∣≤ |b0 (Yn−k,n)|

1

k

k∑i=1

f

(i

k + 1

) ∣∣∣∣∣∣k∑

j=i

Rn,k(j)

∣∣∣∣∣∣ .


For the term∑k

j=iRn,k(j) it follows that

k∑j=i

Rn,k(j) =k∑

j=i

(R̃n,k(j)− R̃n,k(j + 1)

)= R̃n,k(i).

For δ, � > 0 there exists n0 such that for any n ≥ n0, with arbitrary large probability, fori = 1, . . . , k, ∣∣∣∣∣∣

k∑j=i

Rn,k(j)

∣∣∣∣∣∣ ≤ �(

1

Vi,k

)ρmax

((1

Vi,k

)δ,

(1

Vi,k

)−δ)= �V −ρ−δi,k ,

using Lemma 2.4.3. Hence

supi∈{1,...,k}

∣∣∣∣∣∑k

j=iRn,k(j)

V −ρ−δi,k

∣∣∣∣∣ = oP(1)leading to ∣∣∣T (3)n,k∣∣∣ ≤ b0 (Yn−k,n) oP(1)1k

k∑i=1

f

(i

k + 1

)(V −ρ−δi,k

),

which by Assumption 2.2.4 (i) and assuming δ < |ρ| is oP (b0 (Yn−k,n)). Combining the resultson T

(1)n,k , T

(2)n,k and T

(3)n,k establishes the result.

Using Theorem 2.2.7 we can create a class of estimators γ̂k(K) :=Tn,k(K)µ(K) for γ in the following

way

Proposition 2.2.8. Let X1, . . . , Xn be i.i.d. random variables according to a distributionsatisfying Assumption 2.2.1. If further Assumption 2.2.4 holds with µ(K) 6= 0, then fork, n→ ∞ such that kn → 0 and

√kb(nk

)→ λ for some constant λ we have

√k (γ̂k(K)− γ) → N

(λI1(K, ρ)

µ(K), γ2

σ2(K)

µ2(K)

). (2.13)

Proof. We have

√k (γ̂k(K)− γ)

D= γ

σ(K)

µ(K)Nk(K) +

√kb(nk

) I1(K, ρ)µ(K)

(1 + oP(1))

→ N(λI1(K, ρ)

µ(K), γ2

σ2(K)

µ2(K)

),

under the conditions of the Proposition.

We veri�ed in Lemma 2.2.6 that the kernel function K(t) = tτ (− log t)δ satis�es Assumption2.2.4. This allows us to construct consistent estimators which are asymptotically normal using

this kernel. We do so in Corollary 2.2.9.

Estimation of the second order parameter 27

Corollary 2.2.9. Let X1, . . . , Xn be i.i.d. random variables according to a distribution satis-fying Assumption 2.2.1. For k, n → ∞ such that kn → 0 and

√kb(nk

)→ λ for some constant

λ we have for the kernel function K(t) = tτ (− log t)δ, τ, δ ≥ 0 that

√k (γ̂k(K)− γ) → N

(λ

(τ + 1)δ+1

(τ − ρ+ 1)δ+1, γ2

Γ(2δ + 1)(τ + 1)2δ+2

(2τ + 1)2δ+1(Γ(δ + 1))2

).

In particular, we obtain

(i) For the Hill Kernel√k (γ̂k(H)− γ) → N

(λ

1

1− ρ, γ2).

(ii) For the Power kernel

√k (γ̂k(Kτ )− γ) → N

(λ

τ + 1

τ − ρ+ 1, γ2

(τ + 1)2

2τ + 1

).

(iii) For the Log kernel

√k (γ̂k(Lδ)− γ) → N

(λ

1

(1− ρ)δ+1, γ2

Γ(2δ + 1)

(Γ(δ + 1))2

).

A discussion on when to choose which kernel function is a topic of its own, so we will not

spend much time on it since it is not of great importance for this thesis. However, the Hill

kernel always has the smallest asymptotic variance. In general, the kernel function for which

the asymptotic mean squared error of the resulting γ estimator is minimal depends on thedistributional parameters γ and ρ. Concerning the log and power kernel with δ = τ , wesee that the log kernel tends to have a bigger variance than the power kernel, although it

su�ers from less bias. For a detailed discussion of the performance of γ estimators with kernelfunctions in the family K(t) = tτ (− log t)δ we refer to Gomes et al. (2007).

2.3 Estimation of the second order parameter

The estimation of the second order parameter in the univariate case is not of grave impor-

tance to this thesis. We will however in Chapter 4 construct estimators for the second order

parameter in the bivariate extreme value framework, which are based on the same ideas as is

used to construct the estimator for the second order parameter ρ. In order to construct anestimator for ρ we start with the basic building block Tn,k(K) de�ned in (2.8). By making aTaylor series expansion it follows by Theorem 2.2.7 that

Tαn,k(K)D= γαµα(K) + αγµα−1(K)σ(K)

Nk(K)√k

+ b(nk

)αγα−1µα−1(K)I1(K, ρ) (1 + oP(1)) ,

where α > 0 and K > 0. The basic idea is to construct a statistic which converges inprobability to a function of ρ, which does not depend on the unknown parameter γ. To this

28 Estimation of the second order parameter

end, let K1, . . . ,K8 be kernel functions and de�ne

K(1) := (K1,K2,K3,K4) ,

K(2) := (K5,K6,K7,K8) ,

K(1,2) :=(K(1),K(2)

),

Ī1 (Ki, ρ) :=I1 (Ki, ρ)

µ (Ki), i ∈ {1, . . . , 8} ,

Ī(a)1 (Ki,Kj , ρ) := Ī

a1 (Ki, ρ)− Īa1 (Kj , ρ) , a = 1, 2, i, j ∈ {1, . . . , 8} .

Using this notation, we consider the ratio of di�erences given by

Ψn,k

(K(1), α1, α2

):=

(Tn,k(K1)µ(K1)

)α1−(Tn,k(K2)µ(K2)

)α1(Tn,k(K3)µ(K3)

)α2−(Tn,k(K4)µ(K4)

)α2 (2.14)and the function

ψ(K(1), α1, α2, ρ

):= γα1−α2

α1Ī(1)1 (K1,K2, ρ)

α2Ī(1)1 (K3,K4, ρ)

,

with α1, α2 > 0.If k, n→ ∞ such that kn → 0 and

√kb(nk

)→ ∞, then(

Tn,k(K1)µ(K1)

)α1−(Tn,k(K2)µ(K2)

)α1b(nk

) P→ α1γα1−1Ī(1)1 (K1,K2, ρ)and (

Tn,k(K3)µ(K3)

)α1−(Tn,k(K4)µ(K4)

)α2b(nk

) P→ α2γα2−1Ī(1)1 (K3,K4, ρ) .Hence

Ψn,k

(K(1), α1, α2

)P→ ψ

(K(1), α1, α2, ρ

).

This statistic still depends on γ, but we can get rid of this if we consider a ratio of statisticson the form of (2.14) with appropriately chosen α parameters. So de�ne

Λn,k

(K(1,2), α1, α2, l

):=

Ψn,k

(K(1), α1, α1 + l

)Ψn,k

(K(2), α2, α2 + l

)and

Λ(K(1,2), α1, α2, l, ρ

):=

ψ(K(1), α1, α1 + l, ρ

)ψ(K(2), α2, α2 + l, ρ

)where l > 0. If we again assume that If k, n → ∞ such that kn → 0 and

√kb(nk

)→ ∞, then

clearly

Λn,k

(K(1,2), α1, α2, l

)P→ Λ

(K(1,2), α1, α2, l, ρ

),


which does not depend on γ. If the function ρ 7→ Λ(K(1,2), α1, α2, l, ρ

)is bijective, then we

obtain the estimator

ρ̂(K(1,2), α1, α2, l

):= Λ−1

(K(1,2), α1, α2, l,Λn,k

(K(1,2), α1, α2, l

))(2.15)

for the second order parameter. The consistency of this estimator is estblished in Proposition

2.3.1 using a straightforward application of the continuous mapping theorem.

Proposition 2.3.1. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables ac-cording to a distribution satisfying Assumption 2.2.1. Let K1, . . . ,K8 satisfy Assumption

2.2.4, and suppose Ī(1)1 (K1,K2), Ī

(1)1 (K3,K4), Ī

(1)1 (K5,K6) and Ī

(1)1 (K7,K8) are wellde-

�ned and nonzero. Then if k, n → ∞ such that kn → 0 and√kb(nk

)→ ∞ we have

Λn,k

(K

(1,2), α1, α2, l)

P→ Λ(K

(1,2), α1, α2, l, ρ). Further, if Λ is bijective and Λ−1 is con-

tinuous then ρ̂(K

(1,2), α1, α2, l)is a consistent estimator for ρ.

In order to establish asymptotic normality of the estimator of ρ, we need the following thirdorder condition.

Assumption 2.3.2 (Third order condition). There exists a positive real parameter γ, negativereal parameters ρ and β, functions b and b̃ with b(t) → 0 and b̃(t) → 0 for t → ∞, both ofconstant sign for large values of t, such that

limt→∞

logU(tx)−logU(t)−γ log xb(t) −

xρ−1ρ

b̃(t)=

1

β

(xρ+β − 1ρ+ β

− xρ − 1ρ

), ∀x > 0.

The third order condition implies that |b̃| is regularly varying of index β (de Haan and Ferreira,2006). The third order contion is not to restrictive. Among distributions of Pareto-type that

satisfy the second and third order condition are the Fréchet, the Burr, the GP distributions

and the absolute T distribution. This is not a complete list of Pareto-type distributions which

satisfy the second and third order condition. As examples, we show that the Burr and the

absolute T distribution satis�es the third order condition.

Example 2.3.3. In order to verify that the Burr distribution satis�es the third order con-

dition, it is a good idea to choose b(t) = γ tρ

1−tρ . From (2.4) and the choice of b(t) it followsthat

logU(tx)− logU(t)− γ log xb(t)

− xρ − 1ρ

=

γtρ(xρ−1)ρ −

12δ t

2ρ(x2ρ − 1

)+O

(t3ρ)

γ tρ

1−tρ− x

ρ − 1ρ

(2.16)

=− tρ (xρ − 1)

ρ+

1

2ρtρ(x2ρ − 1

)+O

(t2ρ)

(2.17)

=ρtρ1

ρ

(x2ρ − 1

2ρ− x

ρ − 1ρ

)+O(t2ρ). (2.18)

From (2.18) we see that if we choose β = ρ and b̃(t) = ρtρ(1+o(1)) then the Burr distributionsatis�es the third order condition.

30 Estimation of the second order parameter

Example 2.3.4. To verify that the absolute T distribution satis�es the third order condition,

it is a good idea to choose b(t) = − ρD1tρ

1+2(

D2D1

− 12D1

)tρ. With this choice of b(t) and (2.6) it follows

that

logU(xt)− logU(t)− γ log xb(t)

− xρ − 1ρ

=2

(D2D1

− 12D1

)tρ (xρ − 1)

ρ(2.19)

−(D2D1

− 12D1

)tρ(x2ρ − 1

)ρ

+O(t2ρ) (2.20)

=− 2ρ(D2D1

− 12D1

)tρ1

ρ

(x2ρ − 1

2ρ− (x

ρ − 1)ρ

)+O(t2ρ).

(2.21)

From this we see that if we choose β = ρ and b̃(t) on the form b̃(t) = −2ρ(D2D1

− 12D1)tρ(1 +

o(1)), then the absolute T distribution satis�es the third order condition.

We also have to add an extra condition on the kernel function.

Assumption 2.3.5. Let K be a fuction de�ned on (0, 1) such that Assumption 2.2.4 is sat-is�ed, and the following extra condition.

(vi) 1k∑k

j=1K(

jk+1

)(j

k+1

)−ρ= I1(K, ρ) + o

(1√k

), k → ∞.

Lemma 2.3.6. The kernel function considered in Example 2.2.5 given by K(t) := tτ (− log t)δalso satis�es Assumption 2.3.5

This result can easily be obtained from the proof of Assumption 2.2.4 (iii), and is hence

omitted.

Similar to the procedure in Theorem 2.2.7 we can make an asymptotic expansion of the statistic

in (2.8) using the third order condition.

Theorem 2.3.7. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables accordingto a distribution satisfying Assumption 2.3.2. If Assumption 2.3.5 holds, then for k, n → ∞such that kn → 0 we have

Tn,k(K)D=γµ(K) + γσ(K)

Nk(K)√k

+ b (Yn−k,n) I1(K, ρ) + b (Yn−k,n) σ̃(K, ρ)Pk(K, ρ)√

k

+ b (Yn−k,n) b̃ (Yn−k,n) I2(K, ρ, β) (1 + oP(1)) + b (Yn−k,n)OP

(1√k

),

where Nk(K) and Pk(K, ρ) are asymptotic standard normally distributed random variables.

We will not give a proof of this result, but the line of proof follows the same as the proof of

Theorem 2.2.7. The result in Theorem 2.3.7 can be used to obtain the asymptotic expansion

Tαn,k(K)D=γαµα(K) + αγαµα−1(K)σ(K)

Nk(K)√k

+ b (Yn−k,n)αγα−1µα−1(K)I1(K, ρ)

+ b (Yn−k,n) b̃ (Yn−k,n)αγα−1µα−1(K)I2(K, ρ, β) (1 + oP(1))

+ b2 (Yn−k,n)α(α− 1)

2γα−2µα−2(K)I21 (K, ρ) (1 + oP(1)) + b (Yn−k,n)OP

(1√k

)


Before we can present the limiting distribution of the ρ estimator presented in (2.15) we needto introduce the following notation, with i, j ∈ {1, . . . , 8}.

Ī2(K, ρ, β) :=I2 (K, ρ, β)

µ(K),

Ī2 (Ki,Kj , ρ, β) :=I2 (Ki, ρ, β)

µ(K)− I2 (Kj , ρ, β)

µ(K),

σ̄(K) :=σ(K)

µ(K),

Nk (Ki,Kj) := σ̄ (Ki)Nk (Ki)− σ̄ (Kj)Nk (Kj) ,

Nk

(K(1), α1, α2, γ, ρ

):=

α1γα1Nk (K1,K2)− ψ

(K(1), α1, α2, ρ

)α2γ

α2Nk (K3,K4)

α2γα2−1Ī(1)1 (K3,K4, ρ)

,

c1

(K(1), α1, α2, γ, ρ, β

):=

α1γα1−1Ī2 (K1,K2, ρ, β)− ψ

(K(1), α1, α2, ρ

)α2γ

α2−1Ī2 (K3,K4, ρ, β)

α2γα2−1Ī(1)1 (K3,K4, ρ)

,

c2

(K(1), α1, α2, γ, ρ

):=

α1 (α1 − 1) γα1−2Ī(2)1 (K1,K2, ρ)− ψ(K(1), α1, α2, ρ

)α2 (α2 − 1) γα2−2Ī(2)1 (K3,K4, ρ)

α2γα2−1Ī(1)1 (K3,K4, ρ)

,

Nk

(K(1,2), α1, α2, l, γ, ρ

):=

Nk(K(1), α1, α1 + l, γ, ρ

)− Λ

(K(1,2), α1, α2, l, γ, ρ

)Nk(K(2), α2, α2 + l, γ, ρ

)ψ(K(2), α2, α2 + l, ρ

) ,c1

(K(1,2), α1, α2, l, γ, ρ, β

):=

c1(K(1), α1, α1 + l, γ, ρ, β

)− Λ

(K(1,2), α1, α2, l, γ, ρ

)c1(K(2), α2, α2 + l, γ, ρ, β

)ψ(K(2), α2, α2 + l, ρ

) ,c2

(K(1,2), α1, α2, l, γ, ρ

):=

c2(K(1), α1, α1 + l, γ, ρ

)− Λ

(K(1,2), α1, α2, l, γ, ρ

)c2(K(2), α2, α2 + l, γ, ρ

)ψ(K(2), α2, α2 + l, ρ

) ,v2(K(1,2), α1, α2, l, γ, ρ

):= Var

(Nk

(K(1,2), α1, α2, l, γ, ρ

)).

With this notation we can obtain a result giving the asymptotic normality of our ρ estimator.

Proposition 2.3.8. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables ac-cording to a distribution satisfying Assumption 2.3.2. If the kernel functions K1, . . . ,K8satisfy Assumption 2.3.5 and are such that Ī

(1)1 (K1,K2, ρ), Ī

(1)1 (K3,K4, ρ), Ī

(1)1 (K5,K6, ρ)

and Ī(1)1 (K7,K8, ρ) are well de�ned and nonzero, then for k, n → ∞ such that kn → 0,√

kb(nk

)→ ∞,

√kb(nk

)b̃(nk

)→ λ1 and

√kb2(nk

)→ λ2 we have

√kb(nk

) [Λn,k

(K(1,2), α1, α2, l

)− Λ

(K(1,2), α1, α2, l, ρ

)]D→ N

(λ1c1

(K(1,2), α1, α2, l, γ, ρ, β

)+ λ2c2

(K(1,2), α1, α2, l, γ, ρ

), v2(K(1,2), α1, α2, l, γ, ρ

)).

32 Appendix

2.4 Appendix

2.4.1 Proof of Lemma 2.2.6

i)

Since K(t) = 1t tτ+1(− log t)δ it follows that∫ t

0u(v)dv = tτ+1(− log t)δ,

and hence

u(v) = (τ + 1)vτ (− log v)δ − δvτ (− log v)δ−1.

Now∣∣∣∣∣(k + 1)∫ j

k+1

j−1k+1

u(t)dt

∣∣∣∣∣ ≤(k + 1)(τ + 1)∫ j

k+1

j−1k+1

tτ (− log t)δdt+ (k + 1)δ∫ j

k+1

j−1k+1

tτ (− log t)δ−1dt

≤(k + 1)j

(τ + 1)

∫ jk+1

0(− log t)δdt+ (k + 1)δ

∫ jk+1

j−1k+1

(− log t)δ−1dt

We distinguish between the two cases δ > 1 and δ ≤ 1. We start with the case δ > 1. So∣∣∣∣∣(k + 1)∫ j

k+1

j−1k+1

u(t)dt

∣∣∣∣∣ ≤(k + 1)j (τ + 1)∫ j

k+1

0(− log t)δdt+ (k + 1)

jδ

∫ jk+1

0(− log t)δ−1dt

= : f

(j

k + 1

).

Next we show that∫ 10 f(x)dx 1.∫ 1

0f(x)dx = (τ + 1)

∫ 10

1

x

∫ x0(− log t)δdtdx+ δ

∫ 10

1

x

∫ x0(− log t)δ−1dtdx

= (τ + 1)

∫ 10(− log t)δ

∫ 1t

1

xdxdt+ δ

∫ 10(− log t)δ−1

∫ 1t

1

xdxdt

= (τ + 1)Γ(δ + 2) + δΓ(δ + 1)

Appendix 33

ii)

The second part is easily veri�ed using the following argument.

σ2(K) =

∫ 10K2(u)du

≤ Γ(2δ + 1)

34 Appendix

A similar argument shows that I12 = O((log(k+1))δ

k+1

).

Concerning the term I2 we �nd that

I2 ≤∫ ∞log(k+1)

zδe−zdz

=(log(k + 1))δ

k + 1+ δ

∫ ∞log(k+1)

zδ−1e−zdz

=(log(k + 1))δ

k + 1

(1 +

k + 1

(log(k + 1))δδ

∫ ∞log(k+1)

zδ−1e−zdz

).

If we can show that k+1(log(k+1))δ

δ∫∞log(k+1) z

δ−1e−zdz → 0 as k → ∞ then I2 = O((log(k))δ

k

).

Using l'Hôpital's rule and Leibniz's rule it follows that

limx→∞

δ∫∞log(x) z

δ−1e−zdz

(log(x))δ

x

= limx→∞

−δ(log(x))δ−1e− log(x) 1x(δ(log(x))δ−1−(log(x))δ

x2

)= lim

x→∞

−δδ − (log(x))

= 0

iv)

The fourth condition is trivially satis�ed since

maxj∈1,...,k

∣∣∣∣K ( jk + 1)∣∣∣∣ ≤ (log(k + 1))δ = o(√k)

v)

This condition is also trivially satis�ed since∫ 10

|K(u)|u|ρ|−1−�du =∫ 10uτ+|ρ|−1−�(− log u)δdu

=Γ(δ + 1)

(τ + |ρ| − �)δ+1

Appendix 35

where Ei are standard exponential random variables and K (u) , 0 < u < 1 is a kernelfuction. Furthermore, let

vk =

√√√√1k

k∑j=1

K2(

j

k + 1

). (2.23)

Then√k(Zk − 1k

∑kj=1K

(j

k+1

))vk

D→ N(0, 1) ⇔ max1≤j≤k

∣∣∣∣K ( jk + 1)∣∣∣∣ = o(√kvk) (2.24)

as k → ∞. If further we have

1

k

k∑j=1

K

(j

k + 1

)= µ(K) + o

(1√k

), vk → σ(K) > 0, (2.25)

µ(K) and σ(K) �nite, and

max1≤j≤k

∣∣∣∣K ( jk + 1)∣∣∣∣ = o(√k) , as k → ∞, (2.26)

then√k (Zk − µ(K))

σ(K)

D→ N(0, 1) (2.27)

as k → ∞

Lemma 2.4.2. (Goegebeur et al., 2010) Denote by E1, . . . , Ek standard exponential randomvariables and by U1,k ≤ · · · ≤ Uk,k the order statistics of a random sample of size k fromU(0, 1). Assume that

∫ 10 |K(u)| du 0, where γ is a real parameter. Then for all �, δ > 0 there is a t0 = t0(�, δ) suchthat for t, tx ≥ t0, ∣∣∣∣f(tx)− f(t)b0(t) − x

γ − 1γ

∣∣∣∣ ≤ �xγ max(xδ, x−δ) ,where

b0(t) :=

γf(t), γ > 0,−γ(f(∞)− f(t)), γ < 0,f(t)− t−1

∫ 10 f(s)ds, γ = 0

(2.29)

Chapter 3

Multivariate extreme value theory

In this chapter we introduce the basic limit laws in multivariate extreme value theory. After a

transformation of the marginal distribution functions to standard Fréchet margins, we discuss

the dependence structure between the variables. This discussion starts with the exponent

and spectral measure, before we turn our attention to the max domain of attraction in the

multivariate framework and asymptotic independence. This is followed by an introduction to

several other dependence measures. The measures we consider are the Pickands dependence

function and the pair of dependence measures χ and χ̄. We explain the relation betweenall these dependence measures and discuss ways of getting from one to the other. Finally

we introduce the model of Ledford and Tawn (1997) and make the connection between the

coe�cient of tail dependence η and the other dependence measures discussed previously.

3.1 Limit laws

The results we present in this section will be based on two-dimensional spaces. General-

izations to higher dimensional spaces are obvious, but require heavier notation. Suppose

(X1, Y1) , . . . , (Xn, Yn) are i.i.d. random vectors with distribution function FXY . We de�nethe maximum of a set of vectors of this form as

Mn := (max (X1, . . . , Xn) ,max (Y1, . . . , Yn)) ,

which is simply the vector of componentwise maxima. We start by deriving an important

theorem, which is the foundation of our description of the asymptotic distributions that can

occur for an appropriately normalized maximum of the form of Mn. Suppose there existssequences of constants (bn)

∞n=1, (dn)

∞n=1 and sequences of positive constants (an)

∞n=1, (cn)

∞n=1

and a distribution function G with nondegenerate marginals such that

limn→∞

P

(max (X1, . . . , Xn)− bn

an≤ x, max (Y1, . . . , Yn)− dn

cn≤ y

)= G(x, y) (3.1)

for all continuity points (x, y) of G. Any limit distribution function G in (3.1) with nonde-generate marginals is called a multivariate extreme value distribution. It follows that

limn→∞

P

(max (X1, . . . , Xn)− bn

an≤ x

)= G(x,∞)

36

Limit laws 37

and

limn→∞

P

(max (Y1, . . . , Yn)− dn

bn≤ y)

= G(∞, y),

since (3.1) implies convergence of the marginal distributions. According to Theorem 1.1.2 we

can chose the constants an, bn, cn and dn such that for some γ1, γ2 ∈ R, we have

G(x,∞) = exp(− (1 + γ1x)

− 1γ1

)(3.2)

and

G(∞, y) = exp(− (1 + γ2y)

− 1γ2

). (3.3)

It is relevant to note that G is continuous, since the two marginal distributions of G arecontinuous.

If we let FX and FY be the two marginal distributions of FXY and UX and UY be thetwo corresponding tail quantile functions, then according to Theorem 1.1.2 there are positive

functions aX(t) and aY (t), such that

limt→∞

UX(tx)− UX(t)aX(t)

=xγ1 − 1γ1

, ∀x > 0

and

limt→∞

UY (tx)− UY (t)aY (t)

=xγ2 − 1γ2

, ∀x > 0.

Hence

limn→∞

UX(nx)− bnan

=xγ1 − 1γ1

and

limn→∞

UY (nx)− dncn

=xγ2 − 1γ2

,

if we choose the constants an, bn, cn and dn according to Theorem 1.1.2.We easily see that (3.1) can be written as

G(x, y) = limn→∞

FnXY (anx+ bn, cny + dn) .

If xn → u and yn → v then by the continuity of G and the monotonicity of FXY we have that

G(u, v) = limn→∞

FnXY (anxn + bn, cnyn + dn) .

Applying this result with

xn :=UX(nx)− bn

an, x > 0

and

yn :=UY (ny)− dn

cn, y > 0

gives

G

(xγ1 − 1γ1

,yγ2 − 1γ2

)= lim

n→∞FnXY (U1(nx), U2(ny)) .

These results establish the following theorem.

38 The exponent measure and the spectral measure

Theorem 3.1.1. (de Haan and Ferreira, 2006) Let (X1, Y1) , . . . , (Xn, Yn) be i.i.d. ran-dom vectors with distribution function FXY . Suppose there exists sequences of real constants(bn)

∞n=1, (dn)

∞n=1 and positive real constants (an)

∞n=1 and (cn)

∞n=1 such that

limn→∞

FnXY (anx+ bn, cny + dn) = G(x, y)

for all (x, y) of G, and the marginals of G are standardized as in (3.2) and (3.3). Thenwith FX(x) := FXY (x,∞), FY (y) := FXY (∞, y) and UX and UY the two corresponding tailquantile functions, we have that

limn→∞

FnXY (UX(nx), UY (ny)) = G0(x, y) (3.4)

for all x, y > 0, where

G0(x, y) := G

(xγ1 − 1γ1

,yγ2 − 1γ2

)and γ1, γ2 are the marginal extreme value indices from (3.2) and (3.3).

Remark 3.1.2. The multivariate extreme value distribution function G(xγ1−1γ1

, yγ2−1γ2

)has

marginal distributions which are standard Fréchet, i.e. FZ(z) = exp(−1z), z > 0. This fact

simpli�es things, because now we only have to discuss the dependence structure between the

two variables.

The following Corollary is obtained from Theorem 3.1.1, which we state without proof. For

details we refer to de Haan and Ferreira (2006), Corollary 6.1.3 and Corollary 6.1.4

Corollary 3.1.3. (de Haan and Ferreira, 2006) Under the conditions of Theorem 3.1.1, we

have for any (x, y) for which 0 < G0(x, y) < 1, that

limn→∞

n {1− F : XY (UX(nx), UY (ny))} = − logG0(x, y) (3.5)

and

limt→∞

t {1− FXY (UX(tx), UY (ty))} = − logG0(x, y), (3.6)

where t runs through the real numbers.

3.2 The exponent measure and the spectral measure

From Corollary 3.1.3 we can obtain the following usefull theorem.

Theorem 3.2.1. (de Haan and Ferreira, 2006) Let FXY and G0 be distribution functionswhere for x, y > 0 with 0 < G0(x, y) < 1 we have that

limn→∞

n {1− FXY (UX(nx), UY (ny))} = − logG0(x, y),

where UX and UY are the tail quantile functions of the marginals of FXY . Then there are setfunctions ν, ν1, ν2, . . . de�ned for all Borel sets A ⊂ R2+ with

infx,y∈A

max(x, y) > 0

such that

The exponent measure and the spectral measure 39

(i)

νn{(s, t) ∈ R2+ : s > x or t > y

}= n {1− FXY (UX(nx), UY (ny))} , (3.7)

ν{(s, t) ∈ R2+ : s > x or t > y

}= − logG0(x, y). (3.8)

(ii) For all a > 0 the set functions ν, ν1, ν2, . . . are �nite measures on R2+\[0, a]2.

(iii) For each Borel set A ⊂ R2+ with infx,y∈Amax(x, y) > 0 and ν(∂A) = 0,

limn→∞

νn(A) = ν(A). (3.9)

De�nition 3.2.2. The measure ν from (3.8) is called the exponent measure of the extremevalue distribution G0, since

G0(x, y) = exp (−ν (Ax,y))

with

Ax,y :={(s, t) ∈ R2+ : s > x or t > y

}.

In the following we let ν(x, y) := ν (Ax,y)

An important property of the exponent measure, which will be needed later in this chapter,

is that it is homogeneous of order −1, as given in Theorem 3.2.3.

Theorem 3.2.3. (de Haan and Ferreira, 2006) For any Borel set A ⊂ R2+ with inf(x,y)∈Amax(x, y) >0 and ν(∂A) = 0, and any a > 0,

ν(aA) = a−1ν(A),

where aA is the set obtained by multiplying all elements of A by a.

From the exponent measure we can also obtain the spectral measure. The spectral measure

arises when we make a one-to-one transformation R2+\{(0, 0)} → (0,∞)× [0, c] for some c > 0,{r = r(x, y),d = d(x, y),

with the property that for all a, x, y > 0, we have{r(ax, ay) = ar(x, y),d(ax, ay) = d(x, y).

We can think of r as a radius and d as an angle or a direction. In this thesis we will onlyconsider the transformation {

r(x, y) = x+ y,d(x, y) = xx+y ,

in which case the following theorem can be shown to hold.

40 The exponent measure and the spectral measure

Theorem 3.2.4. (de Haan and Ferreira, 2006) For each limit distribution G from (3.1),(3.2) and (3.3) there exist a probability distribution (denoted by the distribution function H)concentrated on [0, 1] with mean 12 such that for x, y > 0,

G

(xγ1 − 1γ1

,yγ2 − 1γ2

)= G0(x, y)

= exp

(−2∫ 10

(ω

x∨ 1− ω

y

)dH(ω)

), (3.10)

where ωx ∨1−ωy := max

(ωx ,

1−ωy

).

From (3.10) we see that the limit distributions in (3.1) are characterized solely by the spectral

measure H and the marginal extreme value indices. Many more transformations than the onewe considered can be chosen in order to construct a spectral measure. In fact there are endless

possibilities. The transformation to choose depends on the situation at hand, and in a sense

they are all equivalent, since one can be transformed into the other.

From (3.8) and (3.10) we see that the connection between the exponent measure and the

spectral measure is given by

ν(x, y) = 2

∫ 10

(ω

x∨ 1− ω

y

)dH(ω).

However, it is not always obvious how to get from one measure to the other using this relation.

In case this is not obvious, and G0 is absolutely continuous, we can use a method discoveredby Coles and Tawn (1991), to compute the spectral density from the exponent measure. In

the bivariate case, the point masses of H on 0 and 1 are

H({0}) = −12limx→0

∂ν

∂y(x, y), (3.11)

H({1}) = −12limy→0

∂ν

∂x(x, y). (3.12)

and the density for 0 < ω < 1 is given by

h(ω) = −12

∂2ν(x, y)

∂x∂y

∣∣∣∣(ω,1−ω)

. (3.13)

Next we will consider some examples of spectral and exponent measures.

Example 3.2.5. We start by considering two important special cases of H. The �rst is thedistribution function which places a point mass of 1 on ω = 12 . In this case we obtain

G0(x, y) = exp(−max

(x−1, y−1

)), x, y > 0,

which corresponds to complete dependence between the two variables. Here G0 is not ab-solutely continuous, so the method discussed above does not apply. The second case is the

distribution function which places point mass of 12 on both ω = 0 and ω = 1. In this case itfollows that

G0(x, y) = exp(−(x−1 + y−1

)), x, y > 0,

which corresponds to independence between the two variables. Here G0 is absolutely contin-uous, though with a spectral measure putting masses of 12 at 0 and 1.a �

Domain of attraction and asymptotic independence 41

Example 3.2.6. The logistic model (Gumbel, 1960a,b) given by

ν(x, y) =(x−

1α + y−

1α

)α, x, y > 0, 0 < α < 1,

is the oldest parametric family of bivariate extreme value dependence structures. It is a

versatile model which covers all levels of dependence from independent variables to completely

dependent variables. We see that for α→ 0 we get

ν(x, y) = max(x−1, y−1

)and for α→ 1 it follows that

ν(x, y) = x−1 + y−1,

which corresponds to complete dependence and independence between the variables, respec-

tively. The logistic model does however not allow for asymmetry in the dependence structure,

as the variables are exchangeable.

From the exponent measure we can compute the point mass of H at 0

H({0}) = 12limx→0

y−1α−1(x−

1α + y−

1α

)α−1= 0,

using (3.11). Because of symmetry the point mass of H at 1 is also 0. The spectral densityon (0, 1) can be found using (3.13). We start by �nding

∂2ν(x, y)

∂x∂y= −1− α

αx−

1α−1y−

1α−1(x−

1α + y−

1α

)α−2.

From this we obtain the spectral density on (0, 1)

h(ω) =1

2

1− αα

ω−1α−1(1− ω)−

1α−1(ω−

1α + (1− ω)−

1α

)α−2.

a �

3.3 Domain of attraction and asymptotic independence

In order to discuss the domain of attraction in the multivariate case we �rst need to introduce

the concept of max stability.

De�nition 3.3.1. If there exists sequences of constants (bn)∞n=1, (dn)

∞n=1 and sequences of

positive constants (an)∞n=1 and (cn)

∞n=1 such that

Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R, ∀n ∈ N, (3.14)

for some distribution function G. Then G belongs to the class of max stable distributions.

With this de�nition we are now able to discuss the bivariate max domain of attraction.

De�nition 3.3.2. Let G : R2 → R+ be a max stable distribution function. A distributionfunction FXY is said to be in the max domain of attraction of G if there exists sequences ofconstants (bn)

∞n=1, (dn)


∞n=1 and (cn)

∞n=1 such that

limn→∞

FnXY (anx+ bn, cny + dn) = G(x, y) (3.15)

for all x, y ∈ R.

42 Domain of attraction and asymptotic independence

Our next proposition shows that the class of max stable distributions and the class of extreme

value distributions coincide.

Proposition 3.3.3. A distribution function G is max stable if and only if it is an extremevalue distribution.

Proof. Assume G is a max stable distribution. Then by De�nition 3.3.1 there exists sequencesof constants (bn)

∞n=1, (dn)


∞n=1 and (cn)

∞n=1 such

that

Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R, ∀n ∈ N.

Since

limn→∞

Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R,

it follows by Theorem 3.1.1 that G is an extreme value distribution.Now, assume that G is an extreme value distribution. We can without loss of generalityassume that G is on the same form as G0 de�ned in Theorem 3.1.1. By De�nition 3.2.2 andTheorem 3.2.3, it follows that

Gn(nx, ny) = exp(−nν(nAx,y)), ∀x, y ∈ R, ∀n ∈ N= exp(−ν(Ax,y))= G(x, y).

So G satis�es De�nition 3.3.1 with an = cn = n and bn = dn = 0, and is hence a max stabledistribution.

Next we present a theorem which gives some equivalent formulations of the max domain of

attraction condition.

Theorem 3.3.4. (de Haan and Ferreira, 2006) Let G be a max stable distribution. Let the

marginal distribution functions be exp(− (1 + γ1x)

− 1γ1

)and exp

(− (1 + γ2y)

− 1γ2

), and let

H be its spectral measure according to the representation of Theorem 3.2.4. Then

(i) If the distribution function FXY of the random vector (X,Y ) with continuous marginaldistribution functions FX and FY is in the max domain of attraction of G, then thefollowing equivalent conditions are ful�lled:

(a) With UX and UY being the tail quantile functions of FX and FY , we have forx, y > 0, that

limt→∞

1− FXY (UX(tx), UY (tx))1− FXY (UX(t), UY (t))

= S(x, y) (3.16)

with S(x, y) :=logG

(xγ1−1

γ1, y

γ2−1γ2

)logG(0,0) .

(b) For all r > 1 and all s ∈ [0, 1] that are continuity points of H,

limt→∞

P

(V +W > rt and

V

V +W≤ s∣∣∣∣V +W > t) = r−1H(s), (3.17)

where V := 11−FX(X) and W :=1

1−FY (Y )

Domain of attraction and asymptotic independence 43

(ii) Conversely, if the continuous marginal distribution functions FX and FY are in the do-

main of attraction of exp(− (1 + γ1x)

− 1γ1

)and exp

(− (1 + γ2y)

− 1γ2

), respectively, and

any limit relation (3.16)-(3.17) holds for some positive function S or some distributionfunction H, then FXY is in the max domain of attraction of G.

We saw in Example 3.2.5 that there exists a special case of the spectral measure, where the

max stable distribution has independent components. This gives inspiration to the following

de�nition.

De�nition 3.3.5. A random vector (X,Y ) whose distribution function FXY is in the domainof attraction of a max stable distribution with independent components, is said to have the

property of asymptotic independence.

From this de�nition we are able to obtain the following theorem.

Theorem 3.3.6. (de Haan and Ferreira, 2006) Let FXY : R2 → R+ be a probability distribu-tion function. Suppose that its marginal distribution functions FX : R → R+ and FY : R → R+satisfy

limn→∞

FnX (anx+ bn) = exp(− (1 + γ1x)

− 1γ1

)and

limn→∞

FnY (cny + dn) = exp(− (1 + γ2y)

− 1γ2

)for all x, y for which 1 + γ1x > 0, 1 + γ2y > 0 and where (bn)

∞n=1, (dn)

∞n=1 are sequences of

real constants and (an)∞n=1 and (cn)

∞n=1 are sequences of positive real constants. Let (X,Y ) be

a random vector with distribution function FXY . If

limt→∞

P (X > UX(t), Y > UY (t))

P (Y > UY (t))= 0, (3.18)

then

limn→∞

FnXY (anx+ bn, cny + dn) = exp(− (1 + γ1x)

− 1γ1 − (1 + γ2y)

− 1γ2

)for 1 + γ1x > 0 and 1 + γ2y > 0. Hence X and Y are asymptotically independent.Conversely, asymptotic independence entails (3.18).

Proof. Assume (3.18) holds. Then also

limt→∞

tP (X > UX(t), Y > UY (t))

tP (Y > UY (t))= 0.

Using Theorem 1.1.2 (i) and (iii) with x = 0 we �nd that

limt→∞

tP (Y > UY (t)) = 1, (3.19)

and hence

limt→∞

tP (X > UX(t), Y > UY (t)) = 0.

Because of monotonicity, it follows that

limt→∞

tP (X > UX(tx), Y > UY (ty)) = 0, ∀x, y > 0,

44 Pickands dependence function

and then also for the set Ãx,y :={(s, t) ∈ R2+ : s > x and t > y

}we have

ν(Ãx,y

)= lim

n→∞νn

(Ãx,y

)= lim

n→∞nP (X > UX(tx), Y > UY (ny))

= 0, ∀x, y > 0.

This means that the spectral measure puts its entire mass on the lines x = 0 and y = 0, i.e.

H[{0}] = 12

and H[{1}] = 12.

This is equivalent to X and Y being asymptotically independent.Conversely, assume that X and Y are asymptotically independent. Then

G0(x, y) = exp(−x−1 − y−1

), x, y > 0,

and hence for x = y = 1 we have

G0(1, 1) = exp (−2) .

Using Corollary 3.1.3, this implies that

2 = limt→∞

t (1− P (X ≤ UX(t), Y ≤ UY (t))) (3.20)

= limt→∞

t (P (X > UX(t)) + P (Y > UY (t))− P (X > UX(t), Y > UY (t))) . (3.21)

From Theorem 1.1.2 (i) and (iii) it follows that

limt→∞

tP (X > UX(t), Y > UY (t)) = 0,

and hence by (3.19), we have that

limt→∞

P (X > UX(t), Y > UY (t))

P (Y > UY (t))= 0.

3.4 Pickands dependence function

Whereas the dependence measures we have discussed previously have straightforward general-

izations from the bivariate case to the multidimensional case, this is not true for the following

dependence measure. This is strictly a bivariate dependence measure. The dependence mea-

sure we are going to discuss is related to the function L : R2+ → R given by

L(x, y) := − logG0(1

x,1

y

). (3.22)

This can also be expressed in terms of the exponent measure as

L(x, y) = ν

{(s, t) ∈ R2+ : s >

1

xor t >

1

y

}

Pickands dependence function 45

using (3.8), or in terms of the spectral measure as

L(x, y) = 2

∫ 10

(ωx ∨ (1− ω)y) dH(ω) (3.23)

using (3.10). The function L has the following properties. These are easy to derive from theproperties of the exponent and spectral measure and will therefore for brevity not be proven

here.

Proposition 3.4.1. (de Haan and Ferreira, 2006) Let L be as de�ned in (3.22). Then L hasthe following properties.

(i) Homogeneity of order 1: L(ax, ay) = aL(x, y), for all a, x, y > 0.

(ii) L(x, 0) = L(0, x) = x, for all x > 0.

(iii) x ∨ y ≤ L(x, y) ≤ x+ y, for all x, y > 0.

(iv) Let (X,Y ) be a random vector with distribution function G0(x, y). If X and Y areindependent, then L(x, y) = x + y, for x, y > 0. If X and Y are completely dependent,then L(x, y) = x ∨ y for x, y > 0.

(v) L is continuous.

(vi) L(x, y) is a convex function: L (λ (x1, y1) + (1− λ) (x2, y2)) ≤ λL (x1, y1)+(1− λ)L (x2, y2)for all x1, x2, y1, y2 > 0 and λ ∈ [0, 1].

From the function L we can obtain the Pickands dependence function A : [0, 1] → R introducedin Pickands (1981). This function is given by

A(t) := − logG0(

1

1− t,1

t

)= L(1− t, t). (3.24)

If we let t = yx+y we easily �nd that

L(x, y) = (x+ y)A

(y

x+ y

),

and hence Pickands dependence function completely determines the function L.Pickands dependence function can easily be connected to the spectral measure through the

function L. If we combine (3.23) and (3.24) we get

A(t) = 2

∫[0,1]

(ω(1− t) ∨ (1− ω)t)dH(ω)

= 2t

∫[0,t]

(1− ω)dH(ω) + 2(1− t)∫(t,1]

ωdH(ω).

Since H has mean 12 we have that∫[0,1] ωdH(ω) =

∫[0,1](1−ω)dH(ω) =

12 . Using this it follows

that ∫(t,1]

ωdH(ω) =1

2−H([0, t]) +

∫[0,t]

(1− ω)dH(ω).

46 Pickands dependence function

Hence

A(t) = 2

∫[0,t]

(1− ω)dH(ω) + (1− t) (1− 2H([0, t])) .

The term∫[0,t](1− ω)dH(ω) can also be written as∫

[0,t](1− ω)dH(ω) =

∫[0,t]

∫[ω,1]

dudH(ω)

=

∫[0,1]

∫[0,u∧t]

dH(ω)du

=

∫[0,t]

∫[0,u]

dH(ω)du+

∫(t,1]

∫[

estimation of tail dependence with application to twin data · 2014. 4. 30. · the twin data we...

Documents