estimation of tail dependence with application to twin data · 2014. 4. 30. · the twin data we...
TRANSCRIPT
-
Estimation of tail
dependence with
application to twin data
Master thesis by
Michael Osmann
May 21, 2012
Supervisors: Yuri Goegebeur and Jacob Hjelmborg
-
Contents
Abstract 4
Acknowledgements 4
1 Preliminaries 5
1.1 Classical convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The Gumbel class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 The extremal Weibull class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Estimation of the extreme value index in practice . . . . . . . . . . . . . . . . . 11
2 Pareto-type distributions 14
2.1 Domain of attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Estimation of the extreme value index . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Estimation of the second order parameter . . . . . . . . . . . . . . . . . . . . . 27
2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Proof of Lemma 2.2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.2 Lemma's needed in the proof of Theorem 2.2.7 . . . . . . . . . . . . . . 34
3 Multivariate extreme value theory 36
3.1 Limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The exponent measure and the spectral measure . . . . . . . . . . . . . . . . . 38
3.3 Domain of attraction and asymptotic independence . . . . . . . . . . . . . . . . 41
3.4 Pickands dependence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 The dependence measures χ and χ̄ . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 The model of Ledford and Tawn . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2
-
CONTENTS 3
4 Estimation of the coe�cient of tail dependence and the second order pa-
rameter in bivariate extreme value statistics 57
4.1 Estimation of the coe�cient of tail dependence . . . . . . . . . . . . . . . . . . 57
4.2 Estimation of the second order parameter . . . . . . . . . . . . . . . . . . . . . 62
4.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.1 Proof of Lemma 4.1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 Simulation study 68
5.1 Copula examples and simulation of data . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Estimation of the second order parameter τ . . . . . . . . . . . . . . . . . . . . 73
5.3 Estimation of the �rst order parameter η . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Estimation of the dependence measures χ and χ̄ . . . . . . . . . . . . . . . . . . 74
6 Estimation of taildependence in BMI twindata 97
6.1 Description of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Univariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Multivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Epilogue 111
Bibliography 113
-
4 CONTENTS
Abstract
This master thesis consists of a theoretical discussion on univariate and bivariate extreme value
statistics along with an application to twin data. We �rst discuss the fundamental conver-
gence results from extreme value theory, which we use to construct the traditional maximum
likelihood estimators of the extreme value index. In order to put our work into the proper
framework, attention is paid to the three classes of extreme value distributions situated within
the max domain of attraction of the generalized extreme value distribution. Special attention
is given to the class of Pareto type distributions, since the methodology of how to construct
estimators in the multivariate setting resembles the methodology used to construct estima-
tors within the class of Pareto-type distributions. For the class of Pareto-type distributions
we propose an estimator of the extreme value index and an estimator for the second order
parameter. For both of these estimators we establish the asymptotic normality.
In the multivariate setting we start by discussing the transformation of the margins to stan-
dard Fréchet distributions and the fundamental convergence results. We discuss the domain of
attraction to the bivariate extreme value distribution and asymptotic dependence and asymp-
totic independence. We discuss furthermore the exponent measure, the spectral measure,
Pickands dependence function, the dependence measures χ and χ̄, and �nally the coe�cientof tail dependence. The interpretations of these measures are discussed and we show how they
are all connected. For the coe�cient of tail dependence we introduce a functional estimator,
for which we show how it can be bias corrected. This bias correction requires estimation of
the second order parameter τ , so we propose two estimators that can be used to estimate thissecond order parameter. The consistency of the estimators for the second order parameter are
established. We examine the �nite sample size behaviour of our estimator for the coe�cient
of tail dependence, the estimators of the second order condition and estimators of χ and χ̄using simulations.
The twin data we consider is from the older cohort of the Finnish Twin Cohort Study. For this
data we make a full univariate data analysis and estimate the coe�cient of tail dependence,
the second order parameter τ , and the measures χ and χ̄ for age and sex de�ned subsets ofthe data.
Throughout the thesis, results that are from the literature are stated with a reference, while
results that are our own are not stated with a reference.
Acknowledgements
I would like to thank my two supervisors Yuri Goegebeur and Jacob v. B. Hjelmborg for
helping me write this master thesis during the last 8 months. I would not have been ableto write this thesis without their help, and they have both spend a lot of time and e�ort on
this. I am gratefull that they decided to join forces and help me write a thesis with such an
interesting topic.
-
Chapter 1
Preliminaries
This chapter serves to give a short introduction to some of the basic concepts in univariate
extreme value statistics. First we will introduce a convergence result which is the foundation
of univariate extreme value statistics. It states what form the limiting distribution of a nor-
malized maximum will follow, if it exists. We will then describe shortly two of the classes of
extreme value distributions, known as the Gumbel and extremal Weibull families, respectively.
Finally, we discuss some simple ways in which the extreme value index can be estimated in
practice.
1.1 Classical convergence result
In the following we will consider a sample {Xi, 1 ≤ i ≤ n} of independent and identicallydistributed (i.i.d.) random variables having a distribution function FX . In extreme valuestatistics we consider either the maximum or the minimum of the random sample, where the
maximum is given by
Xn,n := max{X1, X2, . . . , Xn}.
We will try to describe the statistical behaviour of this maximum, but it is easy to transform
any result we obtain for the maximum to the minimum because of the relation
X1,n := min{X1, X2, . . . , Xn} = −max{−X1,−X2, . . . ,−Xn}. (1.1)
Because of the i.i.d. nature of X1, . . . , Xn, the distribution of Xn,n can be derived exactly forall possible values of n as follows
FXn,n(x) = P (Xn,n ≤ x)= P (X1 ≤ x,X2 ≤ x, . . .Xn ≤ x)= P (X1 ≤ x)P (X2 ≤ x) · · ·P (Xn ≤ x)= (FX(x))
n .
For practical purposes this relation does not help much though, since the distribution of
FX is usually unknown. One could try to estimate the distribution of FX and use this toestimate FXn,n , but small deviations in the estimation of FX can lead to large deviations inthe estimation of FXn,n . Instead we will look for approximate families of FXn,n which for large
5
-
6 Classical convergence result
n can be estimated by use of the extreme data only.We look at the behaviour of FXn,n as n approaches in�nity. If we denote the right endpointof FX as x∗, which means that x∗ := inf{x : FX(x) = 1}, then for any x < x∗ we have thatFnX(x) → 0 as n→ ∞. So the distribution of Xn,n is degenerate in the limit. This degeneracycan possibly be avoided if we look at an appropriate normalization, for instance
Xn,n − bnan
where (bn)∞n=1 is a sequence of constants and (an)
∞n=1 is a sequence of positive constants.
Appropriate choices of (an)∞n=1 and (bn)
∞n=1 can stabilize the location and scale of
Xn,n−bnan
. It
can be shown that the entire range of limit distributions ofXn,n−bn
an, if they exist, is given by
Theorem 1.1.1.
Theorem 1.1.1. (Fisher and Tippet, 1928; Gnedenko, 1943) Let X1, . . . , Xn be i.i.d. randomvariables with distribution function FX . If there exists sequences of constants (bn)
∞n=1 and
positive constants (an)∞n=1 such that
limn→∞
P
(Xn,n − bn
an≤ x
)= lim
n→∞FnX (anx+ bn) = G(x) (1.2)
at all continuity points of G, where G is a non degenerate distribution function, then G shouldbe of the following type
Gγ(x) = exp(−(1 + γx)−
1γ
), 1 + γx > 0, (1.3)
with γ real and where for γ = 0 the right-hand side is interpreted as exp (−e−x).
This family of distribution functions is known as the generalized extreme value (GEV) family,
for which the parameter γ is the shape parameter. This parameter is also called the extremevalue index and it describes the tail behaviour of FX , with larger values indicating heaviertails. The family consists of three classes known as the Gumbel, Fréchet and extremal Weibull
families which correspond to γ = 0, γ > 0 and γ < 0 respectively. The Fréchet class is alsoknown as the class of Pareto-type models. If the distribution FX satis�es (1.2)-(1.3) then wesay that it belongs to the max domain of attraction of Gγ , denoted FX ∈ D(Gγ).The result in Theorem 1.1.1 has some equivalent formulations. Some of these formulations
are based on the tail quantile function U(y) := Q(1− 1y
), y > 1, where Q is the quantile
function, de�ned as Q(p) := inf{x : FX(x) ≥ p}, p ∈ (0, 1). These equivalent formulationsare stated in Theorem 1.1.2.
Theorem 1.1.2. (Gnedenko, 1943; de Haan and Ferreira, 2006) Let X1, . . . , Xn be i.i.d.random variables with distribution function FX . For γ ∈ R the following statements areequivalent:
(i) There exists sequences of real constants (bn)∞n=1 and positive real constants (an)
∞n=1 such
that
limn→∞
FnX (anx+ bn) = Gγ(x) = exp(−(1 + γx)−
1γ
), (1.4)
for all x with 1 + γx > 0.
-
Classical convergence result 7
(ii) There is a positive function a such that for all x > 0,
limt→∞
U(tx)− U(t)a(t)
=xγ − 1γ
, (1.5)
where for γ = 0 the right-hand side is interpreted as log x.
(iii) There is a positive function a such that
limt→∞
t(1− FX(a(t)x+ U(t))) = (1 + γx)−1γ , (1.6)
for all x with 1 + γx > 0.
(iv) There exists a positive function f such that
limt↑x∗
1− FX(t+ xf(t))1− FX(t)
= (1 + γx)− 1
γ (1.7)
for all x for which 1 + γx > 0.
Moreover, (1.4) holds with bn := U(n) and an := a(n). Also (1.7) holds with f(t) =
a(
11−FX(t)
).
As seen in Theorem 1.1.2 the choice of the normalizing constant bn does not depend on thesign of γ and can be shown to always work, if we choose bn = U(n). The choice of an dependson whether we are dealing with γ positive, negative or equal to zero, so we will address thisin the sections dedicated to the corresponding classes.
In order to discuss the extremal Weibull and Fréchet classes, we need the concept of a slowly
varying function. Slowly varying functions are special cases of regularly varying functions, so
we will give the de�nition of what it means to be of regular variation. The regularly varying
functions will also be needed later in this thesis.
De�nition 1.1.3. (Beirlant et al., 2004, De�nition 2.1) Let f be an ultimately positive andmeasurable function on R+. We say that f is regularly varying at in�nity if there exists a realconstant ρ for which
limx→∞
f(λx)
f(x)= λρ for all λ > 0.
We write f ∈ Rρ and we call ρ the the index of regular variation. In the case ρ = 0, thefunction will be called slowly varying or of slow variation. We will reserve the symbol l forsuch functions. The class of all regularly varying functions is denoted by R.
The next two sections will be dedicated to the Gumbel and the extremal Weibull class, while
the Fréchet class which is of more importance for this thesis, will be discussed in the next
chapter.
-
8 The Gumbel class
1.2 The Gumbel class
The Gumbel class corresponds with the max domain of attraction of Gγ with γ = 0. Thefollowing proposition provides a characterization of the distributions that belong to this class.
Proposition 1.2.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ �nite or in�nite and a, f suitable positive functions, that
FX ∈ D(G0) ⇔ limt↑x∗
1− FX(t+ xf(t))1− FX(t)
= exp(−x), x ∈ R (1.8)
⇔ limt→∞
U(tx)− U(t)a(t)
= log(x), x > 0. (1.9)
For the Fréchet and extremal Weibull classes it is easy to show that the distributions belonging
to those classes satisfy (1.5), but this is not the case for the Gumbel class. This also meansthat determining the scaling parameter an for the distributions in the Gumbel class is moredi�cult. It can however be determined by the formula
an = n
∫ x∗U(n)
(1− FX(y)) dy.
We will not derive this formula, but simply take it as a fact. For details we refer to de Haan
and Ferreira (2006), Corollary 1.2.4.
Example 1.2.2. If we want to determine the parameters an and bn for the exp(1) distributionwith distribution function FX(x) = 1−exp(−x), x > 0, then we must �rst �nd the tail quantiledistribution of the exponential distribution. The distribution function has quantile function
Q(p) = − ln(1− p), 0 < p < 1. So
U(x) = Q
(1− 1
x
)= log(x), x > 1.
This means bn can be chosen asbn = U(n) = log(n)
and an can be chosen as
an = n
∫ ∞log(n)
exp(−x)dx = n exp (− log(n)) = 1.
Since we know the constants an and bn we can also show that the exponential distributionbelongs to the max domain of attraction of the Gumbel class. Indeed
P
(Xn,n − bn
an≤ x
)= FnX(anx+ bn)
= FnX (x+ log(n))
= (1− exp (−x− log(n)))n
=
(1− exp(−x)
n
)n→ exp (− exp (−x)) for n→ ∞.
-
The extremal Weibull class 9
The convergence of FnX (anx+ bn) to G(x) is shown in Figure 1.1. The solid line is G(x), thedashed line is for n = 2, the dotted line is for n = 5 and the dashed dotted line is for n = 10.It is clearly seen that when n grows then FnX (anx+ bn) converges pointwise to G(x).
−2 −1 0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
x
p
Figure 1.1: The convergence of FnX (anx+ bn) to G(x) for the standard exponential distribu-tion.
�
1.3 The extremal Weibull class
The extremal Weibull class corresponds with the max domain of attraction of Gγ with γ < 0.As was the case for the Gumbel class, we have a proposition which provides a characterization
of the distributions that belong to this class.
Proposition 1.3.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ �nite that
FX ∈ D(Gγ), γ < 0 ⇔ 1− FX(x∗ −
1
x
)= x
1γ lFX (x), x > 0 (1.10)
⇔ U(x) = x∗ − xγlU (x), x > 1, (1.11)
where lU (x) and lFX (x) are slowly varying at in�nity.
-
10 The extremal Weibull class
From (1.11) it is easily seen that (1.5) is satis�ed when t tends to in�nity. Indeed
U(tx)− U(t)a(t)
=x∗ − (tx)γlU (tx)− (x∗ − tγlU (t))
a(t)
=tγlU (t)
a(t)
(1− xγ lU (tx)
lU (t)
)∼ −γ t
γlU (t)
a(t)
xγ − 1γ
∼ xγ − 1γ
if we choose a(t) such that a(t)x∗−U(t) → −γ. This indicates that a good choice of an would be
an = a(n) = −γ(x∗ − U(n)) = −γnγlU (n).
Example 1.3.2. The reversed Burr distribution has distribution function given by
FX(x) = 1−(
ζ
ζ + (1− x)−δ
)λ, x < 1;λ, ζ, δ > 0
and so the quantile function is
Q(p) = 1− ζ−1δ
((1− p)−
1λ − 1
)− 1δ, 0 < p < 1.
So we �nd the tail quantile function U to be
U(x) = Q
(1− 1
x
)= 1− ζ−
1δ
(x
1λ − 1
)− 1δ, x > 1.
The distribution belongs to the max domain of attraction of Gγ with γ = − 1λδ . If we considerthe reversed Burr distribution with parameters λ = ζ = δ = 1, then we can choose thenormalizing constant bn as
bn = U(n) = 1− (n− 1)−1 .Since x∗ = 1 and γ = −1 we can choose the normalizing constant an as
an = 1− U(n) = (n− 1)−1 .
With these normalizing constants we can show that the reversed Burr distribution with pa-
rameters λ = ζ = δ = 1 belongs to the max domain of attraction of the Weibull class. Indeed
P
(Xn,n − bn
an≤ x
)= FnX (anx+ bn)
= FnX((x− 1)(n− 1)−1 + 1
)=
1− 11 +
(n−11−x
)n
=
(1− 1− x
n− x
)n→ exp(−(1− x)) for n→ ∞.
-
Estimation of the extreme value index in practice 11
The convergence of the reversed Burr distribution to its limit is illustrated in Figure 1.2. The
solid line is G(x), the dashed line is for n = 2, the dotted line is for n = 5, while the dasheddotted line is for n = 10. It is clearly seen that when n grows, then FnX (anx+ bn) convergespointwise to G(x).
−5 −4 −3 −2 −1 0 1
0.0
0.2
0.4
0.6
0.8
1.0
x
p
Figure 1.2: The convergence of FnX (anx+ bn) to G(x) for the reversed Burr distribution withλ = ζ = δ = 1.
�
1.4 Estimation of the extreme value index in practice
In practise we do not know the constants an and bn, so Theorem 1.1.1 is not very usefull ifwe want to estimate γ. However, if we for some �nite n ∈ N have that
P
(Xn,n − bn
an≤ x
)≈ exp
(−(1 + γx)−
1γ
), 1 + γx > 0,
then
P (Xn,n ≤ z) ≈ exp
(−(1 + γ
z − bnan
)− 1γ
), 1 + γ
z − bnan
> 0,
where z = bn + anx. If we let µ = bn and σ = an, then we are left with the model
P (Xn,n ≤ z) ≈ exp
(−(1 + γ
z − µσ
)− 1γ
), 1 + γ
z − µσ
> 0. (1.12)
With this model we can easily obtain maximum likelihood estimates of µ, σ and γ. To dothis, we divide the data into m blocks and de�ne z1, . . . , zm to be the block maxima of the
-
12 Estimation of the extreme value index in practice
m blocks. Under the assumption that Z1, . . . , Zm are independent variables having the GEVdistribution we get from (1.12) that the log likelihood is given by
logL(µ, σ, γ) = −m log σ −(1 +
1
γ
) m∑i=1
log
(1 + γ
zi − µσ
)−
m∑i=1
(1 + γ
zi − µσ
)− 1γ
.
(1.13)
The maximum likelihood estimates are then obtained by maximizing (1.13) with respect to
µ, σ and γ.Another popular model is the peaks over threshold model (POT). This model can be derived
using Theorem 1.1.2. If we assume that (1.4) is satis�ed, then there exists a positive function
f such that
limt↑x∗
P
(X − tf(t)
> x
∣∣∣∣X > t) = limt↑x∗ 1− FX(t+ f(t)x)1− FX(t) , x > 0= (1 + γx)
− 1γ , 1 + γx > 0.
For t large, we thus have
P
(X − tf(t)
> x
∣∣∣∣X > t) ≈ (1 + γx)− 1γ , x > 0 and 1 + γx > 0,which reduces to
P (X − t > z|X > t) ≈(1 + γ
z
σ
)− 1γ, z > 0 and 1 + γ
z
σ> 0, (1.14)
if we set z = f(t)x and f(t) = σ. From this we are able to get maximum likelihood estimatesof γ and σ when we choose a threshold t. If we let z1, . . . , zk denote the k observations whichare greater than the threshold t, then we obtain the log likelihood function from (1.14). Thelog likelihood is given by
logL(σ, γ) = −k log σ −(1 +
1
γ
) k∑i=1
log(1 + γ
ziσ
). (1.15)
The maximum likelihood estimates are obtained by maximizing (1.15) with respect to γ andσ.Using maximum likelihood with block maxima or peaks over threshold is an easy way to
estimate γ. There are many other ways to estimate γ but we will not go into detail aboutthem. Among the methods of estimating γ for the generalized extreme value distribution arethe Pickands estimator (Pickands, 1975), the moment estimator (Dekkers et al., 1989), and
the probability-weighted moment estimator (Hosking et al., 1985).
When considering the POT model we have to choose the threshold ourselves. There are several
ways to do this, but we will only discuss how to choose the threshold using a mean residual
life plot. An introduction to mean residual life plots requires a small lemma about a property
of the generalized Pareto distribution.
Lemma 1.4.1. If X ∼ GPD(σ, γ), then X − u|X > u ∼ GPD(σ + γu, γ).
-
Estimation of the extreme value index in practice 13
Proof. If X ∼ GPD(σ, γ), then FX(x) = 1−(1 + γ xσ
)− 1γ . From this we get that
P (X − u > x|X > u) = P (X > u+ x,X > u)P (X > u)
, x > 0
=1− FX(u+ x)1− FX(u)
=
(1 + γ x+uσ1 + γ uσ
)− 1γ
=
(1 + γ
x
σ + γu
)− 1γ
,
which implies that X − u|X > u ∼ GPD(σ + γu, γ).
If X ∼ GPD(σ, γ) with γ < 1, then
E(X) =σ
1− γ,
while E(X) = ∞ for γ ≥ 1. So assuming γ < 1, it follows from Lemma 1.4.1 that
E(X − u|X > u) = σ + γu1− γ
, u > 0,
and hence the mean excess function is linear in u. The mean residual life plot consists of the
points {(u,
1
nu
nu∑i=1
(x(i) − u
)): u < xmax
},
where x(1), . . . , x(nu) consists of the nu observations that exceeds u, and xmax is the largestobservation. If the GPD approximation is good at threshold u, then it should also be good ata higher threshold, so the mean excess function should be approximately linear in u beyond agood threshold.
-
Chapter 2
Pareto-type distributions
In this chapter we give an introduction to the Fréchet class. We start by considering the domain
of attraction of this class, similar to the discussion of the Gumbel and extremal Weibull classes.
Next we turn our attention to the estimation of the extreme value index γ for Pareto-typedistributions which satisfy a second order condition. We prove asymptotic normality for a
statistic proposed in Goegebeur et al. (2010) and use this to construct a class of estimators
for γ. From this class of estimators we construct speci�c estimators using kernel functions.We end this chapter with a presentation of an estimator of the second order parameter. The
asymptotic normality of the latter is established under a third order condition.
2.1 Domain of attraction
The class of Pareto-type models corresponds with the max domain of attraction of Gγ withγ > 0. The following proposition provides a characterization of the distributions that belongto this class.
Proposition 2.1.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ in�nite that
FX ∈ D(Gγ), γ > 0 ⇔ 1− FX(x) = x−1γ lFX (x), x > 0 (2.1)
⇔ U(x) = xγlU (x), x > 1, (2.2)
where lU (x) and lFX (x) are slowly varying at in�nity.
Tail quantile functions of the form (2.2) can be shown to satisfy (1.5) if x tends to in�nity, inthe following way
U(tx)− U(t)a(t)
=(tx)γlU (tx)− tγlU (t)
a(t)
=lU (t)t
γ
a(t)
(lU (tx)
lU (t)xγ − 1
)∼ x
γ − 1γ
14
-
Domain of attraction 15
when choosing a(t) = γtγlU (t) = γU(t). More generally a(t) can also be chosen as a functionsatisfying
limt→∞
a(t)
U(t)= γ.
This brings us to how an can be chosen as a normalizing constant. If we choose an = a(n) =γU(n) then we can use this constant as one of the normalizing constants for the Fréchet class.There exists full equivalence between the Pareto-type models and the extremal Weibull class.
If we let X be a random variable with FX belonging to the max domain of attraction of theextremal Weibull class with x∗ as the right endpoint, and put Y := (x∗ −X)−1, then theWeibull class and the Pareto-type models are linked through the identi�cation
FX ∈ D (Gγ) , γ < 0 ⇔ FY ∈ D (Gγ) , γ > 0.
The equivalence follows easily because
1− FX(x∗ −
1
x
)= P
(X > x∗ −
1
x
)= P
((x∗ −X)−1 > x
)= 1− FY (x).
Example 2.1.2. The Fréchet distribution has distribution function given by
FX(x) = exp(−x−α
), x > 0, α > 0.
This means it has quantile function
Q(p) = (− log p)−1α , 0 < p < 1,
and hence the tail quantile function is
U(x) =
(− log
(1− 1
x
))− 1α
, x > 1.
The Fréchet distribution has γ = 1α and the normalizing constant an can hence be chosen as
an = γU(n) =1
α
(− log
(1− 1
n
))− 1α
.
The normalizing constant bn can be chosen as
bn = U(n) =
(− log
(1− 1
n
))− 1α
.
Concerning the Fréchet distribution with α = 1 we see that
P
(Xn,n − bn
an≤ x
)= FnX (anx+ bn)
= FnX
((− log
(1− 1
n
))−1x+
(− log
(1− 1
n
))−1)
=
[(1− 1
n
)n] 11+x→ exp
(−(1 + x)−1
)for n→ ∞.
-
16 Domain of attraction
The convergence of the Fréchet distribution to its limit is illustrated in Figure 2.1. The solid
line is G(x), the dashed line is for n = 2, the dotted line is for n = 5, while the dashed dottedline is for n = 10. It is clearly seen that when n grows, then FnX (anx+ bn) converges pointwiseto G(x).
−1 0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
x
p
Figure 2.1: The convergence of FnX (anx+ bn) to G(x) for the Fréchet distribution with α = 1.
�
Next we give two examples of distributions that are of Pareto-type.
Example 2.1.3. The Burr distribution has a distribution function given by
FX(x) = 1−(
ζ
ζ + xδ
)λ, x > 0, λ, ζ, δ > 0.
In order to verify that the Burr distribution is of Pareto-type we start with
1− FX(x) =(
ζ
ζ + xδ
)λ= x−δλ
(ζ
ζx−δ + 1
)λ.
It is easily seen that g(x) :=(
ζζx−δ+1
)λis slowly varying at in�nity since it converges to a
constant when x→ ∞. So the Burr distribution is of Pareto-type with γ = 1λδ .a �Example 2.1.4. The absolute T distribution has distribution function given by
FX(x) =Γ(n+12
)√nπΓ
(n2
) ∫ x−x
(1 +
t2
n
)−n+12
dt, x > 0, n ∈ N.
-
Estimation of the extreme value index 17
In order to verify that the absolute T distribution is of Pareto-type we start with
1− FX(x) = 2Γ(n+12
)√nπΓ
(n2
) ∫ ∞x
(1 +
t2
n
)−n+12
dt
= 2Γ(n+12
)√nπΓ
(n2
) ∫ ∞x
(t2
n
)−n+12 ( n
t2+ 1)−n+1
2dt
= K
∫ ∞x
t−n−1(nt−2 + 1
)−n+12 dt,
where K := 2n
n2 Γ(n+12 )√πΓ(n2 )
. We are concerned with large values of x, so we make a Taylor series
expansion of (1 + x)−n+12 around 0, which yields
(nt−2 + 1
)−n+12 =1− n+ 1
2nt−2 +
1
2
n+ 1
2
(n+ 1
2+ 1
)n2t−4
− 16
n+ 1
2
(n+ 1
2+ 1
)(n+ 1
2+ 2
)(1 + t̃
)−n+12
−3n3t−6,
where t̃ is between 0 and nt2. From this it follows that
1− FX(x) =K(∫ ∞
xt−n−1dt− n(n+ 1)
2
∫ ∞x
t−n−3dt
+n2(n+ 1)(n+ 3)
8
∫ ∞x
t−n−5dt
− n3(n+ 1)(n+ 3)(n+ 5)
48
∫ ∞x
t−n−1(1 + t̃
)−n+12
−3t−6dt
).
Since(1 + t̃
)−n+12
−3 ≤ 1 it follows that∫∞x t
−n−1 (1 + t̃)−n+12 −3 t−6dt ≤ ∫∞x t−n−7dt, andhence
1− FX(x) =K(x−n
n− n(n+ 1)
2(n+ 2)x−n−2 +
n2(n+ 1)(n+ 3)
8(n+ 4)x−n−4 +O
(x−n−6
))=x−nC0
(1− n
2(n+ 1)
2(n+ 2)x−2 +
n3(n+ 1)(n+ 3)
8(n+ 4)x−4 +O
(x−6
)), (2.3)
where C0 :=Kn . Since the function g(x) := C0
(1− n
2(n+1)2(n+2) x
−2 + n3(n+1)(n+3)
8(n+4) x−4 +O
(x−6
))converges to a constant, when x→ ∞, the function is slowly varying at in�nity and hence theabsolute T distribution is of Pareto-type with γ = 1n .a �
2.2 Estimation of the extreme value index
In the analysis of Pareto-type models, estimation of γ plays a central role. The asymptoticdistribution of the estimator of γ is usually established under the following second ordercondition on the tail behaviour.
-
18 Estimation of the extreme value index
Assumption 2.2.1 (Second order condition). There exists a positive real parameter γ, anegative real parameter ρ and a function b with b(t) → 0 for t→ ∞, of constant sign for largevalues of t, such that
limt→∞
logU(tx)− logU(t)− γ log xb(t)
=xρ − 1ρ
, ∀x > 0.
The second order condition implies that |b| is regularly varying with index ρ (Geluk and Haan,1987), so the parameter ρ determines the rate of convergence for logU(tx) − logU(t) to itslimit γ log x, when t tends to in�nity. If ρ is close to zero then the convergence is slow andthe estimation of tail parameters is practically di�cult.
We will now verify that the Burr distribution and the absolute T distribution satisfy the second
order condition. That they are of Pareto-type was veri�ed in Example 2.1.3 and Example 2.1.4
respectively.
Example 2.2.2. In order to verify that the Burr distribution satis�es the second order condi-
tion we need to �nd its tail quantile function. The quantile function of the Burr distribution
is easily found by inverting the distribution function and it is given by
Q(p) = ζ1δ
((1− p)−
1λ − 1
) 1δ, 0 < p < 1.
From this we obtain the tail quantile function
U(x) = Q
(1− 1
x
)= xγζ
1δ
(1− x−
1λ
) 1δ, x > 1.
We start with the expression
logU(tx)− logU(t)− γ log x = 1δlog(1− (xt)−
1λ
)− 1δlog(1− t−
1λ
).
If we make a Taylor series expansion of log(1− x) around 0, we obtain
logU(tx)− logU(t)− γ log x =1δ
(−(tx)−
1λ − 1
2(tx)−
2λ
)− 1δ
(−t−
1λ − 1
2t−
2λ
)+O
(t−
3λ
)=
1λδ t
− 1λ
(x−
1λ − 1
)− 1λ
+
1λδ t
− 2λ
(x−
2λ − 1
)− 2λ
+O(t−
3λ
)(2.4)
=γt−
1λ
(x−
1λ − 1
)− 1λ
+O(t−
2λ
). (2.5)
From (2.5) we see that if we choose ρ = − 1λ and b(t) = γtρ, then the Burr distribution satis�es
the second order condition. More generally b(t) can be chosen such that b(t) = γtρ(1 + o(1)).a �
Example 2.2.3. From (2.3) we get for the absolue T distribution that
1− FX(x) = x−1γC0
(1− C1x−2 + C2x−4 +O
(x−6
)),
-
Estimation of the extreme value index 19
where C1 :=n2(n+1)2(n+2) and C2 :=
n3(n+1)(n+3)8(n+4) . In order to �nd the tail quantile function we
have to invert1
y= x
− 1γC0
(1− C1x−2 + C2x−4 +O
(x−6
)).
From this we �nd
x = Cγ0 yγ(1− C1x−2 + C2x−4 +O
(x−6
))γ.
If we make a Taylor series expansion of (1− x)γ around x = 0, we obtain
x =Cγ0 yγ
(1− γ
(C1x
−2 − C2x−4 +O(x−6
))+
1
2γ(γ − 1)
(C1x
−2 − C2x−4 +O(x−6
))2+O
(x−6
))=Cγ0 y
γ
(1− γC1C−2γ0 y
−2γ(1− γC1x−2 +
(γC2 +
γ(γ − 1)2
C21
)x−4 +O
(x−6
))−2+
(γC2 +
γ(γ − 1)2
C21
)x−4 +O
(x−6
)).
Now we make a Taylor series expansion of (1− x)−2 in which case we obtain
x =Cγ0 yγ
(1− γC1C−2γ0 y
−2γ (1 + 2γC1x−2 +O (x−4))+
(γC2 +
γ(γ − 1)2
C21
)x−4 +O
(x−6
)).
If we substitute the right hand side into the place of x, then it follows that
x =Cγ0 yγ
(1− γC1C−2γ0 y
−2γ
+
(γC2 −
γ(3γ + 1)
2C21
)C−4γ0 y
−4γ +O(y−6γ
)).
So the tail quantile function can be written as
U(x) = Cγ0 xγ(1−D1x−2γ +D2x−4γ +O(x−6γ)
),
where D1 := γC1C−2γ0 , and D2 :=
(γC2 − γ(3γ+1)2 C
21
)C−4γ0 . We are now ready to verify that
the absolute T distribution satis�es the second order condition. We start with the expression
logU(xt)− logU(t)− γ log x = log(1−D1(xt)−2γ +D2(xt)−4γ +O(t−6γ)
)− log
(1−D1t−2γ +D2t−4γ +O(t−6γ)
).
By making a Taylor series expansion of log(1− x) around x = 0 we obtain
logU(xt)− logU(t)− γ log x =−D1(xt)−2γ +D2(xt)−4γ −1
2
(D1(xt)
−2γ −D2(xt)−4γ)2
+D1t−2γ −D2t−4γ +
1
2
(D1t
−2γ −D2t−4γ)2
+O(t−6γ)
=−D1t−2γ(x−2γ − 1
)+
(D2 −
1
2D21
)t−4γ
(x−4γ − 1
)+O(t−6γ) (2.6)
=−D1t−2γ(x−2γ − 1
)+O(t−4γ). (2.7)
-
20 Estimation of the extreme value index
From (2.7) we see that if we choose ρ = −2γ and b(t) of the form b(t) = −ρD1tρ(1 + o(1)),then the absolute T distribution satis�es the second order condition.
a �
We now return to the estimation of γ. The estimator of γ we will consider is based on a kernelstatistic with kernel function K. This statistic is given by
Tn,k(K) :=1
k
k∑j=1
K
(j
k + 1
)Zj , (2.8)
where Zj := j (logXn−j+1,n − logXn−j,n). This statistic will also serve as the basic buildingblock for the ρ estimator we propose in section 2.3. We need some conditions on the kernelfunction, but �rst we introduce the following notation
µ(K) :=
∫ 10K(u)du,
I1(K, ρ) :=
∫ 10K(u)u−ρdu,
σ2(K) :=
∫ 10K2(u)du.
With this notation the kernel function must satisfy
Assumption 2.2.4. Let K be a function de�ned on (0, 1) such that
(i) K(t) = 1t∫ t0 u(v)dv for some function u satisfying
∣∣∣∣(k + 1) ∫ jk+1j−1k+1
u(t)dt
∣∣∣∣ ≤ f ( jk+1) forsome positive continuous and integrable function f de�ned on (0, 1),
(ii) σ2(K) 0 and the log kernels Lδ(t) := (− log t)δ, δ > 0.
Lemma 2.2.6. The function K(t) := tτ (− log t)δ satis�es Assumption 2.2.4.
The proof of Lemma 2.2.6 can be found in Appendix 2.4.
a �
-
Estimation of the extreme value index 21
With Assumption 2.2.1 and Assumption 2.2.4 we are able to establish the following result.
Theorem 2.2.7. Let X1, . . . , Xn be i.i.d. random variables according to a distribution sat-isfying Assumption 2.2.1. If further Assumption 2.2.4 holds, then for k, n → ∞ such thatkn → 0 we have
Tn,k(K)D= γµ(K) + γσ(K)
Nk(K)√k
+ b(nk
)I1(K, ρ) (1 + oP(1)) , (2.9)
where Nk(K) is asymptotically a standard normal random variable.
A proof of this theorem is given in Goegebeur et al. (2010), we will however give an alternative
proof of the result.
Proof of Theorem 2.2.7. Let U1,n ≤ . . . ≤ Un,n be order statistics from a random sample ofsize n from the U(0, 1) distribution. By using the inverse probability integral transform we�nd that
Xi,nD= Q (Ui,n)
D= Q (1− Un−i+1,n)
= U
(1
Un−i+1,n
).
Since the Xi are of Pareto-type it follows that
Xi,nD=
(1
Un−i+1,n
)γlU
(1
Un−i+1,n
).
From this we get
logXi,nD= −γ logUn−i+1,n + log lU
(1
Un−i+1,n
).
Hence
logXn−j+1,n − logXn−k,nD= −γ log Uj,n
Uk+1,n+ log
lU
(Uk+1,nUj,n
1Uk+1,n
)lU
(1
Uk+1,n
) .Since
Uj,nUk+1,n
D= Vj,k, where Vj,k is the j'th order statistic in a random sample of size k from
the U(0, 1) distribution, it follows that
logXn−j+1,n − logXn−k,nD= −γ log Vj,k + log
lU
(1
Vj,k1
Uk+1,n
)lU
(1
Uk+1,n
)D= −γ log (1− Vk−j+1,k) + log
lU
(1
Vj,k1
Uk+1,n
)lU
(1
Uk+1,n
) .Using that the quantile function of the standard exponential distribution is Q(p) = − log(1−p), 0 < p < 1, and denoting by E1,n ≤ . . . ≤ En,n the order statistics of a random sample of
-
22 Estimation of the extreme value index
size n from the standard exponential distribution, we get using Assumption 2.2.1 and inspiredby Lemma 2.4.3, that
logXn−j+1,n − logXn−k,nD= γEk−j+1,k + b0
(1
Uk+1,n
) ( 1Vj,k
)ρ− 1
ρ+ b0
(1
Uk+1,n
)R̃n,k(j),
where R̃n,k(j) :=logU
(1
Uk+1,n
1Vj,k
)−logU
(1
Uk+1,n
)−γ log 1
Vj,k
b0
(1
Uk+1,n
) −(
1Vj,k
)ρ−1
ρ . Thus
Zj = j (logXn−j+1,n − logXn−j,n)
D= j
γEk−j+1,k − γEk−j,k + b0( 1Uk+1,n
) ( 1Vj,k
)ρ−(
1Vj+1,k
)ρρ
+ b0
(1
Uk+1,n
)Rn,k(j)
,(2.10)
where Rn,k(j) := R̃n,k(j) − R̃n,k(j + 1), with the convention R̃n,k(k + 1) := 0 and with b0 afunction satisfying b0(t) ∼ b(t) for t → ∞. Using the Rényi representation (Rényi, 1953) wecan express each Ej,k as
{Ej,k}j=1,...,kD=
{j∑
i=1
Ek−i+1k − i+ 1
}j=1,...,k
,
where the E1, . . . , Ek are independent random variables from a standard exponential distri-bution. Hence
Ek−j+1,k − Ek−j,kD=
k−j+1∑i=1
Ek−i+1k − i+ 1
−k−j∑i=1
Ek−i+1k − i+ 1
=Ejj. (2.11)
Combining (2.10) and (2.11) we �nd that
ZjD= γEj + b0
(1
Uk+1,n
)j
(1
Vj,k
)ρ−(
1Vj+1,k
)ρρ
+ b0
(1
Uk+1,n
)jRn,k(j).
Let Y1,k ≤ . . . ≤ Yk,k be order statistics of a random sample of size k from the standard strictPareto distribution. Then we have
1
Vj,k
D=
1
1− Vk−j+1,kD= Yk−j+1,k.
Using this we get that
ZjD= γEj + b0 (Yn−k,n) j
Y ρk−j+1,k − Yρk−j,k
ρ+ b0 (Yn−k,n) jRn,k(j).
-
Estimation of the extreme value index 23
Hence
Tn,k(K)D=1
k
k∑j=1
K
(j
k + 1
)(γEj + b0 (Yn−k,n) j
Y ρk−j+1,k − Yρk−j,k
ρ+ b0 (Yn−k,n) jRn,k(j)
)
=γ1
k
k∑j=1
K
(j
k + 1
)Ej + b0 (Yn−k,n)
1
k
k∑j=1
K
(j
k + 1
)jY ρk−j+1,k − Y
ρk−j,k
ρ
+ b0 (Yn−k,n)1
k
k∑j=1
K
(j
k + 1
)jRn,k(j)
= : T(1)n,k + T
(2)n,k + T
(3)n,k .
Using Assumption 2.2.4 (iii) we get for the �rst term that
T(1)n,k = γ
1
k
k∑j=1
K
(j
k + 1
)+ γ
1
k
k∑j=1
K
(j
k + 1
)(Ej − 1)
= γµ(K) + o
(1√k
)+ γσ(K)
Ñk(K)√k
, (2.12)
where Ñk(K) :=√k
1k
∑kj=1 K(
jk+1)(Ej−1)
σ(K) . The term Ñk(K) is according to Lemma 2.4.1 in
Appendix 2.4 an asymptotic standard normal random variable. In (2.12) we can combine the
o(
1√k
)with Ñk(K) to get
T(1)n,k = γµ(K) + γσ(K)
Nk(K)√k
,
where Nk(K) is again an asymptotic standard normal random variable.
Since Yi,kD= 11−Ui,k and the standard exponential distribution has quantile function Q(p) =
− log(1− p) it follows that T (2)n,k can be written as
T(2)n,k
D= b0 (Yn−k,n)
1
k
k∑j=1
K
(j
k + 1
)jexp (ρEk−j+1,n)− exp (ρEk−j,n)
ρ.
Using the mean value theorem we �nd that
T(2)n,k
D= b0 (Yn−k,n)
1
k
k∑j=1
K
(j
k + 1
)j (Ek−j+1,n − Ek−j,n) exp (ρQj,k) ,
-
24 Estimation of the extreme value index
where Qj,k is a random value between Ek−j,k and Ek−j+1,k, and hence
T(2)n,k
D=b0 (Yn−k,n)
1
k
k∑j=1
K
(j
k + 1
)Ej exp (ρQj,k)
=b0 (Yn−k,n)1
k
k∑j=1
K
(j
k + 1
)(j
k + 1
)−ρEj
+ b0 (Yn−k,n)1
k
k∑j=1
K
(j
k + 1
)Ej
(exp (ρQj,k)−
(j
k + 1
)−ρ)= : T
(2,1)n,k + T
(2,2)n,k .
Concerning the term T(2,1)n,k we get
T(2,1)n,k =b0 (Yn−k,n)
1
k
k∑j=1
K
(j
k + 1
)(j
k + 1
)−ρ
+ b0 (Yn−k,n)1
k
k∑j=1
K
(j
k + 1
)(j
k + 1
)−ρ(Ej − 1) ,
so by the law of large numbers it follows that
T(2,1)n,k = b0 (Yn−k,n) I1(K, ρ) (1 + oP(1)) .
We now turn to T(2,2)n,k . Note that for j = 1, . . . , k we have that
exp (Ek−j+1,k)D= exp (− log (1− Uk−j+1,k))D= exp (− log (Uj,k))
=1
Uj,k,
and hence∣∣∣∣∣exp (ρQj,k)−(
j
k + 1
)−ρ∣∣∣∣∣ ≤ max{∣∣∣∣∣exp (ρEk−j,k)−
(j
k + 1
)−ρ∣∣∣∣∣ ,∣∣∣∣∣exp (ρEk−j+1,k)−
(j
k + 1
)−ρ∣∣∣∣∣}
D= max
{∣∣∣∣∣U−ρj+1,k −(
j
k + 1
)−ρ∣∣∣∣∣ ,∣∣∣∣∣U−ρj,k −
(j
k + 1
)−ρ∣∣∣∣∣}
≤ max
{∣∣∣∣∣U−ρj+1,k −(j + 1
k + 1
)−ρ∣∣∣∣∣+ cj,k,∣∣∣∣∣U−ρj,k −
(j
k + 1
)−ρ∣∣∣∣∣},
-
Estimation of the extreme value index 25
where cj,k =(
j+1k+1
)−ρ−(
jk+1
)−ρ. From this it follows that∣∣∣∣∣∣1k
k∑j=1
K
(j
k + 1
)Ej
(exp (ρQj,k)−
(j
k + 1
)−ρ)∣∣∣∣∣∣≤ 1k
k∑j=1
∣∣∣∣K ( jk + 1)∣∣∣∣Ej
∣∣∣∣∣U−ρj+1,k −(j + 1
k + 1
)−ρ∣∣∣∣∣+ 1kk∑
j=1
∣∣∣∣K ( jk + 1)∣∣∣∣ cj,kEj
+1
k
k∑j=1
∣∣∣∣K ( jk + 1)∣∣∣∣Ej
∣∣∣∣∣U−ρj,k −(
j
k + 1
)−ρ∣∣∣∣∣=: T
(2,2,1)n,k + T
(2,2,2)n,k + T
(2,2,3)n,k .
According to Lemma 2.4.2 the terms T(2,2,1)n,k and T
(2,2,3)n,k are OP
(1√k
). Using the mean value
theorem we see that we can write the term T(2,2,2)n,k as
T(2,2,2)n,k =
|ρ|k + 1
1
k
k∑j=1
∣∣∣∣K ( jk + 1)∣∣∣∣ z|ρ|−1j,k Ej ,
where zj,k is a value betweenj
k+1 andj+1k+1 . When |ρ| ≥ 1 it follows that
T(2,2,2)n,k ≤
|ρ|k + 1
1
k
k∑j=1
∣∣∣∣K ( jk + 1)∣∣∣∣Ej ,
and hence by the law of large numbers it follows that T(2,2,2)n,k = OP
(1k
). When |ρ| < 1 we have
T(2,2,2)n,k ≤
|ρ|k + 1
1
k
k∑j=1
∣∣∣∣K ( jk + 1)∣∣∣∣ ( jk + 1
)|ρ|−1Ej ,
which by Assumption 2.2.4 (v) and the law of large numbers implies that T(2,2,2)n,k = OP
(1k
).
So
T(2)n,k = b0 (Yn−k,n) I1(K, ρ) (1 + oP(1)) .
Concerning the term T(3)n,k we �nd using Assumption 2.2.4 (i) that∣∣∣T (3)n,k∣∣∣ =
∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑
j=1
Rn,k(j)
∫ jk+1
0u(v)dv
∣∣∣∣∣∣=
∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑
j=1
Rn,k(j)
j∑i=1
∫ ik+1
i−1k+1
u(v)dv
∣∣∣∣∣∣=
∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑
i=1
∫ ik+1
i−1k+1
u(v)dv
k∑j=i
Rn,k(j)
∣∣∣∣∣∣≤ |b0 (Yn−k,n)|
1
k
k∑i=1
f
(i
k + 1
) ∣∣∣∣∣∣k∑
j=i
Rn,k(j)
∣∣∣∣∣∣ .
-
26 Estimation of the extreme value index
For the term∑k
j=iRn,k(j) it follows that
k∑j=i
Rn,k(j) =k∑
j=i
(R̃n,k(j)− R̃n,k(j + 1)
)= R̃n,k(i).
For δ, � > 0 there exists n0 such that for any n ≥ n0, with arbitrary large probability, fori = 1, . . . , k, ∣∣∣∣∣∣
k∑j=i
Rn,k(j)
∣∣∣∣∣∣ ≤ �(
1
Vi,k
)ρmax
((1
Vi,k
)δ,
(1
Vi,k
)−δ)= �V −ρ−δi,k ,
using Lemma 2.4.3. Hence
supi∈{1,...,k}
∣∣∣∣∣∑k
j=iRn,k(j)
V −ρ−δi,k
∣∣∣∣∣ = oP(1)leading to ∣∣∣T (3)n,k∣∣∣ ≤ b0 (Yn−k,n) oP(1)1k
k∑i=1
f
(i
k + 1
)(V −ρ−δi,k
),
which by Assumption 2.2.4 (i) and assuming δ < |ρ| is oP (b0 (Yn−k,n)). Combining the resultson T
(1)n,k , T
(2)n,k and T
(3)n,k establishes the result.
Using Theorem 2.2.7 we can create a class of estimators γ̂k(K) :=Tn,k(K)µ(K) for γ in the following
way
Proposition 2.2.8. Let X1, . . . , Xn be i.i.d. random variables according to a distributionsatisfying Assumption 2.2.1. If further Assumption 2.2.4 holds with µ(K) 6= 0, then fork, n→ ∞ such that kn → 0 and
√kb(nk
)→ λ for some constant λ we have
√k (γ̂k(K)− γ) → N
(λI1(K, ρ)
µ(K), γ2
σ2(K)
µ2(K)
). (2.13)
Proof. We have
√k (γ̂k(K)− γ)
D= γ
σ(K)
µ(K)Nk(K) +
√kb(nk
) I1(K, ρ)µ(K)
(1 + oP(1))
→ N(λI1(K, ρ)
µ(K), γ2
σ2(K)
µ2(K)
),
under the conditions of the Proposition.
We veri�ed in Lemma 2.2.6 that the kernel function K(t) = tτ (− log t)δ satis�es Assumption2.2.4. This allows us to construct consistent estimators which are asymptotically normal using
this kernel. We do so in Corollary 2.2.9.
-
Estimation of the second order parameter 27
Corollary 2.2.9. Let X1, . . . , Xn be i.i.d. random variables according to a distribution satis-fying Assumption 2.2.1. For k, n → ∞ such that kn → 0 and
√kb(nk
)→ λ for some constant
λ we have for the kernel function K(t) = tτ (− log t)δ, τ, δ ≥ 0 that
√k (γ̂k(K)− γ) → N
(λ
(τ + 1)δ+1
(τ − ρ+ 1)δ+1, γ2
Γ(2δ + 1)(τ + 1)2δ+2
(2τ + 1)2δ+1(Γ(δ + 1))2
).
In particular, we obtain
(i) For the Hill Kernel√k (γ̂k(H)− γ) → N
(λ
1
1− ρ, γ2).
(ii) For the Power kernel
√k (γ̂k(Kτ )− γ) → N
(λ
τ + 1
τ − ρ+ 1, γ2
(τ + 1)2
2τ + 1
).
(iii) For the Log kernel
√k (γ̂k(Lδ)− γ) → N
(λ
1
(1− ρ)δ+1, γ2
Γ(2δ + 1)
(Γ(δ + 1))2
).
A discussion on when to choose which kernel function is a topic of its own, so we will not
spend much time on it since it is not of great importance for this thesis. However, the Hill
kernel always has the smallest asymptotic variance. In general, the kernel function for which
the asymptotic mean squared error of the resulting γ estimator is minimal depends on thedistributional parameters γ and ρ. Concerning the log and power kernel with δ = τ , wesee that the log kernel tends to have a bigger variance than the power kernel, although it
su�ers from less bias. For a detailed discussion of the performance of γ estimators with kernelfunctions in the family K(t) = tτ (− log t)δ we refer to Gomes et al. (2007).
2.3 Estimation of the second order parameter
The estimation of the second order parameter in the univariate case is not of grave impor-
tance to this thesis. We will however in Chapter 4 construct estimators for the second order
parameter in the bivariate extreme value framework, which are based on the same ideas as is
used to construct the estimator for the second order parameter ρ. In order to construct anestimator for ρ we start with the basic building block Tn,k(K) de�ned in (2.8). By making aTaylor series expansion it follows by Theorem 2.2.7 that
Tαn,k(K)D= γαµα(K) + αγµα−1(K)σ(K)
Nk(K)√k
+ b(nk
)αγα−1µα−1(K)I1(K, ρ) (1 + oP(1)) ,
where α > 0 and K > 0. The basic idea is to construct a statistic which converges inprobability to a function of ρ, which does not depend on the unknown parameter γ. To this
-
28 Estimation of the second order parameter
end, let K1, . . . ,K8 be kernel functions and de�ne
K(1) := (K1,K2,K3,K4) ,
K(2) := (K5,K6,K7,K8) ,
K(1,2) :=(K(1),K(2)
),
Ī1 (Ki, ρ) :=I1 (Ki, ρ)
µ (Ki), i ∈ {1, . . . , 8} ,
Ī(a)1 (Ki,Kj , ρ) := Ī
a1 (Ki, ρ)− Īa1 (Kj , ρ) , a = 1, 2, i, j ∈ {1, . . . , 8} .
Using this notation, we consider the ratio of di�erences given by
Ψn,k
(K(1), α1, α2
):=
(Tn,k(K1)µ(K1)
)α1−(Tn,k(K2)µ(K2)
)α1(Tn,k(K3)µ(K3)
)α2−(Tn,k(K4)µ(K4)
)α2 (2.14)and the function
ψ(K(1), α1, α2, ρ
):= γα1−α2
α1Ī(1)1 (K1,K2, ρ)
α2Ī(1)1 (K3,K4, ρ)
,
with α1, α2 > 0.If k, n→ ∞ such that kn → 0 and
√kb(nk
)→ ∞, then(
Tn,k(K1)µ(K1)
)α1−(Tn,k(K2)µ(K2)
)α1b(nk
) P→ α1γα1−1Ī(1)1 (K1,K2, ρ)and (
Tn,k(K3)µ(K3)
)α1−(Tn,k(K4)µ(K4)
)α2b(nk
) P→ α2γα2−1Ī(1)1 (K3,K4, ρ) .Hence
Ψn,k
(K(1), α1, α2
)P→ ψ
(K(1), α1, α2, ρ
).
This statistic still depends on γ, but we can get rid of this if we consider a ratio of statisticson the form of (2.14) with appropriately chosen α parameters. So de�ne
Λn,k
(K(1,2), α1, α2, l
):=
Ψn,k
(K(1), α1, α1 + l
)Ψn,k
(K(2), α2, α2 + l
)and
Λ(K(1,2), α1, α2, l, ρ
):=
ψ(K(1), α1, α1 + l, ρ
)ψ(K(2), α2, α2 + l, ρ
)where l > 0. If we again assume that If k, n → ∞ such that kn → 0 and
√kb(nk
)→ ∞, then
clearly
Λn,k
(K(1,2), α1, α2, l
)P→ Λ
(K(1,2), α1, α2, l, ρ
),
-
Estimation of the second order parameter 29
which does not depend on γ. If the function ρ 7→ Λ(K(1,2), α1, α2, l, ρ
)is bijective, then we
obtain the estimator
ρ̂(K(1,2), α1, α2, l
):= Λ−1
(K(1,2), α1, α2, l,Λn,k
(K(1,2), α1, α2, l
))(2.15)
for the second order parameter. The consistency of this estimator is estblished in Proposition
2.3.1 using a straightforward application of the continuous mapping theorem.
Proposition 2.3.1. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables ac-cording to a distribution satisfying Assumption 2.2.1. Let K1, . . . ,K8 satisfy Assumption
2.2.4, and suppose Ī(1)1 (K1,K2), Ī
(1)1 (K3,K4), Ī
(1)1 (K5,K6) and Ī
(1)1 (K7,K8) are wellde-
�ned and nonzero. Then if k, n → ∞ such that kn → 0 and√kb(nk
)→ ∞ we have
Λn,k
(K
(1,2), α1, α2, l)
P→ Λ(K
(1,2), α1, α2, l, ρ). Further, if Λ is bijective and Λ−1 is con-
tinuous then ρ̂(K
(1,2), α1, α2, l)is a consistent estimator for ρ.
In order to establish asymptotic normality of the estimator of ρ, we need the following thirdorder condition.
Assumption 2.3.2 (Third order condition). There exists a positive real parameter γ, negativereal parameters ρ and β, functions b and b̃ with b(t) → 0 and b̃(t) → 0 for t → ∞, both ofconstant sign for large values of t, such that
limt→∞
logU(tx)−logU(t)−γ log xb(t) −
xρ−1ρ
b̃(t)=
1
β
(xρ+β − 1ρ+ β
− xρ − 1ρ
), ∀x > 0.
The third order condition implies that |b̃| is regularly varying of index β (de Haan and Ferreira,2006). The third order contion is not to restrictive. Among distributions of Pareto-type that
satisfy the second and third order condition are the Fréchet, the Burr, the GP distributions
and the absolute T distribution. This is not a complete list of Pareto-type distributions which
satisfy the second and third order condition. As examples, we show that the Burr and the
absolute T distribution satis�es the third order condition.
Example 2.3.3. In order to verify that the Burr distribution satis�es the third order con-
dition, it is a good idea to choose b(t) = γ tρ
1−tρ . From (2.4) and the choice of b(t) it followsthat
logU(tx)− logU(t)− γ log xb(t)
− xρ − 1ρ
=
γtρ(xρ−1)ρ −
12δ t
2ρ(x2ρ − 1
)+O
(t3ρ)
γ tρ
1−tρ− x
ρ − 1ρ
(2.16)
=− tρ (xρ − 1)
ρ+
1
2ρtρ(x2ρ − 1
)+O
(t2ρ)
(2.17)
=ρtρ1
ρ
(x2ρ − 1
2ρ− x
ρ − 1ρ
)+O(t2ρ). (2.18)
From (2.18) we see that if we choose β = ρ and b̃(t) = ρtρ(1+o(1)) then the Burr distributionsatis�es the third order condition.
-
30 Estimation of the second order parameter
Example 2.3.4. To verify that the absolute T distribution satis�es the third order condition,
it is a good idea to choose b(t) = − ρD1tρ
1+2(
D2D1
− 12D1
)tρ. With this choice of b(t) and (2.6) it follows
that
logU(xt)− logU(t)− γ log xb(t)
− xρ − 1ρ
=2
(D2D1
− 12D1
)tρ (xρ − 1)
ρ(2.19)
−(D2D1
− 12D1
)tρ(x2ρ − 1
)ρ
+O(t2ρ) (2.20)
=− 2ρ(D2D1
− 12D1
)tρ1
ρ
(x2ρ − 1
2ρ− (x
ρ − 1)ρ
)+O(t2ρ).
(2.21)
From this we see that if we choose β = ρ and b̃(t) on the form b̃(t) = −2ρ(D2D1
− 12D1)tρ(1 +
o(1)), then the absolute T distribution satis�es the third order condition.
We also have to add an extra condition on the kernel function.
Assumption 2.3.5. Let K be a fuction de�ned on (0, 1) such that Assumption 2.2.4 is sat-is�ed, and the following extra condition.
(vi) 1k∑k
j=1K(
jk+1
)(j
k+1
)−ρ= I1(K, ρ) + o
(1√k
), k → ∞.
Lemma 2.3.6. The kernel function considered in Example 2.2.5 given by K(t) := tτ (− log t)δalso satis�es Assumption 2.3.5
This result can easily be obtained from the proof of Assumption 2.2.4 (iii), and is hence
omitted.
Similar to the procedure in Theorem 2.2.7 we can make an asymptotic expansion of the statistic
in (2.8) using the third order condition.
Theorem 2.3.7. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables accordingto a distribution satisfying Assumption 2.3.2. If Assumption 2.3.5 holds, then for k, n → ∞such that kn → 0 we have
Tn,k(K)D=γµ(K) + γσ(K)
Nk(K)√k
+ b (Yn−k,n) I1(K, ρ) + b (Yn−k,n) σ̃(K, ρ)Pk(K, ρ)√
k
+ b (Yn−k,n) b̃ (Yn−k,n) I2(K, ρ, β) (1 + oP(1)) + b (Yn−k,n)OP
(1√k
),
where Nk(K) and Pk(K, ρ) are asymptotic standard normally distributed random variables.
We will not give a proof of this result, but the line of proof follows the same as the proof of
Theorem 2.2.7. The result in Theorem 2.3.7 can be used to obtain the asymptotic expansion
Tαn,k(K)D=γαµα(K) + αγαµα−1(K)σ(K)
Nk(K)√k
+ b (Yn−k,n)αγα−1µα−1(K)I1(K, ρ)
+ b (Yn−k,n) b̃ (Yn−k,n)αγα−1µα−1(K)I2(K, ρ, β) (1 + oP(1))
+ b2 (Yn−k,n)α(α− 1)
2γα−2µα−2(K)I21 (K, ρ) (1 + oP(1)) + b (Yn−k,n)OP
(1√k
)
-
Estimation of the second order parameter 31
Before we can present the limiting distribution of the ρ estimator presented in (2.15) we needto introduce the following notation, with i, j ∈ {1, . . . , 8}.
Ī2(K, ρ, β) :=I2 (K, ρ, β)
µ(K),
Ī2 (Ki,Kj , ρ, β) :=I2 (Ki, ρ, β)
µ(K)− I2 (Kj , ρ, β)
µ(K),
σ̄(K) :=σ(K)
µ(K),
Nk (Ki,Kj) := σ̄ (Ki)Nk (Ki)− σ̄ (Kj)Nk (Kj) ,
Nk
(K(1), α1, α2, γ, ρ
):=
α1γα1Nk (K1,K2)− ψ
(K(1), α1, α2, ρ
)α2γ
α2Nk (K3,K4)
α2γα2−1Ī(1)1 (K3,K4, ρ)
,
c1
(K(1), α1, α2, γ, ρ, β
):=
α1γα1−1Ī2 (K1,K2, ρ, β)− ψ
(K(1), α1, α2, ρ
)α2γ
α2−1Ī2 (K3,K4, ρ, β)
α2γα2−1Ī(1)1 (K3,K4, ρ)
,
c2
(K(1), α1, α2, γ, ρ
):=
α1 (α1 − 1) γα1−2Ī(2)1 (K1,K2, ρ)− ψ(K(1), α1, α2, ρ
)α2 (α2 − 1) γα2−2Ī(2)1 (K3,K4, ρ)
α2γα2−1Ī(1)1 (K3,K4, ρ)
,
Nk
(K(1,2), α1, α2, l, γ, ρ
):=
Nk(K(1), α1, α1 + l, γ, ρ
)− Λ
(K(1,2), α1, α2, l, γ, ρ
)Nk(K(2), α2, α2 + l, γ, ρ
)ψ(K(2), α2, α2 + l, ρ
) ,c1
(K(1,2), α1, α2, l, γ, ρ, β
):=
c1(K(1), α1, α1 + l, γ, ρ, β
)− Λ
(K(1,2), α1, α2, l, γ, ρ
)c1(K(2), α2, α2 + l, γ, ρ, β
)ψ(K(2), α2, α2 + l, ρ
) ,c2
(K(1,2), α1, α2, l, γ, ρ
):=
c2(K(1), α1, α1 + l, γ, ρ
)− Λ
(K(1,2), α1, α2, l, γ, ρ
)c2(K(2), α2, α2 + l, γ, ρ
)ψ(K(2), α2, α2 + l, ρ
) ,v2(K(1,2), α1, α2, l, γ, ρ
):= Var
(Nk
(K(1,2), α1, α2, l, γ, ρ
)).
With this notation we can obtain a result giving the asymptotic normality of our ρ estimator.
Proposition 2.3.8. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables ac-cording to a distribution satisfying Assumption 2.3.2. If the kernel functions K1, . . . ,K8satisfy Assumption 2.3.5 and are such that Ī
(1)1 (K1,K2, ρ), Ī
(1)1 (K3,K4, ρ), Ī
(1)1 (K5,K6, ρ)
and Ī(1)1 (K7,K8, ρ) are well de�ned and nonzero, then for k, n → ∞ such that kn → 0,√
kb(nk
)→ ∞,
√kb(nk
)b̃(nk
)→ λ1 and
√kb2(nk
)→ λ2 we have
√kb(nk
) [Λn,k
(K(1,2), α1, α2, l
)− Λ
(K(1,2), α1, α2, l, ρ
)]D→ N
(λ1c1
(K(1,2), α1, α2, l, γ, ρ, β
)+ λ2c2
(K(1,2), α1, α2, l, γ, ρ
), v2(K(1,2), α1, α2, l, γ, ρ
)).
-
32 Appendix
2.4 Appendix
2.4.1 Proof of Lemma 2.2.6
i)
Since K(t) = 1t tτ+1(− log t)δ it follows that∫ t
0u(v)dv = tτ+1(− log t)δ,
and hence
u(v) = (τ + 1)vτ (− log v)δ − δvτ (− log v)δ−1.
Now∣∣∣∣∣(k + 1)∫ j
k+1
j−1k+1
u(t)dt
∣∣∣∣∣ ≤(k + 1)(τ + 1)∫ j
k+1
j−1k+1
tτ (− log t)δdt+ (k + 1)δ∫ j
k+1
j−1k+1
tτ (− log t)δ−1dt
≤(k + 1)j
(τ + 1)
∫ jk+1
0(− log t)δdt+ (k + 1)δ
∫ jk+1
j−1k+1
(− log t)δ−1dt
We distinguish between the two cases δ > 1 and δ ≤ 1. We start with the case δ > 1. So∣∣∣∣∣(k + 1)∫ j
k+1
j−1k+1
u(t)dt
∣∣∣∣∣ ≤(k + 1)j (τ + 1)∫ j
k+1
0(− log t)δdt+ (k + 1)
jδ
∫ jk+1
0(− log t)δ−1dt
= : f
(j
k + 1
).
Next we show that∫ 10 f(x)dx 1.∫ 1
0f(x)dx = (τ + 1)
∫ 10
1
x
∫ x0(− log t)δdtdx+ δ
∫ 10
1
x
∫ x0(− log t)δ−1dtdx
= (τ + 1)
∫ 10(− log t)δ
∫ 1t
1
xdxdt+ δ
∫ 10(− log t)δ−1
∫ 1t
1
xdxdt
= (τ + 1)Γ(δ + 2) + δΓ(δ + 1)
-
Appendix 33
ii)
The second part is easily veri�ed using the following argument.
σ2(K) =
∫ 10K2(u)du
≤ Γ(2δ + 1)
-
34 Appendix
A similar argument shows that I12 = O((log(k+1))δ
k+1
).
Concerning the term I2 we �nd that
I2 ≤∫ ∞log(k+1)
zδe−zdz
=(log(k + 1))δ
k + 1+ δ
∫ ∞log(k+1)
zδ−1e−zdz
=(log(k + 1))δ
k + 1
(1 +
k + 1
(log(k + 1))δδ
∫ ∞log(k+1)
zδ−1e−zdz
).
If we can show that k+1(log(k+1))δ
δ∫∞log(k+1) z
δ−1e−zdz → 0 as k → ∞ then I2 = O((log(k))δ
k
).
Using l'Hôpital's rule and Leibniz's rule it follows that
limx→∞
δ∫∞log(x) z
δ−1e−zdz
(log(x))δ
x
= limx→∞
−δ(log(x))δ−1e− log(x) 1x(δ(log(x))δ−1−(log(x))δ
x2
)= lim
x→∞
−δδ − (log(x))
= 0
iv)
The fourth condition is trivially satis�ed since
maxj∈1,...,k
∣∣∣∣K ( jk + 1)∣∣∣∣ ≤ (log(k + 1))δ = o(√k)
v)
This condition is also trivially satis�ed since∫ 10
|K(u)|u|ρ|−1−�du =∫ 10uτ+|ρ|−1−�(− log u)δdu
=Γ(δ + 1)
(τ + |ρ| − �)δ+1
-
Appendix 35
where Ei are standard exponential random variables and K (u) , 0 < u < 1 is a kernelfuction. Furthermore, let
vk =
√√√√1k
k∑j=1
K2(
j
k + 1
). (2.23)
Then√k(Zk − 1k
∑kj=1K
(j
k+1
))vk
D→ N(0, 1) ⇔ max1≤j≤k
∣∣∣∣K ( jk + 1)∣∣∣∣ = o(√kvk) (2.24)
as k → ∞. If further we have
1
k
k∑j=1
K
(j
k + 1
)= µ(K) + o
(1√k
), vk → σ(K) > 0, (2.25)
µ(K) and σ(K) �nite, and
max1≤j≤k
∣∣∣∣K ( jk + 1)∣∣∣∣ = o(√k) , as k → ∞, (2.26)
then√k (Zk − µ(K))
σ(K)
D→ N(0, 1) (2.27)
as k → ∞
Lemma 2.4.2. (Goegebeur et al., 2010) Denote by E1, . . . , Ek standard exponential randomvariables and by U1,k ≤ · · · ≤ Uk,k the order statistics of a random sample of size k fromU(0, 1). Assume that
∫ 10 |K(u)| du 0, where γ is a real parameter. Then for all �, δ > 0 there is a t0 = t0(�, δ) suchthat for t, tx ≥ t0, ∣∣∣∣f(tx)− f(t)b0(t) − x
γ − 1γ
∣∣∣∣ ≤ �xγ max(xδ, x−δ) ,where
b0(t) :=
γf(t), γ > 0,−γ(f(∞)− f(t)), γ < 0,f(t)− t−1
∫ 10 f(s)ds, γ = 0
(2.29)
-
Chapter 3
Multivariate extreme value theory
In this chapter we introduce the basic limit laws in multivariate extreme value theory. After a
transformation of the marginal distribution functions to standard Fréchet margins, we discuss
the dependence structure between the variables. This discussion starts with the exponent
and spectral measure, before we turn our attention to the max domain of attraction in the
multivariate framework and asymptotic independence. This is followed by an introduction to
several other dependence measures. The measures we consider are the Pickands dependence
function and the pair of dependence measures χ and χ̄. We explain the relation betweenall these dependence measures and discuss ways of getting from one to the other. Finally
we introduce the model of Ledford and Tawn (1997) and make the connection between the
coe�cient of tail dependence η and the other dependence measures discussed previously.
3.1 Limit laws
The results we present in this section will be based on two-dimensional spaces. General-
izations to higher dimensional spaces are obvious, but require heavier notation. Suppose
(X1, Y1) , . . . , (Xn, Yn) are i.i.d. random vectors with distribution function FXY . We de�nethe maximum of a set of vectors of this form as
Mn := (max (X1, . . . , Xn) ,max (Y1, . . . , Yn)) ,
which is simply the vector of componentwise maxima. We start by deriving an important
theorem, which is the foundation of our description of the asymptotic distributions that can
occur for an appropriately normalized maximum of the form of Mn. Suppose there existssequences of constants (bn)
∞n=1, (dn)
∞n=1 and sequences of positive constants (an)
∞n=1, (cn)
∞n=1
and a distribution function G with nondegenerate marginals such that
limn→∞
P
(max (X1, . . . , Xn)− bn
an≤ x, max (Y1, . . . , Yn)− dn
cn≤ y
)= G(x, y) (3.1)
for all continuity points (x, y) of G. Any limit distribution function G in (3.1) with nonde-generate marginals is called a multivariate extreme value distribution. It follows that
limn→∞
P
(max (X1, . . . , Xn)− bn
an≤ x
)= G(x,∞)
36
-
Limit laws 37
and
limn→∞
P
(max (Y1, . . . , Yn)− dn
bn≤ y)
= G(∞, y),
since (3.1) implies convergence of the marginal distributions. According to Theorem 1.1.2 we
can chose the constants an, bn, cn and dn such that for some γ1, γ2 ∈ R, we have
G(x,∞) = exp(− (1 + γ1x)
− 1γ1
)(3.2)
and
G(∞, y) = exp(− (1 + γ2y)
− 1γ2
). (3.3)
It is relevant to note that G is continuous, since the two marginal distributions of G arecontinuous.
If we let FX and FY be the two marginal distributions of FXY and UX and UY be thetwo corresponding tail quantile functions, then according to Theorem 1.1.2 there are positive
functions aX(t) and aY (t), such that
limt→∞
UX(tx)− UX(t)aX(t)
=xγ1 − 1γ1
, ∀x > 0
and
limt→∞
UY (tx)− UY (t)aY (t)
=xγ2 − 1γ2
, ∀x > 0.
Hence
limn→∞
UX(nx)− bnan
=xγ1 − 1γ1
and
limn→∞
UY (nx)− dncn
=xγ2 − 1γ2
,
if we choose the constants an, bn, cn and dn according to Theorem 1.1.2.We easily see that (3.1) can be written as
G(x, y) = limn→∞
FnXY (anx+ bn, cny + dn) .
If xn → u and yn → v then by the continuity of G and the monotonicity of FXY we have that
G(u, v) = limn→∞
FnXY (anxn + bn, cnyn + dn) .
Applying this result with
xn :=UX(nx)− bn
an, x > 0
and
yn :=UY (ny)− dn
cn, y > 0
gives
G
(xγ1 − 1γ1
,yγ2 − 1γ2
)= lim
n→∞FnXY (U1(nx), U2(ny)) .
These results establish the following theorem.
-
38 The exponent measure and the spectral measure
Theorem 3.1.1. (de Haan and Ferreira, 2006) Let (X1, Y1) , . . . , (Xn, Yn) be i.i.d. ran-dom vectors with distribution function FXY . Suppose there exists sequences of real constants(bn)
∞n=1, (dn)
∞n=1 and positive real constants (an)
∞n=1 and (cn)
∞n=1 such that
limn→∞
FnXY (anx+ bn, cny + dn) = G(x, y)
for all (x, y) of G, and the marginals of G are standardized as in (3.2) and (3.3). Thenwith FX(x) := FXY (x,∞), FY (y) := FXY (∞, y) and UX and UY the two corresponding tailquantile functions, we have that
limn→∞
FnXY (UX(nx), UY (ny)) = G0(x, y) (3.4)
for all x, y > 0, where
G0(x, y) := G
(xγ1 − 1γ1
,yγ2 − 1γ2
)and γ1, γ2 are the marginal extreme value indices from (3.2) and (3.3).
Remark 3.1.2. The multivariate extreme value distribution function G(xγ1−1γ1
, yγ2−1γ2
)has
marginal distributions which are standard Fréchet, i.e. FZ(z) = exp(−1z), z > 0. This fact
simpli�es things, because now we only have to discuss the dependence structure between the
two variables.
The following Corollary is obtained from Theorem 3.1.1, which we state without proof. For
details we refer to de Haan and Ferreira (2006), Corollary 6.1.3 and Corollary 6.1.4
Corollary 3.1.3. (de Haan and Ferreira, 2006) Under the conditions of Theorem 3.1.1, we
have for any (x, y) for which 0 < G0(x, y) < 1, that
limn→∞
n {1− F : XY (UX(nx), UY (ny))} = − logG0(x, y) (3.5)
and
limt→∞
t {1− FXY (UX(tx), UY (ty))} = − logG0(x, y), (3.6)
where t runs through the real numbers.
3.2 The exponent measure and the spectral measure
From Corollary 3.1.3 we can obtain the following usefull theorem.
Theorem 3.2.1. (de Haan and Ferreira, 2006) Let FXY and G0 be distribution functionswhere for x, y > 0 with 0 < G0(x, y) < 1 we have that
limn→∞
n {1− FXY (UX(nx), UY (ny))} = − logG0(x, y),
where UX and UY are the tail quantile functions of the marginals of FXY . Then there are setfunctions ν, ν1, ν2, . . . de�ned for all Borel sets A ⊂ R2+ with
infx,y∈A
max(x, y) > 0
such that
-
The exponent measure and the spectral measure 39
(i)
νn{(s, t) ∈ R2+ : s > x or t > y
}= n {1− FXY (UX(nx), UY (ny))} , (3.7)
ν{(s, t) ∈ R2+ : s > x or t > y
}= − logG0(x, y). (3.8)
(ii) For all a > 0 the set functions ν, ν1, ν2, . . . are �nite measures on R2+\[0, a]2.
(iii) For each Borel set A ⊂ R2+ with infx,y∈Amax(x, y) > 0 and ν(∂A) = 0,
limn→∞
νn(A) = ν(A). (3.9)
De�nition 3.2.2. The measure ν from (3.8) is called the exponent measure of the extremevalue distribution G0, since
G0(x, y) = exp (−ν (Ax,y))
with
Ax,y :={(s, t) ∈ R2+ : s > x or t > y
}.
In the following we let ν(x, y) := ν (Ax,y)
An important property of the exponent measure, which will be needed later in this chapter,
is that it is homogeneous of order −1, as given in Theorem 3.2.3.
Theorem 3.2.3. (de Haan and Ferreira, 2006) For any Borel set A ⊂ R2+ with inf(x,y)∈Amax(x, y) >0 and ν(∂A) = 0, and any a > 0,
ν(aA) = a−1ν(A),
where aA is the set obtained by multiplying all elements of A by a.
From the exponent measure we can also obtain the spectral measure. The spectral measure
arises when we make a one-to-one transformation R2+\{(0, 0)} → (0,∞)× [0, c] for some c > 0,{r = r(x, y),d = d(x, y),
with the property that for all a, x, y > 0, we have{r(ax, ay) = ar(x, y),d(ax, ay) = d(x, y).
We can think of r as a radius and d as an angle or a direction. In this thesis we will onlyconsider the transformation {
r(x, y) = x+ y,d(x, y) = xx+y ,
in which case the following theorem can be shown to hold.
-
40 The exponent measure and the spectral measure
Theorem 3.2.4. (de Haan and Ferreira, 2006) For each limit distribution G from (3.1),(3.2) and (3.3) there exist a probability distribution (denoted by the distribution function H)concentrated on [0, 1] with mean 12 such that for x, y > 0,
G
(xγ1 − 1γ1
,yγ2 − 1γ2
)= G0(x, y)
= exp
(−2∫ 10
(ω
x∨ 1− ω
y
)dH(ω)
), (3.10)
where ωx ∨1−ωy := max
(ωx ,
1−ωy
).
From (3.10) we see that the limit distributions in (3.1) are characterized solely by the spectral
measure H and the marginal extreme value indices. Many more transformations than the onewe considered can be chosen in order to construct a spectral measure. In fact there are endless
possibilities. The transformation to choose depends on the situation at hand, and in a sense
they are all equivalent, since one can be transformed into the other.
From (3.8) and (3.10) we see that the connection between the exponent measure and the
spectral measure is given by
ν(x, y) = 2
∫ 10
(ω
x∨ 1− ω
y
)dH(ω).
However, it is not always obvious how to get from one measure to the other using this relation.
In case this is not obvious, and G0 is absolutely continuous, we can use a method discoveredby Coles and Tawn (1991), to compute the spectral density from the exponent measure. In
the bivariate case, the point masses of H on 0 and 1 are
H({0}) = −12limx→0
∂ν
∂y(x, y), (3.11)
H({1}) = −12limy→0
∂ν
∂x(x, y). (3.12)
and the density for 0 < ω < 1 is given by
h(ω) = −12
∂2ν(x, y)
∂x∂y
∣∣∣∣(ω,1−ω)
. (3.13)
Next we will consider some examples of spectral and exponent measures.
Example 3.2.5. We start by considering two important special cases of H. The �rst is thedistribution function which places a point mass of 1 on ω = 12 . In this case we obtain
G0(x, y) = exp(−max
(x−1, y−1
)), x, y > 0,
which corresponds to complete dependence between the two variables. Here G0 is not ab-solutely continuous, so the method discussed above does not apply. The second case is the
distribution function which places point mass of 12 on both ω = 0 and ω = 1. In this case itfollows that
G0(x, y) = exp(−(x−1 + y−1
)), x, y > 0,
which corresponds to independence between the two variables. Here G0 is absolutely contin-uous, though with a spectral measure putting masses of 12 at 0 and 1.a �
-
Domain of attraction and asymptotic independence 41
Example 3.2.6. The logistic model (Gumbel, 1960a,b) given by
ν(x, y) =(x−
1α + y−
1α
)α, x, y > 0, 0 < α < 1,
is the oldest parametric family of bivariate extreme value dependence structures. It is a
versatile model which covers all levels of dependence from independent variables to completely
dependent variables. We see that for α→ 0 we get
ν(x, y) = max(x−1, y−1
)and for α→ 1 it follows that
ν(x, y) = x−1 + y−1,
which corresponds to complete dependence and independence between the variables, respec-
tively. The logistic model does however not allow for asymmetry in the dependence structure,
as the variables are exchangeable.
From the exponent measure we can compute the point mass of H at 0
H({0}) = 12limx→0
y−1α−1(x−
1α + y−
1α
)α−1= 0,
using (3.11). Because of symmetry the point mass of H at 1 is also 0. The spectral densityon (0, 1) can be found using (3.13). We start by �nding
∂2ν(x, y)
∂x∂y= −1− α
αx−
1α−1y−
1α−1(x−
1α + y−
1α
)α−2.
From this we obtain the spectral density on (0, 1)
h(ω) =1
2
1− αα
ω−1α−1(1− ω)−
1α−1(ω−
1α + (1− ω)−
1α
)α−2.
a �
3.3 Domain of attraction and asymptotic independence
In order to discuss the domain of attraction in the multivariate case we �rst need to introduce
the concept of max stability.
De�nition 3.3.1. If there exists sequences of constants (bn)∞n=1, (dn)
∞n=1 and sequences of
positive constants (an)∞n=1 and (cn)
∞n=1 such that
Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R, ∀n ∈ N, (3.14)
for some distribution function G. Then G belongs to the class of max stable distributions.
With this de�nition we are now able to discuss the bivariate max domain of attraction.
De�nition 3.3.2. Let G : R2 → R+ be a max stable distribution function. A distributionfunction FXY is said to be in the max domain of attraction of G if there exists sequences ofconstants (bn)
∞n=1, (dn)
∞n=1 and sequences of positive constants (an)
∞n=1 and (cn)
∞n=1 such that
limn→∞
FnXY (anx+ bn, cny + dn) = G(x, y) (3.15)
for all x, y ∈ R.
-
42 Domain of attraction and asymptotic independence
Our next proposition shows that the class of max stable distributions and the class of extreme
value distributions coincide.
Proposition 3.3.3. A distribution function G is max stable if and only if it is an extremevalue distribution.
Proof. Assume G is a max stable distribution. Then by De�nition 3.3.1 there exists sequencesof constants (bn)
∞n=1, (dn)
∞n=1 and sequences of positive constants (an)
∞n=1 and (cn)
∞n=1 such
that
Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R, ∀n ∈ N.
Since
limn→∞
Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R,
it follows by Theorem 3.1.1 that G is an extreme value distribution.Now, assume that G is an extreme value distribution. We can without loss of generalityassume that G is on the same form as G0 de�ned in Theorem 3.1.1. By De�nition 3.2.2 andTheorem 3.2.3, it follows that
Gn(nx, ny) = exp(−nν(nAx,y)), ∀x, y ∈ R, ∀n ∈ N= exp(−ν(Ax,y))= G(x, y).
So G satis�es De�nition 3.3.1 with an = cn = n and bn = dn = 0, and is hence a max stabledistribution.
Next we present a theorem which gives some equivalent formulations of the max domain of
attraction condition.
Theorem 3.3.4. (de Haan and Ferreira, 2006) Let G be a max stable distribution. Let the
marginal distribution functions be exp(− (1 + γ1x)
− 1γ1
)and exp
(− (1 + γ2y)
− 1γ2
), and let
H be its spectral measure according to the representation of Theorem 3.2.4. Then
(i) If the distribution function FXY of the random vector (X,Y ) with continuous marginaldistribution functions FX and FY is in the max domain of attraction of G, then thefollowing equivalent conditions are ful�lled:
(a) With UX and UY being the tail quantile functions of FX and FY , we have forx, y > 0, that
limt→∞
1− FXY (UX(tx), UY (tx))1− FXY (UX(t), UY (t))
= S(x, y) (3.16)
with S(x, y) :=logG
(xγ1−1
γ1, y
γ2−1γ2
)logG(0,0) .
(b) For all r > 1 and all s ∈ [0, 1] that are continuity points of H,
limt→∞
P
(V +W > rt and
V
V +W≤ s∣∣∣∣V +W > t) = r−1H(s), (3.17)
where V := 11−FX(X) and W :=1
1−FY (Y )
-
Domain of attraction and asymptotic independence 43
(ii) Conversely, if the continuous marginal distribution functions FX and FY are in the do-
main of attraction of exp(− (1 + γ1x)
− 1γ1
)and exp
(− (1 + γ2y)
− 1γ2
), respectively, and
any limit relation (3.16)-(3.17) holds for some positive function S or some distributionfunction H, then FXY is in the max domain of attraction of G.
We saw in Example 3.2.5 that there exists a special case of the spectral measure, where the
max stable distribution has independent components. This gives inspiration to the following
de�nition.
De�nition 3.3.5. A random vector (X,Y ) whose distribution function FXY is in the domainof attraction of a max stable distribution with independent components, is said to have the
property of asymptotic independence.
From this de�nition we are able to obtain the following theorem.
Theorem 3.3.6. (de Haan and Ferreira, 2006) Let FXY : R2 → R+ be a probability distribu-tion function. Suppose that its marginal distribution functions FX : R → R+ and FY : R → R+satisfy
limn→∞
FnX (anx+ bn) = exp(− (1 + γ1x)
− 1γ1
)and
limn→∞
FnY (cny + dn) = exp(− (1 + γ2y)
− 1γ2
)for all x, y for which 1 + γ1x > 0, 1 + γ2y > 0 and where (bn)
∞n=1, (dn)
∞n=1 are sequences of
real constants and (an)∞n=1 and (cn)
∞n=1 are sequences of positive real constants. Let (X,Y ) be
a random vector with distribution function FXY . If
limt→∞
P (X > UX(t), Y > UY (t))
P (Y > UY (t))= 0, (3.18)
then
limn→∞
FnXY (anx+ bn, cny + dn) = exp(− (1 + γ1x)
− 1γ1 − (1 + γ2y)
− 1γ2
)for 1 + γ1x > 0 and 1 + γ2y > 0. Hence X and Y are asymptotically independent.Conversely, asymptotic independence entails (3.18).
Proof. Assume (3.18) holds. Then also
limt→∞
tP (X > UX(t), Y > UY (t))
tP (Y > UY (t))= 0.
Using Theorem 1.1.2 (i) and (iii) with x = 0 we �nd that
limt→∞
tP (Y > UY (t)) = 1, (3.19)
and hence
limt→∞
tP (X > UX(t), Y > UY (t)) = 0.
Because of monotonicity, it follows that
limt→∞
tP (X > UX(tx), Y > UY (ty)) = 0, ∀x, y > 0,
-
44 Pickands dependence function
and then also for the set Ãx,y :={(s, t) ∈ R2+ : s > x and t > y
}we have
ν(Ãx,y
)= lim
n→∞νn
(Ãx,y
)= lim
n→∞nP (X > UX(tx), Y > UY (ny))
= 0, ∀x, y > 0.
This means that the spectral measure puts its entire mass on the lines x = 0 and y = 0, i.e.
H[{0}] = 12
and H[{1}] = 12.
This is equivalent to X and Y being asymptotically independent.Conversely, assume that X and Y are asymptotically independent. Then
G0(x, y) = exp(−x−1 − y−1
), x, y > 0,
and hence for x = y = 1 we have
G0(1, 1) = exp (−2) .
Using Corollary 3.1.3, this implies that
2 = limt→∞
t (1− P (X ≤ UX(t), Y ≤ UY (t))) (3.20)
= limt→∞
t (P (X > UX(t)) + P (Y > UY (t))− P (X > UX(t), Y > UY (t))) . (3.21)
From Theorem 1.1.2 (i) and (iii) it follows that
limt→∞
tP (X > UX(t), Y > UY (t)) = 0,
and hence by (3.19), we have that
limt→∞
P (X > UX(t), Y > UY (t))
P (Y > UY (t))= 0.
3.4 Pickands dependence function
Whereas the dependence measures we have discussed previously have straightforward general-
izations from the bivariate case to the multidimensional case, this is not true for the following
dependence measure. This is strictly a bivariate dependence measure. The dependence mea-
sure we are going to discuss is related to the function L : R2+ → R given by
L(x, y) := − logG0(1
x,1
y
). (3.22)
This can also be expressed in terms of the exponent measure as
L(x, y) = ν
{(s, t) ∈ R2+ : s >
1
xor t >
1
y
}
-
Pickands dependence function 45
using (3.8), or in terms of the spectral measure as
L(x, y) = 2
∫ 10
(ωx ∨ (1− ω)y) dH(ω) (3.23)
using (3.10). The function L has the following properties. These are easy to derive from theproperties of the exponent and spectral measure and will therefore for brevity not be proven
here.
Proposition 3.4.1. (de Haan and Ferreira, 2006) Let L be as de�ned in (3.22). Then L hasthe following properties.
(i) Homogeneity of order 1: L(ax, ay) = aL(x, y), for all a, x, y > 0.
(ii) L(x, 0) = L(0, x) = x, for all x > 0.
(iii) x ∨ y ≤ L(x, y) ≤ x+ y, for all x, y > 0.
(iv) Let (X,Y ) be a random vector with distribution function G0(x, y). If X and Y areindependent, then L(x, y) = x + y, for x, y > 0. If X and Y are completely dependent,then L(x, y) = x ∨ y for x, y > 0.
(v) L is continuous.
(vi) L(x, y) is a convex function: L (λ (x1, y1) + (1− λ) (x2, y2)) ≤ λL (x1, y1)+(1− λ)L (x2, y2)for all x1, x2, y1, y2 > 0 and λ ∈ [0, 1].
From the function L we can obtain the Pickands dependence function A : [0, 1] → R introducedin Pickands (1981). This function is given by
A(t) := − logG0(
1
1− t,1
t
)= L(1− t, t). (3.24)
If we let t = yx+y we easily �nd that
L(x, y) = (x+ y)A
(y
x+ y
),
and hence Pickands dependence function completely determines the function L.Pickands dependence function can easily be connected to the spectral measure through the
function L. If we combine (3.23) and (3.24) we get
A(t) = 2
∫[0,1]
(ω(1− t) ∨ (1− ω)t)dH(ω)
= 2t
∫[0,t]
(1− ω)dH(ω) + 2(1− t)∫(t,1]
ωdH(ω).
Since H has mean 12 we have that∫[0,1] ωdH(ω) =
∫[0,1](1−ω)dH(ω) =
12 . Using this it follows
that ∫(t,1]
ωdH(ω) =1
2−H([0, t]) +
∫[0,t]
(1− ω)dH(ω).
-
46 Pickands dependence function
Hence
A(t) = 2
∫[0,t]
(1− ω)dH(ω) + (1− t) (1− 2H([0, t])) .
The term∫[0,t](1− ω)dH(ω) can also be written as∫
[0,t](1− ω)dH(ω) =
∫[0,t]
∫[ω,1]
dudH(ω)
=
∫[0,1]
∫[0,u∧t]
dH(ω)du
=
∫[0,t]
∫[0,u]
dH(ω)du+
∫(t,1]
∫[