estimation of tail dependence with application to twin data · 2014. 4. 30. · the twin data we...

115

Upload: others

Post on 10-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Estimation of tail

    dependence with

    application to twin data

    Master thesis by

    Michael Osmann

    May 21, 2012

    Supervisors: Yuri Goegebeur and Jacob Hjelmborg

  • Contents

    Abstract 4

    Acknowledgements 4

    1 Preliminaries 5

    1.1 Classical convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.2 The Gumbel class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.3 The extremal Weibull class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.4 Estimation of the extreme value index in practice . . . . . . . . . . . . . . . . . 11

    2 Pareto-type distributions 14

    2.1 Domain of attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.2 Estimation of the extreme value index . . . . . . . . . . . . . . . . . . . . . . . 17

    2.3 Estimation of the second order parameter . . . . . . . . . . . . . . . . . . . . . 27

    2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    2.4.1 Proof of Lemma 2.2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    2.4.2 Lemma's needed in the proof of Theorem 2.2.7 . . . . . . . . . . . . . . 34

    3 Multivariate extreme value theory 36

    3.1 Limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.2 The exponent measure and the spectral measure . . . . . . . . . . . . . . . . . 38

    3.3 Domain of attraction and asymptotic independence . . . . . . . . . . . . . . . . 41

    3.4 Pickands dependence function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    3.5 The dependence measures χ and χ̄ . . . . . . . . . . . . . . . . . . . . . . . . . 48

    3.6 The model of Ledford and Tawn . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    2

  • CONTENTS 3

    4 Estimation of the coe�cient of tail dependence and the second order pa-

    rameter in bivariate extreme value statistics 57

    4.1 Estimation of the coe�cient of tail dependence . . . . . . . . . . . . . . . . . . 57

    4.2 Estimation of the second order parameter . . . . . . . . . . . . . . . . . . . . . 62

    4.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    4.3.1 Proof of Lemma 4.1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5 Simulation study 68

    5.1 Copula examples and simulation of data . . . . . . . . . . . . . . . . . . . . . . 68

    5.2 Estimation of the second order parameter τ . . . . . . . . . . . . . . . . . . . . 73

    5.3 Estimation of the �rst order parameter η . . . . . . . . . . . . . . . . . . . . . . 74

    5.4 Estimation of the dependence measures χ and χ̄ . . . . . . . . . . . . . . . . . . 74

    6 Estimation of taildependence in BMI twindata 97

    6.1 Description of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    6.2 Univariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    6.3 Multivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    Epilogue 111

    Bibliography 113

  • 4 CONTENTS

    Abstract

    This master thesis consists of a theoretical discussion on univariate and bivariate extreme value

    statistics along with an application to twin data. We �rst discuss the fundamental conver-

    gence results from extreme value theory, which we use to construct the traditional maximum

    likelihood estimators of the extreme value index. In order to put our work into the proper

    framework, attention is paid to the three classes of extreme value distributions situated within

    the max domain of attraction of the generalized extreme value distribution. Special attention

    is given to the class of Pareto type distributions, since the methodology of how to construct

    estimators in the multivariate setting resembles the methodology used to construct estima-

    tors within the class of Pareto-type distributions. For the class of Pareto-type distributions

    we propose an estimator of the extreme value index and an estimator for the second order

    parameter. For both of these estimators we establish the asymptotic normality.

    In the multivariate setting we start by discussing the transformation of the margins to stan-

    dard Fréchet distributions and the fundamental convergence results. We discuss the domain of

    attraction to the bivariate extreme value distribution and asymptotic dependence and asymp-

    totic independence. We discuss furthermore the exponent measure, the spectral measure,

    Pickands dependence function, the dependence measures χ and χ̄, and �nally the coe�cientof tail dependence. The interpretations of these measures are discussed and we show how they

    are all connected. For the coe�cient of tail dependence we introduce a functional estimator,

    for which we show how it can be bias corrected. This bias correction requires estimation of

    the second order parameter τ , so we propose two estimators that can be used to estimate thissecond order parameter. The consistency of the estimators for the second order parameter are

    established. We examine the �nite sample size behaviour of our estimator for the coe�cient

    of tail dependence, the estimators of the second order condition and estimators of χ and χ̄using simulations.

    The twin data we consider is from the older cohort of the Finnish Twin Cohort Study. For this

    data we make a full univariate data analysis and estimate the coe�cient of tail dependence,

    the second order parameter τ , and the measures χ and χ̄ for age and sex de�ned subsets ofthe data.

    Throughout the thesis, results that are from the literature are stated with a reference, while

    results that are our own are not stated with a reference.

    Acknowledgements

    I would like to thank my two supervisors Yuri Goegebeur and Jacob v. B. Hjelmborg for

    helping me write this master thesis during the last 8 months. I would not have been ableto write this thesis without their help, and they have both spend a lot of time and e�ort on

    this. I am gratefull that they decided to join forces and help me write a thesis with such an

    interesting topic.

  • Chapter 1

    Preliminaries

    This chapter serves to give a short introduction to some of the basic concepts in univariate

    extreme value statistics. First we will introduce a convergence result which is the foundation

    of univariate extreme value statistics. It states what form the limiting distribution of a nor-

    malized maximum will follow, if it exists. We will then describe shortly two of the classes of

    extreme value distributions, known as the Gumbel and extremal Weibull families, respectively.

    Finally, we discuss some simple ways in which the extreme value index can be estimated in

    practice.

    1.1 Classical convergence result

    In the following we will consider a sample {Xi, 1 ≤ i ≤ n} of independent and identicallydistributed (i.i.d.) random variables having a distribution function FX . In extreme valuestatistics we consider either the maximum or the minimum of the random sample, where the

    maximum is given by

    Xn,n := max{X1, X2, . . . , Xn}.

    We will try to describe the statistical behaviour of this maximum, but it is easy to transform

    any result we obtain for the maximum to the minimum because of the relation

    X1,n := min{X1, X2, . . . , Xn} = −max{−X1,−X2, . . . ,−Xn}. (1.1)

    Because of the i.i.d. nature of X1, . . . , Xn, the distribution of Xn,n can be derived exactly forall possible values of n as follows

    FXn,n(x) = P (Xn,n ≤ x)= P (X1 ≤ x,X2 ≤ x, . . .Xn ≤ x)= P (X1 ≤ x)P (X2 ≤ x) · · ·P (Xn ≤ x)= (FX(x))

    n .

    For practical purposes this relation does not help much though, since the distribution of

    FX is usually unknown. One could try to estimate the distribution of FX and use this toestimate FXn,n , but small deviations in the estimation of FX can lead to large deviations inthe estimation of FXn,n . Instead we will look for approximate families of FXn,n which for large

    5

  • 6 Classical convergence result

    n can be estimated by use of the extreme data only.We look at the behaviour of FXn,n as n approaches in�nity. If we denote the right endpointof FX as x∗, which means that x∗ := inf{x : FX(x) = 1}, then for any x < x∗ we have thatFnX(x) → 0 as n→ ∞. So the distribution of Xn,n is degenerate in the limit. This degeneracycan possibly be avoided if we look at an appropriate normalization, for instance

    Xn,n − bnan

    where (bn)∞n=1 is a sequence of constants and (an)

    ∞n=1 is a sequence of positive constants.

    Appropriate choices of (an)∞n=1 and (bn)

    ∞n=1 can stabilize the location and scale of

    Xn,n−bnan

    . It

    can be shown that the entire range of limit distributions ofXn,n−bn

    an, if they exist, is given by

    Theorem 1.1.1.

    Theorem 1.1.1. (Fisher and Tippet, 1928; Gnedenko, 1943) Let X1, . . . , Xn be i.i.d. randomvariables with distribution function FX . If there exists sequences of constants (bn)

    ∞n=1 and

    positive constants (an)∞n=1 such that

    limn→∞

    P

    (Xn,n − bn

    an≤ x

    )= lim

    n→∞FnX (anx+ bn) = G(x) (1.2)

    at all continuity points of G, where G is a non degenerate distribution function, then G shouldbe of the following type

    Gγ(x) = exp(−(1 + γx)−

    ), 1 + γx > 0, (1.3)

    with γ real and where for γ = 0 the right-hand side is interpreted as exp (−e−x).

    This family of distribution functions is known as the generalized extreme value (GEV) family,

    for which the parameter γ is the shape parameter. This parameter is also called the extremevalue index and it describes the tail behaviour of FX , with larger values indicating heaviertails. The family consists of three classes known as the Gumbel, Fréchet and extremal Weibull

    families which correspond to γ = 0, γ > 0 and γ < 0 respectively. The Fréchet class is alsoknown as the class of Pareto-type models. If the distribution FX satis�es (1.2)-(1.3) then wesay that it belongs to the max domain of attraction of Gγ , denoted FX ∈ D(Gγ).The result in Theorem 1.1.1 has some equivalent formulations. Some of these formulations

    are based on the tail quantile function U(y) := Q(1− 1y

    ), y > 1, where Q is the quantile

    function, de�ned as Q(p) := inf{x : FX(x) ≥ p}, p ∈ (0, 1). These equivalent formulationsare stated in Theorem 1.1.2.

    Theorem 1.1.2. (Gnedenko, 1943; de Haan and Ferreira, 2006) Let X1, . . . , Xn be i.i.d.random variables with distribution function FX . For γ ∈ R the following statements areequivalent:

    (i) There exists sequences of real constants (bn)∞n=1 and positive real constants (an)

    ∞n=1 such

    that

    limn→∞

    FnX (anx+ bn) = Gγ(x) = exp(−(1 + γx)−

    ), (1.4)

    for all x with 1 + γx > 0.

  • Classical convergence result 7

    (ii) There is a positive function a such that for all x > 0,

    limt→∞

    U(tx)− U(t)a(t)

    =xγ − 1γ

    , (1.5)

    where for γ = 0 the right-hand side is interpreted as log x.

    (iii) There is a positive function a such that

    limt→∞

    t(1− FX(a(t)x+ U(t))) = (1 + γx)−1γ , (1.6)

    for all x with 1 + γx > 0.

    (iv) There exists a positive function f such that

    limt↑x∗

    1− FX(t+ xf(t))1− FX(t)

    = (1 + γx)− 1

    γ (1.7)

    for all x for which 1 + γx > 0.

    Moreover, (1.4) holds with bn := U(n) and an := a(n). Also (1.7) holds with f(t) =

    a(

    11−FX(t)

    ).

    As seen in Theorem 1.1.2 the choice of the normalizing constant bn does not depend on thesign of γ and can be shown to always work, if we choose bn = U(n). The choice of an dependson whether we are dealing with γ positive, negative or equal to zero, so we will address thisin the sections dedicated to the corresponding classes.

    In order to discuss the extremal Weibull and Fréchet classes, we need the concept of a slowly

    varying function. Slowly varying functions are special cases of regularly varying functions, so

    we will give the de�nition of what it means to be of regular variation. The regularly varying

    functions will also be needed later in this thesis.

    De�nition 1.1.3. (Beirlant et al., 2004, De�nition 2.1) Let f be an ultimately positive andmeasurable function on R+. We say that f is regularly varying at in�nity if there exists a realconstant ρ for which

    limx→∞

    f(λx)

    f(x)= λρ for all λ > 0.

    We write f ∈ Rρ and we call ρ the the index of regular variation. In the case ρ = 0, thefunction will be called slowly varying or of slow variation. We will reserve the symbol l forsuch functions. The class of all regularly varying functions is denoted by R.

    The next two sections will be dedicated to the Gumbel and the extremal Weibull class, while

    the Fréchet class which is of more importance for this thesis, will be discussed in the next

    chapter.

  • 8 The Gumbel class

    1.2 The Gumbel class

    The Gumbel class corresponds with the max domain of attraction of Gγ with γ = 0. Thefollowing proposition provides a characterization of the distributions that belong to this class.

    Proposition 1.2.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ �nite or in�nite and a, f suitable positive functions, that

    FX ∈ D(G0) ⇔ limt↑x∗

    1− FX(t+ xf(t))1− FX(t)

    = exp(−x), x ∈ R (1.8)

    ⇔ limt→∞

    U(tx)− U(t)a(t)

    = log(x), x > 0. (1.9)

    For the Fréchet and extremal Weibull classes it is easy to show that the distributions belonging

    to those classes satisfy (1.5), but this is not the case for the Gumbel class. This also meansthat determining the scaling parameter an for the distributions in the Gumbel class is moredi�cult. It can however be determined by the formula

    an = n

    ∫ x∗U(n)

    (1− FX(y)) dy.

    We will not derive this formula, but simply take it as a fact. For details we refer to de Haan

    and Ferreira (2006), Corollary 1.2.4.

    Example 1.2.2. If we want to determine the parameters an and bn for the exp(1) distributionwith distribution function FX(x) = 1−exp(−x), x > 0, then we must �rst �nd the tail quantiledistribution of the exponential distribution. The distribution function has quantile function

    Q(p) = − ln(1− p), 0 < p < 1. So

    U(x) = Q

    (1− 1

    x

    )= log(x), x > 1.

    This means bn can be chosen asbn = U(n) = log(n)

    and an can be chosen as

    an = n

    ∫ ∞log(n)

    exp(−x)dx = n exp (− log(n)) = 1.

    Since we know the constants an and bn we can also show that the exponential distributionbelongs to the max domain of attraction of the Gumbel class. Indeed

    P

    (Xn,n − bn

    an≤ x

    )= FnX(anx+ bn)

    = FnX (x+ log(n))

    = (1− exp (−x− log(n)))n

    =

    (1− exp(−x)

    n

    )n→ exp (− exp (−x)) for n→ ∞.

  • The extremal Weibull class 9

    The convergence of FnX (anx+ bn) to G(x) is shown in Figure 1.1. The solid line is G(x), thedashed line is for n = 2, the dotted line is for n = 5 and the dashed dotted line is for n = 10.It is clearly seen that when n grows then FnX (anx+ bn) converges pointwise to G(x).

    −2 −1 0 1 2 3 4 5

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x

    p

    Figure 1.1: The convergence of FnX (anx+ bn) to G(x) for the standard exponential distribu-tion.

    1.3 The extremal Weibull class

    The extremal Weibull class corresponds with the max domain of attraction of Gγ with γ < 0.As was the case for the Gumbel class, we have a proposition which provides a characterization

    of the distributions that belong to this class.

    Proposition 1.3.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ �nite that

    FX ∈ D(Gγ), γ < 0 ⇔ 1− FX(x∗ −

    1

    x

    )= x

    1γ lFX (x), x > 0 (1.10)

    ⇔ U(x) = x∗ − xγlU (x), x > 1, (1.11)

    where lU (x) and lFX (x) are slowly varying at in�nity.

  • 10 The extremal Weibull class

    From (1.11) it is easily seen that (1.5) is satis�ed when t tends to in�nity. Indeed

    U(tx)− U(t)a(t)

    =x∗ − (tx)γlU (tx)− (x∗ − tγlU (t))

    a(t)

    =tγlU (t)

    a(t)

    (1− xγ lU (tx)

    lU (t)

    )∼ −γ t

    γlU (t)

    a(t)

    xγ − 1γ

    ∼ xγ − 1γ

    if we choose a(t) such that a(t)x∗−U(t) → −γ. This indicates that a good choice of an would be

    an = a(n) = −γ(x∗ − U(n)) = −γnγlU (n).

    Example 1.3.2. The reversed Burr distribution has distribution function given by

    FX(x) = 1−(

    ζ

    ζ + (1− x)−δ

    )λ, x < 1;λ, ζ, δ > 0

    and so the quantile function is

    Q(p) = 1− ζ−1δ

    ((1− p)−

    1λ − 1

    )− 1δ, 0 < p < 1.

    So we �nd the tail quantile function U to be

    U(x) = Q

    (1− 1

    x

    )= 1− ζ−

    (x

    1λ − 1

    )− 1δ, x > 1.

    The distribution belongs to the max domain of attraction of Gγ with γ = − 1λδ . If we considerthe reversed Burr distribution with parameters λ = ζ = δ = 1, then we can choose thenormalizing constant bn as

    bn = U(n) = 1− (n− 1)−1 .Since x∗ = 1 and γ = −1 we can choose the normalizing constant an as

    an = 1− U(n) = (n− 1)−1 .

    With these normalizing constants we can show that the reversed Burr distribution with pa-

    rameters λ = ζ = δ = 1 belongs to the max domain of attraction of the Weibull class. Indeed

    P

    (Xn,n − bn

    an≤ x

    )= FnX (anx+ bn)

    = FnX((x− 1)(n− 1)−1 + 1

    )=

    1− 11 +

    (n−11−x

    )n

    =

    (1− 1− x

    n− x

    )n→ exp(−(1− x)) for n→ ∞.

  • Estimation of the extreme value index in practice 11

    The convergence of the reversed Burr distribution to its limit is illustrated in Figure 1.2. The

    solid line is G(x), the dashed line is for n = 2, the dotted line is for n = 5, while the dasheddotted line is for n = 10. It is clearly seen that when n grows, then FnX (anx+ bn) convergespointwise to G(x).

    −5 −4 −3 −2 −1 0 1

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x

    p

    Figure 1.2: The convergence of FnX (anx+ bn) to G(x) for the reversed Burr distribution withλ = ζ = δ = 1.

    1.4 Estimation of the extreme value index in practice

    In practise we do not know the constants an and bn, so Theorem 1.1.1 is not very usefull ifwe want to estimate γ. However, if we for some �nite n ∈ N have that

    P

    (Xn,n − bn

    an≤ x

    )≈ exp

    (−(1 + γx)−

    ), 1 + γx > 0,

    then

    P (Xn,n ≤ z) ≈ exp

    (−(1 + γ

    z − bnan

    )− 1γ

    ), 1 + γ

    z − bnan

    > 0,

    where z = bn + anx. If we let µ = bn and σ = an, then we are left with the model

    P (Xn,n ≤ z) ≈ exp

    (−(1 + γ

    z − µσ

    )− 1γ

    ), 1 + γ

    z − µσ

    > 0. (1.12)

    With this model we can easily obtain maximum likelihood estimates of µ, σ and γ. To dothis, we divide the data into m blocks and de�ne z1, . . . , zm to be the block maxima of the

  • 12 Estimation of the extreme value index in practice

    m blocks. Under the assumption that Z1, . . . , Zm are independent variables having the GEVdistribution we get from (1.12) that the log likelihood is given by

    logL(µ, σ, γ) = −m log σ −(1 +

    1

    γ

    ) m∑i=1

    log

    (1 + γ

    zi − µσ

    )−

    m∑i=1

    (1 + γ

    zi − µσ

    )− 1γ

    .

    (1.13)

    The maximum likelihood estimates are then obtained by maximizing (1.13) with respect to

    µ, σ and γ.Another popular model is the peaks over threshold model (POT). This model can be derived

    using Theorem 1.1.2. If we assume that (1.4) is satis�ed, then there exists a positive function

    f such that

    limt↑x∗

    P

    (X − tf(t)

    > x

    ∣∣∣∣X > t) = limt↑x∗ 1− FX(t+ f(t)x)1− FX(t) , x > 0= (1 + γx)

    − 1γ , 1 + γx > 0.

    For t large, we thus have

    P

    (X − tf(t)

    > x

    ∣∣∣∣X > t) ≈ (1 + γx)− 1γ , x > 0 and 1 + γx > 0,which reduces to

    P (X − t > z|X > t) ≈(1 + γ

    z

    σ

    )− 1γ, z > 0 and 1 + γ

    z

    σ> 0, (1.14)

    if we set z = f(t)x and f(t) = σ. From this we are able to get maximum likelihood estimatesof γ and σ when we choose a threshold t. If we let z1, . . . , zk denote the k observations whichare greater than the threshold t, then we obtain the log likelihood function from (1.14). Thelog likelihood is given by

    logL(σ, γ) = −k log σ −(1 +

    1

    γ

    ) k∑i=1

    log(1 + γ

    ziσ

    ). (1.15)

    The maximum likelihood estimates are obtained by maximizing (1.15) with respect to γ andσ.Using maximum likelihood with block maxima or peaks over threshold is an easy way to

    estimate γ. There are many other ways to estimate γ but we will not go into detail aboutthem. Among the methods of estimating γ for the generalized extreme value distribution arethe Pickands estimator (Pickands, 1975), the moment estimator (Dekkers et al., 1989), and

    the probability-weighted moment estimator (Hosking et al., 1985).

    When considering the POT model we have to choose the threshold ourselves. There are several

    ways to do this, but we will only discuss how to choose the threshold using a mean residual

    life plot. An introduction to mean residual life plots requires a small lemma about a property

    of the generalized Pareto distribution.

    Lemma 1.4.1. If X ∼ GPD(σ, γ), then X − u|X > u ∼ GPD(σ + γu, γ).

  • Estimation of the extreme value index in practice 13

    Proof. If X ∼ GPD(σ, γ), then FX(x) = 1−(1 + γ xσ

    )− 1γ . From this we get that

    P (X − u > x|X > u) = P (X > u+ x,X > u)P (X > u)

    , x > 0

    =1− FX(u+ x)1− FX(u)

    =

    (1 + γ x+uσ1 + γ uσ

    )− 1γ

    =

    (1 + γ

    x

    σ + γu

    )− 1γ

    ,

    which implies that X − u|X > u ∼ GPD(σ + γu, γ).

    If X ∼ GPD(σ, γ) with γ < 1, then

    E(X) =σ

    1− γ,

    while E(X) = ∞ for γ ≥ 1. So assuming γ < 1, it follows from Lemma 1.4.1 that

    E(X − u|X > u) = σ + γu1− γ

    , u > 0,

    and hence the mean excess function is linear in u. The mean residual life plot consists of the

    points {(u,

    1

    nu

    nu∑i=1

    (x(i) − u

    )): u < xmax

    },

    where x(1), . . . , x(nu) consists of the nu observations that exceeds u, and xmax is the largestobservation. If the GPD approximation is good at threshold u, then it should also be good ata higher threshold, so the mean excess function should be approximately linear in u beyond agood threshold.

  • Chapter 2

    Pareto-type distributions

    In this chapter we give an introduction to the Fréchet class. We start by considering the domain

    of attraction of this class, similar to the discussion of the Gumbel and extremal Weibull classes.

    Next we turn our attention to the estimation of the extreme value index γ for Pareto-typedistributions which satisfy a second order condition. We prove asymptotic normality for a

    statistic proposed in Goegebeur et al. (2010) and use this to construct a class of estimators

    for γ. From this class of estimators we construct speci�c estimators using kernel functions.We end this chapter with a presentation of an estimator of the second order parameter. The

    asymptotic normality of the latter is established under a third order condition.

    2.1 Domain of attraction

    The class of Pareto-type models corresponds with the max domain of attraction of Gγ withγ > 0. The following proposition provides a characterization of the distributions that belongto this class.

    Proposition 2.1.1. (Gnedenko, 1943) Let X be a random variable with distribution functionFX . Then we have for x∗ in�nite that

    FX ∈ D(Gγ), γ > 0 ⇔ 1− FX(x) = x−1γ lFX (x), x > 0 (2.1)

    ⇔ U(x) = xγlU (x), x > 1, (2.2)

    where lU (x) and lFX (x) are slowly varying at in�nity.

    Tail quantile functions of the form (2.2) can be shown to satisfy (1.5) if x tends to in�nity, inthe following way

    U(tx)− U(t)a(t)

    =(tx)γlU (tx)− tγlU (t)

    a(t)

    =lU (t)t

    γ

    a(t)

    (lU (tx)

    lU (t)xγ − 1

    )∼ x

    γ − 1γ

    14

  • Domain of attraction 15

    when choosing a(t) = γtγlU (t) = γU(t). More generally a(t) can also be chosen as a functionsatisfying

    limt→∞

    a(t)

    U(t)= γ.

    This brings us to how an can be chosen as a normalizing constant. If we choose an = a(n) =γU(n) then we can use this constant as one of the normalizing constants for the Fréchet class.There exists full equivalence between the Pareto-type models and the extremal Weibull class.

    If we let X be a random variable with FX belonging to the max domain of attraction of theextremal Weibull class with x∗ as the right endpoint, and put Y := (x∗ −X)−1, then theWeibull class and the Pareto-type models are linked through the identi�cation

    FX ∈ D (Gγ) , γ < 0 ⇔ FY ∈ D (Gγ) , γ > 0.

    The equivalence follows easily because

    1− FX(x∗ −

    1

    x

    )= P

    (X > x∗ −

    1

    x

    )= P

    ((x∗ −X)−1 > x

    )= 1− FY (x).

    Example 2.1.2. The Fréchet distribution has distribution function given by

    FX(x) = exp(−x−α

    ), x > 0, α > 0.

    This means it has quantile function

    Q(p) = (− log p)−1α , 0 < p < 1,

    and hence the tail quantile function is

    U(x) =

    (− log

    (1− 1

    x

    ))− 1α

    , x > 1.

    The Fréchet distribution has γ = 1α and the normalizing constant an can hence be chosen as

    an = γU(n) =1

    α

    (− log

    (1− 1

    n

    ))− 1α

    .

    The normalizing constant bn can be chosen as

    bn = U(n) =

    (− log

    (1− 1

    n

    ))− 1α

    .

    Concerning the Fréchet distribution with α = 1 we see that

    P

    (Xn,n − bn

    an≤ x

    )= FnX (anx+ bn)

    = FnX

    ((− log

    (1− 1

    n

    ))−1x+

    (− log

    (1− 1

    n

    ))−1)

    =

    [(1− 1

    n

    )n] 11+x→ exp

    (−(1 + x)−1

    )for n→ ∞.

  • 16 Domain of attraction

    The convergence of the Fréchet distribution to its limit is illustrated in Figure 2.1. The solid

    line is G(x), the dashed line is for n = 2, the dotted line is for n = 5, while the dashed dottedline is for n = 10. It is clearly seen that when n grows, then FnX (anx+ bn) converges pointwiseto G(x).

    −1 0 1 2 3 4 5

    0.0

    0.2

    0.4

    0.6

    0.8

    x

    p

    Figure 2.1: The convergence of FnX (anx+ bn) to G(x) for the Fréchet distribution with α = 1.

    Next we give two examples of distributions that are of Pareto-type.

    Example 2.1.3. The Burr distribution has a distribution function given by

    FX(x) = 1−(

    ζ

    ζ + xδ

    )λ, x > 0, λ, ζ, δ > 0.

    In order to verify that the Burr distribution is of Pareto-type we start with

    1− FX(x) =(

    ζ

    ζ + xδ

    )λ= x−δλ

    ζx−δ + 1

    )λ.

    It is easily seen that g(x) :=(

    ζζx−δ+1

    )λis slowly varying at in�nity since it converges to a

    constant when x→ ∞. So the Burr distribution is of Pareto-type with γ = 1λδ .a �Example 2.1.4. The absolute T distribution has distribution function given by

    FX(x) =Γ(n+12

    )√nπΓ

    (n2

    ) ∫ x−x

    (1 +

    t2

    n

    )−n+12

    dt, x > 0, n ∈ N.

  • Estimation of the extreme value index 17

    In order to verify that the absolute T distribution is of Pareto-type we start with

    1− FX(x) = 2Γ(n+12

    )√nπΓ

    (n2

    ) ∫ ∞x

    (1 +

    t2

    n

    )−n+12

    dt

    = 2Γ(n+12

    )√nπΓ

    (n2

    ) ∫ ∞x

    (t2

    n

    )−n+12 ( n

    t2+ 1)−n+1

    2dt

    = K

    ∫ ∞x

    t−n−1(nt−2 + 1

    )−n+12 dt,

    where K := 2n

    n2 Γ(n+12 )√πΓ(n2 )

    . We are concerned with large values of x, so we make a Taylor series

    expansion of (1 + x)−n+12 around 0, which yields

    (nt−2 + 1

    )−n+12 =1− n+ 1

    2nt−2 +

    1

    2

    n+ 1

    2

    (n+ 1

    2+ 1

    )n2t−4

    − 16

    n+ 1

    2

    (n+ 1

    2+ 1

    )(n+ 1

    2+ 2

    )(1 + t̃

    )−n+12

    −3n3t−6,

    where t̃ is between 0 and nt2. From this it follows that

    1− FX(x) =K(∫ ∞

    xt−n−1dt− n(n+ 1)

    2

    ∫ ∞x

    t−n−3dt

    +n2(n+ 1)(n+ 3)

    8

    ∫ ∞x

    t−n−5dt

    − n3(n+ 1)(n+ 3)(n+ 5)

    48

    ∫ ∞x

    t−n−1(1 + t̃

    )−n+12

    −3t−6dt

    ).

    Since(1 + t̃

    )−n+12

    −3 ≤ 1 it follows that∫∞x t

    −n−1 (1 + t̃)−n+12 −3 t−6dt ≤ ∫∞x t−n−7dt, andhence

    1− FX(x) =K(x−n

    n− n(n+ 1)

    2(n+ 2)x−n−2 +

    n2(n+ 1)(n+ 3)

    8(n+ 4)x−n−4 +O

    (x−n−6

    ))=x−nC0

    (1− n

    2(n+ 1)

    2(n+ 2)x−2 +

    n3(n+ 1)(n+ 3)

    8(n+ 4)x−4 +O

    (x−6

    )), (2.3)

    where C0 :=Kn . Since the function g(x) := C0

    (1− n

    2(n+1)2(n+2) x

    −2 + n3(n+1)(n+3)

    8(n+4) x−4 +O

    (x−6

    ))converges to a constant, when x→ ∞, the function is slowly varying at in�nity and hence theabsolute T distribution is of Pareto-type with γ = 1n .a �

    2.2 Estimation of the extreme value index

    In the analysis of Pareto-type models, estimation of γ plays a central role. The asymptoticdistribution of the estimator of γ is usually established under the following second ordercondition on the tail behaviour.

  • 18 Estimation of the extreme value index

    Assumption 2.2.1 (Second order condition). There exists a positive real parameter γ, anegative real parameter ρ and a function b with b(t) → 0 for t→ ∞, of constant sign for largevalues of t, such that

    limt→∞

    logU(tx)− logU(t)− γ log xb(t)

    =xρ − 1ρ

    , ∀x > 0.

    The second order condition implies that |b| is regularly varying with index ρ (Geluk and Haan,1987), so the parameter ρ determines the rate of convergence for logU(tx) − logU(t) to itslimit γ log x, when t tends to in�nity. If ρ is close to zero then the convergence is slow andthe estimation of tail parameters is practically di�cult.

    We will now verify that the Burr distribution and the absolute T distribution satisfy the second

    order condition. That they are of Pareto-type was veri�ed in Example 2.1.3 and Example 2.1.4

    respectively.

    Example 2.2.2. In order to verify that the Burr distribution satis�es the second order condi-

    tion we need to �nd its tail quantile function. The quantile function of the Burr distribution

    is easily found by inverting the distribution function and it is given by

    Q(p) = ζ1δ

    ((1− p)−

    1λ − 1

    ) 1δ, 0 < p < 1.

    From this we obtain the tail quantile function

    U(x) = Q

    (1− 1

    x

    )= xγζ

    (1− x−

    ) 1δ, x > 1.

    We start with the expression

    logU(tx)− logU(t)− γ log x = 1δlog(1− (xt)−

    )− 1δlog(1− t−

    ).

    If we make a Taylor series expansion of log(1− x) around 0, we obtain

    logU(tx)− logU(t)− γ log x =1δ

    (−(tx)−

    1λ − 1

    2(tx)−

    )− 1δ

    (−t−

    1λ − 1

    2t−

    )+O

    (t−

    )=

    1λδ t

    − 1λ

    (x−

    1λ − 1

    )− 1λ

    +

    1λδ t

    − 2λ

    (x−

    2λ − 1

    )− 2λ

    +O(t−

    )(2.4)

    =γt−

    (x−

    1λ − 1

    )− 1λ

    +O(t−

    ). (2.5)

    From (2.5) we see that if we choose ρ = − 1λ and b(t) = γtρ, then the Burr distribution satis�es

    the second order condition. More generally b(t) can be chosen such that b(t) = γtρ(1 + o(1)).a �

    Example 2.2.3. From (2.3) we get for the absolue T distribution that

    1− FX(x) = x−1γC0

    (1− C1x−2 + C2x−4 +O

    (x−6

    )),

  • Estimation of the extreme value index 19

    where C1 :=n2(n+1)2(n+2) and C2 :=

    n3(n+1)(n+3)8(n+4) . In order to �nd the tail quantile function we

    have to invert1

    y= x

    − 1γC0

    (1− C1x−2 + C2x−4 +O

    (x−6

    )).

    From this we �nd

    x = Cγ0 yγ(1− C1x−2 + C2x−4 +O

    (x−6

    ))γ.

    If we make a Taylor series expansion of (1− x)γ around x = 0, we obtain

    x =Cγ0 yγ

    (1− γ

    (C1x

    −2 − C2x−4 +O(x−6

    ))+

    1

    2γ(γ − 1)

    (C1x

    −2 − C2x−4 +O(x−6

    ))2+O

    (x−6

    ))=Cγ0 y

    γ

    (1− γC1C−2γ0 y

    −2γ(1− γC1x−2 +

    (γC2 +

    γ(γ − 1)2

    C21

    )x−4 +O

    (x−6

    ))−2+

    (γC2 +

    γ(γ − 1)2

    C21

    )x−4 +O

    (x−6

    )).

    Now we make a Taylor series expansion of (1− x)−2 in which case we obtain

    x =Cγ0 yγ

    (1− γC1C−2γ0 y

    −2γ (1 + 2γC1x−2 +O (x−4))+

    (γC2 +

    γ(γ − 1)2

    C21

    )x−4 +O

    (x−6

    )).

    If we substitute the right hand side into the place of x, then it follows that

    x =Cγ0 yγ

    (1− γC1C−2γ0 y

    −2γ

    +

    (γC2 −

    γ(3γ + 1)

    2C21

    )C−4γ0 y

    −4γ +O(y−6γ

    )).

    So the tail quantile function can be written as

    U(x) = Cγ0 xγ(1−D1x−2γ +D2x−4γ +O(x−6γ)

    ),

    where D1 := γC1C−2γ0 , and D2 :=

    (γC2 − γ(3γ+1)2 C

    21

    )C−4γ0 . We are now ready to verify that

    the absolute T distribution satis�es the second order condition. We start with the expression

    logU(xt)− logU(t)− γ log x = log(1−D1(xt)−2γ +D2(xt)−4γ +O(t−6γ)

    )− log

    (1−D1t−2γ +D2t−4γ +O(t−6γ)

    ).

    By making a Taylor series expansion of log(1− x) around x = 0 we obtain

    logU(xt)− logU(t)− γ log x =−D1(xt)−2γ +D2(xt)−4γ −1

    2

    (D1(xt)

    −2γ −D2(xt)−4γ)2

    +D1t−2γ −D2t−4γ +

    1

    2

    (D1t

    −2γ −D2t−4γ)2

    +O(t−6γ)

    =−D1t−2γ(x−2γ − 1

    )+

    (D2 −

    1

    2D21

    )t−4γ

    (x−4γ − 1

    )+O(t−6γ) (2.6)

    =−D1t−2γ(x−2γ − 1

    )+O(t−4γ). (2.7)

  • 20 Estimation of the extreme value index

    From (2.7) we see that if we choose ρ = −2γ and b(t) of the form b(t) = −ρD1tρ(1 + o(1)),then the absolute T distribution satis�es the second order condition.

    a �

    We now return to the estimation of γ. The estimator of γ we will consider is based on a kernelstatistic with kernel function K. This statistic is given by

    Tn,k(K) :=1

    k

    k∑j=1

    K

    (j

    k + 1

    )Zj , (2.8)

    where Zj := j (logXn−j+1,n − logXn−j,n). This statistic will also serve as the basic buildingblock for the ρ estimator we propose in section 2.3. We need some conditions on the kernelfunction, but �rst we introduce the following notation

    µ(K) :=

    ∫ 10K(u)du,

    I1(K, ρ) :=

    ∫ 10K(u)u−ρdu,

    σ2(K) :=

    ∫ 10K2(u)du.

    With this notation the kernel function must satisfy

    Assumption 2.2.4. Let K be a function de�ned on (0, 1) such that

    (i) K(t) = 1t∫ t0 u(v)dv for some function u satisfying

    ∣∣∣∣(k + 1) ∫ jk+1j−1k+1

    u(t)dt

    ∣∣∣∣ ≤ f ( jk+1) forsome positive continuous and integrable function f de�ned on (0, 1),

    (ii) σ2(K) 0 and the log kernels Lδ(t) := (− log t)δ, δ > 0.

    Lemma 2.2.6. The function K(t) := tτ (− log t)δ satis�es Assumption 2.2.4.

    The proof of Lemma 2.2.6 can be found in Appendix 2.4.

    a �

  • Estimation of the extreme value index 21

    With Assumption 2.2.1 and Assumption 2.2.4 we are able to establish the following result.

    Theorem 2.2.7. Let X1, . . . , Xn be i.i.d. random variables according to a distribution sat-isfying Assumption 2.2.1. If further Assumption 2.2.4 holds, then for k, n → ∞ such thatkn → 0 we have

    Tn,k(K)D= γµ(K) + γσ(K)

    Nk(K)√k

    + b(nk

    )I1(K, ρ) (1 + oP(1)) , (2.9)

    where Nk(K) is asymptotically a standard normal random variable.

    A proof of this theorem is given in Goegebeur et al. (2010), we will however give an alternative

    proof of the result.

    Proof of Theorem 2.2.7. Let U1,n ≤ . . . ≤ Un,n be order statistics from a random sample ofsize n from the U(0, 1) distribution. By using the inverse probability integral transform we�nd that

    Xi,nD= Q (Ui,n)

    D= Q (1− Un−i+1,n)

    = U

    (1

    Un−i+1,n

    ).

    Since the Xi are of Pareto-type it follows that

    Xi,nD=

    (1

    Un−i+1,n

    )γlU

    (1

    Un−i+1,n

    ).

    From this we get

    logXi,nD= −γ logUn−i+1,n + log lU

    (1

    Un−i+1,n

    ).

    Hence

    logXn−j+1,n − logXn−k,nD= −γ log Uj,n

    Uk+1,n+ log

    lU

    (Uk+1,nUj,n

    1Uk+1,n

    )lU

    (1

    Uk+1,n

    ) .Since

    Uj,nUk+1,n

    D= Vj,k, where Vj,k is the j'th order statistic in a random sample of size k from

    the U(0, 1) distribution, it follows that

    logXn−j+1,n − logXn−k,nD= −γ log Vj,k + log

    lU

    (1

    Vj,k1

    Uk+1,n

    )lU

    (1

    Uk+1,n

    )D= −γ log (1− Vk−j+1,k) + log

    lU

    (1

    Vj,k1

    Uk+1,n

    )lU

    (1

    Uk+1,n

    ) .Using that the quantile function of the standard exponential distribution is Q(p) = − log(1−p), 0 < p < 1, and denoting by E1,n ≤ . . . ≤ En,n the order statistics of a random sample of

  • 22 Estimation of the extreme value index

    size n from the standard exponential distribution, we get using Assumption 2.2.1 and inspiredby Lemma 2.4.3, that

    logXn−j+1,n − logXn−k,nD= γEk−j+1,k + b0

    (1

    Uk+1,n

    ) ( 1Vj,k

    )ρ− 1

    ρ+ b0

    (1

    Uk+1,n

    )R̃n,k(j),

    where R̃n,k(j) :=logU

    (1

    Uk+1,n

    1Vj,k

    )−logU

    (1

    Uk+1,n

    )−γ log 1

    Vj,k

    b0

    (1

    Uk+1,n

    ) −(

    1Vj,k

    )ρ−1

    ρ . Thus

    Zj = j (logXn−j+1,n − logXn−j,n)

    D= j

    γEk−j+1,k − γEk−j,k + b0( 1Uk+1,n

    ) ( 1Vj,k

    )ρ−(

    1Vj+1,k

    )ρρ

    + b0

    (1

    Uk+1,n

    )Rn,k(j)

    ,(2.10)

    where Rn,k(j) := R̃n,k(j) − R̃n,k(j + 1), with the convention R̃n,k(k + 1) := 0 and with b0 afunction satisfying b0(t) ∼ b(t) for t → ∞. Using the Rényi representation (Rényi, 1953) wecan express each Ej,k as

    {Ej,k}j=1,...,kD=

    {j∑

    i=1

    Ek−i+1k − i+ 1

    }j=1,...,k

    ,

    where the E1, . . . , Ek are independent random variables from a standard exponential distri-bution. Hence

    Ek−j+1,k − Ek−j,kD=

    k−j+1∑i=1

    Ek−i+1k − i+ 1

    −k−j∑i=1

    Ek−i+1k − i+ 1

    =Ejj. (2.11)

    Combining (2.10) and (2.11) we �nd that

    ZjD= γEj + b0

    (1

    Uk+1,n

    )j

    (1

    Vj,k

    )ρ−(

    1Vj+1,k

    )ρρ

    + b0

    (1

    Uk+1,n

    )jRn,k(j).

    Let Y1,k ≤ . . . ≤ Yk,k be order statistics of a random sample of size k from the standard strictPareto distribution. Then we have

    1

    Vj,k

    D=

    1

    1− Vk−j+1,kD= Yk−j+1,k.

    Using this we get that

    ZjD= γEj + b0 (Yn−k,n) j

    Y ρk−j+1,k − Yρk−j,k

    ρ+ b0 (Yn−k,n) jRn,k(j).

  • Estimation of the extreme value index 23

    Hence

    Tn,k(K)D=1

    k

    k∑j=1

    K

    (j

    k + 1

    )(γEj + b0 (Yn−k,n) j

    Y ρk−j+1,k − Yρk−j,k

    ρ+ b0 (Yn−k,n) jRn,k(j)

    )

    =γ1

    k

    k∑j=1

    K

    (j

    k + 1

    )Ej + b0 (Yn−k,n)

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )jY ρk−j+1,k − Y

    ρk−j,k

    ρ

    + b0 (Yn−k,n)1

    k

    k∑j=1

    K

    (j

    k + 1

    )jRn,k(j)

    = : T(1)n,k + T

    (2)n,k + T

    (3)n,k .

    Using Assumption 2.2.4 (iii) we get for the �rst term that

    T(1)n,k = γ

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )+ γ

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )(Ej − 1)

    = γµ(K) + o

    (1√k

    )+ γσ(K)

    Ñk(K)√k

    , (2.12)

    where Ñk(K) :=√k

    1k

    ∑kj=1 K(

    jk+1)(Ej−1)

    σ(K) . The term Ñk(K) is according to Lemma 2.4.1 in

    Appendix 2.4 an asymptotic standard normal random variable. In (2.12) we can combine the

    o(

    1√k

    )with Ñk(K) to get

    T(1)n,k = γµ(K) + γσ(K)

    Nk(K)√k

    ,

    where Nk(K) is again an asymptotic standard normal random variable.

    Since Yi,kD= 11−Ui,k and the standard exponential distribution has quantile function Q(p) =

    − log(1− p) it follows that T (2)n,k can be written as

    T(2)n,k

    D= b0 (Yn−k,n)

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )jexp (ρEk−j+1,n)− exp (ρEk−j,n)

    ρ.

    Using the mean value theorem we �nd that

    T(2)n,k

    D= b0 (Yn−k,n)

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )j (Ek−j+1,n − Ek−j,n) exp (ρQj,k) ,

  • 24 Estimation of the extreme value index

    where Qj,k is a random value between Ek−j,k and Ek−j+1,k, and hence

    T(2)n,k

    D=b0 (Yn−k,n)

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )Ej exp (ρQj,k)

    =b0 (Yn−k,n)1

    k

    k∑j=1

    K

    (j

    k + 1

    )(j

    k + 1

    )−ρEj

    + b0 (Yn−k,n)1

    k

    k∑j=1

    K

    (j

    k + 1

    )Ej

    (exp (ρQj,k)−

    (j

    k + 1

    )−ρ)= : T

    (2,1)n,k + T

    (2,2)n,k .

    Concerning the term T(2,1)n,k we get

    T(2,1)n,k =b0 (Yn−k,n)

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )(j

    k + 1

    )−ρ

    + b0 (Yn−k,n)1

    k

    k∑j=1

    K

    (j

    k + 1

    )(j

    k + 1

    )−ρ(Ej − 1) ,

    so by the law of large numbers it follows that

    T(2,1)n,k = b0 (Yn−k,n) I1(K, ρ) (1 + oP(1)) .

    We now turn to T(2,2)n,k . Note that for j = 1, . . . , k we have that

    exp (Ek−j+1,k)D= exp (− log (1− Uk−j+1,k))D= exp (− log (Uj,k))

    =1

    Uj,k,

    and hence∣∣∣∣∣exp (ρQj,k)−(

    j

    k + 1

    )−ρ∣∣∣∣∣ ≤ max{∣∣∣∣∣exp (ρEk−j,k)−

    (j

    k + 1

    )−ρ∣∣∣∣∣ ,∣∣∣∣∣exp (ρEk−j+1,k)−

    (j

    k + 1

    )−ρ∣∣∣∣∣}

    D= max

    {∣∣∣∣∣U−ρj+1,k −(

    j

    k + 1

    )−ρ∣∣∣∣∣ ,∣∣∣∣∣U−ρj,k −

    (j

    k + 1

    )−ρ∣∣∣∣∣}

    ≤ max

    {∣∣∣∣∣U−ρj+1,k −(j + 1

    k + 1

    )−ρ∣∣∣∣∣+ cj,k,∣∣∣∣∣U−ρj,k −

    (j

    k + 1

    )−ρ∣∣∣∣∣},

  • Estimation of the extreme value index 25

    where cj,k =(

    j+1k+1

    )−ρ−(

    jk+1

    )−ρ. From this it follows that∣∣∣∣∣∣1k

    k∑j=1

    K

    (j

    k + 1

    )Ej

    (exp (ρQj,k)−

    (j

    k + 1

    )−ρ)∣∣∣∣∣∣≤ 1k

    k∑j=1

    ∣∣∣∣K ( jk + 1)∣∣∣∣Ej

    ∣∣∣∣∣U−ρj+1,k −(j + 1

    k + 1

    )−ρ∣∣∣∣∣+ 1kk∑

    j=1

    ∣∣∣∣K ( jk + 1)∣∣∣∣ cj,kEj

    +1

    k

    k∑j=1

    ∣∣∣∣K ( jk + 1)∣∣∣∣Ej

    ∣∣∣∣∣U−ρj,k −(

    j

    k + 1

    )−ρ∣∣∣∣∣=: T

    (2,2,1)n,k + T

    (2,2,2)n,k + T

    (2,2,3)n,k .

    According to Lemma 2.4.2 the terms T(2,2,1)n,k and T

    (2,2,3)n,k are OP

    (1√k

    ). Using the mean value

    theorem we see that we can write the term T(2,2,2)n,k as

    T(2,2,2)n,k =

    |ρ|k + 1

    1

    k

    k∑j=1

    ∣∣∣∣K ( jk + 1)∣∣∣∣ z|ρ|−1j,k Ej ,

    where zj,k is a value betweenj

    k+1 andj+1k+1 . When |ρ| ≥ 1 it follows that

    T(2,2,2)n,k ≤

    |ρ|k + 1

    1

    k

    k∑j=1

    ∣∣∣∣K ( jk + 1)∣∣∣∣Ej ,

    and hence by the law of large numbers it follows that T(2,2,2)n,k = OP

    (1k

    ). When |ρ| < 1 we have

    T(2,2,2)n,k ≤

    |ρ|k + 1

    1

    k

    k∑j=1

    ∣∣∣∣K ( jk + 1)∣∣∣∣ ( jk + 1

    )|ρ|−1Ej ,

    which by Assumption 2.2.4 (v) and the law of large numbers implies that T(2,2,2)n,k = OP

    (1k

    ).

    So

    T(2)n,k = b0 (Yn−k,n) I1(K, ρ) (1 + oP(1)) .

    Concerning the term T(3)n,k we �nd using Assumption 2.2.4 (i) that∣∣∣T (3)n,k∣∣∣ =

    ∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑

    j=1

    Rn,k(j)

    ∫ jk+1

    0u(v)dv

    ∣∣∣∣∣∣=

    ∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑

    j=1

    Rn,k(j)

    j∑i=1

    ∫ ik+1

    i−1k+1

    u(v)dv

    ∣∣∣∣∣∣=

    ∣∣∣∣∣∣b0 (Yn−k,n) k + 1kk∑

    i=1

    ∫ ik+1

    i−1k+1

    u(v)dv

    k∑j=i

    Rn,k(j)

    ∣∣∣∣∣∣≤ |b0 (Yn−k,n)|

    1

    k

    k∑i=1

    f

    (i

    k + 1

    ) ∣∣∣∣∣∣k∑

    j=i

    Rn,k(j)

    ∣∣∣∣∣∣ .

  • 26 Estimation of the extreme value index

    For the term∑k

    j=iRn,k(j) it follows that

    k∑j=i

    Rn,k(j) =k∑

    j=i

    (R̃n,k(j)− R̃n,k(j + 1)

    )= R̃n,k(i).

    For δ, � > 0 there exists n0 such that for any n ≥ n0, with arbitrary large probability, fori = 1, . . . , k, ∣∣∣∣∣∣

    k∑j=i

    Rn,k(j)

    ∣∣∣∣∣∣ ≤ �(

    1

    Vi,k

    )ρmax

    ((1

    Vi,k

    )δ,

    (1

    Vi,k

    )−δ)= �V −ρ−δi,k ,

    using Lemma 2.4.3. Hence

    supi∈{1,...,k}

    ∣∣∣∣∣∑k

    j=iRn,k(j)

    V −ρ−δi,k

    ∣∣∣∣∣ = oP(1)leading to ∣∣∣T (3)n,k∣∣∣ ≤ b0 (Yn−k,n) oP(1)1k

    k∑i=1

    f

    (i

    k + 1

    )(V −ρ−δi,k

    ),

    which by Assumption 2.2.4 (i) and assuming δ < |ρ| is oP (b0 (Yn−k,n)). Combining the resultson T

    (1)n,k , T

    (2)n,k and T

    (3)n,k establishes the result.

    Using Theorem 2.2.7 we can create a class of estimators γ̂k(K) :=Tn,k(K)µ(K) for γ in the following

    way

    Proposition 2.2.8. Let X1, . . . , Xn be i.i.d. random variables according to a distributionsatisfying Assumption 2.2.1. If further Assumption 2.2.4 holds with µ(K) 6= 0, then fork, n→ ∞ such that kn → 0 and

    √kb(nk

    )→ λ for some constant λ we have

    √k (γ̂k(K)− γ) → N

    (λI1(K, ρ)

    µ(K), γ2

    σ2(K)

    µ2(K)

    ). (2.13)

    Proof. We have

    √k (γ̂k(K)− γ)

    D= γ

    σ(K)

    µ(K)Nk(K) +

    √kb(nk

    ) I1(K, ρ)µ(K)

    (1 + oP(1))

    → N(λI1(K, ρ)

    µ(K), γ2

    σ2(K)

    µ2(K)

    ),

    under the conditions of the Proposition.

    We veri�ed in Lemma 2.2.6 that the kernel function K(t) = tτ (− log t)δ satis�es Assumption2.2.4. This allows us to construct consistent estimators which are asymptotically normal using

    this kernel. We do so in Corollary 2.2.9.

  • Estimation of the second order parameter 27

    Corollary 2.2.9. Let X1, . . . , Xn be i.i.d. random variables according to a distribution satis-fying Assumption 2.2.1. For k, n → ∞ such that kn → 0 and

    √kb(nk

    )→ λ for some constant

    λ we have for the kernel function K(t) = tτ (− log t)δ, τ, δ ≥ 0 that

    √k (γ̂k(K)− γ) → N

    (τ + 1)δ+1

    (τ − ρ+ 1)δ+1, γ2

    Γ(2δ + 1)(τ + 1)2δ+2

    (2τ + 1)2δ+1(Γ(δ + 1))2

    ).

    In particular, we obtain

    (i) For the Hill Kernel√k (γ̂k(H)− γ) → N

    1

    1− ρ, γ2).

    (ii) For the Power kernel

    √k (γ̂k(Kτ )− γ) → N

    τ + 1

    τ − ρ+ 1, γ2

    (τ + 1)2

    2τ + 1

    ).

    (iii) For the Log kernel

    √k (γ̂k(Lδ)− γ) → N

    1

    (1− ρ)δ+1, γ2

    Γ(2δ + 1)

    (Γ(δ + 1))2

    ).

    A discussion on when to choose which kernel function is a topic of its own, so we will not

    spend much time on it since it is not of great importance for this thesis. However, the Hill

    kernel always has the smallest asymptotic variance. In general, the kernel function for which

    the asymptotic mean squared error of the resulting γ estimator is minimal depends on thedistributional parameters γ and ρ. Concerning the log and power kernel with δ = τ , wesee that the log kernel tends to have a bigger variance than the power kernel, although it

    su�ers from less bias. For a detailed discussion of the performance of γ estimators with kernelfunctions in the family K(t) = tτ (− log t)δ we refer to Gomes et al. (2007).

    2.3 Estimation of the second order parameter

    The estimation of the second order parameter in the univariate case is not of grave impor-

    tance to this thesis. We will however in Chapter 4 construct estimators for the second order

    parameter in the bivariate extreme value framework, which are based on the same ideas as is

    used to construct the estimator for the second order parameter ρ. In order to construct anestimator for ρ we start with the basic building block Tn,k(K) de�ned in (2.8). By making aTaylor series expansion it follows by Theorem 2.2.7 that

    Tαn,k(K)D= γαµα(K) + αγµα−1(K)σ(K)

    Nk(K)√k

    + b(nk

    )αγα−1µα−1(K)I1(K, ρ) (1 + oP(1)) ,

    where α > 0 and K > 0. The basic idea is to construct a statistic which converges inprobability to a function of ρ, which does not depend on the unknown parameter γ. To this

  • 28 Estimation of the second order parameter

    end, let K1, . . . ,K8 be kernel functions and de�ne

    K(1) := (K1,K2,K3,K4) ,

    K(2) := (K5,K6,K7,K8) ,

    K(1,2) :=(K(1),K(2)

    ),

    Ī1 (Ki, ρ) :=I1 (Ki, ρ)

    µ (Ki), i ∈ {1, . . . , 8} ,

    Ī(a)1 (Ki,Kj , ρ) := Ī

    a1 (Ki, ρ)− Īa1 (Kj , ρ) , a = 1, 2, i, j ∈ {1, . . . , 8} .

    Using this notation, we consider the ratio of di�erences given by

    Ψn,k

    (K(1), α1, α2

    ):=

    (Tn,k(K1)µ(K1)

    )α1−(Tn,k(K2)µ(K2)

    )α1(Tn,k(K3)µ(K3)

    )α2−(Tn,k(K4)µ(K4)

    )α2 (2.14)and the function

    ψ(K(1), α1, α2, ρ

    ):= γα1−α2

    α1Ī(1)1 (K1,K2, ρ)

    α2Ī(1)1 (K3,K4, ρ)

    ,

    with α1, α2 > 0.If k, n→ ∞ such that kn → 0 and

    √kb(nk

    )→ ∞, then(

    Tn,k(K1)µ(K1)

    )α1−(Tn,k(K2)µ(K2)

    )α1b(nk

    ) P→ α1γα1−1Ī(1)1 (K1,K2, ρ)and (

    Tn,k(K3)µ(K3)

    )α1−(Tn,k(K4)µ(K4)

    )α2b(nk

    ) P→ α2γα2−1Ī(1)1 (K3,K4, ρ) .Hence

    Ψn,k

    (K(1), α1, α2

    )P→ ψ

    (K(1), α1, α2, ρ

    ).

    This statistic still depends on γ, but we can get rid of this if we consider a ratio of statisticson the form of (2.14) with appropriately chosen α parameters. So de�ne

    Λn,k

    (K(1,2), α1, α2, l

    ):=

    Ψn,k

    (K(1), α1, α1 + l

    )Ψn,k

    (K(2), α2, α2 + l

    )and

    Λ(K(1,2), α1, α2, l, ρ

    ):=

    ψ(K(1), α1, α1 + l, ρ

    )ψ(K(2), α2, α2 + l, ρ

    )where l > 0. If we again assume that If k, n → ∞ such that kn → 0 and

    √kb(nk

    )→ ∞, then

    clearly

    Λn,k

    (K(1,2), α1, α2, l

    )P→ Λ

    (K(1,2), α1, α2, l, ρ

    ),

  • Estimation of the second order parameter 29

    which does not depend on γ. If the function ρ 7→ Λ(K(1,2), α1, α2, l, ρ

    )is bijective, then we

    obtain the estimator

    ρ̂(K(1,2), α1, α2, l

    ):= Λ−1

    (K(1,2), α1, α2, l,Λn,k

    (K(1,2), α1, α2, l

    ))(2.15)

    for the second order parameter. The consistency of this estimator is estblished in Proposition

    2.3.1 using a straightforward application of the continuous mapping theorem.

    Proposition 2.3.1. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables ac-cording to a distribution satisfying Assumption 2.2.1. Let K1, . . . ,K8 satisfy Assumption

    2.2.4, and suppose Ī(1)1 (K1,K2), Ī

    (1)1 (K3,K4), Ī

    (1)1 (K5,K6) and Ī

    (1)1 (K7,K8) are wellde-

    �ned and nonzero. Then if k, n → ∞ such that kn → 0 and√kb(nk

    )→ ∞ we have

    Λn,k

    (K

    (1,2), α1, α2, l)

    P→ Λ(K

    (1,2), α1, α2, l, ρ). Further, if Λ is bijective and Λ−1 is con-

    tinuous then ρ̂(K

    (1,2), α1, α2, l)is a consistent estimator for ρ.

    In order to establish asymptotic normality of the estimator of ρ, we need the following thirdorder condition.

    Assumption 2.3.2 (Third order condition). There exists a positive real parameter γ, negativereal parameters ρ and β, functions b and b̃ with b(t) → 0 and b̃(t) → 0 for t → ∞, both ofconstant sign for large values of t, such that

    limt→∞

    logU(tx)−logU(t)−γ log xb(t) −

    xρ−1ρ

    b̃(t)=

    1

    β

    (xρ+β − 1ρ+ β

    − xρ − 1ρ

    ), ∀x > 0.

    The third order condition implies that |b̃| is regularly varying of index β (de Haan and Ferreira,2006). The third order contion is not to restrictive. Among distributions of Pareto-type that

    satisfy the second and third order condition are the Fréchet, the Burr, the GP distributions

    and the absolute T distribution. This is not a complete list of Pareto-type distributions which

    satisfy the second and third order condition. As examples, we show that the Burr and the

    absolute T distribution satis�es the third order condition.

    Example 2.3.3. In order to verify that the Burr distribution satis�es the third order con-

    dition, it is a good idea to choose b(t) = γ tρ

    1−tρ . From (2.4) and the choice of b(t) it followsthat

    logU(tx)− logU(t)− γ log xb(t)

    − xρ − 1ρ

    =

    γtρ(xρ−1)ρ −

    12δ t

    2ρ(x2ρ − 1

    )+O

    (t3ρ)

    γ tρ

    1−tρ− x

    ρ − 1ρ

    (2.16)

    =− tρ (xρ − 1)

    ρ+

    1

    2ρtρ(x2ρ − 1

    )+O

    (t2ρ)

    (2.17)

    =ρtρ1

    ρ

    (x2ρ − 1

    2ρ− x

    ρ − 1ρ

    )+O(t2ρ). (2.18)

    From (2.18) we see that if we choose β = ρ and b̃(t) = ρtρ(1+o(1)) then the Burr distributionsatis�es the third order condition.

  • 30 Estimation of the second order parameter

    Example 2.3.4. To verify that the absolute T distribution satis�es the third order condition,

    it is a good idea to choose b(t) = − ρD1tρ

    1+2(

    D2D1

    − 12D1

    )tρ. With this choice of b(t) and (2.6) it follows

    that

    logU(xt)− logU(t)− γ log xb(t)

    − xρ − 1ρ

    =2

    (D2D1

    − 12D1

    )tρ (xρ − 1)

    ρ(2.19)

    −(D2D1

    − 12D1

    )tρ(x2ρ − 1

    +O(t2ρ) (2.20)

    =− 2ρ(D2D1

    − 12D1

    )tρ1

    ρ

    (x2ρ − 1

    2ρ− (x

    ρ − 1)ρ

    )+O(t2ρ).

    (2.21)

    From this we see that if we choose β = ρ and b̃(t) on the form b̃(t) = −2ρ(D2D1

    − 12D1)tρ(1 +

    o(1)), then the absolute T distribution satis�es the third order condition.

    We also have to add an extra condition on the kernel function.

    Assumption 2.3.5. Let K be a fuction de�ned on (0, 1) such that Assumption 2.2.4 is sat-is�ed, and the following extra condition.

    (vi) 1k∑k

    j=1K(

    jk+1

    )(j

    k+1

    )−ρ= I1(K, ρ) + o

    (1√k

    ), k → ∞.

    Lemma 2.3.6. The kernel function considered in Example 2.2.5 given by K(t) := tτ (− log t)δalso satis�es Assumption 2.3.5

    This result can easily be obtained from the proof of Assumption 2.2.4 (iii), and is hence

    omitted.

    Similar to the procedure in Theorem 2.2.7 we can make an asymptotic expansion of the statistic

    in (2.8) using the third order condition.

    Theorem 2.3.7. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables accordingto a distribution satisfying Assumption 2.3.2. If Assumption 2.3.5 holds, then for k, n → ∞such that kn → 0 we have

    Tn,k(K)D=γµ(K) + γσ(K)

    Nk(K)√k

    + b (Yn−k,n) I1(K, ρ) + b (Yn−k,n) σ̃(K, ρ)Pk(K, ρ)√

    k

    + b (Yn−k,n) b̃ (Yn−k,n) I2(K, ρ, β) (1 + oP(1)) + b (Yn−k,n)OP

    (1√k

    ),

    where Nk(K) and Pk(K, ρ) are asymptotic standard normally distributed random variables.

    We will not give a proof of this result, but the line of proof follows the same as the proof of

    Theorem 2.2.7. The result in Theorem 2.3.7 can be used to obtain the asymptotic expansion

    Tαn,k(K)D=γαµα(K) + αγαµα−1(K)σ(K)

    Nk(K)√k

    + b (Yn−k,n)αγα−1µα−1(K)I1(K, ρ)

    + b (Yn−k,n) b̃ (Yn−k,n)αγα−1µα−1(K)I2(K, ρ, β) (1 + oP(1))

    + b2 (Yn−k,n)α(α− 1)

    2γα−2µα−2(K)I21 (K, ρ) (1 + oP(1)) + b (Yn−k,n)OP

    (1√k

    )

  • Estimation of the second order parameter 31

    Before we can present the limiting distribution of the ρ estimator presented in (2.15) we needto introduce the following notation, with i, j ∈ {1, . . . , 8}.

    Ī2(K, ρ, β) :=I2 (K, ρ, β)

    µ(K),

    Ī2 (Ki,Kj , ρ, β) :=I2 (Ki, ρ, β)

    µ(K)− I2 (Kj , ρ, β)

    µ(K),

    σ̄(K) :=σ(K)

    µ(K),

    Nk (Ki,Kj) := σ̄ (Ki)Nk (Ki)− σ̄ (Kj)Nk (Kj) ,

    Nk

    (K(1), α1, α2, γ, ρ

    ):=

    α1γα1Nk (K1,K2)− ψ

    (K(1), α1, α2, ρ

    )α2γ

    α2Nk (K3,K4)

    α2γα2−1Ī(1)1 (K3,K4, ρ)

    ,

    c1

    (K(1), α1, α2, γ, ρ, β

    ):=

    α1γα1−1Ī2 (K1,K2, ρ, β)− ψ

    (K(1), α1, α2, ρ

    )α2γ

    α2−1Ī2 (K3,K4, ρ, β)

    α2γα2−1Ī(1)1 (K3,K4, ρ)

    ,

    c2

    (K(1), α1, α2, γ, ρ

    ):=

    α1 (α1 − 1) γα1−2Ī(2)1 (K1,K2, ρ)− ψ(K(1), α1, α2, ρ

    )α2 (α2 − 1) γα2−2Ī(2)1 (K3,K4, ρ)

    α2γα2−1Ī(1)1 (K3,K4, ρ)

    ,

    Nk

    (K(1,2), α1, α2, l, γ, ρ

    ):=

    Nk(K(1), α1, α1 + l, γ, ρ

    )− Λ

    (K(1,2), α1, α2, l, γ, ρ

    )Nk(K(2), α2, α2 + l, γ, ρ

    )ψ(K(2), α2, α2 + l, ρ

    ) ,c1

    (K(1,2), α1, α2, l, γ, ρ, β

    ):=

    c1(K(1), α1, α1 + l, γ, ρ, β

    )− Λ

    (K(1,2), α1, α2, l, γ, ρ

    )c1(K(2), α2, α2 + l, γ, ρ, β

    )ψ(K(2), α2, α2 + l, ρ

    ) ,c2

    (K(1,2), α1, α2, l, γ, ρ

    ):=

    c2(K(1), α1, α1 + l, γ, ρ

    )− Λ

    (K(1,2), α1, α2, l, γ, ρ

    )c2(K(2), α2, α2 + l, γ, ρ

    )ψ(K(2), α2, α2 + l, ρ

    ) ,v2(K(1,2), α1, α2, l, γ, ρ

    ):= Var

    (Nk

    (K(1,2), α1, α2, l, γ, ρ

    )).

    With this notation we can obtain a result giving the asymptotic normality of our ρ estimator.

    Proposition 2.3.8. (Goegebeur et al., 2010) Let X1, . . . , Xn be i.i.d. random variables ac-cording to a distribution satisfying Assumption 2.3.2. If the kernel functions K1, . . . ,K8satisfy Assumption 2.3.5 and are such that Ī

    (1)1 (K1,K2, ρ), Ī

    (1)1 (K3,K4, ρ), Ī

    (1)1 (K5,K6, ρ)

    and Ī(1)1 (K7,K8, ρ) are well de�ned and nonzero, then for k, n → ∞ such that kn → 0,√

    kb(nk

    )→ ∞,

    √kb(nk

    )b̃(nk

    )→ λ1 and

    √kb2(nk

    )→ λ2 we have

    √kb(nk

    ) [Λn,k

    (K(1,2), α1, α2, l

    )− Λ

    (K(1,2), α1, α2, l, ρ

    )]D→ N

    (λ1c1

    (K(1,2), α1, α2, l, γ, ρ, β

    )+ λ2c2

    (K(1,2), α1, α2, l, γ, ρ

    ), v2(K(1,2), α1, α2, l, γ, ρ

    )).

  • 32 Appendix

    2.4 Appendix

    2.4.1 Proof of Lemma 2.2.6

    i)

    Since K(t) = 1t tτ+1(− log t)δ it follows that∫ t

    0u(v)dv = tτ+1(− log t)δ,

    and hence

    u(v) = (τ + 1)vτ (− log v)δ − δvτ (− log v)δ−1.

    Now∣∣∣∣∣(k + 1)∫ j

    k+1

    j−1k+1

    u(t)dt

    ∣∣∣∣∣ ≤(k + 1)(τ + 1)∫ j

    k+1

    j−1k+1

    tτ (− log t)δdt+ (k + 1)δ∫ j

    k+1

    j−1k+1

    tτ (− log t)δ−1dt

    ≤(k + 1)j

    (τ + 1)

    ∫ jk+1

    0(− log t)δdt+ (k + 1)δ

    ∫ jk+1

    j−1k+1

    (− log t)δ−1dt

    We distinguish between the two cases δ > 1 and δ ≤ 1. We start with the case δ > 1. So∣∣∣∣∣(k + 1)∫ j

    k+1

    j−1k+1

    u(t)dt

    ∣∣∣∣∣ ≤(k + 1)j (τ + 1)∫ j

    k+1

    0(− log t)δdt+ (k + 1)

    ∫ jk+1

    0(− log t)δ−1dt

    = : f

    (j

    k + 1

    ).

    Next we show that∫ 10 f(x)dx 1.∫ 1

    0f(x)dx = (τ + 1)

    ∫ 10

    1

    x

    ∫ x0(− log t)δdtdx+ δ

    ∫ 10

    1

    x

    ∫ x0(− log t)δ−1dtdx

    = (τ + 1)

    ∫ 10(− log t)δ

    ∫ 1t

    1

    xdxdt+ δ

    ∫ 10(− log t)δ−1

    ∫ 1t

    1

    xdxdt

    = (τ + 1)Γ(δ + 2) + δΓ(δ + 1)

  • Appendix 33

    ii)

    The second part is easily veri�ed using the following argument.

    σ2(K) =

    ∫ 10K2(u)du

    ≤ Γ(2δ + 1)

  • 34 Appendix

    A similar argument shows that I12 = O((log(k+1))δ

    k+1

    ).

    Concerning the term I2 we �nd that

    I2 ≤∫ ∞log(k+1)

    zδe−zdz

    =(log(k + 1))δ

    k + 1+ δ

    ∫ ∞log(k+1)

    zδ−1e−zdz

    =(log(k + 1))δ

    k + 1

    (1 +

    k + 1

    (log(k + 1))δδ

    ∫ ∞log(k+1)

    zδ−1e−zdz

    ).

    If we can show that k+1(log(k+1))δ

    δ∫∞log(k+1) z

    δ−1e−zdz → 0 as k → ∞ then I2 = O((log(k))δ

    k

    ).

    Using l'Hôpital's rule and Leibniz's rule it follows that

    limx→∞

    δ∫∞log(x) z

    δ−1e−zdz

    (log(x))δ

    x

    = limx→∞

    −δ(log(x))δ−1e− log(x) 1x(δ(log(x))δ−1−(log(x))δ

    x2

    )= lim

    x→∞

    −δδ − (log(x))

    = 0

    iv)

    The fourth condition is trivially satis�ed since

    maxj∈1,...,k

    ∣∣∣∣K ( jk + 1)∣∣∣∣ ≤ (log(k + 1))δ = o(√k)

    v)

    This condition is also trivially satis�ed since∫ 10

    |K(u)|u|ρ|−1−�du =∫ 10uτ+|ρ|−1−�(− log u)δdu

    =Γ(δ + 1)

    (τ + |ρ| − �)δ+1

  • Appendix 35

    where Ei are standard exponential random variables and K (u) , 0 < u < 1 is a kernelfuction. Furthermore, let

    vk =

    √√√√1k

    k∑j=1

    K2(

    j

    k + 1

    ). (2.23)

    Then√k(Zk − 1k

    ∑kj=1K

    (j

    k+1

    ))vk

    D→ N(0, 1) ⇔ max1≤j≤k

    ∣∣∣∣K ( jk + 1)∣∣∣∣ = o(√kvk) (2.24)

    as k → ∞. If further we have

    1

    k

    k∑j=1

    K

    (j

    k + 1

    )= µ(K) + o

    (1√k

    ), vk → σ(K) > 0, (2.25)

    µ(K) and σ(K) �nite, and

    max1≤j≤k

    ∣∣∣∣K ( jk + 1)∣∣∣∣ = o(√k) , as k → ∞, (2.26)

    then√k (Zk − µ(K))

    σ(K)

    D→ N(0, 1) (2.27)

    as k → ∞

    Lemma 2.4.2. (Goegebeur et al., 2010) Denote by E1, . . . , Ek standard exponential randomvariables and by U1,k ≤ · · · ≤ Uk,k the order statistics of a random sample of size k fromU(0, 1). Assume that

    ∫ 10 |K(u)| du 0, where γ is a real parameter. Then for all �, δ > 0 there is a t0 = t0(�, δ) suchthat for t, tx ≥ t0, ∣∣∣∣f(tx)− f(t)b0(t) − x

    γ − 1γ

    ∣∣∣∣ ≤ �xγ max(xδ, x−δ) ,where

    b0(t) :=

    γf(t), γ > 0,−γ(f(∞)− f(t)), γ < 0,f(t)− t−1

    ∫ 10 f(s)ds, γ = 0

    (2.29)

  • Chapter 3

    Multivariate extreme value theory

    In this chapter we introduce the basic limit laws in multivariate extreme value theory. After a

    transformation of the marginal distribution functions to standard Fréchet margins, we discuss

    the dependence structure between the variables. This discussion starts with the exponent

    and spectral measure, before we turn our attention to the max domain of attraction in the

    multivariate framework and asymptotic independence. This is followed by an introduction to

    several other dependence measures. The measures we consider are the Pickands dependence

    function and the pair of dependence measures χ and χ̄. We explain the relation betweenall these dependence measures and discuss ways of getting from one to the other. Finally

    we introduce the model of Ledford and Tawn (1997) and make the connection between the

    coe�cient of tail dependence η and the other dependence measures discussed previously.

    3.1 Limit laws

    The results we present in this section will be based on two-dimensional spaces. General-

    izations to higher dimensional spaces are obvious, but require heavier notation. Suppose

    (X1, Y1) , . . . , (Xn, Yn) are i.i.d. random vectors with distribution function FXY . We de�nethe maximum of a set of vectors of this form as

    Mn := (max (X1, . . . , Xn) ,max (Y1, . . . , Yn)) ,

    which is simply the vector of componentwise maxima. We start by deriving an important

    theorem, which is the foundation of our description of the asymptotic distributions that can

    occur for an appropriately normalized maximum of the form of Mn. Suppose there existssequences of constants (bn)

    ∞n=1, (dn)

    ∞n=1 and sequences of positive constants (an)

    ∞n=1, (cn)

    ∞n=1

    and a distribution function G with nondegenerate marginals such that

    limn→∞

    P

    (max (X1, . . . , Xn)− bn

    an≤ x, max (Y1, . . . , Yn)− dn

    cn≤ y

    )= G(x, y) (3.1)

    for all continuity points (x, y) of G. Any limit distribution function G in (3.1) with nonde-generate marginals is called a multivariate extreme value distribution. It follows that

    limn→∞

    P

    (max (X1, . . . , Xn)− bn

    an≤ x

    )= G(x,∞)

    36

  • Limit laws 37

    and

    limn→∞

    P

    (max (Y1, . . . , Yn)− dn

    bn≤ y)

    = G(∞, y),

    since (3.1) implies convergence of the marginal distributions. According to Theorem 1.1.2 we

    can chose the constants an, bn, cn and dn such that for some γ1, γ2 ∈ R, we have

    G(x,∞) = exp(− (1 + γ1x)

    − 1γ1

    )(3.2)

    and

    G(∞, y) = exp(− (1 + γ2y)

    − 1γ2

    ). (3.3)

    It is relevant to note that G is continuous, since the two marginal distributions of G arecontinuous.

    If we let FX and FY be the two marginal distributions of FXY and UX and UY be thetwo corresponding tail quantile functions, then according to Theorem 1.1.2 there are positive

    functions aX(t) and aY (t), such that

    limt→∞

    UX(tx)− UX(t)aX(t)

    =xγ1 − 1γ1

    , ∀x > 0

    and

    limt→∞

    UY (tx)− UY (t)aY (t)

    =xγ2 − 1γ2

    , ∀x > 0.

    Hence

    limn→∞

    UX(nx)− bnan

    =xγ1 − 1γ1

    and

    limn→∞

    UY (nx)− dncn

    =xγ2 − 1γ2

    ,

    if we choose the constants an, bn, cn and dn according to Theorem 1.1.2.We easily see that (3.1) can be written as

    G(x, y) = limn→∞

    FnXY (anx+ bn, cny + dn) .

    If xn → u and yn → v then by the continuity of G and the monotonicity of FXY we have that

    G(u, v) = limn→∞

    FnXY (anxn + bn, cnyn + dn) .

    Applying this result with

    xn :=UX(nx)− bn

    an, x > 0

    and

    yn :=UY (ny)− dn

    cn, y > 0

    gives

    G

    (xγ1 − 1γ1

    ,yγ2 − 1γ2

    )= lim

    n→∞FnXY (U1(nx), U2(ny)) .

    These results establish the following theorem.

  • 38 The exponent measure and the spectral measure

    Theorem 3.1.1. (de Haan and Ferreira, 2006) Let (X1, Y1) , . . . , (Xn, Yn) be i.i.d. ran-dom vectors with distribution function FXY . Suppose there exists sequences of real constants(bn)

    ∞n=1, (dn)

    ∞n=1 and positive real constants (an)

    ∞n=1 and (cn)

    ∞n=1 such that

    limn→∞

    FnXY (anx+ bn, cny + dn) = G(x, y)

    for all (x, y) of G, and the marginals of G are standardized as in (3.2) and (3.3). Thenwith FX(x) := FXY (x,∞), FY (y) := FXY (∞, y) and UX and UY the two corresponding tailquantile functions, we have that

    limn→∞

    FnXY (UX(nx), UY (ny)) = G0(x, y) (3.4)

    for all x, y > 0, where

    G0(x, y) := G

    (xγ1 − 1γ1

    ,yγ2 − 1γ2

    )and γ1, γ2 are the marginal extreme value indices from (3.2) and (3.3).

    Remark 3.1.2. The multivariate extreme value distribution function G(xγ1−1γ1

    , yγ2−1γ2

    )has

    marginal distributions which are standard Fréchet, i.e. FZ(z) = exp(−1z), z > 0. This fact

    simpli�es things, because now we only have to discuss the dependence structure between the

    two variables.

    The following Corollary is obtained from Theorem 3.1.1, which we state without proof. For

    details we refer to de Haan and Ferreira (2006), Corollary 6.1.3 and Corollary 6.1.4

    Corollary 3.1.3. (de Haan and Ferreira, 2006) Under the conditions of Theorem 3.1.1, we

    have for any (x, y) for which 0 < G0(x, y) < 1, that

    limn→∞

    n {1− F : XY (UX(nx), UY (ny))} = − logG0(x, y) (3.5)

    and

    limt→∞

    t {1− FXY (UX(tx), UY (ty))} = − logG0(x, y), (3.6)

    where t runs through the real numbers.

    3.2 The exponent measure and the spectral measure

    From Corollary 3.1.3 we can obtain the following usefull theorem.

    Theorem 3.2.1. (de Haan and Ferreira, 2006) Let FXY and G0 be distribution functionswhere for x, y > 0 with 0 < G0(x, y) < 1 we have that

    limn→∞

    n {1− FXY (UX(nx), UY (ny))} = − logG0(x, y),

    where UX and UY are the tail quantile functions of the marginals of FXY . Then there are setfunctions ν, ν1, ν2, . . . de�ned for all Borel sets A ⊂ R2+ with

    infx,y∈A

    max(x, y) > 0

    such that

  • The exponent measure and the spectral measure 39

    (i)

    νn{(s, t) ∈ R2+ : s > x or t > y

    }= n {1− FXY (UX(nx), UY (ny))} , (3.7)

    ν{(s, t) ∈ R2+ : s > x or t > y

    }= − logG0(x, y). (3.8)

    (ii) For all a > 0 the set functions ν, ν1, ν2, . . . are �nite measures on R2+\[0, a]2.

    (iii) For each Borel set A ⊂ R2+ with infx,y∈Amax(x, y) > 0 and ν(∂A) = 0,

    limn→∞

    νn(A) = ν(A). (3.9)

    De�nition 3.2.2. The measure ν from (3.8) is called the exponent measure of the extremevalue distribution G0, since

    G0(x, y) = exp (−ν (Ax,y))

    with

    Ax,y :={(s, t) ∈ R2+ : s > x or t > y

    }.

    In the following we let ν(x, y) := ν (Ax,y)

    An important property of the exponent measure, which will be needed later in this chapter,

    is that it is homogeneous of order −1, as given in Theorem 3.2.3.

    Theorem 3.2.3. (de Haan and Ferreira, 2006) For any Borel set A ⊂ R2+ with inf(x,y)∈Amax(x, y) >0 and ν(∂A) = 0, and any a > 0,

    ν(aA) = a−1ν(A),

    where aA is the set obtained by multiplying all elements of A by a.

    From the exponent measure we can also obtain the spectral measure. The spectral measure

    arises when we make a one-to-one transformation R2+\{(0, 0)} → (0,∞)× [0, c] for some c > 0,{r = r(x, y),d = d(x, y),

    with the property that for all a, x, y > 0, we have{r(ax, ay) = ar(x, y),d(ax, ay) = d(x, y).

    We can think of r as a radius and d as an angle or a direction. In this thesis we will onlyconsider the transformation {

    r(x, y) = x+ y,d(x, y) = xx+y ,

    in which case the following theorem can be shown to hold.

  • 40 The exponent measure and the spectral measure

    Theorem 3.2.4. (de Haan and Ferreira, 2006) For each limit distribution G from (3.1),(3.2) and (3.3) there exist a probability distribution (denoted by the distribution function H)concentrated on [0, 1] with mean 12 such that for x, y > 0,

    G

    (xγ1 − 1γ1

    ,yγ2 − 1γ2

    )= G0(x, y)

    = exp

    (−2∫ 10

    x∨ 1− ω

    y

    )dH(ω)

    ), (3.10)

    where ωx ∨1−ωy := max

    (ωx ,

    1−ωy

    ).

    From (3.10) we see that the limit distributions in (3.1) are characterized solely by the spectral

    measure H and the marginal extreme value indices. Many more transformations than the onewe considered can be chosen in order to construct a spectral measure. In fact there are endless

    possibilities. The transformation to choose depends on the situation at hand, and in a sense

    they are all equivalent, since one can be transformed into the other.

    From (3.8) and (3.10) we see that the connection between the exponent measure and the

    spectral measure is given by

    ν(x, y) = 2

    ∫ 10

    x∨ 1− ω

    y

    )dH(ω).

    However, it is not always obvious how to get from one measure to the other using this relation.

    In case this is not obvious, and G0 is absolutely continuous, we can use a method discoveredby Coles and Tawn (1991), to compute the spectral density from the exponent measure. In

    the bivariate case, the point masses of H on 0 and 1 are

    H({0}) = −12limx→0

    ∂ν

    ∂y(x, y), (3.11)

    H({1}) = −12limy→0

    ∂ν

    ∂x(x, y). (3.12)

    and the density for 0 < ω < 1 is given by

    h(ω) = −12

    ∂2ν(x, y)

    ∂x∂y

    ∣∣∣∣(ω,1−ω)

    . (3.13)

    Next we will consider some examples of spectral and exponent measures.

    Example 3.2.5. We start by considering two important special cases of H. The �rst is thedistribution function which places a point mass of 1 on ω = 12 . In this case we obtain

    G0(x, y) = exp(−max

    (x−1, y−1

    )), x, y > 0,

    which corresponds to complete dependence between the two variables. Here G0 is not ab-solutely continuous, so the method discussed above does not apply. The second case is the

    distribution function which places point mass of 12 on both ω = 0 and ω = 1. In this case itfollows that

    G0(x, y) = exp(−(x−1 + y−1

    )), x, y > 0,

    which corresponds to independence between the two variables. Here G0 is absolutely contin-uous, though with a spectral measure putting masses of 12 at 0 and 1.a �

  • Domain of attraction and asymptotic independence 41

    Example 3.2.6. The logistic model (Gumbel, 1960a,b) given by

    ν(x, y) =(x−

    1α + y−

    )α, x, y > 0, 0 < α < 1,

    is the oldest parametric family of bivariate extreme value dependence structures. It is a

    versatile model which covers all levels of dependence from independent variables to completely

    dependent variables. We see that for α→ 0 we get

    ν(x, y) = max(x−1, y−1

    )and for α→ 1 it follows that

    ν(x, y) = x−1 + y−1,

    which corresponds to complete dependence and independence between the variables, respec-

    tively. The logistic model does however not allow for asymmetry in the dependence structure,

    as the variables are exchangeable.

    From the exponent measure we can compute the point mass of H at 0

    H({0}) = 12limx→0

    y−1α−1(x−

    1α + y−

    )α−1= 0,

    using (3.11). Because of symmetry the point mass of H at 1 is also 0. The spectral densityon (0, 1) can be found using (3.13). We start by �nding

    ∂2ν(x, y)

    ∂x∂y= −1− α

    αx−

    1α−1y−

    1α−1(x−

    1α + y−

    )α−2.

    From this we obtain the spectral density on (0, 1)

    h(ω) =1

    2

    1− αα

    ω−1α−1(1− ω)−

    1α−1(ω−

    1α + (1− ω)−

    )α−2.

    a �

    3.3 Domain of attraction and asymptotic independence

    In order to discuss the domain of attraction in the multivariate case we �rst need to introduce

    the concept of max stability.

    De�nition 3.3.1. If there exists sequences of constants (bn)∞n=1, (dn)

    ∞n=1 and sequences of

    positive constants (an)∞n=1 and (cn)

    ∞n=1 such that

    Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R, ∀n ∈ N, (3.14)

    for some distribution function G. Then G belongs to the class of max stable distributions.

    With this de�nition we are now able to discuss the bivariate max domain of attraction.

    De�nition 3.3.2. Let G : R2 → R+ be a max stable distribution function. A distributionfunction FXY is said to be in the max domain of attraction of G if there exists sequences ofconstants (bn)

    ∞n=1, (dn)

    ∞n=1 and sequences of positive constants (an)

    ∞n=1 and (cn)

    ∞n=1 such that

    limn→∞

    FnXY (anx+ bn, cny + dn) = G(x, y) (3.15)

    for all x, y ∈ R.

  • 42 Domain of attraction and asymptotic independence

    Our next proposition shows that the class of max stable distributions and the class of extreme

    value distributions coincide.

    Proposition 3.3.3. A distribution function G is max stable if and only if it is an extremevalue distribution.

    Proof. Assume G is a max stable distribution. Then by De�nition 3.3.1 there exists sequencesof constants (bn)

    ∞n=1, (dn)

    ∞n=1 and sequences of positive constants (an)

    ∞n=1 and (cn)

    ∞n=1 such

    that

    Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R, ∀n ∈ N.

    Since

    limn→∞

    Gn(anx+ bn, cny + dn) = G(x, y), ∀x, y ∈ R,

    it follows by Theorem 3.1.1 that G is an extreme value distribution.Now, assume that G is an extreme value distribution. We can without loss of generalityassume that G is on the same form as G0 de�ned in Theorem 3.1.1. By De�nition 3.2.2 andTheorem 3.2.3, it follows that

    Gn(nx, ny) = exp(−nν(nAx,y)), ∀x, y ∈ R, ∀n ∈ N= exp(−ν(Ax,y))= G(x, y).

    So G satis�es De�nition 3.3.1 with an = cn = n and bn = dn = 0, and is hence a max stabledistribution.

    Next we present a theorem which gives some equivalent formulations of the max domain of

    attraction condition.

    Theorem 3.3.4. (de Haan and Ferreira, 2006) Let G be a max stable distribution. Let the

    marginal distribution functions be exp(− (1 + γ1x)

    − 1γ1

    )and exp

    (− (1 + γ2y)

    − 1γ2

    ), and let

    H be its spectral measure according to the representation of Theorem 3.2.4. Then

    (i) If the distribution function FXY of the random vector (X,Y ) with continuous marginaldistribution functions FX and FY is in the max domain of attraction of G, then thefollowing equivalent conditions are ful�lled:

    (a) With UX and UY being the tail quantile functions of FX and FY , we have forx, y > 0, that

    limt→∞

    1− FXY (UX(tx), UY (tx))1− FXY (UX(t), UY (t))

    = S(x, y) (3.16)

    with S(x, y) :=logG

    (xγ1−1

    γ1, y

    γ2−1γ2

    )logG(0,0) .

    (b) For all r > 1 and all s ∈ [0, 1] that are continuity points of H,

    limt→∞

    P

    (V +W > rt and

    V

    V +W≤ s∣∣∣∣V +W > t) = r−1H(s), (3.17)

    where V := 11−FX(X) and W :=1

    1−FY (Y )

  • Domain of attraction and asymptotic independence 43

    (ii) Conversely, if the continuous marginal distribution functions FX and FY are in the do-

    main of attraction of exp(− (1 + γ1x)

    − 1γ1

    )and exp

    (− (1 + γ2y)

    − 1γ2

    ), respectively, and

    any limit relation (3.16)-(3.17) holds for some positive function S or some distributionfunction H, then FXY is in the max domain of attraction of G.

    We saw in Example 3.2.5 that there exists a special case of the spectral measure, where the

    max stable distribution has independent components. This gives inspiration to the following

    de�nition.

    De�nition 3.3.5. A random vector (X,Y ) whose distribution function FXY is in the domainof attraction of a max stable distribution with independent components, is said to have the

    property of asymptotic independence.

    From this de�nition we are able to obtain the following theorem.

    Theorem 3.3.6. (de Haan and Ferreira, 2006) Let FXY : R2 → R+ be a probability distribu-tion function. Suppose that its marginal distribution functions FX : R → R+ and FY : R → R+satisfy

    limn→∞

    FnX (anx+ bn) = exp(− (1 + γ1x)

    − 1γ1

    )and

    limn→∞

    FnY (cny + dn) = exp(− (1 + γ2y)

    − 1γ2

    )for all x, y for which 1 + γ1x > 0, 1 + γ2y > 0 and where (bn)

    ∞n=1, (dn)

    ∞n=1 are sequences of

    real constants and (an)∞n=1 and (cn)

    ∞n=1 are sequences of positive real constants. Let (X,Y ) be

    a random vector with distribution function FXY . If

    limt→∞

    P (X > UX(t), Y > UY (t))

    P (Y > UY (t))= 0, (3.18)

    then

    limn→∞

    FnXY (anx+ bn, cny + dn) = exp(− (1 + γ1x)

    − 1γ1 − (1 + γ2y)

    − 1γ2

    )for 1 + γ1x > 0 and 1 + γ2y > 0. Hence X and Y are asymptotically independent.Conversely, asymptotic independence entails (3.18).

    Proof. Assume (3.18) holds. Then also

    limt→∞

    tP (X > UX(t), Y > UY (t))

    tP (Y > UY (t))= 0.

    Using Theorem 1.1.2 (i) and (iii) with x = 0 we �nd that

    limt→∞

    tP (Y > UY (t)) = 1, (3.19)

    and hence

    limt→∞

    tP (X > UX(t), Y > UY (t)) = 0.

    Because of monotonicity, it follows that

    limt→∞

    tP (X > UX(tx), Y > UY (ty)) = 0, ∀x, y > 0,

  • 44 Pickands dependence function

    and then also for the set Ãx,y :={(s, t) ∈ R2+ : s > x and t > y

    }we have

    ν(Ãx,y

    )= lim

    n→∞νn

    (Ãx,y

    )= lim

    n→∞nP (X > UX(tx), Y > UY (ny))

    = 0, ∀x, y > 0.

    This means that the spectral measure puts its entire mass on the lines x = 0 and y = 0, i.e.

    H[{0}] = 12

    and H[{1}] = 12.

    This is equivalent to X and Y being asymptotically independent.Conversely, assume that X and Y are asymptotically independent. Then

    G0(x, y) = exp(−x−1 − y−1

    ), x, y > 0,

    and hence for x = y = 1 we have

    G0(1, 1) = exp (−2) .

    Using Corollary 3.1.3, this implies that

    2 = limt→∞

    t (1− P (X ≤ UX(t), Y ≤ UY (t))) (3.20)

    = limt→∞

    t (P (X > UX(t)) + P (Y > UY (t))− P (X > UX(t), Y > UY (t))) . (3.21)

    From Theorem 1.1.2 (i) and (iii) it follows that

    limt→∞

    tP (X > UX(t), Y > UY (t)) = 0,

    and hence by (3.19), we have that

    limt→∞

    P (X > UX(t), Y > UY (t))

    P (Y > UY (t))= 0.

    3.4 Pickands dependence function

    Whereas the dependence measures we have discussed previously have straightforward general-

    izations from the bivariate case to the multidimensional case, this is not true for the following

    dependence measure. This is strictly a bivariate dependence measure. The dependence mea-

    sure we are going to discuss is related to the function L : R2+ → R given by

    L(x, y) := − logG0(1

    x,1

    y

    ). (3.22)

    This can also be expressed in terms of the exponent measure as

    L(x, y) = ν

    {(s, t) ∈ R2+ : s >

    1

    xor t >

    1

    y

    }

  • Pickands dependence function 45

    using (3.8), or in terms of the spectral measure as

    L(x, y) = 2

    ∫ 10

    (ωx ∨ (1− ω)y) dH(ω) (3.23)

    using (3.10). The function L has the following properties. These are easy to derive from theproperties of the exponent and spectral measure and will therefore for brevity not be proven

    here.

    Proposition 3.4.1. (de Haan and Ferreira, 2006) Let L be as de�ned in (3.22). Then L hasthe following properties.

    (i) Homogeneity of order 1: L(ax, ay) = aL(x, y), for all a, x, y > 0.

    (ii) L(x, 0) = L(0, x) = x, for all x > 0.

    (iii) x ∨ y ≤ L(x, y) ≤ x+ y, for all x, y > 0.

    (iv) Let (X,Y ) be a random vector with distribution function G0(x, y). If X and Y areindependent, then L(x, y) = x + y, for x, y > 0. If X and Y are completely dependent,then L(x, y) = x ∨ y for x, y > 0.

    (v) L is continuous.

    (vi) L(x, y) is a convex function: L (λ (x1, y1) + (1− λ) (x2, y2)) ≤ λL (x1, y1)+(1− λ)L (x2, y2)for all x1, x2, y1, y2 > 0 and λ ∈ [0, 1].

    From the function L we can obtain the Pickands dependence function A : [0, 1] → R introducedin Pickands (1981). This function is given by

    A(t) := − logG0(

    1

    1− t,1

    t

    )= L(1− t, t). (3.24)

    If we let t = yx+y we easily �nd that

    L(x, y) = (x+ y)A

    (y

    x+ y

    ),

    and hence Pickands dependence function completely determines the function L.Pickands dependence function can easily be connected to the spectral measure through the

    function L. If we combine (3.23) and (3.24) we get

    A(t) = 2

    ∫[0,1]

    (ω(1− t) ∨ (1− ω)t)dH(ω)

    = 2t

    ∫[0,t]

    (1− ω)dH(ω) + 2(1− t)∫(t,1]

    ωdH(ω).

    Since H has mean 12 we have that∫[0,1] ωdH(ω) =

    ∫[0,1](1−ω)dH(ω) =

    12 . Using this it follows

    that ∫(t,1]

    ωdH(ω) =1

    2−H([0, t]) +

    ∫[0,t]

    (1− ω)dH(ω).

  • 46 Pickands dependence function

    Hence

    A(t) = 2

    ∫[0,t]

    (1− ω)dH(ω) + (1− t) (1− 2H([0, t])) .

    The term∫[0,t](1− ω)dH(ω) can also be written as∫

    [0,t](1− ω)dH(ω) =

    ∫[0,t]

    ∫[ω,1]

    dudH(ω)

    =

    ∫[0,1]

    ∫[0,u∧t]

    dH(ω)du

    =

    ∫[0,t]

    ∫[0,u]

    dH(ω)du+

    ∫(t,1]

    ∫[