parameter estimation for the truncated pareto distribution

Parameter Estimation for the Truncated ParetoDistribution

Inmaculada B. Aban1, Mark M. Meerschaert1 , and Anna K. Panorska2

Department of Mathematics and Statistics, University of Nevada

May 15, 2004

Abstract. The Pareto distribution is a simple model for nonnegative data with a power

law probability tail. In many practical applications, there is a natural upper bound that

truncates the probability tail. This paper develops parameter estimates for the truncated

Pareto distribution, including a simple test to detect whether a given sample is truncated. We

illustrate these methods with applications from finance, hydrology and atmospheric science.

Keywords and phrases: Pareto distribution, truncation, maximum likelihood estimator,

order statistics, tail behavior.

1. INTRODUCTION

A random variable X has a Pareto distribution if P (X > x) = Cx−α for some α > 0 (see

Johnson et al. [25] or Arnold [4]). This distributional model is important in applications

because many data sets are observed to follow a power law probability tail, at least ap-

proximately, for large values of x. Stable distributions with index α and max-stable type II

extreme value distributions with index α are also asymptotically Pareto in their probability

tails, and this fact has been used frequently to develop tail estimators for those distributions.

In some applications, there is a natural upper bound on the probability tail that truncates

the Pareto law. In other cases, there is empirical evidence that a truncated Pareto gives

a better fit to the data. Since both sums (if α < 2) and extremes (for any α > 0) fall in

1Partially supported by NSF grant DMS-01399272Partially supported by NSF grants DMS-0139927 and ATM-0231781

1

2

a different domain of attraction depending on whether a data set follows a Pareto or trun-

cated Pareto distribution, it is also useful to test for possible truncation in a data set with

evidence of power law tails. In this paper, we develop parameter estimates for the truncated

Pareto distribution that are easy to compute, we prove their consistency (and in some cases,

asymptotic normality), and we develop a simple test to distinguish between truncated and

unbounded Pareto distributional models on the basis of data. These methods should be

useful to practitioners in areas of science and engineering where power law probability tails

are prevalent.

Heavy tail random variables are important in applications to finance [16, 17, 24, 28, 31, 32,

34, 38], physics [5, 26, 27, 33, 35], hydrology [2, 8, 9, 23, 29, 44], engineering [36, 40, 41, 45]

and many other fields [1, 18, 42]. Recently, Burroughs and Tebbens [12, 13] have surveyed

evidence of truncated power law distributions in data sets on earthquake magnitudes, forest

fire areas, fault lengths (on Earth and on Venus), and oil and gas field sizes. Earthquake

fault sizes are limited by the physical (or terrain) considerations, see Scholtz and Contreras

[43] or Pacheco et al. [37]. The size of forest fires may be naturally limited by the availability

of fuel and climate, see Malamud et al. [30]. Additional applications of truncated power

law distributions in finance, groundwater hydrology, and atmospheric science are given in

Section 5 of this paper.

Burroughs and Tebbens estimate parameters of the truncated Pareto distribution by least-

squares fitting on a probability plot [11] and by minimizing mean squared error fit on a

plot of the tail distribution function [13]. Minimum variance unbiased estimators for the

parameters of a truncated Pareto law were developed in Beg [6, 7]. Beg [6] also provides

maximum likelihood estimates of the lower truncation parameter, scale and probability of

exceedance for a truncated Pareto distribution. The maximum likelihood estimator of α,

when the lower truncation limit is known was presented in [14], with some recommendations

for the case when the lower truncation limit was not known. This paper develops maximum

likelihood estimators (MLE) for all parameters of a truncated Pareto distribution. Existence

and uniqueness of the MLE are proven under certain conditions that are shown to hold with

probability approaching one as the sample size increases. A simple formula for the MLE is

3

obtained that can be easily computed. Asymptotic normality is established for the estimator

of the tail parameter α, and a simple test is proposed to distinguish between Pareto and

upper-truncated Pareto models. We also consider the case where only the upper tail of the

data fits a truncated Pareto distribution, and we compute the conditional MLE based on

the upper order statistics. These results can be used for robust tail estimation for truncated

models with power law tails (e.g., stable data). Examples from hydrology, climatology and

finance illustrate the utility and practical application of these results.

2. MAXIMUM LIKELIHOOD ESTIMATION FOR THE ENTIRE SAMPLE

A random variable W has a Pareto distribution function if

(2.1) P (W > w) = γα w−α for w ≥ γ > 0, and α > 0.

An upper-truncated Pareto random variable, X, has distribution

(2.2) FX(x) = P (X > x) =γα(x−α − ν−α)

1 − (γ/ν)α

and density

(2.3) fX(x) =αγαx−α−1

1 − (γ/ν)α

for 0 < γ ≤ x ≤ ν < ∞ where γ < ν. Consider a random sample X1, X2, . . . , Xn from this

upper-truncated distribution and let X(1) ≥ X(2) ≥ . . . ≥ X(n) denote their order statistics.

We now derive the maximum likelihood estimator (MLE) for the unknown parameters. We

start by considering the simplest case, where γ and ν are both known.

Theorem 2.1. The MLE α of α for an upper-truncated Pareto defined in (2.2) where γ and

ν are known solves the equation

(2.4)n

α+

n (γ/ν)α ln(γ/ν)

1 − (γ/ν)α−

n∑

i=1

[ln X(i) − ln γ] = 0

Furthermore, α has an asymptotic normal distribution with asymptotic mean α and asymp-

totic variance {1

α2−(

γν

)α [ln(

γν

)]2[1 −

(γν

)α]2

}−1

4

Proof. The likelihood function under this setting is given by

(2.5) L(γ, ν, α) =αn γnα

[1 − (γ/ν)α]n

n∏

i=1

X−α−1i

n∏

i=1

I{0 < γ ≤ Xi ≤ ν < ∞},

where γ < ν. The natural logarithm of this likelihood is

(2.6) −n ln[1 − (γ/ν)α] + n ln α + n α ln γ − (α + 1)n∑

i=1

ln X(i).

for 0 < γ ≤ Xi ≤ ν < ∞, i = 1, . . . , n. Taking the derivative with respect to α, we get

n

α+

n (γ/ν)α ln(γ/ν)

1 − (γ/ν)α−

n∑

i=1

[ln X(i) − ln γ].

Setting the above equation to zero and solving, we get the MLE for α.

To obtain the asymptotic distribution of the MLE α, note that the density in (2.3) may

be written as

fX(x) = exp{B(α)T (x) − A(α)}H(x)

where A(α) = − ln(αγα) + ln[1 − (γ/ν)α], B(α) = α, T (x) = − ln x, and H(x) =

x−1I{0 < γ ≤ x ≤ ν < ∞}. Hence, when γ and ν are fixed and known, this upper-truncated

Pareto model is a one-parameter exponential family of distributions. Applying the results

on the asymptotic properties of the MLE for a one-parameter exponential family (see for

instance Theorem 5.3.5 in [10]), α has an asymptotic normal distribution with an asymptotic

mean of α and an asymptotic variance given by the inverse of the Fisher information. In

this case, the Fisher information is found to be

∂2A(α)

∂α2=

1

α2−(

γν

)α [ln(

γν

)]2[1 −

(γν

)α]2

�

Remark 2.2. The MLE for α presented in equation (2.4) appeared first in [14] for the case,

when the lower truncation limit γ was known. The authors also had recommendations on

the computation of the estimate when γ was unknown.

Next we obtain the MLEs for the parameters of an upper-truncated Pareto where α, γ,

and ν are all unknown.

5

Theorem 2.3. Consider a random sample X1, X2, . . . , Xn from an upper-truncated Pareto

defined in (2.2) where α, γ and ν are unknown. The maximum likelihood estimators for the

parameters in this model are given by

γ = X(n) = min(X1, X2, . . . , Xn), ν = X(1) = max(X1, X2, . . . , Xn)

and α solves the equation

(2.7)n

α+

n [X(n)/X(1)]α ln[X(n)/X(1)]

1 − [X(n)/X(1)]α−

n∑

i=1

[ln X(i) − ln X(n)] = 0

Proof. The likelihood function for this model is given by

L(γ, ν, α) =αn γnα

[1 − (γ/ν)α]n

n∏

i=1

X−α−1i

n∏

i=1

I{γ ≤ Xi ≤ ν}

Note that∏n

i=1 I{γ ≤ Xi ≤ ν} = 1 if and only if X(n) ≥ γ and X(1) ≤ ν. Since the

likelihood function L is an increasing function of γ for γ ≤ X(n) and a decreasing function of

ν for ν ≥ X(1), then for a fixed α, the likelihood is maximized when γ = X(n) and ν = X(1).

Replacing γ and ν by X(n) and X(1), respectively, and taking the natural log of the likelihood,

we get

−n ln[1 − (X(n)/X(1))α] + n ln α + n α ln X(1) − (α + 1)

n∑

i=1

ln X(i).

Taking the partial derivative with respect to α and setting it equal to zero, the MLE for α

is α which solves the equation (2.7). �

When γ and ν are unknown, the upper-truncated Pareto distribution has support that

depends on these unknown parameters, and hence, it is not a member of a multiparameter

exponential family of distributions. Furthermore, it does not satisfy the regularity conditions

necessary to directly apply the asymptotic properties of MLEs to derive the asymptotic

distribution of α. We will instead use a Taylor series expansion to obtain the asymptotic

properties of α. First we need to show that γ and ν are consistent estimators for γ and ν,

respectively.

Lemma 2.4. Under the conditions of Theorem 2.3, γ and ν are consistent estimators of γ

and ν.

6

Proof. The cdf of the minimum, X(n), of a truncated Pareto random sample is

FX(n)(x) = P{X(n) ≤ x} = 1 −

[(γx

)α −(

γν

)α

1 −(

γν

)α]n

For any small ε > 0, P{|X(n) − γ| > ε} = P{X(n) > γ + ε}, and consequently

P{|X(n) − γ| > ε} =

(γ

γ+ε

)α

−(

γν

)α

1 −(

γν

)α

n

→ 0

as n → ∞. Thus, γ = X(n) converges in probability to γ, and hence γ is a consistent

estimator of γ.

Similarly, we can show that ν is a consistent estimator of ν. The cdf of the maximum,

X(1), of a truncated Pareto random sample is

FX(1)(x) = P{X(1) ≤ x} =

[1 −

(γx

)α

1 −(

γν

)α]n

and hence for any small ε > 0,

P{|X(1) − ν| > ε} = P{X(1) ≤ ν − ε} =

[1 −

(γ

ν−ε

)α

1 −(

γν

)α]n

→ 0

as n → ∞. Hence, ν = X(1) converges in probability to ν. �

Now we derive the asymptotic properties of α.

Theorem 2.5. Under the conditions of Theorem 2.3, the MLE α of α has an asymptotic

normal distribution with asymptotic mean α.

Proof. For a given γ and ν, suppose that h ≡ h(γ, ν) solves the equation

(2.8) G =1

h+

(γ/ν)h ln(γ/ν)

1 − (γ/ν)h− 1

n

n∑

i=1

(ln X(i) − ln γ

)= 0

for a fixed γ and ν. Then from Theorem 2.1, α = h(γ0, ν0) where γ0 and ν0 are the true values

of γ and ν, respectively, and by Theorem 2.3, α = h(γ, ν) where γ = X(n) and ν = X(1). By

a Taylor series expansion about γ0 and ν0,

(2.9) α = h(γ, ν) ≈ h(γ0, ν0) + hγ(γ0, ν0)(γ − γ0) + hν(γ0, ν0)(ν − ν0)

7

where hγ(γ0, ν0) and hν(γ0, ν0) are the partial derivatives of h with respect to γ and ν,

respectively, evaluated at γ0 and ν0. By Lemma 2.4, the last two terms on the right side

goes to zero in probability as n → ∞ provided that hγ(γ0, ν0) and hν(γ0, ν0) are bounded.

Applying Slutsky’s Theorem (see for instance [10]), it then follows that α has asymptotic

mean α and converges in distribution to an asymptotic normal distribution.

We finish the proof by showing that the partial derivatives hγ(γ0, ν0) and hν(γ0, ν0) are

bounded. If (2.8) holds then

d G

d γ=

∂ G

∂γ+

∂G

∂h

∂h

∂γ= 0 implying hγ(γ0, ν0) =

∂h

∂γ=

−∂ G/∂γ

∂G/∂h

and similarly, hν = (−∂ G/∂ν)/(∂G/∂h). Since neither ∂ G/∂γ nor ∂ G/∂ν have any singu-

larities for 0 < γ < ν and h > 0, both hγ(γ0, ν0) and hν(γ0, ν0) are bounded if we can show

that ∂G/∂h 6= 0. Let b = γ/ν. In fact we have

(2.10)∂G

∂h=

−(1 − bh)2 + h2bh(ln b)2

h2(1 − bh)2< 0 for every 0 < b < 1 and h > 0

To see this, first note that

(2.11) ex − e−x > 2x for all x > 0.

Now (2.10) is equivalent to

h2bh(ln b)2 < (1 − bh)2

−hbh/2(ln b) < 1 − bh

−h ln b < b−h/2 − bh/2

(2.12)

and the last line is true for all h > 0 using (2.11) where x = (−h/2) ln b. �

Remark 2.6. The asymptotic variance of α is not the same as the asymptotic variance of α

since adjustments need to be made on the asymptotic variance of α because γ and ν were

estimated. Furthermore, the standard theory of obtaining asymptotic variance using the

Fisher information matrix is no longer applicable.

The next result addresses the issue of existence and uniqueness of the MLE for α, obtained

by solving (2.4) in the case where γ and ν are known, or (2.7) in the case where all three

parameters are estimated.

8

Theorem 2.7. The probability that a solution to the MLE equation (2.4) or (2.7) exists

tends to one as n → ∞, and if it exists, the solution is unique.

Proof. First we consider the case where γ and ν are known. In this case the MLE α = h

where h > 0 solves the equation

(2.13) G(h) =1

h+

bh ln b

1 − bh− 1

n

n∑

i=1

(ln X(i) − ln γ

)= 0

where b = γ/ν ∈ (0, 1). Write M = (1/n)∑n

i=1

(ln X(i) − ln γ

)and compute

(2.14) limh→0

G(h) =− ln b

2− M and lim

h→∞G1(h) = 0 − M.

In view of (2.10) we have that ∂G/∂h < 0 for all h > 0 so that G is monotone. Hence,

the solution to G(h) = 0 is unique if it exists, and it exists if and only if M < (− ln b)/2.

As n → ∞, we have M → E{ln(X/γ)} in probability by the law of large numbers, where

W = ln(X/γ) has a truncated exponential distribution with density

g(w) =α exp{−α w}

1 − exp{−α(− ln b)}supported on 0 < w < − ln b. Since the density of W is monotone decreasing, a simple

geometrical argument shows that the mean of W must lie in the left half of the interval

[0,− ln b], and hence EW < −(1/2) ln b. Then P{M < (− ln b)/2} → 1 as n → ∞, so that

the MLE α exists with probability approaching one as n → ∞.

In the case where all three parameters are estimated, the MLE α = h solves

(2.15) G(h) =1

h+

bh ln b

1 − bh− 1

n

n∑

i=1

(ln X(i) − ln X(n)

)= 0

where b = X(n)/X(1) ∈ (0, 1) with probability one. Write M ′ = (1/n)∑n

i=1

(ln X(i) − ln X(n)

)

and compute

(2.16) limh→0

G(h) =− ln b

2− M ′ and lim

h→∞G1(h) = 0 − M ′

as before. Now (2.10) shows that G is monotone, and hence, the solution α = h to G(h) = 0

is unique if it exists, and it exists if and only if M ′ = M+ln(γ/X(n)) < (− ln b)/2. Lemma 2.4

shows that X(n) → γ in probability, and then the continuous mapping theorem implies that

ln(γ/X(n)) → 0 in probability. Since M → E{ln(X/γ)} in probability another application

9

of the continuous mapping theorem shows that M ′ → E{ln(X/γ)} in probability as well,

and the same argument as before shows that the MLE α exists with probability approaching

one as n → ∞. �

3. MAXIMUM LIKELIHOOD ESTIMATION FOR THE TAIL

In many practical applications, a truncated Pareto may be fit to the upper tail of the data,

if for sufficiently large x > 0,

P (X > x) ≈ γα(x−α − ν−α)

1 − (γ/ν)α, 0 < γ ≤ x ≤ ν.

In this case, we estimate the parameters by obtaining the conditional maximum likelihood

estimate based on the r +1 (0 ≤ r < n) largest order statistics representing only the portion

of the tail where the truncated Pareto approximation holds. This extends the well-known

Hill’s estimator [21, 22] to the case of a truncated Pareto distribution.

Theorem 3.1. When X(r) > X(r+1) the conditional maximum likelihood estimator for the

parameters of the upper-truncated Pareto in (2.2) based on the r + 1 largest order statistics

is given by

ν = X(1)

γ = r1/α(X(r+1))[n − (n − r)

(X(r+1)/X(1)

)α]−1/α


(3.1) 0 =r

α+

r(X(r+1)/X(1)

)αln(X(r+1)/X(1)

)

1 −(X(r+1)/X(1)

)α −r∑

i=1

[ln X(i) − ln X(r+1)]

Proof. The proof is similar to that of the Hill’s estimator ([22]). It is convenient to transform

the data, taking Z(i) = (X(n−i+1))−1 so that

G(z) = P (Z ≤ z) =γα(zα − ν−α)

1 − (γ/ν)α

and Z(1) ≥ · · · ≥ Z(n). Since U(i) = G(Z(i)) are (decreasing) order statistics from a uni-

form distribution, E(i) = − ln U(i) are (increasing) order statistics from a unit exponential.

Following [15] pp. 20–21, we let Yi = (n − i + 1)(E(i) − E(i−1)), and hence, one can easily

check {Yi, i = 1, . . . , n} are independent and identically distributed unit exponential. Define

10

Y ∗ = nE(n−r+1) = n(Y1/n + · · · + Yn−r+1/r) so that Yn, . . . , Yn−r+2, Y∗ are mutually inde-

pendent with joint density exp(−y(n) −· · ·− y(n−r+2))p(y∗), where p(y∗) is the density of Y ∗.

Since U(n−r+1) has density K1ur−1(1 − u)n−r it follows that Y ∗ = −n ln U(n−r+1) has density

p(y) = K2 exp(−y/n)r(1 − exp(y/n))n−r where {Kj, j = 1, 2} are positive constants. Use

the fact that

Y(i) = (n− i + 1)[− lnG(Z(i)) + ln G(Z(i−1))] = (n− i + 1)[ln(Zα(i−1) − ν−α)− ln(Zα

(i) − ν−α)],

for i = n − r + 2, . . . , n, and

exp(−Y ∗/n) = G(Z(n−r+1)) =γα(Zα

(n−r+1) − ν−α)

1 − (γ/ν)α,

to obtain the likelihood conditional on the values of the r + 1 smallest order statistics,

Z(n) = z(n), . . . , Z(n−r+1) = z(n−r+1), as

K

[ r∏

i=1

αzα−1(n−i+1)

zα(n−i+1) − ν−α

]exp

[−

r−1∑

i=1

i[ln(zα(n−i) − ν−α) − ln(zα

(n−i+1) − ν−α)]

]

×[γα(zα

(n−r+1) − ν−α)

1 − (γ/ν)α

]r[1 −

γα(zα(n−r+1) − ν−α)

1 − (γ/ν)α

]n−r r∏

i=1

I{1/ν ≤ z(n−i+1) ≤ 1/γ},

where K > 0 does not depend on the data or the parameters and the first product term is

the Jacobian for the transformations defined by Yn, . . . , Yn−r+2, Y∗. Note that

r−1∑

i=1

i[ln(zα(n−i) − ν−α) − ln(zα

(n−i+1) − ν−α) = r ln(zα(n−r+1) − ν−α) −

r∑

i=1

ln(zα(n−i+1) − ν−α)

Next, condition on Z(n−r+1) ≤ d < Z(n−r), which multiplies the conditional likelihood by

a factor of [1 − γα(dα − ν−α)

1 − (γ/ν)α

]n−r[1 −

γα(zα(n−r+1) − ν−α)

1 − (γ/ν)α

]−(n−r)

.

Then the conditional likelihood function simplifies to

K αr γrα[1 − (γd)α]n−r

[1 − (γ/ν)α]n

[ r∏

i=1

zα−1(n−i+1)

] r∏

i=1

I{1/ν ≤ z(n−i+1) ≤ 1/γ}.

Substitute β = (γd)α to obtain

K αr βrd−rα(1 − β)n−r

[1 − β(d ν)−α]n

[ r∏

i=1

zα−1(n−i+1)

] r∏

i=1

I{1/ν ≤ z(n−i+1) ≤ 1/γ}.

11

In terms of the original data, this conditional likelihood becomes

K αr βrDrα(1 − β)n−r

[1 − β(D/ν)α]n

[ r∏

i=1

x−α+1(i)

][ r∏

i=1

x−2(i)

] r∏

i=1

I{γ ≤ x(i) ≤ ν}.

where d = D−1 and the product term

[∏ri=1 x−2

(i)

]is the Jacobian associated with the trans-

formations X(i) = (Z(n−i+1))−1, for i = 1, . . . , r. Note that

∏ri=1 I{γ ≤ x(i) ≤ ν} if and only

if x(1) ≤ ν and x(r) ≥ γ. Consequently, whenever x(1) ≤ ν and x(r) ≥ γ, the conditional

log-likelihood is

ln L = K0+r ln α+r ln β+rα ln D+(n−r) ln(1−β)−n ln(1−β(D/ν)α)−(α+1)r∑

i=1

ln(x(i)),

and we seek the global maximum over the parameter space consisting of all (α, β) for which

α > 0 and 0 < β < 1. Since this conditional log-likelihood is a decreasing function of ν for

ν ≥ X(1), the conditional MLE for ν is ν = X(1). Substituting the ν for ν and taking partial

derivatives of the conditional log-likelihood with respect to α and β, we get

∂ ln L

∂α=

r

α+

nβ(D/X(1))α ln(D/X(1))

1 − β(D/X(1))α−

r∑

i=1

[ln X(i) − ln D]

∂ ln L

∂β=

r

β− n − r

1 − β+

n(D/X(1))α

1 − β(D/X(1))α

(3.2)

We obtain the MLE for α and β by setting these partial derivatives to zero and solving. The

solution to (3.2) depends on the value of D, which will be unknown in practice. Since we

assume X(r) ≥ D > X(r+1) we could estimate D by X(r) or X(r+1) or some point in between.

We use the estimate D = X(r+1) to be consistent with standard usage for Hill’s estimator

[3, 19, 21, 24, 28, 40, 41], MLE for α in a Pareto. Setting the equations in (3.2) to zero and

solving, we find that

β = r[n − (n − r)

(X(r+1)/X(1)

)α]−1


0 =r

α+

r(X(r+1)/X(1)

)αln(X(r+1)/X(1)

)

1 −(X(r+1)/X(1)

)α −r∑

i=1

[ln X(i) − ln X(r+1)].

We complete the proof of the theorem by using γ = Dβ1/α. �

12

Remark 3.2. The conditional maximum likelihood estimator ˆαTP of α given in Theorem 3.1,

is smaller than the Hill’s estimator αH of α. This is easy to see after noticing that the second

term on the right-hand side of equation (3.1) is always negative.

Remark 3.3. In some applications, it may be natural to employ a truncated and possibly

shifted exponential distribution. All of the results in this section and the previous section

also apply to that case, since Y = lnX has a truncated shifted exponential distribution with

P (Y > y) =e−α(y−ln γ) − (γ/ν)α

1 − (γ/ν)αfor ln γ ≤ y ≤ ln ν

if and only if X has a truncated Pareto distribution. If γ = 1 then Y has a truncated

exponential distribution with no shift.

4. Test for Truncation

In this section, we present a simple test to distinguish whether a given data set is more

appropriately modeled by a Pareto or truncated Pareto distribution. The truncated Pareto

distribution is given by (2.2) with 0 < γ < ν < ∞, which reduces to the Pareto distribution

(2.1) in the limit ν = ∞. An asymptotic q-level test (0 < q < 1) for H0 : ν = ∞ (Pareto)

versus Ha : ν < ∞ (truncated Pareto) can be constructed using the following simple result

from extreme value theory.

Lemma 4.1. Suppose that X1, X2, . . . , Xn are independent and identically distributed accord-

ing to a Pareto distribution, and let X(1) ≥ X(2) ≥ . . . ≥ X(n) denote their order statistics.

Then n−1/α X(1)d→ Y as n → ∞ where P (Y ≤ t) = exp{−Ct−α}, and C = γα in (2.1).

Proof. Write

P{n−1/αX(1) ≤ x} = P{X(1) ≤ n1/αx} =

(1 − Cx−α

n

)n

→ exp(−Cx−α

)

which proves the lemma. �

13

The random variable Y has a max-stable or Type II extreme value distribution. The

lemma implies that for large n, the maximum X(1) ≈ n1/α Y and then

P (X(1) ≤ x) ≈ P (n1/αY ≤ x) = P (Y ≤ n−1/α x)

= exp{−C(n−1/α x)−α}

= exp{−n C x−α}.

(4.1)

Then an asymptotic level q test (0 < q < 1) rejects the null hypothesis H0 : ν = ∞

(Pareto) in favor of the alternative Ha : ν < ∞ (truncated Pareto) if X(1) < xq where

q = exp{−n C x−αq }. This test rejects H0 if and only if

(4.2) X(1) <

(nC

− ln q

)1/α

and the p-value of this test is given by

(4.3) p = exp{−n C X−α(1) }.

In practice we use Hill’s estimator [22]

αH =

[r−1

r∑

i=1

{ln X(i) − ln X(r+1)}

]−1

and C = (r/n)(X(r+1))αH

to estimate the parameters C and α. Note that a small p-value in (4.3) indicates that the

Pareto model is not a good fit, and this is not enough in itself to indicate goodness of fit

of the truncated Pareto distribution. Hence the test should always be supplemented by a

graphical check of the data tail, as illustrated in the next section.

We performed Monte Carlo simulations to investigate the small sample property of the

proposed test for truncation. We generated a random sample from a Pareto distribution with

α = 0.5 and C = 1 for varying n ∈ (100, 500, 1000). The values of r were also varied based

on the percentage of the total sample size n. In particular, we considered the largest 10%,

30% and 100% order statistics. We performed the proposed test at 5% level of significance

for each combination of n and r. We repeated this process 1000 times for each case. The

simulated significance level was the proportion of times out of 1000 that the null hypothesis

was rejected. The results of these simulations are summarized in Table 1. The proposed

test is conservative, i.e., the simulated significance level is smaller than the set significance

14

Table 1. Simulated significance level (in %) of the proposed test for trun-

cation using 5% level of significance for varying sample sizes and percent of

largest order statistics used based on 1000 replicates. Pareto random sample

of size n were generated with α = 0.5 and C = 1.

Percent

Sample Size (n) 10% 30% 70% 100%

100 ** 0.3 1.8 2.0

500 1.3 3.8 4.1 4.9

1000 3.3 4.7 5.2 4.8

level. This means that for small sample sizes it will be more difficult to reject the Pareto

assumption in favor of the truncated Pareto. However, the simulated significance level gets

closer to 5% as n and r increase as expected from the asymptotic properties of the maximum

under the Pareto distribution. Simulation results are exactly the same for any values of α

and C, when using the same values of n and r and the same random number seed. To

understand why this is the case, note that we simulate Xi = (C/Ui)1/α where Ui are iid

uniform (0, 1) random variables, and hence X(i) = (C/U(i))1/α. The test rejects the null

hypothesis if

X(1) <

(nCH

− ln q

)1/αH

which is equivalent to

(X(1)

X(r+1)

)αH

=

(U(r+1)

U(1)

)αH/α

<r

− ln q

where

αH

α=

[r−1

r∑

i=1

{ln U(r+1) − ln U(i)}

]−1

so that everything is independent of α and C. On a related note, equation (4.2) is equivalent

to rejecting H0 if Xα(1)/C < n/(− ln q) and this expression is also independent of C and α,

because if X is Pareto with P (X > x) = Cx−α then Z = Xα/C is standard Pareto with

P (Z > z) = z−1.

15

Figure 1. Log-log plot of the largest absolute values of daily price changes

in Amazon, Inc. stock, with best fitting Pareto (dashed line) and truncated

Pareto (solid line) tail distribution.

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooooooooo ooooooo oooooooooooooooooo o o o ooo o ooo

ooo

o

o

o

o

1.5 2.0 2.5 3.0

-8-7

-6-5

-4-3

-2

1.5 2.0 2.5 3.0

-8-7

-6-5

-4-3

-2

ln(x)

ln(P

(X>

x))

1.5 2.0 2.5 3.0

-8-7

-6-5

-4-3

-2

5. Applications

Figure 1 shows the upper tail for a data set of absolute daily price changes in U.S. dollars

for Amazon, Inc. stock from Jan 1, 1998 to June 30, 2003 (n = 1378), plotted on a log-log

scale. The apparent downward curve is characteristic of a truncated Pareto distribution.

Hill’s estimator gives αH = 2.343 and C = 2.203, while the truncated Pareto estimator from

Theorem 3.1 gives αTP = 1.681, γ = .963, ν = 15.53 (estimated C = γα .= 0.936). Both

estimates are based on the upper r = 100 order statistics. The truncation test using formula

(4.3) rejects the null hypothesis of an untruncated Pareto (p = 0.007) which supports the

visual evidence of Figure 1 that the truncated Pareto model gives a better fit. One possible

explanation for the truncated Pareto in this application is that the stock exchange employs

16

Figure 2. Log-log plot of the empirical survival function for absolute differ-

ences of hydraulic conductivity at the MADE site, with best fitting Pareto

(dashed line) and truncated Pareto (solid line) tail distribution.

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooooo ooooooo oooooooooooooooo

oo

o

o

o

o

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5

-8-7

-6-5

-4-3

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5

-8-7

-6-5

-4-3

ln(x)

ln(P

(X>

x))

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5

-8-7

-6-5

-4-3

automatic mechanisms to slow trading in the event of extreme price changes due to automatic

trading by large mutual funds.

Figure 2 shows the upper tail of the survival function for absolute differences from a

series of 2618 measurements of hydraulic conductivity K cm/sec taken at 15 cm intervals in

vertical boreholes at the MAcroDispersion Experiment (MADE) site on the Columbus Air

Force Base in northeastern Mississippi (see Rehfeldt et al. [39]), along with the best fitting

Pareto and truncated Pareto models. The estimated parameters of the truncated Pareto

model based on the 100 upper order statistics are γ = .008, ν = .783, and αTP = 1.196.

The estimated parameters of the Pareto model are C = .0011 and αH = 1.598, so that

the estimated γ = C1/α .= .014. The p-value from (4.3) is .012, strong evidence in favor

of the truncated model. The α estimate from the truncated Pareto model is also in good

17

Table 2. Estimates of truncated Pareto parameters for hydraulic conductiv-

ity at the MADE site (n = 2618) for varying r values. The p-value for the

truncation test is also given.

r + 1 largest Estimates test for truncation

order statistics γ ν α Hill’s α Hill’s C p-value

50 0.0073 0.7828 1.1663 1.9 0.0008 0.04068

100 0.0079 0.7828 1.1958 1.5985 0.0011 0.012

200 0.0057 0.7828 1.0562 1.316 0.002 0.0009

300 0.0042 0.7828 0.9434 1.1496 0.0028 0.0001

agreement with the analysis in Benson et al. [9]. A possible explanation for truncation is

that the depositional process that formed the MADE aquifer created high velocity channels

that account for the largest K values. Over time these high-flow channels are occluded with

sedimentation, truncating the high K values. Another truncation effect comes from the K

measurement process, which averages values over a cylindrical region. The highest K values

occur in a relatively small sector of this cylinder, and are thus attenuated by combination

with the more prevalent lower K values. Table 2 shows how the parameter estimates and

the p-value for the test of truncation (4.2) vary with the number r of upper order statistics

used. As a general guide, the value of r should be chosen on the basis of a log-log plot similar

to Figure 2 so that r is as large as possible so long as the model fit is adequate.

Figure 3 shows the 100 largest observations of total daily precipitation in units of 0.1

mm at Tombstone, Arizona between July 1, 1893 to December 31, 2001 (from Groisman

et al. [20]), along with the best fitting Pareto and truncated Pareto models. Based on the

n = 5216 nonzero observations, the estimated parameters of the truncated Pareto model are

γ = 88.025, ν = 762, and αTP = 2.933. The estimated parameters of the Pareto model are

C = 7.59478 × 107 and αH = 3.813, so that the estimated γ = C1/α .= 117. The p-value

from (4.3) is 0.017, strong evidence in favor of the truncated model. The fact that the tail of

the data curves downward in Figure 3 is also diagnostic for a truncated Pareto model. Note

18

Figure 3. Log-log plot of the empirical survival function for the 100 largest

observations of positive daily total precipitation in Tombstone, AZ between

July 1, 1893 to December 31, 2001 with best fitting Pareto (dotted line),

Pareto with truncated Pareto parameters (dashed line), and truncated Pareto

(solid line) tail distribution.

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooooooooooo o oooo ooooo ooo

ooo

o

o

o

5.5 6.0 6.5 7.0

-9-8

-7-6

-5-4

-3

5.5 6.0 6.5 7.0

-9-8

-7-6

-5-4

-3

5.5 6.0 6.5 7.0

-9-8

-7-6

-5-4

-3

ln(x)

ln(P

(X>

x))

5.5 6.0 6.5 7.0

-9-8

-7-6

-5-4

-3

that since our parameter estimates are based on the nonzero observations, we are actually

modeling the conditional distribution of precipitation given that some precipitation occurs.

Another alternative model, in which the parameter estimates α = 2.933 and C = 88.0252.933

from the truncated Pareto are used in the Pareto, is appropriate if the investigator believes

that precipitation follows a power law that has been truncated by some natural or observa-

tional mechanism. For illustration purposes, we present estimates of the 0.999 quantile for

three models for daily precipitation. For the Pareto model the quantile is estimated to be

714 (71.4 mm or 2.81 in of precipitation), for the truncated Pareto model it is 655, and for

the Pareto model with truncated Pareto parameters we obtain 928. Since precipitation was

19

only recorded on 13% of the days in this period, the 0.999 percentile actually represents the

level of daily precipitation one could expect on any given day with probability (0.13 · 0.001)

or about once every 20 years. This example is typical in that, for extreme upper values, the

truncated Pareto gives the lowest estimate and the Pareto with truncated Pareto parameters

gives the highest.

Acknowledgment. The authors would like to thank Dr. Alexander Gershunov, Scripps

Institution of Oceanography, for his help with obtaining and understanding the precipitation

data in Figure 3, and Dr. Fred Molz of Clemson University for his help understanding the

depositional process and measurement procedures at the MADE site.

References

[1] Adler, R., R. Feldman, and M. Taqqu (1998) A Practical Guide to Heavy Tails. Birkhauser, Boston.

[2] Anderson, P. and M. Meerschaert (1998) Modeling river flows with heavy tails. Water Resour. Res. 34,

2271–2280.

[3] Anderson, P. and M. Meerschaert (1997) Periodic moving averages of random variables with regularly

varying tails. Ann. Statist. 25, 771–785.

[4] Arnold, B. C. (1983) Pareto Distributions. International Co-operative Publishing House, MD, USA.

[5] E. Barkai, R. Metzler and J. Klafter (2000) From continuous time random walks to the fractional

Fokker-Planck equation. Phys. Rev. E 61 132-138.

[6] Beg, M. A. (1981). Estimation of the tail probability of the truncated Pareto distribution. J. Inform.

Optim. Sci. 2, No. 2, 192–198.

[7] Beg, M. A. (1983). Unbiased estimators and tests for truncation and scale parameters. Amer. J. Math.

Management Sci. 3, No. 4, 251–274.

[8] D. Benson, S. Wheatcraft and M. Meerschaert, Application of a fractional advection-dispersion equation.

Water Resources Research 36 (2000), 1403–1412.

[9] D. Benson, R. Schumer, M. Meerschaert and S. Wheatcraft (2001) Fractional dispersion, Levy motions,

and the MADE tracer tests. Transport in Porous Media 42, 211–240.

[10] P. Bickel and K. Doksum (2001) Mathematical Statistics Basic Ideas and Selected Topics Vol. 1, Second

Edition. Prentice Hall, Upper Saddle River.

[11] Burroughs, S.M, and S.F. Tebbens (2001) Upper-truncated power law distributions. Fractals, 9, 209–

222.

[12] Burroughs, S.M, and S.F. Tebbens (2001) Upper-truncated power laws in natural systems. Journal of

Pure and Applied Geophysics, 158, 741–757.

20

[13] Burroughs, S.M, and S.F. Tebbens (2002) The upper-truncated power law applied to earthquake cumu-

lative frequency-magnitude distributions. Bulletin of the Seismological Society of America, 92, No. 8,

2983–2993.

[14] Cohen, A. C., and Whitten, B. J. (1988). Parameter Estimation in reliability and Life Span Models,

Marcel Dekker, New York.

[15] David, H (1981) Order Statistics, 2nd Ed., Wiley, New York.

[16] Embrechts, P., C. Kluppelberg, and T. Mikosch (1997) Modelling extremal events. For insurance and

finance. Applications of Mathematics 33, Springer-Verlag, Berlin.

[17] Fama, E. (1965) The behavior of stock market prices, J. Business, 38, 34–105.

[18] Feller, W. (1971) An Introduction to Probability Theory and Its Applications. Vol. II, 2nd Ed., Wiley,

New York.

[19] Fofack, H. and J. P. Nolan (1999) Tail Behavior, Modes and Other Characteristics of Stable Distribution.

Extremes, 2, no. 1, 39–58.

[20] Groisman, P. Ya., Knight, R. W., Karl, T. R., Easterling, D.R., Sun, B., Lawrimore, J.H. (2004)

Contemporary Changes of the Hydrological Cycle over the Contiguous United States: Trends Derived

from In Situ Observations. Journal of Hydrometeorology, 5, 1, 64–85.

[21] Hall, P (1982) On some simple estimates of an exponent of regular variation. J. Royal Statist. Soc. B

44, 37–42.

[22] Hill, B (1975) A simple general approach to inference about the tail of a distribution. Ann. Statist. 3,

no. 5, 1163–1173.

[23] Hosking, J. and J. Wallis (1987) Parameter and quantile estimation for the generalized Pareto distrib-

ution. Technometrics 29, 339–349.

[24] Jansen, D. and C. de Vries (1991) On the frequency of large stock market returns: Putting booms and

busts into perspective. Review of Econ. and Statist. 73, 18–24.

[25] Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994) Continuous univariate distributions, 2nd edition,

Wiley, New York.

[26] J. Klafter, A. Blumen and M.F. Shlesinger (1987) Stochastic pathways to anomalous diffusion. Phys.

Rev. A 35, 3081–3085.

[27] M. Kotulski (1995) Asymptotic distributions of the continuous time random walks: A probabilistic

approach. J. Statist. Phys. 81 777–792.

[28] Loretan, M. and P. Phillips (1994) Testing the covariance stationarity of heavy tailed time series. J.

Empirical Finance, 1, 211–248.

[29] Lu, S.L. and F.J. Molz (2001) How well are hydraulic conductivity variations approximated by additive

stable processes? Advances in Environmental Research 5, no. 1, 39–45.

21

[30] Malamud, B. D., Morein, G. and Turcotte, D. L. (1998) Forest fires: an example of self-organized critical

behavior, Science, 281, 1840–1842.

[31] Mandelbrot, B. (1963) The variation of certain speculative prices, J. Business, 36, 394–419.

[32] McCulloch, J. (1996) Financial applications of stable distributions. Statistical Methods in Finance:

Handbook of Statistics 14, G. Madfala and C. R. Rao, Eds., Elsevier, Amsterdam, 393–425.

[33] M.M. Meerschaert, D.A. Benson, H.P. Scheffler and P. Becker-Kern (2002) Governing equations and

solutions of anomalous random walk limits, Phys. Rev. E 66 No. 6, 102–105.

[34] Meerschaert, M. and H.P. Scheffler (2003) Portfolio modeling with heavy tailed random vectors, Hand-

book of Heavy-Tailed Distributions in Finance, 595–640, S. T. Rachev, Ed., Elsevier North-Holland,

New York.

[35] Metzler R. and J. Klafter (2000) The random walk’s guide to anomalous diffusion: A fractional dynamics

approach. Phys. Rep. 339, 1–77.

[36] Nikias, C. and M. Shao (1995) Signal Processing with Alpha Stable Distributions and Applications.

Wiley, New York.

[37] Pacheco, J., Scholtz, C.H. and Sykes, L.R. (1992) Changes in frequency-size relationship from small to

large earthquakes, Nature, 355, 71–73.

[38] Rachev, S. and S. Mittnik (2000) Stable Paretian Models in Finance, Wiley, Chichester.

[39] Rehfeldt, K.R., J. M. Boggs, and L. W. Gelhar (1992) Field study of dispersion in a heterogeneous

aquifer. 3: Geostatistical analysis of hydraulic conductivity, Water Resour. Res., 28 (12), 3309-3324.

[40] Resnick, S (1997) Heavy tail modeling and teletraffic data. Ann. Statist. 25, 1805–1869.

[41] Resnick, S. and C. Starica (1995) Consistency of Hill’s estimator for dependent data. J. Appl. Probab.

32, 139–167.

[42] Samorodnitsky, G. and M. Taqqu (1994) Stable non-Gaussian Random Processes. Chapman & Hall,

New York.

[43] Scholz, C.H. and Contreras, J. C. (1998) Mechanics of continental rift architecture, Geology, 26, 967–

970.

[44] R. Schumer, D. Benson, M. Meerschaert and S. Wheatcraft (2001) Eulerian derivation of the fractional

advection-dispersion equation. J. Contaminant Hydrology 38, 69–88.

[45] Uchaikin, V. and V. Zolotarev (1999) Chance and Stability: Stable Distributions and Their Applications.

VSP, Utrecht.

Inmaculada B. Aban, Department of Mathematics and Statistics, MS 084, University of

Nevada, Reno, NV 89557-0045, USA

E-mail address : [email protected]

22

Mark M. Meerschaert, Department of Mathematics and Statistics, MS 084, University

of Nevada, Reno, NV 89557-0045, USA


Anna K. Panorska, Department of Mathematics and Statistics, MS 084, University of

Nevada, Reno, NV 89557-0045, USA


parameter estimation for the truncated pareto distribution

Documents

truncated pareto distribution

tail estimators

tail behavior

uppertruncated pareto

robust tail estimation

theorem mle

x11 for0 x

estimatorof thetail