parameter estimation for the truncated pareto distribution
TRANSCRIPT
Parameter Estimation for the Truncated ParetoDistribution
Inmaculada B. Aban1, Mark M. Meerschaert1 , and Anna K. Panorska2
Department of Mathematics and Statistics, University of Nevada
May 15, 2004
Abstract. The Pareto distribution is a simple model for nonnegative data with a power
law probability tail. In many practical applications, there is a natural upper bound that
truncates the probability tail. This paper develops parameter estimates for the truncated
Pareto distribution, including a simple test to detect whether a given sample is truncated. We
illustrate these methods with applications from finance, hydrology and atmospheric science.
Keywords and phrases: Pareto distribution, truncation, maximum likelihood estimator,
order statistics, tail behavior.
1. INTRODUCTION
A random variable X has a Pareto distribution if P (X > x) = Cx−α for some α > 0 (see
Johnson et al. [25] or Arnold [4]). This distributional model is important in applications
because many data sets are observed to follow a power law probability tail, at least ap-
proximately, for large values of x. Stable distributions with index α and max-stable type II
extreme value distributions with index α are also asymptotically Pareto in their probability
tails, and this fact has been used frequently to develop tail estimators for those distributions.
In some applications, there is a natural upper bound on the probability tail that truncates
the Pareto law. In other cases, there is empirical evidence that a truncated Pareto gives
a better fit to the data. Since both sums (if α < 2) and extremes (for any α > 0) fall in
1Partially supported by NSF grant DMS-01399272Partially supported by NSF grants DMS-0139927 and ATM-0231781
1
2
a different domain of attraction depending on whether a data set follows a Pareto or trun-
cated Pareto distribution, it is also useful to test for possible truncation in a data set with
evidence of power law tails. In this paper, we develop parameter estimates for the truncated
Pareto distribution that are easy to compute, we prove their consistency (and in some cases,
asymptotic normality), and we develop a simple test to distinguish between truncated and
unbounded Pareto distributional models on the basis of data. These methods should be
useful to practitioners in areas of science and engineering where power law probability tails
are prevalent.
Heavy tail random variables are important in applications to finance [16, 17, 24, 28, 31, 32,
34, 38], physics [5, 26, 27, 33, 35], hydrology [2, 8, 9, 23, 29, 44], engineering [36, 40, 41, 45]
and many other fields [1, 18, 42]. Recently, Burroughs and Tebbens [12, 13] have surveyed
evidence of truncated power law distributions in data sets on earthquake magnitudes, forest
fire areas, fault lengths (on Earth and on Venus), and oil and gas field sizes. Earthquake
fault sizes are limited by the physical (or terrain) considerations, see Scholtz and Contreras
[43] or Pacheco et al. [37]. The size of forest fires may be naturally limited by the availability
of fuel and climate, see Malamud et al. [30]. Additional applications of truncated power
law distributions in finance, groundwater hydrology, and atmospheric science are given in
Section 5 of this paper.
Burroughs and Tebbens estimate parameters of the truncated Pareto distribution by least-
squares fitting on a probability plot [11] and by minimizing mean squared error fit on a
plot of the tail distribution function [13]. Minimum variance unbiased estimators for the
parameters of a truncated Pareto law were developed in Beg [6, 7]. Beg [6] also provides
maximum likelihood estimates of the lower truncation parameter, scale and probability of
exceedance for a truncated Pareto distribution. The maximum likelihood estimator of α,
when the lower truncation limit is known was presented in [14], with some recommendations
for the case when the lower truncation limit was not known. This paper develops maximum
likelihood estimators (MLE) for all parameters of a truncated Pareto distribution. Existence
and uniqueness of the MLE are proven under certain conditions that are shown to hold with
probability approaching one as the sample size increases. A simple formula for the MLE is
3
obtained that can be easily computed. Asymptotic normality is established for the estimator
of the tail parameter α, and a simple test is proposed to distinguish between Pareto and
upper-truncated Pareto models. We also consider the case where only the upper tail of the
data fits a truncated Pareto distribution, and we compute the conditional MLE based on
the upper order statistics. These results can be used for robust tail estimation for truncated
models with power law tails (e.g., stable data). Examples from hydrology, climatology and
finance illustrate the utility and practical application of these results.
2. MAXIMUM LIKELIHOOD ESTIMATION FOR THE ENTIRE SAMPLE
A random variable W has a Pareto distribution function if
(2.1) P (W > w) = γα w−α for w ≥ γ > 0, and α > 0.
An upper-truncated Pareto random variable, X, has distribution
(2.2) FX(x) = P (X > x) =γα(x−α − ν−α)
1 − (γ/ν)α
and density
(2.3) fX(x) =αγαx−α−1
1 − (γ/ν)α
for 0 < γ ≤ x ≤ ν < ∞ where γ < ν. Consider a random sample X1, X2, . . . , Xn from this
upper-truncated distribution and let X(1) ≥ X(2) ≥ . . . ≥ X(n) denote their order statistics.
We now derive the maximum likelihood estimator (MLE) for the unknown parameters. We
start by considering the simplest case, where γ and ν are both known.
Theorem 2.1. The MLE α of α for an upper-truncated Pareto defined in (2.2) where γ and
ν are known solves the equation
(2.4)n
α+
n (γ/ν)α ln(γ/ν)
1 − (γ/ν)α−
n∑
i=1
[ln X(i) − ln γ] = 0
Furthermore, α has an asymptotic normal distribution with asymptotic mean α and asymp-
totic variance {1
α2−(
γν
)α [ln(
γν
)]2[1 −
(γν
)α]2
}−1
4
Proof. The likelihood function under this setting is given by
(2.5) L(γ, ν, α) =αn γnα
[1 − (γ/ν)α]n
n∏
i=1
X−α−1i
n∏
i=1
I{0 < γ ≤ Xi ≤ ν < ∞},
where γ < ν. The natural logarithm of this likelihood is
(2.6) −n ln[1 − (γ/ν)α] + n ln α + n α ln γ − (α + 1)n∑
i=1
ln X(i).
for 0 < γ ≤ Xi ≤ ν < ∞, i = 1, . . . , n. Taking the derivative with respect to α, we get
n
α+
n (γ/ν)α ln(γ/ν)
1 − (γ/ν)α−
n∑
i=1
[ln X(i) − ln γ].
Setting the above equation to zero and solving, we get the MLE for α.
To obtain the asymptotic distribution of the MLE α, note that the density in (2.3) may
be written as
fX(x) = exp{B(α)T (x) − A(α)}H(x)
where A(α) = − ln(αγα) + ln[1 − (γ/ν)α], B(α) = α, T (x) = − ln x, and H(x) =
x−1I{0 < γ ≤ x ≤ ν < ∞}. Hence, when γ and ν are fixed and known, this upper-truncated
Pareto model is a one-parameter exponential family of distributions. Applying the results
on the asymptotic properties of the MLE for a one-parameter exponential family (see for
instance Theorem 5.3.5 in [10]), α has an asymptotic normal distribution with an asymptotic
mean of α and an asymptotic variance given by the inverse of the Fisher information. In
this case, the Fisher information is found to be
∂2A(α)
∂α2=
1
α2−(
γν
)α [ln(
γν
)]2[1 −
(γν
)α]2
�
Remark 2.2. The MLE for α presented in equation (2.4) appeared first in [14] for the case,
when the lower truncation limit γ was known. The authors also had recommendations on
the computation of the estimate when γ was unknown.
Next we obtain the MLEs for the parameters of an upper-truncated Pareto where α, γ,
and ν are all unknown.
5
Theorem 2.3. Consider a random sample X1, X2, . . . , Xn from an upper-truncated Pareto
defined in (2.2) where α, γ and ν are unknown. The maximum likelihood estimators for the
parameters in this model are given by
γ = X(n) = min(X1, X2, . . . , Xn), ν = X(1) = max(X1, X2, . . . , Xn)
and α solves the equation
(2.7)n
α+
n [X(n)/X(1)]α ln[X(n)/X(1)]
1 − [X(n)/X(1)]α−
n∑
i=1
[ln X(i) − ln X(n)] = 0
Proof. The likelihood function for this model is given by
L(γ, ν, α) =αn γnα
[1 − (γ/ν)α]n
n∏
i=1
X−α−1i
n∏
i=1
I{γ ≤ Xi ≤ ν}
Note that∏n
i=1 I{γ ≤ Xi ≤ ν} = 1 if and only if X(n) ≥ γ and X(1) ≤ ν. Since the
likelihood function L is an increasing function of γ for γ ≤ X(n) and a decreasing function of
ν for ν ≥ X(1), then for a fixed α, the likelihood is maximized when γ = X(n) and ν = X(1).
Replacing γ and ν by X(n) and X(1), respectively, and taking the natural log of the likelihood,
we get
−n ln[1 − (X(n)/X(1))α] + n ln α + n α ln X(1) − (α + 1)
n∑
i=1
ln X(i).
Taking the partial derivative with respect to α and setting it equal to zero, the MLE for α
is α which solves the equation (2.7). �
When γ and ν are unknown, the upper-truncated Pareto distribution has support that
depends on these unknown parameters, and hence, it is not a member of a multiparameter
exponential family of distributions. Furthermore, it does not satisfy the regularity conditions
necessary to directly apply the asymptotic properties of MLEs to derive the asymptotic
distribution of α. We will instead use a Taylor series expansion to obtain the asymptotic
properties of α. First we need to show that γ and ν are consistent estimators for γ and ν,
respectively.
Lemma 2.4. Under the conditions of Theorem 2.3, γ and ν are consistent estimators of γ
and ν.
6
Proof. The cdf of the minimum, X(n), of a truncated Pareto random sample is
FX(n)(x) = P{X(n) ≤ x} = 1 −
[(γx
)α −(
γν
)α
1 −(
γν
)α]n
For any small ε > 0, P{|X(n) − γ| > ε} = P{X(n) > γ + ε}, and consequently
P{|X(n) − γ| > ε} =
(γ
γ+ε
)α
−(
γν
)α
1 −(
γν
)α
n
→ 0
as n → ∞. Thus, γ = X(n) converges in probability to γ, and hence γ is a consistent
estimator of γ.
Similarly, we can show that ν is a consistent estimator of ν. The cdf of the maximum,
X(1), of a truncated Pareto random sample is
FX(1)(x) = P{X(1) ≤ x} =
[1 −
(γx
)α
1 −(
γν
)α]n
and hence for any small ε > 0,
P{|X(1) − ν| > ε} = P{X(1) ≤ ν − ε} =
[1 −
(γ
ν−ε
)α
1 −(
γν
)α]n
→ 0
as n → ∞. Hence, ν = X(1) converges in probability to ν. �
Now we derive the asymptotic properties of α.
Theorem 2.5. Under the conditions of Theorem 2.3, the MLE α of α has an asymptotic
normal distribution with asymptotic mean α.
Proof. For a given γ and ν, suppose that h ≡ h(γ, ν) solves the equation
(2.8) G =1
h+
(γ/ν)h ln(γ/ν)
1 − (γ/ν)h− 1
n
n∑
i=1
(ln X(i) − ln γ
)= 0
for a fixed γ and ν. Then from Theorem 2.1, α = h(γ0, ν0) where γ0 and ν0 are the true values
of γ and ν, respectively, and by Theorem 2.3, α = h(γ, ν) where γ = X(n) and ν = X(1). By
a Taylor series expansion about γ0 and ν0,
(2.9) α = h(γ, ν) ≈ h(γ0, ν0) + hγ(γ0, ν0)(γ − γ0) + hν(γ0, ν0)(ν − ν0)
7
where hγ(γ0, ν0) and hν(γ0, ν0) are the partial derivatives of h with respect to γ and ν,
respectively, evaluated at γ0 and ν0. By Lemma 2.4, the last two terms on the right side
goes to zero in probability as n → ∞ provided that hγ(γ0, ν0) and hν(γ0, ν0) are bounded.
Applying Slutsky’s Theorem (see for instance [10]), it then follows that α has asymptotic
mean α and converges in distribution to an asymptotic normal distribution.
We finish the proof by showing that the partial derivatives hγ(γ0, ν0) and hν(γ0, ν0) are
bounded. If (2.8) holds then
d G
d γ=
∂ G
∂γ+
∂G
∂h
∂h
∂γ= 0 implying hγ(γ0, ν0) =
∂h
∂γ=
−∂ G/∂γ
∂G/∂h
and similarly, hν = (−∂ G/∂ν)/(∂G/∂h). Since neither ∂ G/∂γ nor ∂ G/∂ν have any singu-
larities for 0 < γ < ν and h > 0, both hγ(γ0, ν0) and hν(γ0, ν0) are bounded if we can show
that ∂G/∂h 6= 0. Let b = γ/ν. In fact we have
(2.10)∂G
∂h=
−(1 − bh)2 + h2bh(ln b)2
h2(1 − bh)2< 0 for every 0 < b < 1 and h > 0
To see this, first note that
(2.11) ex − e−x > 2x for all x > 0.
Now (2.10) is equivalent to
h2bh(ln b)2 < (1 − bh)2
−hbh/2(ln b) < 1 − bh
−h ln b < b−h/2 − bh/2
(2.12)
and the last line is true for all h > 0 using (2.11) where x = (−h/2) ln b. �
Remark 2.6. The asymptotic variance of α is not the same as the asymptotic variance of α
since adjustments need to be made on the asymptotic variance of α because γ and ν were
estimated. Furthermore, the standard theory of obtaining asymptotic variance using the
Fisher information matrix is no longer applicable.
The next result addresses the issue of existence and uniqueness of the MLE for α, obtained
by solving (2.4) in the case where γ and ν are known, or (2.7) in the case where all three
parameters are estimated.
8
Theorem 2.7. The probability that a solution to the MLE equation (2.4) or (2.7) exists
tends to one as n → ∞, and if it exists, the solution is unique.
Proof. First we consider the case where γ and ν are known. In this case the MLE α = h
where h > 0 solves the equation
(2.13) G(h) =1
h+
bh ln b
1 − bh− 1
n
n∑
i=1
(ln X(i) − ln γ
)= 0
where b = γ/ν ∈ (0, 1). Write M = (1/n)∑n
i=1
(ln X(i) − ln γ
)and compute
(2.14) limh→0
G(h) =− ln b
2− M and lim
h→∞G1(h) = 0 − M.
In view of (2.10) we have that ∂G/∂h < 0 for all h > 0 so that G is monotone. Hence,
the solution to G(h) = 0 is unique if it exists, and it exists if and only if M < (− ln b)/2.
As n → ∞, we have M → E{ln(X/γ)} in probability by the law of large numbers, where
W = ln(X/γ) has a truncated exponential distribution with density
g(w) =α exp{−α w}
1 − exp{−α(− ln b)}supported on 0 < w < − ln b. Since the density of W is monotone decreasing, a simple
geometrical argument shows that the mean of W must lie in the left half of the interval
[0,− ln b], and hence EW < −(1/2) ln b. Then P{M < (− ln b)/2} → 1 as n → ∞, so that
the MLE α exists with probability approaching one as n → ∞.
In the case where all three parameters are estimated, the MLE α = h solves
(2.15) G(h) =1
h+
bh ln b
1 − bh− 1
n
n∑
i=1
(ln X(i) − ln X(n)
)= 0
where b = X(n)/X(1) ∈ (0, 1) with probability one. Write M ′ = (1/n)∑n
i=1
(ln X(i) − ln X(n)
)
and compute
(2.16) limh→0
G(h) =− ln b
2− M ′ and lim
h→∞G1(h) = 0 − M ′
as before. Now (2.10) shows that G is monotone, and hence, the solution α = h to G(h) = 0
is unique if it exists, and it exists if and only if M ′ = M+ln(γ/X(n)) < (− ln b)/2. Lemma 2.4
shows that X(n) → γ in probability, and then the continuous mapping theorem implies that
ln(γ/X(n)) → 0 in probability. Since M → E{ln(X/γ)} in probability another application
9
of the continuous mapping theorem shows that M ′ → E{ln(X/γ)} in probability as well,
and the same argument as before shows that the MLE α exists with probability approaching
one as n → ∞. �
3. MAXIMUM LIKELIHOOD ESTIMATION FOR THE TAIL
In many practical applications, a truncated Pareto may be fit to the upper tail of the data,
if for sufficiently large x > 0,
P (X > x) ≈ γα(x−α − ν−α)
1 − (γ/ν)α, 0 < γ ≤ x ≤ ν.
In this case, we estimate the parameters by obtaining the conditional maximum likelihood
estimate based on the r +1 (0 ≤ r < n) largest order statistics representing only the portion
of the tail where the truncated Pareto approximation holds. This extends the well-known
Hill’s estimator [21, 22] to the case of a truncated Pareto distribution.
Theorem 3.1. When X(r) > X(r+1) the conditional maximum likelihood estimator for the
parameters of the upper-truncated Pareto in (2.2) based on the r + 1 largest order statistics
is given by
ν = X(1)
γ = r1/α(X(r+1))[n − (n − r)
(X(r+1)/X(1)
)α]−1/α
and α solves the equation
(3.1) 0 =r
α+
r(X(r+1)/X(1)
)αln(X(r+1)/X(1)
)
1 −(X(r+1)/X(1)
)α −r∑
i=1
[ln X(i) − ln X(r+1)]
Proof. The proof is similar to that of the Hill’s estimator ([22]). It is convenient to transform
the data, taking Z(i) = (X(n−i+1))−1 so that
G(z) = P (Z ≤ z) =γα(zα − ν−α)
1 − (γ/ν)α
and Z(1) ≥ · · · ≥ Z(n). Since U(i) = G(Z(i)) are (decreasing) order statistics from a uni-
form distribution, E(i) = − ln U(i) are (increasing) order statistics from a unit exponential.
Following [15] pp. 20–21, we let Yi = (n − i + 1)(E(i) − E(i−1)), and hence, one can easily
check {Yi, i = 1, . . . , n} are independent and identically distributed unit exponential. Define
10
Y ∗ = nE(n−r+1) = n(Y1/n + · · · + Yn−r+1/r) so that Yn, . . . , Yn−r+2, Y∗ are mutually inde-
pendent with joint density exp(−y(n) −· · ·− y(n−r+2))p(y∗), where p(y∗) is the density of Y ∗.
Since U(n−r+1) has density K1ur−1(1 − u)n−r it follows that Y ∗ = −n ln U(n−r+1) has density
p(y) = K2 exp(−y/n)r(1 − exp(y/n))n−r where {Kj, j = 1, 2} are positive constants. Use
the fact that
Y(i) = (n− i + 1)[− lnG(Z(i)) + ln G(Z(i−1))] = (n− i + 1)[ln(Zα(i−1) − ν−α)− ln(Zα
(i) − ν−α)],
for i = n − r + 2, . . . , n, and
exp(−Y ∗/n) = G(Z(n−r+1)) =γα(Zα
(n−r+1) − ν−α)
1 − (γ/ν)α,
to obtain the likelihood conditional on the values of the r + 1 smallest order statistics,
Z(n) = z(n), . . . , Z(n−r+1) = z(n−r+1), as
K
[ r∏
i=1
αzα−1(n−i+1)
zα(n−i+1) − ν−α
]exp
[−
r−1∑
i=1
i[ln(zα(n−i) − ν−α) − ln(zα
(n−i+1) − ν−α)]
]
×[γα(zα
(n−r+1) − ν−α)
1 − (γ/ν)α
]r[1 −
γα(zα(n−r+1) − ν−α)
1 − (γ/ν)α
]n−r r∏
i=1
I{1/ν ≤ z(n−i+1) ≤ 1/γ},
where K > 0 does not depend on the data or the parameters and the first product term is
the Jacobian for the transformations defined by Yn, . . . , Yn−r+2, Y∗. Note that
r−1∑
i=1
i[ln(zα(n−i) − ν−α) − ln(zα
(n−i+1) − ν−α) = r ln(zα(n−r+1) − ν−α) −
r∑
i=1
ln(zα(n−i+1) − ν−α)
Next, condition on Z(n−r+1) ≤ d < Z(n−r), which multiplies the conditional likelihood by
a factor of [1 − γα(dα − ν−α)
1 − (γ/ν)α
]n−r[1 −
γα(zα(n−r+1) − ν−α)
1 − (γ/ν)α
]−(n−r)
.
Then the conditional likelihood function simplifies to
K αr γrα[1 − (γd)α]n−r
[1 − (γ/ν)α]n
[ r∏
i=1
zα−1(n−i+1)
] r∏
i=1
I{1/ν ≤ z(n−i+1) ≤ 1/γ}.
Substitute β = (γd)α to obtain
K αr βrd−rα(1 − β)n−r
[1 − β(d ν)−α]n
[ r∏
i=1
zα−1(n−i+1)
] r∏
i=1
I{1/ν ≤ z(n−i+1) ≤ 1/γ}.
11
In terms of the original data, this conditional likelihood becomes
K αr βrDrα(1 − β)n−r
[1 − β(D/ν)α]n
[ r∏
i=1
x−α+1(i)
][ r∏
i=1
x−2(i)
] r∏
i=1
I{γ ≤ x(i) ≤ ν}.
where d = D−1 and the product term
[∏ri=1 x−2
(i)
]is the Jacobian associated with the trans-
formations X(i) = (Z(n−i+1))−1, for i = 1, . . . , r. Note that
∏ri=1 I{γ ≤ x(i) ≤ ν} if and only
if x(1) ≤ ν and x(r) ≥ γ. Consequently, whenever x(1) ≤ ν and x(r) ≥ γ, the conditional
log-likelihood is
ln L = K0+r ln α+r ln β+rα ln D+(n−r) ln(1−β)−n ln(1−β(D/ν)α)−(α+1)r∑
i=1
ln(x(i)),
and we seek the global maximum over the parameter space consisting of all (α, β) for which
α > 0 and 0 < β < 1. Since this conditional log-likelihood is a decreasing function of ν for
ν ≥ X(1), the conditional MLE for ν is ν = X(1). Substituting the ν for ν and taking partial
derivatives of the conditional log-likelihood with respect to α and β, we get
∂ ln L
∂α=
r
α+
nβ(D/X(1))α ln(D/X(1))
1 − β(D/X(1))α−
r∑
i=1
[ln X(i) − ln D]
∂ ln L
∂β=
r
β− n − r
1 − β+
n(D/X(1))α
1 − β(D/X(1))α
(3.2)
We obtain the MLE for α and β by setting these partial derivatives to zero and solving. The
solution to (3.2) depends on the value of D, which will be unknown in practice. Since we
assume X(r) ≥ D > X(r+1) we could estimate D by X(r) or X(r+1) or some point in between.
We use the estimate D = X(r+1) to be consistent with standard usage for Hill’s estimator
[3, 19, 21, 24, 28, 40, 41], MLE for α in a Pareto. Setting the equations in (3.2) to zero and
solving, we find that
β = r[n − (n − r)
(X(r+1)/X(1)
)α]−1
and α solves the equation
0 =r
α+
r(X(r+1)/X(1)
)αln(X(r+1)/X(1)
)
1 −(X(r+1)/X(1)
)α −r∑
i=1
[ln X(i) − ln X(r+1)].
We complete the proof of the theorem by using γ = Dβ1/α. �
12
Remark 3.2. The conditional maximum likelihood estimator ˆαTP of α given in Theorem 3.1,
is smaller than the Hill’s estimator αH of α. This is easy to see after noticing that the second
term on the right-hand side of equation (3.1) is always negative.
Remark 3.3. In some applications, it may be natural to employ a truncated and possibly
shifted exponential distribution. All of the results in this section and the previous section
also apply to that case, since Y = lnX has a truncated shifted exponential distribution with
P (Y > y) =e−α(y−ln γ) − (γ/ν)α
1 − (γ/ν)αfor ln γ ≤ y ≤ ln ν
if and only if X has a truncated Pareto distribution. If γ = 1 then Y has a truncated
exponential distribution with no shift.
4. Test for Truncation
In this section, we present a simple test to distinguish whether a given data set is more
appropriately modeled by a Pareto or truncated Pareto distribution. The truncated Pareto
distribution is given by (2.2) with 0 < γ < ν < ∞, which reduces to the Pareto distribution
(2.1) in the limit ν = ∞. An asymptotic q-level test (0 < q < 1) for H0 : ν = ∞ (Pareto)
versus Ha : ν < ∞ (truncated Pareto) can be constructed using the following simple result
from extreme value theory.
Lemma 4.1. Suppose that X1, X2, . . . , Xn are independent and identically distributed accord-
ing to a Pareto distribution, and let X(1) ≥ X(2) ≥ . . . ≥ X(n) denote their order statistics.
Then n−1/α X(1)d→ Y as n → ∞ where P (Y ≤ t) = exp{−Ct−α}, and C = γα in (2.1).
Proof. Write
P{n−1/αX(1) ≤ x} = P{X(1) ≤ n1/αx} =
(1 − Cx−α
n
)n
→ exp(−Cx−α
)
which proves the lemma. �
13
The random variable Y has a max-stable or Type II extreme value distribution. The
lemma implies that for large n, the maximum X(1) ≈ n1/α Y and then
P (X(1) ≤ x) ≈ P (n1/αY ≤ x) = P (Y ≤ n−1/α x)
= exp{−C(n−1/α x)−α}
= exp{−n C x−α}.
(4.1)
Then an asymptotic level q test (0 < q < 1) rejects the null hypothesis H0 : ν = ∞
(Pareto) in favor of the alternative Ha : ν < ∞ (truncated Pareto) if X(1) < xq where
q = exp{−n C x−αq }. This test rejects H0 if and only if
(4.2) X(1) <
(nC
− ln q
)1/α
and the p-value of this test is given by
(4.3) p = exp{−n C X−α(1) }.
In practice we use Hill’s estimator [22]
αH =
[r−1
r∑
i=1
{ln X(i) − ln X(r+1)}
]−1
and C = (r/n)(X(r+1))αH
to estimate the parameters C and α. Note that a small p-value in (4.3) indicates that the
Pareto model is not a good fit, and this is not enough in itself to indicate goodness of fit
of the truncated Pareto distribution. Hence the test should always be supplemented by a
graphical check of the data tail, as illustrated in the next section.
We performed Monte Carlo simulations to investigate the small sample property of the
proposed test for truncation. We generated a random sample from a Pareto distribution with
α = 0.5 and C = 1 for varying n ∈ (100, 500, 1000). The values of r were also varied based
on the percentage of the total sample size n. In particular, we considered the largest 10%,
30% and 100% order statistics. We performed the proposed test at 5% level of significance
for each combination of n and r. We repeated this process 1000 times for each case. The
simulated significance level was the proportion of times out of 1000 that the null hypothesis
was rejected. The results of these simulations are summarized in Table 1. The proposed
test is conservative, i.e., the simulated significance level is smaller than the set significance
14
Table 1. Simulated significance level (in %) of the proposed test for trun-
cation using 5% level of significance for varying sample sizes and percent of
largest order statistics used based on 1000 replicates. Pareto random sample
of size n were generated with α = 0.5 and C = 1.
Percent
Sample Size (n) 10% 30% 70% 100%
100 ** 0.3 1.8 2.0
500 1.3 3.8 4.1 4.9
1000 3.3 4.7 5.2 4.8
level. This means that for small sample sizes it will be more difficult to reject the Pareto
assumption in favor of the truncated Pareto. However, the simulated significance level gets
closer to 5% as n and r increase as expected from the asymptotic properties of the maximum
under the Pareto distribution. Simulation results are exactly the same for any values of α
and C, when using the same values of n and r and the same random number seed. To
understand why this is the case, note that we simulate Xi = (C/Ui)1/α where Ui are iid
uniform (0, 1) random variables, and hence X(i) = (C/U(i))1/α. The test rejects the null
hypothesis if
X(1) <
(nCH
− ln q
)1/αH
which is equivalent to
(X(1)
X(r+1)
)αH
=
(U(r+1)
U(1)
)αH/α
<r
− ln q
where
αH
α=
[r−1
r∑
i=1
{ln U(r+1) − ln U(i)}
]−1
so that everything is independent of α and C. On a related note, equation (4.2) is equivalent
to rejecting H0 if Xα(1)/C < n/(− ln q) and this expression is also independent of C and α,
because if X is Pareto with P (X > x) = Cx−α then Z = Xα/C is standard Pareto with
P (Z > z) = z−1.
15
Figure 1. Log-log plot of the largest absolute values of daily price changes
in Amazon, Inc. stock, with best fitting Pareto (dashed line) and truncated
Pareto (solid line) tail distribution.
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooooooooo ooooooo oooooooooooooooooo o o o ooo o ooo
ooo
o
o
o
o
1.5 2.0 2.5 3.0
-8-7
-6-5
-4-3
-2
1.5 2.0 2.5 3.0
-8-7
-6-5
-4-3
-2
ln(x)
ln(P
(X>
x))
1.5 2.0 2.5 3.0
-8-7
-6-5
-4-3
-2
5. Applications
Figure 1 shows the upper tail for a data set of absolute daily price changes in U.S. dollars
for Amazon, Inc. stock from Jan 1, 1998 to June 30, 2003 (n = 1378), plotted on a log-log
scale. The apparent downward curve is characteristic of a truncated Pareto distribution.
Hill’s estimator gives αH = 2.343 and C = 2.203, while the truncated Pareto estimator from
Theorem 3.1 gives αTP = 1.681, γ = .963, ν = 15.53 (estimated C = γα .= 0.936). Both
estimates are based on the upper r = 100 order statistics. The truncation test using formula
(4.3) rejects the null hypothesis of an untruncated Pareto (p = 0.007) which supports the
visual evidence of Figure 1 that the truncated Pareto model gives a better fit. One possible
explanation for the truncated Pareto in this application is that the stock exchange employs
16
Figure 2. Log-log plot of the empirical survival function for absolute differ-
ences of hydraulic conductivity at the MADE site, with best fitting Pareto
(dashed line) and truncated Pareto (solid line) tail distribution.
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooooo ooooooo oooooooooooooooo
oo
o
o
o
o
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5
-8-7
-6-5
-4-3
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5
-8-7
-6-5
-4-3
ln(x)
ln(P
(X>
x))
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5
-8-7
-6-5
-4-3
automatic mechanisms to slow trading in the event of extreme price changes due to automatic
trading by large mutual funds.
Figure 2 shows the upper tail of the survival function for absolute differences from a
series of 2618 measurements of hydraulic conductivity K cm/sec taken at 15 cm intervals in
vertical boreholes at the MAcroDispersion Experiment (MADE) site on the Columbus Air
Force Base in northeastern Mississippi (see Rehfeldt et al. [39]), along with the best fitting
Pareto and truncated Pareto models. The estimated parameters of the truncated Pareto
model based on the 100 upper order statistics are γ = .008, ν = .783, and αTP = 1.196.
The estimated parameters of the Pareto model are C = .0011 and αH = 1.598, so that
the estimated γ = C1/α .= .014. The p-value from (4.3) is .012, strong evidence in favor
of the truncated model. The α estimate from the truncated Pareto model is also in good
17
Table 2. Estimates of truncated Pareto parameters for hydraulic conductiv-
ity at the MADE site (n = 2618) for varying r values. The p-value for the
truncation test is also given.
r + 1 largest Estimates test for truncation
order statistics γ ν α Hill’s α Hill’s C p-value
50 0.0073 0.7828 1.1663 1.9 0.0008 0.04068
100 0.0079 0.7828 1.1958 1.5985 0.0011 0.012
200 0.0057 0.7828 1.0562 1.316 0.002 0.0009
300 0.0042 0.7828 0.9434 1.1496 0.0028 0.0001
agreement with the analysis in Benson et al. [9]. A possible explanation for truncation is
that the depositional process that formed the MADE aquifer created high velocity channels
that account for the largest K values. Over time these high-flow channels are occluded with
sedimentation, truncating the high K values. Another truncation effect comes from the K
measurement process, which averages values over a cylindrical region. The highest K values
occur in a relatively small sector of this cylinder, and are thus attenuated by combination
with the more prevalent lower K values. Table 2 shows how the parameter estimates and
the p-value for the test of truncation (4.2) vary with the number r of upper order statistics
used. As a general guide, the value of r should be chosen on the basis of a log-log plot similar
to Figure 2 so that r is as large as possible so long as the model fit is adequate.
Figure 3 shows the 100 largest observations of total daily precipitation in units of 0.1
mm at Tombstone, Arizona between July 1, 1893 to December 31, 2001 (from Groisman
et al. [20]), along with the best fitting Pareto and truncated Pareto models. Based on the
n = 5216 nonzero observations, the estimated parameters of the truncated Pareto model are
γ = 88.025, ν = 762, and αTP = 2.933. The estimated parameters of the Pareto model are
C = 7.59478 × 107 and αH = 3.813, so that the estimated γ = C1/α .= 117. The p-value
from (4.3) is 0.017, strong evidence in favor of the truncated model. The fact that the tail of
the data curves downward in Figure 3 is also diagnostic for a truncated Pareto model. Note
18
Figure 3. Log-log plot of the empirical survival function for the 100 largest
observations of positive daily total precipitation in Tombstone, AZ between
July 1, 1893 to December 31, 2001 with best fitting Pareto (dotted line),
Pareto with truncated Pareto parameters (dashed line), and truncated Pareto
(solid line) tail distribution.
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooooooooooo o oooo ooooo ooo
ooo
o
o
o
5.5 6.0 6.5 7.0
-9-8
-7-6
-5-4
-3
5.5 6.0 6.5 7.0
-9-8
-7-6
-5-4
-3
5.5 6.0 6.5 7.0
-9-8
-7-6
-5-4
-3
ln(x)
ln(P
(X>
x))
5.5 6.0 6.5 7.0
-9-8
-7-6
-5-4
-3
that since our parameter estimates are based on the nonzero observations, we are actually
modeling the conditional distribution of precipitation given that some precipitation occurs.
Another alternative model, in which the parameter estimates α = 2.933 and C = 88.0252.933
from the truncated Pareto are used in the Pareto, is appropriate if the investigator believes
that precipitation follows a power law that has been truncated by some natural or observa-
tional mechanism. For illustration purposes, we present estimates of the 0.999 quantile for
three models for daily precipitation. For the Pareto model the quantile is estimated to be
714 (71.4 mm or 2.81 in of precipitation), for the truncated Pareto model it is 655, and for
the Pareto model with truncated Pareto parameters we obtain 928. Since precipitation was
19
only recorded on 13% of the days in this period, the 0.999 percentile actually represents the
level of daily precipitation one could expect on any given day with probability (0.13 · 0.001)
or about once every 20 years. This example is typical in that, for extreme upper values, the
truncated Pareto gives the lowest estimate and the Pareto with truncated Pareto parameters
gives the highest.
Acknowledgment. The authors would like to thank Dr. Alexander Gershunov, Scripps
Institution of Oceanography, for his help with obtaining and understanding the precipitation
data in Figure 3, and Dr. Fred Molz of Clemson University for his help understanding the
depositional process and measurement procedures at the MADE site.
References
[1] Adler, R., R. Feldman, and M. Taqqu (1998) A Practical Guide to Heavy Tails. Birkhauser, Boston.
[2] Anderson, P. and M. Meerschaert (1998) Modeling river flows with heavy tails. Water Resour. Res. 34,
2271–2280.
[3] Anderson, P. and M. Meerschaert (1997) Periodic moving averages of random variables with regularly
varying tails. Ann. Statist. 25, 771–785.
[4] Arnold, B. C. (1983) Pareto Distributions. International Co-operative Publishing House, MD, USA.
[5] E. Barkai, R. Metzler and J. Klafter (2000) From continuous time random walks to the fractional
Fokker-Planck equation. Phys. Rev. E 61 132-138.
[6] Beg, M. A. (1981). Estimation of the tail probability of the truncated Pareto distribution. J. Inform.
Optim. Sci. 2, No. 2, 192–198.
[7] Beg, M. A. (1983). Unbiased estimators and tests for truncation and scale parameters. Amer. J. Math.
Management Sci. 3, No. 4, 251–274.
[8] D. Benson, S. Wheatcraft and M. Meerschaert, Application of a fractional advection-dispersion equation.
Water Resources Research 36 (2000), 1403–1412.
[9] D. Benson, R. Schumer, M. Meerschaert and S. Wheatcraft (2001) Fractional dispersion, Levy motions,
and the MADE tracer tests. Transport in Porous Media 42, 211–240.
[10] P. Bickel and K. Doksum (2001) Mathematical Statistics Basic Ideas and Selected Topics Vol. 1, Second
Edition. Prentice Hall, Upper Saddle River.
[11] Burroughs, S.M, and S.F. Tebbens (2001) Upper-truncated power law distributions. Fractals, 9, 209–
222.
[12] Burroughs, S.M, and S.F. Tebbens (2001) Upper-truncated power laws in natural systems. Journal of
Pure and Applied Geophysics, 158, 741–757.
20
[13] Burroughs, S.M, and S.F. Tebbens (2002) The upper-truncated power law applied to earthquake cumu-
lative frequency-magnitude distributions. Bulletin of the Seismological Society of America, 92, No. 8,
2983–2993.
[14] Cohen, A. C., and Whitten, B. J. (1988). Parameter Estimation in reliability and Life Span Models,
Marcel Dekker, New York.
[15] David, H (1981) Order Statistics, 2nd Ed., Wiley, New York.
[16] Embrechts, P., C. Kluppelberg, and T. Mikosch (1997) Modelling extremal events. For insurance and
finance. Applications of Mathematics 33, Springer-Verlag, Berlin.
[17] Fama, E. (1965) The behavior of stock market prices, J. Business, 38, 34–105.
[18] Feller, W. (1971) An Introduction to Probability Theory and Its Applications. Vol. II, 2nd Ed., Wiley,
New York.
[19] Fofack, H. and J. P. Nolan (1999) Tail Behavior, Modes and Other Characteristics of Stable Distribution.
Extremes, 2, no. 1, 39–58.
[20] Groisman, P. Ya., Knight, R. W., Karl, T. R., Easterling, D.R., Sun, B., Lawrimore, J.H. (2004)
Contemporary Changes of the Hydrological Cycle over the Contiguous United States: Trends Derived
from In Situ Observations. Journal of Hydrometeorology, 5, 1, 64–85.
[21] Hall, P (1982) On some simple estimates of an exponent of regular variation. J. Royal Statist. Soc. B
44, 37–42.
[22] Hill, B (1975) A simple general approach to inference about the tail of a distribution. Ann. Statist. 3,
no. 5, 1163–1173.
[23] Hosking, J. and J. Wallis (1987) Parameter and quantile estimation for the generalized Pareto distrib-
ution. Technometrics 29, 339–349.
[24] Jansen, D. and C. de Vries (1991) On the frequency of large stock market returns: Putting booms and
busts into perspective. Review of Econ. and Statist. 73, 18–24.
[25] Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994) Continuous univariate distributions, 2nd edition,
Wiley, New York.
[26] J. Klafter, A. Blumen and M.F. Shlesinger (1987) Stochastic pathways to anomalous diffusion. Phys.
Rev. A 35, 3081–3085.
[27] M. Kotulski (1995) Asymptotic distributions of the continuous time random walks: A probabilistic
approach. J. Statist. Phys. 81 777–792.
[28] Loretan, M. and P. Phillips (1994) Testing the covariance stationarity of heavy tailed time series. J.
Empirical Finance, 1, 211–248.
[29] Lu, S.L. and F.J. Molz (2001) How well are hydraulic conductivity variations approximated by additive
stable processes? Advances in Environmental Research 5, no. 1, 39–45.
21
[30] Malamud, B. D., Morein, G. and Turcotte, D. L. (1998) Forest fires: an example of self-organized critical
behavior, Science, 281, 1840–1842.
[31] Mandelbrot, B. (1963) The variation of certain speculative prices, J. Business, 36, 394–419.
[32] McCulloch, J. (1996) Financial applications of stable distributions. Statistical Methods in Finance:
Handbook of Statistics 14, G. Madfala and C. R. Rao, Eds., Elsevier, Amsterdam, 393–425.
[33] M.M. Meerschaert, D.A. Benson, H.P. Scheffler and P. Becker-Kern (2002) Governing equations and
solutions of anomalous random walk limits, Phys. Rev. E 66 No. 6, 102–105.
[34] Meerschaert, M. and H.P. Scheffler (2003) Portfolio modeling with heavy tailed random vectors, Hand-
book of Heavy-Tailed Distributions in Finance, 595–640, S. T. Rachev, Ed., Elsevier North-Holland,
New York.
[35] Metzler R. and J. Klafter (2000) The random walk’s guide to anomalous diffusion: A fractional dynamics
approach. Phys. Rep. 339, 1–77.
[36] Nikias, C. and M. Shao (1995) Signal Processing with Alpha Stable Distributions and Applications.
Wiley, New York.
[37] Pacheco, J., Scholtz, C.H. and Sykes, L.R. (1992) Changes in frequency-size relationship from small to
large earthquakes, Nature, 355, 71–73.
[38] Rachev, S. and S. Mittnik (2000) Stable Paretian Models in Finance, Wiley, Chichester.
[39] Rehfeldt, K.R., J. M. Boggs, and L. W. Gelhar (1992) Field study of dispersion in a heterogeneous
aquifer. 3: Geostatistical analysis of hydraulic conductivity, Water Resour. Res., 28 (12), 3309-3324.
[40] Resnick, S (1997) Heavy tail modeling and teletraffic data. Ann. Statist. 25, 1805–1869.
[41] Resnick, S. and C. Starica (1995) Consistency of Hill’s estimator for dependent data. J. Appl. Probab.
32, 139–167.
[42] Samorodnitsky, G. and M. Taqqu (1994) Stable non-Gaussian Random Processes. Chapman & Hall,
New York.
[43] Scholz, C.H. and Contreras, J. C. (1998) Mechanics of continental rift architecture, Geology, 26, 967–
970.
[44] R. Schumer, D. Benson, M. Meerschaert and S. Wheatcraft (2001) Eulerian derivation of the fractional
advection-dispersion equation. J. Contaminant Hydrology 38, 69–88.
[45] Uchaikin, V. and V. Zolotarev (1999) Chance and Stability: Stable Distributions and Their Applications.
VSP, Utrecht.
Inmaculada B. Aban, Department of Mathematics and Statistics, MS 084, University of
Nevada, Reno, NV 89557-0045, USA
E-mail address : [email protected]
22
Mark M. Meerschaert, Department of Mathematics and Statistics, MS 084, University
of Nevada, Reno, NV 89557-0045, USA
E-mail address : [email protected]
Anna K. Panorska, Department of Mathematics and Statistics, MS 084, University of
Nevada, Reno, NV 89557-0045, USA
E-mail address : [email protected]