distributions (2) · 2011. 3. 31. · about mean, mode, and median • for a discrete sample,...

24
© 2008 Winton 1 Distributions (2)

Upload: others

Post on 21-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 1

Distributions (2)

Page 2: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 2

IV. Triangular Distribution• Known values

– The minimum (a)– The mode (b - the most likely value of the pdf)– The maximum (c)

probability density function (area under the curve = 1); the area under the triangle (1) is half the area of its bounding box which is h(c-a)

a-c

2 h =

f(x)

a b c x

Page 3: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 3

pdf for Triangular Distribution

• The pdf is given by

)

b-ch (slope c x bfor

b)-a)(c-(cc) -x (2

)a-b

h (slope b x afor a)-a)(b-(c

a)-2(x

f(x)

⎪⎪⎩

⎪⎪⎨

−=≤≤

=≤≤=

= 0 otherwise

Page 4: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 4

Expected Value

• The expected value is given by

• Derivation: with a little work the integrals evaluate to

3cba xdx

b)-a)(c-(cx)-2(c xdx

a)-a)(b-(ca)-2(x f(x)dxx E(X)

c

b

b

a-

++=⋅+⋅=⋅= ∫∫∫

3c b a

b)-a)(c-a)(b-3(cb)-a)(c-a)(b-c)(c b (a

b)-a)(c-a)(b-3(ca)-(bc c)-(ab b)-(ca

b)-6(c2b 3cb - c

a)-6(ba 3ab - 2bh E(X)

333323323

++=

++=

++=⎥

⎤⎢⎣

⎡ ++

+⋅=

Page 5: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 5

About mean, mode, and median• For a discrete sample, measures of centrality that are typically

determined are the mean, the mode, and the median• The mean is the average value of the sample and corresponds to E(X)• The mode corresponds to the maximum value of the pdf

– When working with a sample, it is necessary to resort to a histogram (which can be tricky) to estimate the mode of the underlying pdf

• The median simply corresponds to that point at which half of the area under the curve is to the left and half is to the right– The triangular distribution is typically employed when not much is known

about the distribution, but the minimum, mode, and maximum can be estimated.

Page 6: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 6

Sampling From the Triangular Distribution

• Again this means solving

for rsample given a random probability x

• Since f(z) is piecewise continuous, its distribution function

F(t) is given by

∫∞

=rsample

-

f(z)dz x

c for t 1

c t bfor f(z)dz - 1

b t afor f(z)dz

a for t 0

f(z)dz F(t) c

t

t

at

⎪⎪⎪

⎪⎪⎪

<≤

≤<

==

∫∫∞−

Left side integral

Right side integral

Page 7: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 7

Left and Right Side Integrals

• For left side a ≤ rsample ≤ b

• For right side b ≤ rsample ≤ c

since

a) - a)(c - (ba) - (

a)-a)(c-(b2az - z dz

a)a)(c(ba)2(z f(z)dzx

2rsample

a

2

a a

rsamplersample rsample==

−−−

== ∫ ∫

a)-b)(c-(c) - (c - 1 x

2rsample=

2c2c

a)-b)(c - (c) - (c

a)-b)(c-(cz - 2cz dz

a)b)(c(cz)2(c f(z)dz rsample

rsample

c

rsample rsample

==−−

−=∫ ∫

Page 8: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 8

Sampling Function• Since (area of left triangle)

if use the left side equation to

solve for rsample; otherwise use the right side equation

∫ =b

a a)-(ca)-(b f(z)dz

a)-(ca)-(b x ≤

a)-a)/(c-(b for x x)-a)(1-b)(c-(c - c

a)-a)/(c-(bfor x a)x-a)(c-(b a

>=

≤+=

rsample

rsample

Page 9: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 9

Graph of Sampling Function: Triangular Distribution

• Example: – the median corresponds to x = 0.5)– For a=1, b = 2, c = 4

mean = (a+b+c)/3 = 2.333mode = 2median =

2.268 3 - 4 a)(0.5)b)(c(cc ==−−−

x

rsample

c

b

a

0 1

(b-a)/(c-a)

Page 10: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 10

V. Gamma Distribution• Background: Gamma Function

– One of a large number of functions related to the exponential function

– Traces from 18th century work by Euler in which he was using interpolation methods to define n! for non-integral values

• dubbed the gamma function by LeGendre in a series of books published between 1811 and 1826

– Appears naturally in the study of anti-differentiation• Also studied in the context of differential equations when

calculating LaPlace transforms

• The gamma function is given by

0)(αdx ex Γ(α) x

0

1 - α >= −∞

Page 11: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 11

Gamma Function Definition

• The gamma function is given by

• Integrate (by parts) to get Γ(α) = (α-1)Γ(α-1) for α > 1–

– When α is an integer > 1, Γ(α) = (α - 1)!– The gamma function is a generalization of the factorial, applying

to all α > 1, not just integers

0)(αdx ex Γ(α) x

0

1 - α >= −∞

1 dx e (1)0

x- ==Γ ∫∞

Page 12: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 12

Obtaining the Gamma Distribution

• The gamma distribution is obtained from the gamma function by specifying the pdf

where the proportionality constant k is

chosen so that

⎪⎩

⎪⎨⎧

>=−

otherwise 0 0 for x e kx f(x) β

x1 - α

∫∞

∞−

= . 1 f(x)dx

for fixed α > 0 and β > 0

Page 13: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 13

Proportionality Constant k

so and f(x) is given by

• α is called shape (or order) parameter• β us called the scale parameter

– When α = 1, f(x) = the exponential distribution with mean β

• In general, E(X) = αβ and σ2 = αβ2

– If the mean and standard deviation can be estimatedthen α and β can also be determined

α)(β1 k αΓ

=

∫∞

∞−

= . 1 f(x)dx dt et kβ dx exk 1 t-

0

1 - ααβx-

0

1 - α ∫∫∞∞

== where x = βt

Γ(α)

( )x/β-1 - α

α e xαβ

xβ1-

e β1 ⋅⋅

Page 14: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 14

Gamma Function Reflection Formula

• The general relationship Γ(α) = (α-1)Γ(α-1) for α > 1 holds• It can also be shown that for 0 < α < 1

• For 0 < α < 1, 1 + α > 1, so Γ(1+α) = αΓ(α) – This gives the reflection formula

for 0 < α < 1

0) (α dx e x ) (α0

x-1 - α >=Γ ∫∞

α)sin(ππ ) α - (1) (α⋅

=ΓΓ

π (.5) =Γ

α))sin(π α Γ(1απ α) - (1

⋅+⋅

Page 15: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 15

Algorithm for Calculating the Natural Logarithm of the Gamma Function

Attributed to Lanczos, C., Journal S.I.A.M. Numerical Analysis, ser. B, vol. 1, p. 86 (1964) and adapted from Numerical Recipes in C by Press, W.H., and B.P. Flannery, S.A. Teukolsky, W.T. Vetterling (Cambridge University Press, 1988)

FUNCTION lngamma(z)/* Use the reflection formula for z < 1 */IF z < 1

z ← 1 - zRETURN ln(πz) - (lngamma(1 + z) + ln(sin(πz))

ENDIFcoeff ← 76.18009173, -86.50532033, 24.01409822, -1.231739516, 0.00120858003, -0.00000536382/* These values are the (approximate) coefficients for the first 6 terms of an infinite series involved in an exact formulation for the gamma function credited to Lanczos. They yield an approximation for the variable "a" (determined below) which is within |ε| < 2 × 10-10 of its true value */a ← 1FOR i ← 1 TO 6

a ← a + coeff(i)/(i + z - 1)ENDFORRETURN ln(a) - (z + 4.5) + (z - 0.5)ln(z + 4.5)

END

Page 16: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 16

Selected Values of Γ(α)computed according to the algorithm for ln(Γ(α))

Γ(.25) ≈ 3.62560991 Γ(.5) = ≈ 1.77245385Γ(.75) ≈ 1.22541670Γ(1) = 0! = 1Γ(1.25) ≈ 0.90640248Γ(1.5) = .5Γ(.5) = ≈ 0.88622693Γ(1.75) ≈ 0.91906253Γ(2) = 1! = 1Γ(2.25) ≈ 1.13300310Γ(2.5) = 1.5Γ(1.5) = ≈ 1.32934039Γ(2.75) ≈ 1.60835942Γ(3) = 2! = 2Γ(3.25) ≈ 2.54925697Γ(3.5) = 2.5Γ(2.5) = ≈ 3.32335097Γ(3.75) ≈ 4.42298841Γ(4) = 3! = 6Γ(4.25) ≈ 8.28508514Γ(4.5) = 3.5Γ(3.5) = ≈ 11.63172840Γ(4.75) ≈ 16.58620654Γ(5) = 4! = 24Γ(5.25) ≈ 35.21161185Γ(5.5) = 4.5Γ(4.5) = ≈ 52.34277778Γ(5.75) ≈ 78.78448105Γ(6) = 5! = 120

π

2/π

4/π3

8/π15

/16 π105

/32π9450

2

4

6

8

10

0 1 2 3 4 5

x xx

x0! 1! 2! 3!

Γ(α)

α

Page 17: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 17

Graph of Gamma pdffixed mean αβ = 5 and varying values of α and β

0

0.1

0.2

0.3

0.4

0.5

0.6f(x)

0 5 10x

α=.5, β=10

α=1.5, β=3.3333α=5, β=1

α=10, β=.5

Page 18: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 18

Corresponding Distribution Functions and Sampling Functions

0

1

.6

.2

0 10

F(x)

0

2

4

6

8

10rsample

0 .6.2 1• The gamma distribution is used to model waiting times or time to complete a task• It can be shown that if we have exponentially distributed interarrival times with

mean 1/λ, the time needed to obtain k changes distributes according to a gamma distribution with α = k and β = 1/λ

Page 19: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 19

Gamma Distribution Behavior

• If– α < 1 then xα - 1 →∞ as x → 0– α = 1 then the distribution is the exponential distribution– α > 1 then xα - 1 → 0 as x → 0

• The example showed three basic shapes, each of which is described by the behavior of the derivative f '(x) (slope function) of f(x)

– f '(x) = k[(α - 1)xα - 2 e -x/β + (-1/β)xα - 1 e -x/β]• There are actually 5 cases:

– α < 1 the slope → -∞ as x → 0 since each term is < 0 and each exponent of x is < 0 – α = 1 the slope → -1/β2 as x → 0 in accord with the exponential distribution

since k = 1/β, term 1 is 0 and term 2 is -1/β– α < 2 and α > 1 the slope → +∞ as x → 0 since the lead term → +∞ and term 2 is 0– α = 2 the slope → +1/β2 as x → 0 since k = 1/βα = 1/β2

– α > 2 the slope → 0 as x → 0 • In each case the slope → 0 as x → +∞• The gamma distribution is one which is usually sampled by the accept-reject

technique, which means to get k, the value of Γ(α) must be computed

⎪⎩

⎪⎨⎧

>=−

otherwise 0 0 for x e kx f(x) β

x1 - α for fixed α > 0 and β > 0

Page 20: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 20

VI. Erlang Distribution• Erlang was a Danish telephone engineer who did some of the early work in

queuing theory• For the Gamma distribution, when α is an integer i, then the gamma

distribution is called an Erlang distribution of order i• The Erlang distribution is used to model phenomena having i stages, each

with independent, exponentially distributed service times of mean μ• Rather than model these separately, an Erlang distribution of order i can be

used to model the total service time– e.g., if an experiment has 3 successive stages each of which takes an average of 5

minutes (exponentially distributed), then the experiment time can be modeled by taking α = 3 and β = 5 (to get the overall mean of αβ = 15 minutes)

Page 21: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 21

VII. Weibull Distribution• For the gamma distribution the idea was to choose k so that

is equal to 1 and then choose f(x) = kxα-1e -x/β

• A major difficulty with this pdf is that it is not integrable in closed form; however, it is close to being so

• The idea is to render the part under the integral to be in the form ezdzso that the function will be integrable; i.e., for

we need to determine the ? value so that for z = -(x/β)?, dz = xα-1 dx• If z = -(x/β)α then dz = -(α/β)(x/β)α - 1 dx and then

dx e xk x/β-

0

1 - α∫∞

( ) dx xek 1 - α

0

x/β ?

∫∞

( ) ( ) ( )α

α

0

x/βα

1 - α

0

x/βα

βα k when 1

αβk e

αβ-k dx x/β

βαe

αβ-k

αα==⎟

⎟⎠

⎞⎜⎜⎝

⎛=⎥⎦

⎤⎢⎣⎡

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

⎛ −⋅⎟⎟⎠

⎞⎜⎜⎝

⎛ ∞−

∞−∫

Page 22: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 22

Weibull pdf• The choice of k=α/βα is the basis for the pdf

– the effect of δ is to displace the distribution δalong the horizontal axis, so it is often taken to be 0

• With a little work, it can be shown that– E(X) = δ + βΓ(1 + 1/α) and – VAR(X) = β2(Γ(1 + 2/α) - Γ2(1 + 1/α))

⎪⎪⎩

⎪⎪⎨

≥⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

⎟⎟⎠

⎞⎜⎜⎝

otherwise 0

δ for x eβδx

βα

f(x)

α

βδ -x -1 - α

Page 23: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 23

Sampling• Since

the sample function can be solved for rsample for given values of α, β, δ

• The Weibull distribution is often used to model time until failure (for example, light bulbs may have significant early failure and some have significant long term until failure) – When α = 1, β = 1/λ, δ = 0 then the Weibull distribution is the

exponential distribution

( )( ) ( )αα δ)/β - (-

δδ

/βδ -x - e - 1 e- f(z)dz x rsamplersamplersample

=⎥⎦⎤

⎢⎣⎡== ∫

Page 24: Distributions (2) · 2011. 3. 31. · About mean, mode, and median • For a discrete sample, measures of centrality that are typically determined are the mean, the mode, and the

© 2008 Winton 24

GraphWeibull pdf for fixed β = 1 and varying values of α

0

1

2

3

4

0 1 2 3 4

α=.5, β=1

α=1.5, β=3.3333

α=5, β=1

α=10, β=1

f(x)

x