validity and application of some continuous distributions dr. md. monsur rahman professor department...
TRANSCRIPT
Validity and application of some continuous distributions
Dr. Md. Monsur RahmanProfessor
Department of StatisticsUniversity of Rajshahi
Rajshshi – 6205E-mail: [email protected]
1
Normal distribution
The first discoverer of the normal probability functionwas Abraham De Moivre(1667-1754), who, in 1733,derived the distribution as the limiting form of the binomial distribution. But the same formula was derived by Karl Freidrich Gauss(1777-1855) in connection withhis work in evaluating errors of observation in astronomy.This is why the normal probability is often referred to as Gaussian distribution.
2
X: Normal Variate
Density:
0,,
],)(exp[)(
2
221
21
x
xf x
2)(,)( XVarXE
Standard Normal Variate :
XZ
ZX 3
Normal distribution
4
Properties of Normal distribution
Normal probability curve is symmetrical about the ordinate at x
Mean, median and mode of the distribution are equal and each of these is The curve has its points of inflection at By a point of infection, we mean a point at which the concavity changes
x
All odd order moments of the distribution about themean vanishThe values of and are 0 and 3 respectively
1 25
includes about 68.27% of the population
includes about 95.45% of the population 2
includes about 99.73% of the population 3Application:Many biological characteristics conform to a Normal distribution - for example, heights of adult men andwomen, blood pressures in a healthy population,RBS levels in blood etc.
6
Validity of Normal Distribution for a set of data
Many statistical methods can only be used if the observations follow a Normal Distribution. There are several ways of investing whether observations follow a Normal distribution. With a large sample we can inspect a histogram to see whether it looks like a Normal distribution curve. This does not work well with a small sample, and a more reliable method is the normal plot which is described below.
7
8
X: Normal Variate
Density:
0,,
],)(exp[)(
2
221
21
x
xf x
2)(,)( XVarXE
Standard Normal Variate :
XZ
ZX 9
CDF OF X : F(X)
CDF OF Z : )(z )(1)( zz
P quantile of X :
P quantile of Z :
pX
pZ
,
pp
X
p
ZX
Z p
pX
pZ
is the solution of
is the solution of
pXF p )(
pZ p )(
10
Dataset nxxx ,...,, 21
• Find empirical CDF values
• Arrange the data in ascending order as
• Empirical CDF values are as follows
.,...,2,1,)( 5.0)( nixF n
ii
)()2()1( ,...,, nxxx
•Using normal table obtain the values corresponding to
)(iz)( )(ixF
11
•If the given set of observations follow normal distribution, the plot (x, z) should roughly be a straight
line and the line passes through the
point and has slope .
•Graphical estimates of and may be obtained.
xz
)0,(
1
•If the data are not come from Normal distribution wewill get a curve of some sort.
12
2.2 3.6 3.8 4.1 4.73.3 3.6 3.8 4.1 4.73.3 3.7 3.9 4.2 4.83.4 3.8 4.0 4.4 5.0
Table 1 : RBS levels(mmol/L) measured in the blood of 20 medical students. Data of Bland(1995), pp. 66
Bland,M.(1995): An Introductions to Medical Statistics, second edition, ELBS with Oxford University Press.
13
14
Lmmol /92.3ˆ Lmmol /642.ˆ
MLE
15
16
•Goodness of Fit Test
•We use here Kolmogorov-Smirnov (KS) test for the given data
• KS statistic=max |CDF_FIT- CDF_EMP|
• For the RBS level data we calculate KS statistic KS(cal)=0.07827
• 5% tabulated value=0.294• Conclusion: Normal distribution fit is good for the given data
17
• Estimated population having RBS within the normal range (3.9 – 7.8mmol/L) is about 51%
• Estimated population having RBS below the normal range is about 49%
• Estimated population having RBS above the normal range is 0%
Results
18
n
m
XXX
XXX
22221
11211
,...,,
,...,,
• Empirical CDF values of are as follows:
)1()12()11( ,...,, mXXX
miXF mi
i ,...,2,1,)( 5.0)1(
• Obtain the values corresponding to )( )1( iXF)1( iZ
• Similarly values are obtained corresponding to)2( iZ
)( )2( iXF
•Two sample case
19
• If the first set of data come from normal distribution
with mean and variance , then the plot
will roughly be linear and passes
through the point with slope .
121
),( 11 ZX
1
1)0,( 1
• If the second set of data come from normal distribution
with mean and variance , then the plot
will roughly be linear and passes
through the point with slope .
2 22
),( 22 ZX
)0,( 22
1 20
• Both the lines parallel indicating different means but equal variances
• Both the lines coincide indicating equal means and equal variances
• Both the lines pass through the same point on the X-axis indicating same means but different variances
21
Table 2 : Burning times (rounded to the nearest tenth of a minute) of two kinds of emergency flares. Data due to Freund and Walpole(1987), pp. 530
Brand A: 14.9,11.3,13.2,16.6,17.0,14.1,15.4,
13.0,16.9
Brand B: 15.2,19.8,14.7,18.3,16.2,21.2,18.9,
12.2,15.3,19.4
Freund, J.E. and Walpole, R.E.(1987): Mathematical Statistics, Fourth edition, Prentice-Hall Inc.
22
Above plot indicates that both the samples come from normal population with unequal means and variances 23
Log-normal distribution
In probability theory, a log-normal distribution is aprobability distribution of a random variable whoselogarithm is normally distributed. If X is a random variable with a normal distribution, then Y = exp(X) hasa log-normal distribution; likewise, if Y is log-normallydistributed, then X = log(Y) is normally distributed. It is occasionally referred to as the Galton distribution.
24
Density:
0,,0
],)(exp[)(
2
2log21
21
x
xf x
x
Mean =
Variance=
Median=
Mode=
25
Log-normal density function
x
f(x)
26
ApplicationCertain physiological measurements, such as blood pressure of adult humans (after separation on male/female subpopulations), vitamin D level in blood etc. follow lognormal distribution.Subsequently, reference ranges for measurements in healthy individuals are more accurately estimated byassuming a log-normal distribution than by assuming a symmetric distribution about the mean.
27
Table 3 : Vitamin D levels(ng/ml) measured in the blood of 26 healthy men. Data due to Bland(1995), pp. 113
14 25 30 42 54 17 26 31 43 54
20 26 31 46 63 21 26 32 48 67
22 27 35 52 83 24
Bland,M.(1995): An Introductions to Medical Statistics, Second edition, ELBS with Oxford University Press.
28
29
449.ˆ
509.3ˆ ng/ml
ng/ml
• MLE
30
31
•Goodness-of-fit test
• KS statistic=max |CDF_FIT- CDF_EMP|
• For the vitamin D level data we calculate KS statistic KS(cal)=0.0967• 5% tabulated value=0.274
• Conclusion: Lognormal distribution fit is good for the given vitamin D data
32
• Estimated population having vitamin D level within the normal range (30 – 74 ng/ml) is about 56%
• Estimated population having vitamin D level below the normal range is about 40%
• Estimated population having vitamin D level above the normal range is about 4%
Results
33
Weibull Distribution
Weibull distribution is used to analyze the lifetime dataT: Lifetime variable• Density function
0,,0],)(exp[)()( 1
ttf tt
• : Scale parameter(.632 quantile)
• : Shape parameter(<1 or >1 or =1)
• CDF : ])(exp[1)( ttF
])(exp[)( ttR
• Reliability (or Survival) function:
34
1)()(
tth
•Increasing hazard rate : for
•Decreasing hazard rate: for
•Constant hazard rate : for
tth )( 1
1tth )(
1)( th 1
])}1({)1([)(
)1()(2122
1
TV
TE
:pt p quantile, which is the solution of ptF p )(
•Accordingly,1
)]1log([ pt p
•Hazard Function :
35
Density function:
0,0),exp()( 1 ttf t
• : Scale parameter(.632 quantile)
• CDF : )exp(1)(
ttF
)exp()( ttR
• Reliability (or Survival) function:
•Weibull distribution reduces to exponential distribution when 1
Exponential distribution
36
1)( th
2)(
)(
TVar
TE
:pt p quantile, which is the solution of ptF p )(
•Accordingly, )]1log([ pt p
•Hazard Function :
37
The red curve is the exponential density
The red line is theexp. hazard function
38
From the Weibull CDF we get
)log()log())](1log(log[ ttF,XAY
where
)log(
)log(
))](1log(log[
A
tX
tFY•
)()2()1( ,...,, nttt• Ordered lifetimes are:
• values are obtained through the empirical CDF values as given below
)(iY
,)( 5.0)( n
iitF
ni ,...,2,1
Validity of Weibull distribution for a set of data
39
•
• If the data follow Weibull distribution with scale parameter and shape parameter , the plot of (X,Y) will roughly be linear with slope and passes through the point .
)0),(log(
• Accordingly, the graphical estimates of and may be obtained.
40
Table 4: Specimens lives (in hours) of a electrical insulation at temperature appear below. Data due to Nelson(1990), pp. 154
Nelson,W.(1990): Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, John Wiley and Sons.
Co200
2520, 2856, 3192, 3192, 3528
41
42
• MLE of and
• Log-likelihood function of and based on observed data
nttt ,...,, 21
)()log()1()log( ii ttnLogL • MLE of and by maximizing the log-Likelihood with respect to and using numerical method.
• Graphical estimates may be used as starting values required for the numerical method
• The MLEs of and are denoted by and respectively.
43
For the insulation fluid data given in table 4 the following results (based on MLEs) are obtained:
78.3)ˆ(.
56.142)ˆ(.
61.10ˆ
49.3208ˆ
ES
ES
hours
hours
Estimated median life= 3099.548 hours
])(exp[)(ˆˆ
ˆ
ttR •ML estimate of R(t)
Time (hour): 3000 3500 3700 4000Reliability : .6124 .0807 .0107 .0000311
44
Weibull versus Exponential Model
•Suppose we want to test whether we accept exponential or Weibull model for a given set of data
•The above test is equivalent to test whether the shape parameter of Weibull distribution is unity or not i.e. vs 1:0 H 1:1 H
45
•Test Procedure(LR test)
•Under the log-likelihood function is
which yields , MLE of .
0H
itnl 10 )log(
in t1
•Maximum of is given by
0l
itnl ˆ1
0 )ˆlog(ˆ
46
•Similarly, under the maximum of the log-likelihood is given by
1H
ˆ
ˆˆˆ
ˆ
1 )()log()1ˆ()log(
ˆii ttnl
where and are the MLE s of and
under .
ˆ
1H
•LR test implies follows chi-square distribution with 1 df.
)ˆˆ(2 01 ll
•If , accept (use) exponential Model
)1,1()ˆˆ(2 2
01 ll47
• If , accept (use) Weibull model
)1,1()ˆˆ(2 2
01 ll
• For the insulation fluid data given in table 4
87.17)1293.451877.36(2)ˆˆ(2 01 ll
34.3)1,95(.2
Conclusion: Weibull model may be accepted at 5% level of significance
48
Accelerated Life Testing (ALT) for Weibull Distribution
• Stress: Temperature, Voltage, Load, etc.• Under operating (used) stress level, it takes a lot of time to get sufficient number of failures
• Lifetimes obtained under high stress levels
• Aim: (i) To estimate the lifetime distribution under used stress level, say, (ii) To estimate reliability for a specified time under (iii) To estimate quantiles under
0S0S
0S
Sampling scheme(under constant stress testing)• Divide n components into k groups with number
of components
respectively, where
• components exposed under stress levels • , j-th lifetime corresponding to • Obtain the equation for the lifetime corresponding
to i-th group
knnn ,...,, 21
k
iinn
1
in iS
ijT iS
50
• If the data corresponding to the i- th group follow , the plot will roughly be linear with slope and passes through the point
• If the plots are linear and parallel, then lifetimes under different stress levels are Weibull with common slope and different scale which implies that depends on the stress levels
)log()log())](1log(log[ iiijiij ttF
ijiiij XAY
),( iiW ),( ii YX
i )0),(log( i
ii
iS
•The equation for the lifetime corresponding to i-th group
51
• If the k plots are linear and parallel, the lifetimes under different stress levels are Weibull with different slopes and different scales which implies that both and both depend on the stress levels . In this case modeling is difficult.
• For the first case the relationship between the life and stress will be identified
• Plot log(.632 quantile) against the stress levels• If the plot yields a straight line then the life-stress relationship will be
i ii
iSi
S10)log( 52
&, 10• Estimation of
• Likelihood function under stress level iS
i
i
ij
i
ij
i
n
j
tt
iL1
1 }])(exp{))([(
where
,
)exp( 10 ii S
• Total log-likelihood
k
iiLLogLogL
110 )(),,(
using ML method
53
• Using numerical method MLEs of may be obtained
&, 10
• MLE of at , say, , is obtained through the relationship
)ˆˆexp(ˆ 0100 S
0S 0
• Hence ML estimate of Weibull density under used stress level is obtained. Accordingly, estimate of reliability for a specified time, median life and other desired percentiles may also be obtained
0S
54
Table 5: Specimens lives (in hours) of a electrical insulation at three temperatures appear below, data of Nelson(1990), pp. 154
2520
2856
3192
3192
3528
816
912
1296
1392
1488
300
324
372
372
444
Co200 Co225 Co250
Nelson,W.(1990): Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, John Wiley and Sons. 55
56
Above three plots of the data given in table5 are roughly linear and parallel, so the lifetimes under three stress levels are Weibull with common slope and different scale parameters which implies that the scale parameters depend on the stress levels
• Arrhenious life-stress relationship (temperature stress)
),/1()log( 10 W where W is the temperature in degree kelvin
• Temperature in degree kelvin= temperature in degree centigrade plus 273.16
57
58
• Results based on MLEs for the data given in table 5 with respect to the Arrhenious-Weibull model
39707.13ˆ0
9961.105961 68566.ˆ
98.21754ˆ0
9008.269ˆlog2 L
At used stress(180 deg. Centigrade) the followingresults are obtained
Estimated median lifetime=12747.08 hours
and 68566.ˆ
59
17807.13ˆ0
98923.105961
87.27080ˆ0
9037.273ˆlog2 L
At used stress(180 deg. Centigrade) the followingresults are obtained
Estimated median lifetime=18771.03 hours
• Results based on MLEs for the data given in table 5 with respect to the Arrhenious-Exponential model
60
• Weibull versus Exponential Model for ALT
1:0 H 1:1 Hvs
9037.273ˆlog2 0 L
9008.269ˆlog2 1 L
(For Exponential model)
(For Weibull model)
)1,95(.34.30029.4)ˆlogˆ(log2 201 LL
Conclusion: Accept Weibull model at 5% level of significance
61