mt2004

60
MT2004 Olivier GIMENEZ Telephone: 01334 461827 E-mail: [email protected] Website: http://www.creem.st-and.ac.uk/olivier/OGimene z.html

Upload: lysandra-dodson

Post on 02-Jan-2016

26 views

Category:

Documents


2 download

DESCRIPTION

MT2004. Olivier GIMENEZ Telephone: 01334 461827 E-mail: [email protected] Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html. 9. Distributions derived from normal distributions. In the previous section, we assume that the variance of the whole population was known - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MT2004

MT2004

Olivier GIMENEZ

Telephone: 01334 461827

E-mail: [email protected]

Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html

Page 2: MT2004

In the previous section, we assume that the variance of the whole population was known

Unlikely to be the case…

So we need methods to deal with both mean and variance of the whole population are unknown

To develop the theory underlying such methods, we need to introduce first some other distributions but related to the normal distribution

Namely, the 2, t and F distributions

9. Distributions derived from normal distributions

Page 3: MT2004

9.1 2 distributions

Page 4: MT2004

9.1 2 distributions

Page 5: MT2004

Upper quantile = value above which some specified proportion of the area of a p.d.f. lies

9.1 2 distributions

Page 6: MT2004

The 5% upper quantile of a 25 is x such Pr(2

5 x) = 0.05

9.1 2 distributions

Page 7: MT2004

The 5% upper quantile of a 25 is x such Pr(2

5 x) = 0.05 or alternatively Pr(2

5 x) = 0.95 i.e. the lower 95% quantile

9.1 2 distributions

Page 8: MT2004

Pr(25 x) = 0.95 (the lower 95% quantile) is obtained using the R

command: > qchisq(0.95,5) # cumulative d. f.[1] 11.07050

9.1 2 distributions

Page 9: MT2004

Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which are independently distributed as N(0,1), with all measurements in cm. What is the probability that the point (X,Y,Z) lies more than 3 cm from the origin?

9.1 2 distributions

Page 10: MT2004

Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which are independently distributed as N(0,1), with all measurements in cm. What is the probability that the point (X,Y,Z) lies more than 3 cm from the origin?

9.1 2 distributions

Page 11: MT2004

Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which are independently distributed as N(0,1), with all measurements in cm. What is the probability that the point (X,Y,Z) lies more than 3 cm from the origin?

9.1 2 distributions

Page 12: MT2004

9.1 2 distributions

Page 13: MT2004

9.1 2 distributions

Page 14: MT2004

9.1 2 distributions

Page 15: MT2004

9.2 The Fdistributions

Page 16: MT2004

9.2 The Fdistributions

The 5% upper quantile of a Fdf1,df2 is x such Pr(Fdf1,df2 x) = 0.05

Use Tables or R command qf(0.95,df1,df2) (lower 95% quantile)

Page 17: MT2004

9.2 The Fdistributions

So if we have a table with the upper quantiles, we can also get the lower quantiles as follows.

Remember that:

Upper quantile = value above which some specified proportion of the area of a p.d.f. lies

Lower quantile = value below which some specified proportion of the area of a p.d.f. lies

Page 18: MT2004

9.2 The Fdistributions

So if we have a table with the upper quantiles, we can also get the lower quantiles as follows.

Page 19: MT2004

9.2 The Fdistributions

So if we have a table with the upper quantiles, we can also get the lower quantiles as follows.

i.e. upper (1-) quantile of Fn,k or lower quantile of Fn,k is the inverse of the upper quantile of the Fk,n

Page 20: MT2004

9.2 The Fdistributions

Example: Given that F3,2;0.025 = 39.17, find F2,3;0.975 (i.e. lower 0.025 = 1-0.975 quantile of the F2,3 distribution)

F2,3;0.975 = 1/ F3,2;0.025 = 1/39.17 = 0.0255

Page 21: MT2004

9.2 The Fdistributions

Example: Given that F3,2;0.025 = 39.17, find F2,3;0.975 (i.e. lower 0.025 = 1-0.975 quantile of the F2,3 distribution)

F2,3;0.975 = 1/ F3,2;0.025 = 1/39.17 = 0.0255

R commands

> par(mfrow=c(2,1))

> plot(x,df(x,2,3),xlab="",ylab="",type='l')

> title("pdf F(2,3)")

> plot(x,df(x,3,2),xlab="",ylab="",type='l')

> title("pdf F(3,2)")

Page 22: MT2004

9.3 The tdistributions

Page 23: MT2004

9.3 The tdistributions

The shape of the p.d.f. of tn depends on n

Page 24: MT2004

9.3 The tdistributions

Looks like a normal distribution, but more of the probability is in the centre and the tails, see the graph for t1 e.g. (top left)

Page 25: MT2004

9.3 The tdistributions

Page 26: MT2004

9.3 The tdistributions

tn; is the upper quantile of the t distribution with n degrees of freedom

Page 27: MT2004

9.3 The tdistributions

Use tables or R, e.g. qt(0.95,30) (=1.859548) gives the lower 95% quantile of the t distribution with 8 degrees of freedom (upper 5% quantile) (qt(0.95,5000) = 1.645158…)

Page 28: MT2004

10 Using tdistributionsTo derive the distribution of the statistic testing hypotheses about the mean of a normal population with unknown variance, we need a key result on the joint distribution of the sample mean and the sample variance

Remember that:

Page 29: MT2004

10 Using tdistributionsTo derive the distribution of the statistic testing hypotheses about the mean of a normal population with unknown variance, we need a key result on the joint distribution of the sample mean and the sample variance

Page 30: MT2004

10 Using tdistributions

The quantity T depends on the population mean but not on the unknown variance 2.

So this statistic will be useful to test hypotheses about the mean population of normal populations with unknown variance

Page 31: MT2004

10.2 One-sample t-testsand confidence intervals

One sample t-tests:

39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219.

We assume normality.

Question: at the 1% significance level, could this data set be considered as a random sample from a population with mean 75.

In other words (Step 1 of hypothesis testing strategy):

H0: = 75 against H1 75

Your turn. Perform step 2 (find a ‘good test statistic’) and step 3 (derive its distribution)

Page 32: MT2004

One sample t-tests:

39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219.

Step 1: H0: = 75 against H1 75

Step 2: Xi/n - 0 is a good candidate since it takes ‘extreme’ values if H1 is true, and moderate values if H0 is true.

Step 4: it’s a 2-sided test, so we will reject H0 if

tobs –tn-1;/2 or tobs tn-1;/2 (graphical representation)

10.2 One-sample t-testsand confidence intervals

Page 33: MT2004

One sample t-tests:

39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219.

Step 1: H0: = 75 against H1 75

Step 2: Xi/n - 0 is a good candidate since it takes ‘extreme’ values if H1 is true, and moderate values if H0 is true.

If one-sided test, H1: <0, we reject if tobs –tn-1;

If one-sided test, H1: >0, we reject if tobs tn-1;

10.2 One-sample t-testsand confidence intervals

Page 34: MT2004

One sample t-tests:

39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219.

So we will reject if tobs 2.7045 or if tobs -2.7045

P-value using R:

> 2*pt(tobs,38) # (tobs<0 so need to double the c.d.f. of tobs – 2-sided test)

> 0.003799049

10.2 One-sample t-testsand confidence intervals

Page 35: MT2004

Confidence interval:

39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219.

We’d like to build up a 99% confidence interval for , we’re looking for values of for which we would accept H0

We know that:

10.2 One-sample t-testsand confidence intervals

Page 36: MT2004

Confidence interval:

39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219.

So we would accept any value of such that

75 is outside the confidence interval, so we would reject H0 at the 1% significance level

10.2 One-sample t-testsand confidence intervals

Page 37: MT2004

Confidence interval:

With R, a 95% confidence interval is obtained as follows:

> cil = 70.31 + qt(0.975,38)*90.219/sqrt{39}

> cil = 70.31 - qt(0.975,38)*90.219/sqrt{39}

> c(cil,ciu)

> [1] 67.23099 73.38901

And the 99% confidence interval is obtained as

> c(70.31 + qt(0.995,38)*90.219/sqrt{39}, 70.31 + qt(0.995,38)*90.219/sqrt{39}

10.2 One-sample t-testsand confidence intervals

Page 38: MT2004

Consider two samples of observations (Xi,Yi)

Consider the case: the two measurements (Xi,Yi) are made on the same unit i

We wish to test if the two population means are equal

Example: measurement of left and right wing length of birds

Should not be treated as independent!!!!!

Obviously, length of left wing and length of right wing both tend to be large for large birds: dependent measurements

Idea: work with the differences between the two measurement on each unit, i.e. Xi-Yi, in order to go back to a one-sample t-test e.g.

10.3 Paired t-tests

Page 39: MT2004

Example: corneal thickness in microns for both eyes of patients who have glaucoma in one eye

Glaucoma 488 478 480 426 440 410 458 460

Healthy 484 478 492 444 436 398 464 476

Obviously, the corneal thickness is likely to be similar in the two eyes of any patient – dependent observations

Consider di = glaucomai – healthyi. We will assume that this new random sample is drawn from a normal distribution N(d,2), and we wish to test: H0: d=0 vs H1: d0

di = -32 ; di2 = 936 and

10.3 Paired t-tests

Page 40: MT2004

Example: corneal thickness in microns for both eyes of patients who have glaucoma in one eye

H0: d=0 vs H1: d0

di = -32 ; di2 = 936, s2 = 115.43 and t7;0.025 = 2.3646 (see Tables)

tobs > - t7;0.025 and tobs < t7;0.025 meaning that tobs is in the region of acceptance of H0

10.3 Paired t-tests

-t/2 t/2

t

Page 41: MT2004

Example: corneal thickness in microns for both eyes of patients who have glaucoma in one eye

H0: d=0 vs H1: d0

di = -32 ; di2 = 936, s2 = 115.43 and t7;0.025 = 2.3646 (see Tables)

tobs > - t7;0.025 and tobs < t7;0.025 meaning that tobs is in the region of acceptance of H0

At the 5% significance level, we fail to reject H0, so there is apparently no difference between the good eye and the diseased eye

10.3 Paired t-tests

Page 42: MT2004

10.4 Two-sample t-tests

Now, we want to deal with two sets of data and compare, e.g., their means

We consider that the two random samples are drawn from normal distributions with unknown but same variances.

More formally

Page 43: MT2004

10.4 Two-sample t-tests

We consider that the two random samples are drawn from normal distributions with unknown but same variances.

We know that the distributions of the sample means of the two samples are:

so that (using results on sums of normal r.v’s)

As usual, we’d like to relate this distribution to a standard normal random variable…

Page 44: MT2004

10.4 Two-sample t-tests

We consider that the two random samples are drawn from normal distributions with unknown but same variances.

We have that:

Obviously, if we assume that is known, we can test hypotheses about the difference in means between the two groups (see the one-sample case – z-test).

But we assume that is unknown. So we need to do again what we’ve done for the t-test (one-sample test about the mean with unknown variance).

Page 45: MT2004

10.4 Two-sample t-tests

More precisely, first find the distribution of:

We note that:

where

Page 46: MT2004

10.4 Two-sample t-tests

Similarly, we have that:

where

Page 47: MT2004

10.4 Two-sample t-tests

Putting the two latter results together, we have that, using the additivity of 2 r.v’s:

Note that the above quantity can be written as

where:

is called the pooled sample variance.

Page 48: MT2004

10.4 Two-sample t-tests

Remember that we have:

Page 49: MT2004

10.4 Two-sample t-tests

So let the test statistic T be

which is actually the ratio of following distributions:

i.e. a t distribution with n+m-2 degrees of freedom!

Page 50: MT2004

10.4 Two-sample t-tests

Now we can see that T can be re-written as follows:

or:

The quantity T depends on the population means X and Y but not on the unknown variance 2.

This statistic is thus useful to test hypotheses about the difference in means between the 2 populations.

Page 51: MT2004

10.4 Two-sample t-tests

Example: Consider two random samples from 2 normal distributions:

x = 11 10 14 12 13 and y = 8 3 4 9

Test the hypothesis that the two population means are equal against the alternative hypothesis that they are not.

Page 52: MT2004

10.4 Two-sample t-tests

Example: Consider two random samples from 2 normal distributions:

x = 11 10 14 12 13 and y = 8 3 4 9

Test the hypothesis that the two population means are equal against the alternative hypothesis that they are not.

We wish to test H0: X = Y against H1: X Y

s2 = (10 + 26) / 7 = 36 / 7, and xi/n = 12, yj/m = 6

There is evidence to reject H0 at the 5% significance level.

In other words, the two population means are different

Page 53: MT2004

10.4 Two-sample t-tests

Using R:

> x=c(11,10,14,12,13)> y=c(8,3,4,9)> # pooled standard deviation:> pooledsd=sqrt(((5-1)*var(x)+(4-1)*var(y))/(5+4-2))

> # observed value of the test statistic:> tobs=(mean(x)-mean(y))/(pooledsd*sqrt(1/5+1/4))> tobs[1] 3.944053

> # p-value of the 2-sided test> 2*(1-pt(tobs,5+4-2))[1] 0.005574311

Page 54: MT2004

10.5 Testing equality of variances

Motivation: to apply the two-sample t-test of Section 10.4, we need to check that the two samples come from normal distributions with same variance

Consider X1,…,Xn and Y1,…,Ym two random samples drawn from normal distributions. We also assume independence.

Let 2X and 2

Y be the population variances of the two random samples.

Remember the strategy of hypothesis testing:

Step 1: We wish to test H0: 2X = 2

Y vs H1: 2

X 2Y

Step 2: We need to find a ‘good’ test statistic, i.e. a function of the data that takes ‘extreme’ values if H1 is true, and moderate values if H0 is true.

Page 55: MT2004

10.5 Testing equality of variances

We’ve seen that:

So what about the ratio:

?????

Page 56: MT2004

10.5 Testing equality of variances

If you work it out a little bit, you get under H0: 2X = 2

Y = 2, the following test statistic:

Under the null hypothesis the terms involving cancel.

If the alternative hypothesis is true, i.e. if 2X 2

Y, then the value of the test statistic above will be small or large depending on whether 2

X 2Y or 2

X 2Y.

Page 57: MT2004

10.5 Testing equality of variances

Step 3: Now we need the distribution of this test statistic under H0.

By definition of an F distribution, we have that:

that is:

using the main property of F distributions.

or

Page 58: MT2004

10.5 Testing equality of variances

Step 4: We will reject the null hypothesis if the observed value of this test statistic is greater than the upper quantile of the appropriate F distribution (using the Tables or program R).

Note that it is enough to compare the larger of the two test statistics describes on the previous slide with the upper quantile of the appropriate distribution.

Example: consider two examples, one of size 11 and the other of size 16 from two normal distributions. The sample variance of the first is 20 and the sample variance of the second is 30. At the 5% level, is there evidence to reject the hypothesis that the two populations have the same variance? Note that F15,10;0.025=3.522

Page 59: MT2004

10.5 Testing equality of variances

Example: consider two examples, one of size 11 and the other of size 16 from two normal distributions. The sample variance of the first is 20 and the sample variance of the second is 30. At the 5% level, is there evidence to reject the hypothesis that the two populations have the same variance? Note that F15,10;0.025=3.522

1) We wish to test 2X = 2

Y vs H1: 2

X 2Y , where X has sample

size and Y has sample size 16, with respectively s2X=20 and

s2Y=30. This is a test of equality of variances.

2) To perform it, we calculate the observed value of the test statistic (the largest one): fobs = s2

Y/s2X = 30/20 = 1.5

3) We need to compare this observed value to the 2.5% upper quantile of an F distribution with 15 and 10 degrees of freedom, i.e. F15,10;0.025 which is equal to 3.522

Page 60: MT2004

10.5 Testing equality of variances

Example: consider two examples, one of size 11 and the other of size 16 from two normal distributions. The sample variance of the first is 20 and the sample variance of the second is 30. At the 5% level, is there evidence to reject the hypothesis that the two populations have the same variance? Note that F15,10;0.025 = 3.522

4) fobs = 1.5 < F15,10;0.025 = 3.522

5) So there is no evidence to reject the null hypothesis. We fail to reject the equality of variances.

Note: We might now consider testing whether the two population means are different.