introduction to hypothesis testing 2015 · introduction to hypothesis testing review: logic of...

44
Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If hypothesis (H A )is that an experimental treatment has an effect: null hypothesis is that there is no effect Disproving H 0 = evidence that actual hypothesis is true

Upload: others

Post on 16-Apr-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Introduction to hypothesis testing

Review: Logic of Hypothesis Tests

• Usually, we test (attempt to falsify) a null hypothesis (H0):– includes all possibilities except prediction in

hypothesis (HA)

• If hypothesis (HA)is that an experimental treatment has an effect:– null hypothesis is that there is no effect

• Disproving H0 = evidence that actual hypothesis is true

Page 2: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Decision criterion• How low a probability should make us reject

H0?• If probability is less than significance level

(critical p-value, ), then reject H0; otherwise do not reject

• Convention sets significance level: = 0.05 (5%)

• Arbitrary:– other significance levels might be valid. Context

specific

Three special types of Hypothesis Tests based on the t distribution

1. The mean of a distribution is different from a constant (one sample t test)

2. The mean difference in pairs of observations is different from a constant (paired t test)

3. Two distributions differ (i.e. the means from two sets of observations do not come from the same distribution of means). Two sample t test.

Page 3: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

t statisticGeneral form of t statistic:

where St is sample statistic, is parameter value specified in H0 and SE is standard error of sample statistic.

Specific form for population mean:

Value of meanspecified in H0

SE

St

ns

y

Test statistics

• Sampling distributions of t, one for each sample size, when H0 true– use degrees of freedom (df = n - 1)

• Area under each sampling (probability) distribution equals one

• Probabilities of obtaining particular ranges of t when H0 is true

Page 4: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Three special types of Hypothesis Tests based on the t distribution

1. The mean of a distribution is different from a constant. One sample t test

2. The mean difference in pairs of observations is different from a constant. Paired t test.

3. Two distributions differ (ie the means from two sets of observations do not come from the same distribution of means). Two sample t test.

Simple null hypothesis

• Test of hypothesis that population mean equals a particular value (H0: = )

• These values may be from literature or other research or legislation

Page 5: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

One sample t-test

Populations are fairly stable if the ratio of births to deaths is close to 1.25.

Ho: B/D ratios = 1.25HA: B/D ratios = 1.25

1) Are the B/D ratios for any of these groups =1.25

2) Test using a one sample t-test

Ourworld

0

0.5

1

1.5

2

2.5

3

3.5

4

Mea

n(B_

To_D

)

Europe Islamic NewWorld

Group

t statisticGeneral form of t statistic:

where St is sample statistic, is parameter value specified in H0 and SE is standard error of sample statistic.

Specific form for population mean:

Value of meanspecified in H0

SE

St

ns

y

Page 6: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

One sample t-tests

Single population:H0: = 0 (or any other pre-specified value:

here 1.25)

df = n - 1

ns

y – 1.25

s

yt

y

1.25

Results

1. Box plot2. Normal approximation3. Histogram

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

0.05 0.15 0.25Probability

Europe

Page 7: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

More Results

Test MeanHypothesized ValueActual EstimateDFStd Dev

1.253.47825

151.17943

Test StatisticProb > |t|Prob > tProb < t

t Test7.5570

<.0001*<.0001*1.0000

-1 0 1 2 3 4

Test MeanHypothesized ValueActual EstimateDFStd Dev

1.253.95091

201.50949

Test StatisticProb > |t|Prob > tProb < t

t Test8.1995

<.0001*<.0001*1.0000

-2 -1 0 1 2 3 4

Islamic New World

Even more – a way to present the results

0

1

2

3

4

5

6

7

8

Birt

hs /

dea

ths

(95%

CI)

Ho:

Page 8: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Two sample t- test

• Used to compare two populations, each of which has been sampled

• The simplest form of tests among multiple populations

• Example: does the average annual income differ for males and females: – Ho: income (males) = income (females)

Survey20

5

10

15

20

25

Female Male

SEX

H0: 1 = 2, i.e. 1 - 2 = 0

- independent observations

df = (n1 - 1) + (n2 - 1) = n1 + n2 - 2

2121

212121 )(

yyyy s

yy

s

yyt

n1sp n2

1 1+

21 yy

Where sp = the pooled standard deviation (more later), and

Calculation:

Page 9: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

y1 t =

y2

1

n1

1

n2

+sp

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

Pro

ba

bili

ty o

f t

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

6 7 8 9

HA true

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

Pro

ba

bili

ty o

f t

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4Ho true

Ho: = 2

HA: > 2

1) If Ho is true then the null distribution is known (for a set df)

2) If HA is true, we don’t know the distribution but we do know that it is not the null distribution

Logic of the two sample t test

Assume

Central t Non-Central t

Assume: Ho: = 2, 4 df

y1 t =

y2

1

n1

1

n2

+sp

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

6 7 8 9-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4Ho true

y1 t =

y2

1

n1

1

n2

+sp

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

6 7 8 9-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4Ho true

t 0.05, 4 df = 2.14

Any t >2.14 will lead to incorrect rejection of Ho

1. This means that the difference between y1 and y2

is > than 2.14 standard errors (pooled)

2. This will happen 5 % of the time

Page 10: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Assume: HA: > 2, 4 df

y1 t =

y2

1

n1

1

n2

+sp

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

6 7 8 9-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

y1 t =

y2

1

n1

1

n2

+sp

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

6 7 8 9-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

-5 -4 -3 -2 -1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4HA true

t 0.05, 4 df = 2.14

Any t < 2.14 will lead to incorrect rejection of HA

1. This means that the difference between y1 and y2

is < than 2.14 standard errors (pooled)

2. The probability that this will happen is dependent on n and the true difference between and

Results of example

The unequal variance t-test is based on the Satterthwaite adjustment (of degrees of freedom), it is not recommended unless the variance terms are very different and the sample sizes (n) are very different

What is the conclusion?

Difference in Means

Difference in Means

Page 11: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

0

10

20

30

40

50

60

70

0

10

20

30

40

50

60

70

0

5

10

15

20

25

Annu

al In

com

e (m

ean

+-

SE)

Female Male

SEX

Female Male

Paired t – tests: The logic of

1. Often there is interest in comparisons of observations that can be considered ‘paired” within a subject or replicate

a) For example:i. A comparison of activity level before and after eating in the

same individualii. A comparison of longevity of males vs females,where

county is the replicate

2. In such cases there is often benefit in accounting for variance that could be caused by differences among subjects (or replicates)

Page 12: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Paired observations: Paired t- test

H0: d = 0

where d is difference between betweenpaired observations

Where sd = standard deviation of the sample of differences, anddf = n - 1 where n is number of pairs

ds

dt

dnd

sd

Paired t-test – example II

• Pisaster comes in two colors along the west coast: purple and orange:

– Ho: density of purple per site = density of orange

– Individual reefs are the replicates of interest

– Looks like a no brainer

Sea star colors all sites two sample

Orange PurpleCOLOR

0

200

400

600

800

1000

1200

Den

sity

Page 13: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Results of a 2 sample test

Orange PurpleColor of seastars

0

200

400

600

800

1000

1200

De

nsi

ty (

95

% C

I)

PurpleOrange

COLOR

0

200

400

600

800

1000

1200

NU

MB

ER

0123456789Count

0 1 2 3 4 5 6 7 8 9Count

Marginally significantWHY?

¦ StandardGROUP ¦ N Mean Deviation-------+--------------------------Orange ¦ 7 144.71429 101.75086Purple ¦ 7 457.28571 353.47829

Pooled VarianceDifference in Means : -312.57143 95.00% Confidence Interval : -615.48591 to -9.65695 t : -2.24827 df : 12.00000 p-value : 0.04413

Consider the variability added at the level of replicate (site)

Govpt

BoatStair

Shell Beach

Hazards

Cayuco

sPSN

Site

0

200

400

600

800

1000

1200

Den

sity

Given that observations are paired at the level of site – can this be accounted for

Orange PurpleCOLOR

0

200

400

600

800

1000

1200

Den

sity

Govpt

BoatStair

Shell Beach

Hazards

Cayucos

PSN

SITE

0

200

400

600

800

1000

1200

Den

sity

PurpleOrange

COLOR

Page 14: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Paired test: Details of calculationSite Purple Orange differenceGovpt 1023 306 717Boat 585 155 430Stair 476 143 333PSN 233 142 91Cayucos 107 31 76Hazards 728 222 506Shell Beach 49 14 35

mean 312.5714Sediff 97.25882t 3.21381

ORANGE PURPLEIndex of Case

0

200

400

600

800

1000

1200

Va

lue

Note slopes – are they the same:Perhaps rates are a better comparison1) Convert to rates or2) Log transform

Paired test: Details of calculation:use of Log transformed data

Note slopes – much more similarIndicates that:1) Purples are more common

• By a constant ratio –rather than by a constant amount

Site Purple(log) Orange(log) differenceGovpt 3.0098756 2.4857214 0.524154Boat 2.7671559 2.1903317 0.576824Stair 2.677607 2.155336 0.522271PSN 2.3673559 2.1522883 0.215068Cayucos 2.0293838 1.4913617 0.538022Hazards 2.8621314 2.346353 0.515778Shell Beach 1.6901961 1.146128 0.544068

mean 0.490884Sediff 0.046604t 10.53299 LORANGE LPURPLE

Index of Case

1.0

1.5

2.0

2.5

3.0

3.5

Va

lue

Page 15: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Review – calculations of t for • One sample test

• Two sample test

• Paired test

ns

y

y1 y2

1

n1

1

n2

+sp

y1 y2

1

n1

1

n2

+sp

d

nd

sd

n

s

nd

sd

n1sp n2

1 1+ Sp =SS1+SS2

(n1-1)+(n2-1)

SS1+SS2

(n1-1)+(n2-1)

SS1+SS2

(n1+n2-2)

SS1+SS2

(n1+n2-2)=

Sd =

2

2

S =SS

(n-1)

SS

(n-1)

2

SSd

(nd -1)

SSd

(nd -1)

Calculations of Standard Error

1) One sample t-test

2) Paired t-test

3) Two sample t- test (calculation based on pooled variance term)

Page 16: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Testing statistical null hypotheses

Hypothesis construction

Page 17: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

General Hypothesis

• A hypothesis that addresses the general question of interest

Ho: There will be no difference in the density of urchins on vertical vs horizontal surfaces

HA: There will be a difference in the density of urchins on vertical vs horizontal surface

Specific hypotheses

• A hypothesis that represents the specific question addressed in your study. The specifics include– Location of study

– Time period

– Replication

– Simple description of design

Page 18: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Specific Hypothesis

Ho: There will be no difference in the density of (species name) on vertical vs horizontal surfaces based on 10 replicate quadrats for each treatment randomly placed within site A sampled on date B

HA: There will be a difference in the density of (species name) on vertical vs horizontal surfaces based on 10 replicate quadrats for each treatment randomly placed within site A sampled on date B

Note much of this can be placed in the methods section, which would alleviate the need to state these details. However, also note that the hypotheses above are actually what are being tested

Depiction of hypotheses

Horizontal Density – Vertical Density of Urchins

- 0 +

Ho: There will be no difference in the density of (species name) on vertical vs horizontal surfaces based on 10 replicate quadrats for each treatment randomly placed within site A sampled on date B

Increasing likelihood that Ho is incorrectIncreasing likelihood that Ho is incorrect

Page 19: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Depiction of hypotheses:what should the units be?

Horizontal Density – Vertical Density of Urchins

- 0 +

Increasing likelihood that Ho is incorrectIncreasing likelihood that Ho is incorrect

Ho

Depiction of hypotheses:what should the units be?

• Goal– To use same units for all assessments – irrespective

of species or system

– To have same set of probabilities based on those units

– Hence - units should link to estimate of confidence• Most common form are t-values, which provide an

estimate of the difference in mean values calibrated by an estimate of error in the assessment of the mean values

Page 20: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

T- statistic

1

2

N

XXSD

andN

SDSE

N

ii

SE

XXT

21

30404537

136.3

272.6

000.38

SE

SD

X

(Standard error)

(Standard deviation)

(Number of replicates)

Depiction of hypotheses:what should the units be?

Horizontal Density – Vertical Density of Urchins

- 0 +

Increasing likelihood that Ho is incorrectIncreasing likelihood that Ho is incorrect

Ho

SET =

Page 21: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Depiction of hypotheses:what should the units be?

Horizontal Density – Vertical Density of Urchins

Increasing likelihood that Ho is incorrectIncreasing likelihood that Ho is incorrect

Ho

SET =

-3 -2 -1 0 1 2 3

T-distribution (central t) is a null probability distribution

• Depicts the probability that the null hypothesis is correct

• One use is to estimate confidence levels

Page 22: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Depiction of hypotheses:

Horizontal Density – Vertical Density of Urchins

Increasing likelihood that Ho is incorrectIncreasing likelihood that Ho is incorrect

Ho

SET =

-3 -2 -1 0 1 2 3

Depiction of hypotheses:what should the units be?

Increasing likelihood that Ho is incorrectIncreasing likelihood that Ho is incorrect

Ho

Horizontal Density – Vertical Density of Urchins

SET =

-3 -2 -1 0 1 2 3

Page 23: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Ho: There will be no difference in the density of urchins on vertical vs horizontal surfaces

-3 -2 -1 0 1 2 3Horizontal Density – Vertical Density of Urchins

SET =

Ho: There will be no difference in the density of urchins on vertical vs horizontal surfaces

-3 -2 -1 0 1 2 3Horizontal Density – Vertical Density of Urchins

SET =

Page 24: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Ho: There will be no difference in the density of urchins on vertical vs horizontal surfaces

-3 -2 -1 0 1 2 3

95% CI

Horizontal Density – Vertical Density of Urchins

SET =

Including error yields a confidence interval e.g. 95% confident that the true t value is between….

HA: There will be a difference in the density of urchins on vertical vs horizontal surface

-3 -2 -1 0 1 2 3

95% CI 2.5%2.5%

100% CI

Horizontal Density – Vertical Density of Urchins

SET =

Page 25: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

The importance of directionality of the alternative hypothesis (HA)

Consider:

Ho: There will be no difference in the density of urchins on vertical vs horizontal surfaces

HA: There will be a difference in the density of urchins on vertical vs horizontal surfaces

vs

Ho1: Urchin density on horizontal surfaces will be greater than or equal to that on vertical surfaces

HA1: Urchins will be more dense on vertical than on horizontal surfaces

Ho1: Urchin density on horizontal surfaces will be greater than or equal to that on vertical

surfaces

-3 -2 -1 0 1 2 3

100% CI

5%

Horizontal Density – Vertical Density of Urchins

SET =

95% CI

Page 26: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

HA1: Urchins will be more dense on vertical than on horizontal surfaces

-3 -2 -1 0 1 2 3

100% CI

5%

Horizontal Density – Vertical Density of Urchins

SET =

95% CI

One vs two tailed hypotheses-

-3 -2 -1 0 1 2 3

100% CI

5% 95% CI

Horizontal Density – Vertical Density of Urchins

SET =

HA1: Urchins will be more dense on vertical than on horizontal surfaces

HA: There will be a difference in the density of urchins on vertical vs horizontal surface

1. Which is more interesting?2. Which is more informed?

-3 -2 -1 0 1 2 3

95% CI 2.5%2.5%

100% CI

Page 27: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

One vs two tailed hypotheses-

-3 -2 -1 0 1 2 3

100% CI

5% 95% CI

Horizontal Density – Vertical Density of Urchins

SET =

HA: There will be a difference in the density of urchins on vertical vs horizontal surface

1. Which is more powerful?

-3 -2 -1 0 1 2 3

95% CI 2.5%2.5%

100% CI

HA1: Urchins will be more dense on vertical than on horizontal surfaces

Example

• Replication on horizontal and vertical surfaces = 50 (100 total)

• Mean on Horizontal surfaces = 33.54

• Mean on Vertical Surfaces = 45.31

• Pooled standard deviation = 66.49

SE

XXT

vh 79.1

10049.66

32.4554.33

T

Page 28: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

One vs two tailed hypotheses-

-3 -2 -1 0 1 2 3

100% CI

5% 95% CI

Horizontal Density – Vertical Density of Urchins

SET =

HA: There will be a difference in the density of urchins on vertical vs horizontal surface

1. Which is more powerful?

-3 -2 -1 0 1 2 3

95% CI 2.5%2.5%

100% CI

T= -1.79, p=0.04 T= -1.79, p=0.08

HA1: Urchins will be more dense on vertical than on horizontal surfaces

One vs two tailed hypotheses-Conversion to original units

100% CI

5% 95% CI

Horizontal Density – Vertical Density of Urchins

HA: There will be a difference in the density of urchins on vertical vs horizontal surface

95% CI 2.5%2.5%

100% CI

Difference = -11.78, p=0.04

HA1: Urchins will be more dense on vertical than on horizontal surfaces

-19.5 -13.3 -6.65 6.65 13.3 19.50 -19.5 -13.3 -6.65 6.65 13.3 19.50

Difference = -11.78, p=0.08

Page 29: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

This is the difference between 1 and 2 tailed hypotheses – make sure you know which you

are dealing with

• Always strive for one tailed hypotheses

• Is there a directional prediction (eg > or separately <)– One tailed

• If not– Two tailed

Assumptions of t test

• The t test is a parametric test

• The t statistic only follows t distribution if:– variable has normal distribution (normality

assumption)

– two groups have equal population variances (homogeneity of variance assumption)

– observations are independent or specifically paired (independence assumption)

Page 30: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Normality assumption

• Data in each group are normally distributed• Checks:

– Frequency distributions – be careful– Boxplots– Probability plots– formal tests for normality

• Solutions:– Transformations– Don’t worry run it anyway – just kidding but not

entirely

Homogeneity of variance

• Population variances equal in 2 groups

• Checks:– subjective comparison of sample variances

– boxplots

– F-ratio test of H0: 12 = 2

2

• Solutions– Transformations

– Don’t worry run it anyway – just kidding again but again not entirely

Page 31: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

F-test on variances

• H0: 12 = 2

2

• F statistic (F-ratio) = ratio of 2 sample variances– F = s1

2 / s22

– Reject H0 if F < or > 1

• If H0 is true, F-ratio follows F distribution

• Usual logic of statistical test

50 100 150 200 250 300 350LENGTH

Largest valueSmallest value

Median25% of values 25% of values

Boxplot

Page 32: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

0 10 20 30 40 50 60 70 80 90

Limpet numbers per quadrat

0

10

20

30

40

50

60

70

Cou

nt

1. IDEAL 2. SKEWED

4. UNEQUAL VARIANCES3. OUTLIERS

*

*

**

*

Page 33: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Use of transformations to control departures from normality and homogeneity of variances

assumptions

Ourworld

Pop_1990 Lpop1990

Europe 441 0.17

Islamic 1378 0.30

Newworld 1042 0.34

Greatest ratio 3.12 - 1 2 - 1

Variance

Europe

Islamic

NewWorld

GROUP

0

50

100

150

200

PO

P_1

990

Europe

Islamic

NewWorld

GROUP

-1

0

1

2

3

LPO

P19

90

0.02

0.050.080.120.18

0.3

0.45

0.6

0.75

0.840.9

0.930.96

0 50 100 150

Pop_1990

0.02

0.050.080.120.18

0.3

0.45

0.6

0.75

0.840.9

0.930.96

0.2 0.4 1 2 3 4 6 10 20 30 50 100 200

Pop_1990

Nonparametric tests

• Usually based on ranks of the data• H0: samples come from populations with

identical distributions– equal means or medians

• Don’t assume particular underlying distribution of data– normal distributions not necessary

• Equal variances and independence still required

• Typically much less powerful than parametric tests

Page 34: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Mann-Whitney-Wilcoxon test

• Calculates sum of ranks in 2 samples– should be similar if H0 is true

• Compares rank sum to sampling distribution of rank sums– distribution of rank sums when H0 true

• Equivalent to t test on data transformed to ranks

Additional slides

Page 35: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

A brief digression to re-sampling theory

Number inside Number outside 3 10 5 7 2 9 8 12 7 8

Mean 5 9.2

Traditional evaluation would probably involve a t test: another approach is re-sampling.

Treatment Number

Inside 3

Inside 5

Inside 2

Inside 8

Inside 7

Outside 10

Outside 7

Outside 9

Outside 12

Outside 8

1) Assume both treatments come from the same distribution

2) Resample groups of 5 observations, with replacement, but irrespective of treatment

Resampling

Page 36: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Treatment Number

Inside 3

Inside 5

Inside 2

Inside 8

Inside 7

Outside 10

Outside 7

Outside 9

Outside 12

Outside 8

1) Assume both treatments come from the same distribution

2) Resample groups of 5 observations, with replacement, but irrespective of treatment

Resampling

Treatment Number

Inside 3

Inside 5

Inside 2

Inside 8

Inside 7

Outside 10

Outside 7

Outside 9

Outside 12

Outside 8

1) Assume both treatments come from the same distribution

2) Resample groups of 5 observations, with replacement, but irrespective of treatment

3) Calculate mean for each group

Resampling

7.6

Page 37: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Treatment Number

Inside 3

Inside 5

Inside 2

Inside 8

Inside 7

Outside 10

Outside 7

Outside 9

Outside 12

Outside 8

1) Assume both treatments come from the same distribution

2) Resample groups of 5 observations, with replacement, but irrespective of treatment

3) Calculate mean for each group4) Repeat many times5) Calculate differences between pairs of means

(remember the null hypothesis is that there is no effect of treatment). This generates a distribution of differences.

Resampling

Mean 1 Mean 2 Difference

8 7.8 0.2

5.6 8.2 ‐2.6

6 9 ‐3

8 5 3

6 6 0

7 8 ‐1

6 6.8 ‐0.8

8 7.2 0.8

8 6.6 1.4

7 8.4 ‐1.4

6 5.4 0.6

7 6.4 0.6

6.4 6.8 ‐0.4

5 3.4 1.6

6.8 4.8 2

6.4 7.2 ‐0.8

7.2 8 ‐0.8

6.4 4.6 1.8

8.4 6 2.4

7.4 6.6 0.8

5.6 8.4 ‐2.8

8.2 6.2 2

7.8 8.4 ‐0.6

8.6 6.6 2

6 10.2 ‐4.2

6.8 5.6 1.2

6.4 7.8 ‐1.4

7.2 4.8 2.4

6.6 7.2 ‐0.6

7 5.2 1.8

6.6 9.8 ‐3.2

8.4 7.8 0.6

-10 -5 0 5 10

Difference in Means

0.0

0.1

0.2 Pro

po

rtion

pe

r Ba

r

0

50

100

150

200

250

Nu

mb

er

of O

bse

rva

tion

s 1000 observations

Distribution of differences

OK, now what?

Page 38: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Compare distribution of differences to real difference

Number inside Number outside 3 10 5 7 2 9 8 12 7 8

Mean 5 9.2

Real difference = 4.2

Estimate likelihood that real difference comes from two similar distributions

Mean 1 Mean 2 Difference

10.2 3.6 6.6 1

10 3.8 6.2 0.999

10.2 4.4 5.8 0.998

9.2 3.6 5.6 0.997

9.8 4.8 5 0.996

8.8 4.2 4.6 0.995

9.6 5.2 4.4 0.994

9.8 5.6 4.2 0.993

9.8 5.8 4 0.992

9.4 5.4 4 0.991

And on through 1000 differences

Proportion of differences less than current

Likelihood is 0.007 that distributions are the same

What are constraints of this sort of approach?

Page 39: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

T-test vs resampling

Test P-valueResampling 0.007T-test 0.0093 Why the difference?

Additional examples

Page 40: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Worked example• Fecundity of predatory

gastropods:– sample of 37 and 42 egg capsule

of Lepsiella from littorinid zone and mussel zone respectively

• Counted number of eggs per capsule

• Null hypothesis:– no difference between zones in

mean number of eggs per capsule

• Ward & Quinn (1988), qk2002 Box 3.1

• Specify H0 and choose test statistic:

H0: M = L, i.e. population mean number of eggs per capsule from both zones are equal

The t statistic is appropriate test statistic for comparing 2 population means

Page 41: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

• Specify a priori significance (probability) level ():

By convention, use = 0.05 (5%).

• Collect data, check assumptions,calculate test statistic from sample data:

Mean SD nLittorinid: 8.70 3.03 37

Mussel: 11.36 2.33 42

t = -5.39, df = 77

Page 42: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

• Compare value of t statistic to its sampling distribution, the probability distribution of statistic (for specific df) when H0 is true– what is probability of obtaining t value of 5.39 or

greater from a t distribution with 77 df?

– what is probability of taking samples with observed or greater mean difference from 2 populations with same means?

• Probability (from JMP)

P = 0.001

• Look up in t table

P < 0.05

Page 43: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

• If probability of obtaining this value or larger is less than , conclude H0 is “unlikely” to be true and reject it:– statistically significant result

• Our probability (<0.001) is less than 0.05 so reject H0:– statistically significant result.

• If probability of obtaining this value or larger is greater than , conclude that H0 is “likely” to be true and do not reject it:– statistically non-significant result

Page 44: Introduction to hypothesis testing 2015 · Introduction to hypothesis testing Review: Logic of Hypothesis Tests • Usually, we test (attempt to falsify) a null hypothesis (H0): –

Presenting results of t test

• Methods:– An independent t test was used to compare the

mean number of eggs per capsule from the two zones. Assumptions were checked with….

• Results:– The mean number of eggs per capsule from the

mussel zone was significantly greater than that from the littorinid zone (t = 5.39, df = 77, P < 0.001; see Fig. 2).