statistical testing i - uni-kiel.de

45
De gustibus non est disputandum Statistical Testing I

Upload: others

Post on 16-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Testing I - uni-kiel.de

De gustibus non est disputandum

Statistical Testing I

Page 2: Statistical Testing I - uni-kiel.de

"Take the Pepsi Challenge" was the motto of a marketing campaign by the Pepsi-Cola Company in the 1980's. A total of 100 Coca-Cola drinkers were asked to blindly taste unmarked cups of Diet Pepsi and Diet Coke, and to select their favorite. A subsequent Pepsi TV commercial stated

The Pepsi Challenge

"... in recent blind taste tests, more than half of all Diet Coke drinkers surveyed said they preferred the taste of Diet Pepsi".

Assume that, out of the 100 Diet Coke drinkers, 56 preferred Diet Pepsi. Would this result support the claim that more than half of all Diet Coke drinkers prefer Diet Pepsi to Diet Coke?

Page 3: Statistical Testing I - uni-kiel.de

"Scientific Method"

"The validity of knowledge is tied to the probability of falsification."

"Scientific propositions can be falsified empirically. On the other

hand, unscientific claims are always 'right' and cannot be falsified at all."

Karl Popper(1902-1994)

Page 4: Statistical Testing I - uni-kiel.de

Statistical Testing

current knowledge

falsification

new knowledge

H0 HA

New Knowledge Through Falsification

Page 5: Statistical Testing I - uni-kiel.de

Decision Making

- Scientific questions are often formulated in the form of mutually exclusive hypotheses (i.e. H0

versus HA) about one or more population parameters.

- A statistical test is a decision rule that allows a researcher to either reject H0 ("statistically significant result") or maintain H0 on the basis of sample data.

Page 6: Statistical Testing I - uni-kiel.de

Statistical TestingNull Hypothesis

The null hypothesis usually implies the opposite of what a researcher expects (or wishes) to be

true. It often represents conservatism or common opinion.

H0: The expected diastolic blood pressure of patients with aparticular disease equals that of control individuals.

Page 7: Statistical Testing I - uni-kiel.de

Statistical TestingAlternative Hypothesis

The alternative hypothesis usually implies what a researcher expects (or wishes) to be true.

The alternative hypothesis is regarded as established when the null hypothesis is rejected.

HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.

Page 8: Statistical Testing I - uni-kiel.de

Blood Pressure and Myocardial Infarction

A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control

individuals, namely 80 mmHg.

H0: µ=µ0 HA: µ≠µ0

Page 9: Statistical Testing I - uni-kiel.de

- All information from the sample data is collapsed in a single numerical quantity, called the test statistic (T).

- The maintenance region of the test comprises all values of T for which H0 is maintained.

- The rejection region comprises all values of T for which H0 is rejected.

- The maintenance and rejection regions are demarcated by the critical values.

Statistical TestingProcedure

Page 10: Statistical Testing I - uni-kiel.de

T

maintenance regionrejection

regionrejection

region

Statistical Testing

critical value critical value

Procedure

H0

T in maintenance region

T in rejection region

maintain H0

reject H0

Page 11: Statistical Testing I - uni-kiel.de

maintain H0 correcttype II error

reject H0 correct

decision

truth

Statistical TestingPossible Errors

type Ierror

H0 HA

A type I error is made when H0 is rejected although it is true.

A type II error is made when H0 is maintained although it is wrong.

Page 12: Statistical Testing I - uni-kiel.de

Significance Level

- A statistical test has significance level α if the probability of making a type I error is at most α.

- Before data collection, the critical values of a test are chosen such that the test has a pre-specified significance level (e.g. 0.05).

- The choice of critical values depends upon the pre-specified significance level and the nature of H0, but not the nature of HA.

Statistical Testing

Page 13: Statistical Testing I - uni-kiel.de

The significance level of a test of H0 versus HA limits the probability of erroneously claiming a difference between the expected DBP of MI patients and a reference value.

H0: µ=µ0 HA: µ≠µ0

Blood Pressure and Myocardial Infarction

Page 14: Statistical Testing I - uni-kiel.de

Statistical TestingCritical Values

c1-α/2cα/2

α/2 α/2

T

H0

Page 15: Statistical Testing I - uni-kiel.de

Procedure

One-sample t-Test

00 :H µ=µ 0A :H µ≠µ

X∼N(µ,σ2) both parameters unknown

T≤tα/2,n-1 or T≥t1-α/2,n-1=-tα/2,n-1

Hypotheses

Test Statistic

RandomVariable

RejectionRegion

'degrees of freedom' (ν)

n/S

XT 0

µ−=

Page 16: Statistical Testing I - uni-kiel.de

Blood Pressure and Myocardial Infarction

A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control individuals, namely 80 mmHg. The following DBP values

were observed in 9 patients with MI:

92, 87, 79, 87, 99, 82, 74, 83, 103

mmHg 33.87=x

306.2t2.354t 8,975.0 =≥=

mmHg34.9 s =

Page 17: Statistical Testing I - uni-kiel.de

Quantiles

t-Distribution

Page 18: Statistical Testing I - uni-kiel.de

Statistical TestingPower

- The probability of making a type II error (i.e. to adhere

to H0 if, in fact, HA is true) is designated as ββββ.

- The converse probability 1-β, i.e. the probability of avoiding a type II error, is called the power of a test.

- The power of a statistical test depends upon the nature of HA, but not the nature of H0.

Page 19: Statistical Testing I - uni-kiel.de

maintain H0

H0 HA

≥1-α ββββ

reject H0

decision

truth

≤αααα

Error Probabilities

1-β

Statistical Testing

Page 20: Statistical Testing I - uni-kiel.de

Critical Values

HA

ββββ

c1-α/2cα/2

α/2 α/2

T

H0

Statistical Testing

Page 21: Statistical Testing I - uni-kiel.de

Blood Pressure and Myocardial Infarction

µ Pµ(T≤-2.306, T≥2.306)

80

81 (79)

85 (75)

90 (70)

0.050

0.058

0.262

0.748

α=0.05

1-β1-β1-β

σ=10 mmHg

H0: µ=80 HA: µ≠80

H0

HA

Page 22: Statistical Testing I - uni-kiel.de

c1-α/2cα/2

α/2 α/2

T

H0 HA

ββββ

Statistical TestingEffect Size and Power

Page 23: Statistical Testing I - uni-kiel.de

Statistical Testing

H0 HA

c1-α'/2cα'/2

α'/2 α'/2

T

ββββ'

Significance and Power

Page 24: Statistical Testing I - uni-kiel.de

Quantiles

t-Distribution

Page 25: Statistical Testing I - uni-kiel.de

Blood Pressure and Myocardial Infarction

µ Pµ(T≤-2.896, T≥2.896)

80

81 (79)

85 (75)

90 (70)

1-β1-β1-β

H0: µ=80 HA: µ≠80

0.050

0.058

0.262

0.748

0.020

0.024

0.143

0.566

H0

HA

σ=10 mmHg

α=0.02

Page 26: Statistical Testing I - uni-kiel.de

- reflects a lack of prior knowledge about realistic alternatives to the null hypothesis

- reads "is different from" or "deviates from"

A two-sided alternative hypothesis does not specify a direction of the expected findings and usually

Alternative HypothesesTwo-Sided

HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.

Page 27: Statistical Testing I - uni-kiel.de

T

HA

c1-α/2cα/2

α/2 α/2

H0

ββββ

HA (?)

Alternative HypothesesTwo-Sided

Page 28: Statistical Testing I - uni-kiel.de

α

HA

c1-αT

H0

ββββ

Alternative HypothesesOne-Sided

Page 29: Statistical Testing I - uni-kiel.de

- reflects common sense or suitable knowledge from previous scientific experiments

- reads "is larger than", "is heavier than" or "is longer than"

A one-sided alternative hypothesis specifies the direction of the expected findings and usually

Alternative HypothesesOne-Sided

HA: The expected diastolic blood pressure of patients with aparticular disease exceeds that of control individuals.

Page 30: Statistical Testing I - uni-kiel.de

Clinical Studies

In a clinical study, researchers often wish to compare the respective probability of therapeutic success between a

new medication (πM) and placebo (πP).

HA: πM>πP H0: πM≤πP

significance level upper limit for the probability to declare a useless medication effective

power probability to recognise an effective medication as effective

Page 31: Statistical Testing I - uni-kiel.de

One-Sided

One-sample t-Test

00 :H µ≥µ 0A :H µ<µ

X∼N(µ,σ2) both parameters unknown

00 :H µ≤µ 0A :H µ>µ

T≤tα,n-1

T≥t1-α,n-1

or

or

Hypotheses

Test Statistic

RandomVariable

RejectionRegion

n/S

XT 0

µ−=

Page 32: Statistical Testing I - uni-kiel.de

Quantiles

t-Distribution

Page 33: Statistical Testing I - uni-kiel.de

Blood Pressure and Myocardial Infarction

H0: µ≤80 HA: µ>80

µ

80

75

85

90

1-β1-β

σ=10 mmHg

0.262

0.748

Pµ(T≥1.860)

0.050

0.005

0.392

0.862

α=0.02

H0

HA

Pµ(|T|≥2.306)

Page 34: Statistical Testing I - uni-kiel.de

Which sample size, n, is required to detect, at significance level α, a given effect

µ-µ0 with power 1-β?

2

0

11 zzn

µ−µ+

⋅σ≥ β−α−

Sample Size

one-sided two-sided

2

0

12/1 zzn

µ−µ+

⋅σ≥ β−α−

One-sample t-Test

Page 35: Statistical Testing I - uni-kiel.de

1 2 3 4 5

10

100

1000

Sample Size (one-sided)

σ = 10α = 0.05

1-β = 0.90, 0.80, 0.70

µ – µ0

n

One-sample t-Test

Page 36: Statistical Testing I - uni-kiel.de

1 2 3 4 5

10

100

1000

σ = 10α = 0.05

1-β = 0.90, 0.80, 0.70

µ – µ0

n

Sample Size (two-sided)

One-sample t-Test

Page 37: Statistical Testing I - uni-kiel.de

H0: Pepsi does not taste better than Coke (π≤0.5). HA: Pepsi tastes better than Coke (π>0.5).

The Pepsi Challenge

c0.05 = 59

Conclusion: The number of Diet Coke drinkers who preferred Diet Pepsi (i.e. 56) was not significantly higher than the

number who preferred Diet Coke (i.e. 44).

( ) 044.05.05.0i

10059TP

100

59i

i100i =⋅⋅

=≥ ∑=

( ) 067.05.05.0i

10058TP

100

58i

i100i =⋅⋅

=≥ ∑=

Page 38: Statistical Testing I - uni-kiel.de

"No test based upon the theory of probability can by itself provide any valuable evidence of the truth or

falsehood of a hypothesis."

Neyman J, Pearson E (1933) Phil Trans R Soc A, 231:289-337

Egon Pearson (1895-1980)

Jerzy Neyman (1894-1981)

Statistics and Truth

Page 39: Statistical Testing I - uni-kiel.de

"It would, therefore, add greatly to the clarity with which the tests of significance are regarded if it were generally understood that the tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but that they are never capable of establishing them as certainly true."

Ronald A. Fisher(1890-1962)

Statistics and Truth

Page 40: Statistical Testing I - uni-kiel.de

p

tobs

T

H0

The p-value is the probability of obtaining the observed, or an even less probable, value of T than tobs when the

null hypothesis is correct.

p Value

Page 41: Statistical Testing I - uni-kiel.de

p ValueEvidence Against H0

0.1

0.01

0.0001

0.001

evidence

1.0

p value

none

"moderate"

"strong"

"very strong"

Page 42: Statistical Testing I - uni-kiel.de

H0: µ=80 HA: µ≠80H0: µ≤80 HA: µ>80

p = P(T>2.354)= 0.023

( ) 1356.05.05.0i

10056XPp

100

56i

i100i =⋅⋅

=≥= ∑=

p = P(|T|>2.354) = 0.046

H0: π≤0.5 HA: π>0.5

The Pepsi Challenge

Blood Pressure and Myocardial Infarction

Page 43: Statistical Testing I - uni-kiel.de

Pravastatin and Cardiovascular Disease

major cardiovascularoutcome

non-fatal MI or death from CHD

CABG or PTCA

Stroke

0.132

0.188

0.038

placebo(n=2078)

0.102

0.141

0.026

Pravastatin(n=2081) p

0.003

<0.001

0.030

CAGB: coronary artery bypass grafting, PTCA: percutaneous transluminal coronary angioplasty

Sacks FM et al. (1996) N Engl J Med 335: 1001–1009

Page 44: Statistical Testing I - uni-kiel.de

Negative findings are as important as positive findings because they reduce ignorance and may suggest interesting new hypotheses and lines of investigation. They are also necessary to guide future research in the field of interest

(publication bias).

Negative Findings

Page 45: Statistical Testing I - uni-kiel.de

Summary

- Statistical problems are usually defined as mutally exclusive hypotheses about population parameters.

- Statistical tests are decision rules to either maintain or reject a given null hypothesis on the basis of sample data.

- When performing a statistical test, two types of error can occur through falsely rejecting either the null hypothesis or the alternative hypothesis.

- The probability of making a type I error is limited by the significance level of the test; the probability of avoiding a type II error is called the power of the test.

- The p value is a measure of the discrepancy between the data and the null hypothesis.