statistical testing i - uni-kiel.de

De gustibus non est disputandum

Statistical Testing I

"Take the Pepsi Challenge" was the motto of a marketing campaign by the Pepsi-Cola Company in the 1980's. A total of 100 Coca-Cola drinkers were asked to blindly taste unmarked cups of Diet Pepsi and Diet Coke, and to select their favorite. A subsequent Pepsi TV commercial stated

The Pepsi Challenge

"... in recent blind taste tests, more than half of all Diet Coke drinkers surveyed said they preferred the taste of Diet Pepsi".

Assume that, out of the 100 Diet Coke drinkers, 56 preferred Diet Pepsi. Would this result support the claim that more than half of all Diet Coke drinkers prefer Diet Pepsi to Diet Coke?

"Scientific Method"

"The validity of knowledge is tied to the probability of falsification."

"Scientific propositions can be falsified empirically. On the other

hand, unscientific claims are always 'right' and cannot be falsified at all."

Karl Popper(1902-1994)

Statistical Testing

current knowledge

falsification

new knowledge

H0 HA

New Knowledge Through Falsification

Decision Making

- Scientific questions are often formulated in the form of mutually exclusive hypotheses (i.e. H0

versus HA) about one or more population parameters.

- A statistical test is a decision rule that allows a researcher to either reject H0 ("statistically significant result") or maintain H0 on the basis of sample data.

Statistical TestingNull Hypothesis

The null hypothesis usually implies the opposite of what a researcher expects (or wishes) to be

true. It often represents conservatism or common opinion.

H0: The expected diastolic blood pressure of patients with aparticular disease equals that of control individuals.

Statistical TestingAlternative Hypothesis

The alternative hypothesis usually implies what a researcher expects (or wishes) to be true.

The alternative hypothesis is regarded as established when the null hypothesis is rejected.

HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.

Blood Pressure and Myocardial Infarction

A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control

individuals, namely 80 mmHg.

H0: µ=µ0 HA: µ≠µ0

- All information from the sample data is collapsed in a single numerical quantity, called the test statistic (T).

- The maintenance region of the test comprises all values of T for which H0 is maintained.

- The rejection region comprises all values of T for which H0 is rejected.

- The maintenance and rejection regions are demarcated by the critical values.

Statistical TestingProcedure

T

maintenance regionrejection

regionrejection

region

Statistical Testing

critical value critical value

Procedure

H0

T in maintenance region

T in rejection region

maintain H0

reject H0

maintain H0 correcttype II error

reject H0 correct

decision

truth

Statistical TestingPossible Errors

type Ierror

H0 HA

A type I error is made when H0 is rejected although it is true.

A type II error is made when H0 is maintained although it is wrong.

Significance Level

- A statistical test has significance level α if the probability of making a type I error is at most α.

- Before data collection, the critical values of a test are chosen such that the test has a pre-specified significance level (e.g. 0.05).

- The choice of critical values depends upon the pre-specified significance level and the nature of H0, but not the nature of HA.

Statistical Testing

The significance level of a test of H0 versus HA limits the probability of erroneously claiming a difference between the expected DBP of MI patients and a reference value.

H0: µ=µ0 HA: µ≠µ0


Statistical TestingCritical Values

c1-α/2cα/2

α/2 α/2

T

H0

Procedure

One-sample t-Test

00 :H µ=µ 0A :H µ≠µ

X∼N(µ,σ2) both parameters unknown

T≤tα/2,n-1 or T≥t1-α/2,n-1=-tα/2,n-1

Hypotheses

Test Statistic

RandomVariable

RejectionRegion

'degrees of freedom' (ν)

n/S

XT 0

µ−=


A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control individuals, namely 80 mmHg. The following DBP values

were observed in 9 patients with MI:

92, 87, 79, 87, 99, 82, 74, 83, 103

mmHg 33.87=x

306.2t2.354t 8,975.0 =≥=

mmHg34.9 s =

Quantiles

t-Distribution

Statistical TestingPower

- The probability of making a type II error (i.e. to adhere

to H0 if, in fact, HA is true) is designated as ββββ.

- The converse probability 1-β, i.e. the probability of avoiding a type II error, is called the power of a test.

- The power of a statistical test depends upon the nature of HA, but not the nature of H0.

maintain H0

H0 HA

≥1-α ββββ

reject H0

decision

truth

≤αααα

Error Probabilities

1-β

Statistical Testing

Critical Values

HA

ββββ

c1-α/2cα/2

α/2 α/2

T

H0

Statistical Testing


µ Pµ(T≤-2.306, T≥2.306)

80

81 (79)

85 (75)

90 (70)

0.050

0.058

0.262

0.748

α=0.05

1-β1-β1-β

σ=10 mmHg

H0: µ=80 HA: µ≠80

H0

HA

c1-α/2cα/2

α/2 α/2

T

H0 HA

ββββ

Statistical TestingEffect Size and Power

Statistical Testing

H0 HA

c1-α'/2cα'/2

α'/2 α'/2

T

ββββ'

Significance and Power

Quantiles

t-Distribution


µ Pµ(T≤-2.896, T≥2.896)

80

81 (79)

85 (75)

90 (70)

1-β1-β1-β

H0: µ=80 HA: µ≠80

0.050

0.058

0.262

0.748

0.020

0.024

0.143

0.566

H0

HA

σ=10 mmHg

α=0.02

- reflects a lack of prior knowledge about realistic alternatives to the null hypothesis

- reads "is different from" or "deviates from"

A two-sided alternative hypothesis does not specify a direction of the expected findings and usually

Alternative HypothesesTwo-Sided

HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.

T

HA

c1-α/2cα/2

α/2 α/2

H0

ββββ

HA (?)

Alternative HypothesesTwo-Sided

α

HA

c1-αT

H0

ββββ

Alternative HypothesesOne-Sided

- reflects common sense or suitable knowledge from previous scientific experiments

- reads "is larger than", "is heavier than" or "is longer than"

A one-sided alternative hypothesis specifies the direction of the expected findings and usually

Alternative HypothesesOne-Sided

HA: The expected diastolic blood pressure of patients with aparticular disease exceeds that of control individuals.

Clinical Studies

In a clinical study, researchers often wish to compare the respective probability of therapeutic success between a

new medication (πM) and placebo (πP).

HA: πM>πP H0: πM≤πP

significance level upper limit for the probability to declare a useless medication effective

power probability to recognise an effective medication as effective

One-Sided

One-sample t-Test

00 :H µ≥µ 0A :H µ<µ

X∼N(µ,σ2) both parameters unknown

00 :H µ≤µ 0A :H µ>µ

T≤tα,n-1

T≥t1-α,n-1

or

or

Hypotheses

Test Statistic

RandomVariable

RejectionRegion

n/S

XT 0

µ−=

Quantiles

t-Distribution


H0: µ≤80 HA: µ>80

µ

80

75

85

90

1-β1-β

σ=10 mmHg

0.262

0.748

Pµ(T≥1.860)

0.050

0.005

0.392

0.862

α=0.02

H0

HA

Pµ(|T|≥2.306)

Which sample size, n, is required to detect, at significance level α, a given effect

µ-µ0 with power 1-β?

2

0

11 zzn

µ−µ+

⋅σ≥ β−α−

Sample Size

one-sided two-sided

2

0

12/1 zzn

µ−µ+

⋅σ≥ β−α−

One-sample t-Test

1 2 3 4 5

10

100

1000

Sample Size (one-sided)

σ = 10α = 0.05

1-β = 0.90, 0.80, 0.70

µ – µ0

n

One-sample t-Test

1 2 3 4 5

10

100

1000

σ = 10α = 0.05

1-β = 0.90, 0.80, 0.70

µ – µ0

n

Sample Size (two-sided)

One-sample t-Test

H0: Pepsi does not taste better than Coke (π≤0.5). HA: Pepsi tastes better than Coke (π>0.5).

The Pepsi Challenge

c0.05 = 59

Conclusion: The number of Diet Coke drinkers who preferred Diet Pepsi (i.e. 56) was not significantly higher than the

number who preferred Diet Coke (i.e. 44).

( ) 044.05.05.0i

10059TP

100

59i

i100i =⋅⋅

=≥ ∑=

−

( ) 067.05.05.0i

10058TP

100

58i

i100i =⋅⋅

=≥ ∑=

−

"No test based upon the theory of probability can by itself provide any valuable evidence of the truth or

falsehood of a hypothesis."

Neyman J, Pearson E (1933) Phil Trans R Soc A, 231:289-337

Egon Pearson (1895-1980)

Jerzy Neyman (1894-1981)

Statistics and Truth

"It would, therefore, add greatly to the clarity with which the tests of significance are regarded if it were generally understood that the tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but that they are never capable of establishing them as certainly true."

Ronald A. Fisher(1890-1962)

Statistics and Truth

p

tobs

T

H0

The p-value is the probability of obtaining the observed, or an even less probable, value of T than tobs when the

null hypothesis is correct.

p Value

p ValueEvidence Against H0

0.1

0.01

0.0001

0.001

evidence

1.0

p value

none

"moderate"

"strong"

"very strong"

H0: µ=80 HA: µ≠80H0: µ≤80 HA: µ>80

p = P(T>2.354)= 0.023

( ) 1356.05.05.0i

10056XPp

100

56i

i100i =⋅⋅

=≥= ∑=

−

p = P(|T|>2.354) = 0.046

H0: π≤0.5 HA: π>0.5

The Pepsi Challenge


Pravastatin and Cardiovascular Disease

major cardiovascularoutcome

non-fatal MI or death from CHD

CABG or PTCA

Stroke

0.132

0.188

0.038

placebo(n=2078)

0.102

0.141

0.026

Pravastatin(n=2081) p

0.003

<0.001

0.030

CAGB: coronary artery bypass grafting, PTCA: percutaneous transluminal coronary angioplasty

Sacks FM et al. (1996) N Engl J Med 335: 1001–1009

Negative findings are as important as positive findings because they reduce ignorance and may suggest interesting new hypotheses and lines of investigation. They are also necessary to guide future research in the field of interest

(publication bias).

Negative Findings

Summary

- Statistical problems are usually defined as mutally exclusive hypotheses about population parameters.

- Statistical tests are decision rules to either maintain or reject a given null hypothesis on the basis of sample data.

- When performing a statistical test, two types of error can occur through falsely rejecting either the null hypothesis or the alternative hypothesis.

- The probability of making a type I error is limited by the significance level of the test; the probability of avoiding a type II error is called the power of the test.

- The p value is a measure of the discrepancy between the data and the null hypothesis.

statistical testing i - uni-kiel.de

Documents