statistical testing i - uni-kiel.de
TRANSCRIPT
De gustibus non est disputandum
Statistical Testing I
"Take the Pepsi Challenge" was the motto of a marketing campaign by the Pepsi-Cola Company in the 1980's. A total of 100 Coca-Cola drinkers were asked to blindly taste unmarked cups of Diet Pepsi and Diet Coke, and to select their favorite. A subsequent Pepsi TV commercial stated
The Pepsi Challenge
"... in recent blind taste tests, more than half of all Diet Coke drinkers surveyed said they preferred the taste of Diet Pepsi".
Assume that, out of the 100 Diet Coke drinkers, 56 preferred Diet Pepsi. Would this result support the claim that more than half of all Diet Coke drinkers prefer Diet Pepsi to Diet Coke?
"Scientific Method"
"The validity of knowledge is tied to the probability of falsification."
"Scientific propositions can be falsified empirically. On the other
hand, unscientific claims are always 'right' and cannot be falsified at all."
Karl Popper(1902-1994)
Statistical Testing
current knowledge
falsification
new knowledge
H0 HA
New Knowledge Through Falsification
Decision Making
- Scientific questions are often formulated in the form of mutually exclusive hypotheses (i.e. H0
versus HA) about one or more population parameters.
- A statistical test is a decision rule that allows a researcher to either reject H0 ("statistically significant result") or maintain H0 on the basis of sample data.
Statistical TestingNull Hypothesis
The null hypothesis usually implies the opposite of what a researcher expects (or wishes) to be
true. It often represents conservatism or common opinion.
H0: The expected diastolic blood pressure of patients with aparticular disease equals that of control individuals.
Statistical TestingAlternative Hypothesis
The alternative hypothesis usually implies what a researcher expects (or wishes) to be true.
The alternative hypothesis is regarded as established when the null hypothesis is rejected.
HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.
Blood Pressure and Myocardial Infarction
A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control
individuals, namely 80 mmHg.
H0: µ=µ0 HA: µ≠µ0
- All information from the sample data is collapsed in a single numerical quantity, called the test statistic (T).
- The maintenance region of the test comprises all values of T for which H0 is maintained.
- The rejection region comprises all values of T for which H0 is rejected.
- The maintenance and rejection regions are demarcated by the critical values.
Statistical TestingProcedure
T
maintenance regionrejection
regionrejection
region
Statistical Testing
critical value critical value
Procedure
H0
T in maintenance region
T in rejection region
maintain H0
reject H0
maintain H0 correcttype II error
reject H0 correct
decision
truth
Statistical TestingPossible Errors
type Ierror
H0 HA
A type I error is made when H0 is rejected although it is true.
A type II error is made when H0 is maintained although it is wrong.
Significance Level
- A statistical test has significance level α if the probability of making a type I error is at most α.
- Before data collection, the critical values of a test are chosen such that the test has a pre-specified significance level (e.g. 0.05).
- The choice of critical values depends upon the pre-specified significance level and the nature of H0, but not the nature of HA.
Statistical Testing
The significance level of a test of H0 versus HA limits the probability of erroneously claiming a difference between the expected DBP of MI patients and a reference value.
H0: µ=µ0 HA: µ≠µ0
Blood Pressure and Myocardial Infarction
Statistical TestingCritical Values
c1-α/2cα/2
α/2 α/2
T
H0
Procedure
One-sample t-Test
00 :H µ=µ 0A :H µ≠µ
X∼N(µ,σ2) both parameters unknown
T≤tα/2,n-1 or T≥t1-α/2,n-1=-tα/2,n-1
Hypotheses
Test Statistic
RandomVariable
RejectionRegion
'degrees of freedom' (ν)
n/S
XT 0
µ−=
Blood Pressure and Myocardial Infarction
A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control individuals, namely 80 mmHg. The following DBP values
were observed in 9 patients with MI:
92, 87, 79, 87, 99, 82, 74, 83, 103
mmHg 33.87=x
306.2t2.354t 8,975.0 =≥=
mmHg34.9 s =
Quantiles
t-Distribution
Statistical TestingPower
- The probability of making a type II error (i.e. to adhere
to H0 if, in fact, HA is true) is designated as ββββ.
- The converse probability 1-β, i.e. the probability of avoiding a type II error, is called the power of a test.
- The power of a statistical test depends upon the nature of HA, but not the nature of H0.
maintain H0
H0 HA
≥1-α ββββ
reject H0
decision
truth
≤αααα
Error Probabilities
1-β
Statistical Testing
Critical Values
HA
ββββ
c1-α/2cα/2
α/2 α/2
T
H0
Statistical Testing
Blood Pressure and Myocardial Infarction
µ Pµ(T≤-2.306, T≥2.306)
80
81 (79)
85 (75)
90 (70)
0.050
0.058
0.262
0.748
α=0.05
1-β1-β1-β
σ=10 mmHg
H0: µ=80 HA: µ≠80
H0
HA
c1-α/2cα/2
α/2 α/2
T
H0 HA
ββββ
Statistical TestingEffect Size and Power
Statistical Testing
H0 HA
c1-α'/2cα'/2
α'/2 α'/2
T
ββββ'
Significance and Power
Quantiles
t-Distribution
Blood Pressure and Myocardial Infarction
µ Pµ(T≤-2.896, T≥2.896)
80
81 (79)
85 (75)
90 (70)
1-β1-β1-β
H0: µ=80 HA: µ≠80
0.050
0.058
0.262
0.748
0.020
0.024
0.143
0.566
H0
HA
σ=10 mmHg
α=0.02
- reflects a lack of prior knowledge about realistic alternatives to the null hypothesis
- reads "is different from" or "deviates from"
A two-sided alternative hypothesis does not specify a direction of the expected findings and usually
Alternative HypothesesTwo-Sided
HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.
T
HA
c1-α/2cα/2
α/2 α/2
H0
ββββ
HA (?)
Alternative HypothesesTwo-Sided
α
HA
c1-αT
H0
ββββ
Alternative HypothesesOne-Sided
- reflects common sense or suitable knowledge from previous scientific experiments
- reads "is larger than", "is heavier than" or "is longer than"
A one-sided alternative hypothesis specifies the direction of the expected findings and usually
Alternative HypothesesOne-Sided
HA: The expected diastolic blood pressure of patients with aparticular disease exceeds that of control individuals.
Clinical Studies
In a clinical study, researchers often wish to compare the respective probability of therapeutic success between a
new medication (πM) and placebo (πP).
HA: πM>πP H0: πM≤πP
significance level upper limit for the probability to declare a useless medication effective
power probability to recognise an effective medication as effective
One-Sided
One-sample t-Test
00 :H µ≥µ 0A :H µ<µ
X∼N(µ,σ2) both parameters unknown
00 :H µ≤µ 0A :H µ>µ
T≤tα,n-1
T≥t1-α,n-1
or
or
Hypotheses
Test Statistic
RandomVariable
RejectionRegion
n/S
XT 0
µ−=
Quantiles
t-Distribution
Blood Pressure and Myocardial Infarction
H0: µ≤80 HA: µ>80
µ
80
75
85
90
1-β1-β
σ=10 mmHg
0.262
0.748
Pµ(T≥1.860)
0.050
0.005
0.392
0.862
α=0.02
H0
HA
Pµ(|T|≥2.306)
Which sample size, n, is required to detect, at significance level α, a given effect
µ-µ0 with power 1-β?
2
0
11 zzn
µ−µ+
⋅σ≥ β−α−
Sample Size
one-sided two-sided
2
0
12/1 zzn
µ−µ+
⋅σ≥ β−α−
One-sample t-Test
1 2 3 4 5
10
100
1000
Sample Size (one-sided)
σ = 10α = 0.05
1-β = 0.90, 0.80, 0.70
µ – µ0
n
One-sample t-Test
1 2 3 4 5
10
100
1000
σ = 10α = 0.05
1-β = 0.90, 0.80, 0.70
µ – µ0
n
Sample Size (two-sided)
One-sample t-Test
H0: Pepsi does not taste better than Coke (π≤0.5). HA: Pepsi tastes better than Coke (π>0.5).
The Pepsi Challenge
c0.05 = 59
Conclusion: The number of Diet Coke drinkers who preferred Diet Pepsi (i.e. 56) was not significantly higher than the
number who preferred Diet Coke (i.e. 44).
( ) 044.05.05.0i
10059TP
100
59i
i100i =⋅⋅
=≥ ∑=
−
( ) 067.05.05.0i
10058TP
100
58i
i100i =⋅⋅
=≥ ∑=
−
"No test based upon the theory of probability can by itself provide any valuable evidence of the truth or
falsehood of a hypothesis."
Neyman J, Pearson E (1933) Phil Trans R Soc A, 231:289-337
Egon Pearson (1895-1980)
Jerzy Neyman (1894-1981)
Statistics and Truth
"It would, therefore, add greatly to the clarity with which the tests of significance are regarded if it were generally understood that the tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but that they are never capable of establishing them as certainly true."
Ronald A. Fisher(1890-1962)
Statistics and Truth
p
tobs
T
H0
The p-value is the probability of obtaining the observed, or an even less probable, value of T than tobs when the
null hypothesis is correct.
p Value
p ValueEvidence Against H0
0.1
0.01
0.0001
0.001
evidence
1.0
p value
none
"moderate"
"strong"
"very strong"
H0: µ=80 HA: µ≠80H0: µ≤80 HA: µ>80
p = P(T>2.354)= 0.023
( ) 1356.05.05.0i
10056XPp
100
56i
i100i =⋅⋅
=≥= ∑=
−
p = P(|T|>2.354) = 0.046
H0: π≤0.5 HA: π>0.5
The Pepsi Challenge
Blood Pressure and Myocardial Infarction
Pravastatin and Cardiovascular Disease
major cardiovascularoutcome
non-fatal MI or death from CHD
CABG or PTCA
Stroke
0.132
0.188
0.038
placebo(n=2078)
0.102
0.141
0.026
Pravastatin(n=2081) p
0.003
<0.001
0.030
CAGB: coronary artery bypass grafting, PTCA: percutaneous transluminal coronary angioplasty
Sacks FM et al. (1996) N Engl J Med 335: 1001–1009
Negative findings are as important as positive findings because they reduce ignorance and may suggest interesting new hypotheses and lines of investigation. They are also necessary to guide future research in the field of interest
(publication bias).
Negative Findings
Summary
- Statistical problems are usually defined as mutally exclusive hypotheses about population parameters.
- Statistical tests are decision rules to either maintain or reject a given null hypothesis on the basis of sample data.
- When performing a statistical test, two types of error can occur through falsely rejecting either the null hypothesis or the alternative hypothesis.
- The probability of making a type I error is limited by the significance level of the test; the probability of avoiding a type II error is called the power of the test.
- The p value is a measure of the discrepancy between the data and the null hypothesis.