can i have a p-value for that, please? christopher j. miller associate director, biostatistics...
TRANSCRIPT
![Page 1: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/1.jpg)
Can I Have a P-value For That, Please?
Christopher J. Miller
Associate Director, Biostatistics
AstraZeneca, LP
![Page 2: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/2.jpg)
2
Outline
DefinitionsQuizHypothesis testing and Power
no mathphilosophy
Things that make no sense to metesting for differences at baselinepost-hoc power calculations
![Page 3: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/3.jpg)
3
Biostatistics
A term which ought to mean “statistics for biology” but is now increasingly reserved for medical statistics.
S. Senn
![Page 4: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/4.jpg)
4
Biostatistician
One who has neither the intellect for mathematics nor the commitment for medicine, but likes to dabble in both.
S. Senn
![Page 5: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/5.jpg)
5
Biometrics
An alternative name for statistics, especially if applied to the life sciences. The advantage of the name compared to statistics is that the general public does not understand what it means, whereas with statistics the general public thinks it understands what it means.
S. Senn
![Page 6: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/6.jpg)
6
Quiz time!
![Page 7: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/7.jpg)
7
A 95% Confidence Interval of(5 to 11) for the population mean implies:
1. The probability that the true mean is between5 and 11 is 0.95 (95%).
2. Ninety-five percent of the time (for 95% of samples) the interval will include the true mean. Five to 11 is one such interval.
3. Five to 11 covers 95% of the possible values of the true mean.
![Page 8: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/8.jpg)
8
A 95% Confidence Interval of(5 to 11) for the population mean implies:
1. The probability that the true mean is between5 and 11 is 0.95 (95%).
2. 95% of the time (for 95% of samples) the interval will include the true mean.Five to 11 is one such interval.
3. Five to 11 covers 95% of the possible values of the true mean.
![Page 9: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/9.jpg)
9
A p-value < 0.05:
1. Assuming the treatment is not effective, there
is less than a 5% chance of obtaining such results.
2. The observed effect from the treatment isso large that there is less than a 5% chancethat the treatment truly is no better than placebo.
3. On average, fewer than 5% of placebo-treated patients will do better than active-treated patients.
![Page 10: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/10.jpg)
10
A p-value < 0.05:
1. Assuming the treatment is not effective, there
is less than a 5% chance of obtaining such results.
2. The observed effect from the treatment isso large that there is less than a 5% chancethat the treatment truly is no better than placebo.
3. On average, fewer than 5% of placebo-treated patients will do better than active-treated patients.
![Page 11: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/11.jpg)
11
Thoughts
How many people got both correct?
P-values and confidence intervals are often misinterpreted.P-values and confidence intervals do not necessarily answer a relevant question.Misunderstandings lead us to present analyses that are nonsensical.
![Page 12: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/12.jpg)
12
Hypothesis Testing
Question: Is the average effect of active treatment better than that of placebo?
Null Hypothesis: Assume that there is no effect.Ho : A = P or A - P = 0
Alternative Hypothesis Ha : A > P or A - P > 0
![Page 13: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/13.jpg)
13
Hypothesis testing (cont’d)
Assume Ho is true (true means equal)Choose an analysis model and study designPower studyRun an experimentCollect data
See if you have enough evidence to reject Ho
Ho not false until proven falseHo is never proven to be true“not guilty, until proven guilty”
![Page 14: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/14.jpg)
14
Hypothesis Testing Essentials
Population
Parameters
Probabilities are related to long-run relative frequency of events in a series of trials
![Page 15: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/15.jpg)
15
Essentials: Population
“A largely theoretical concept which refers to a (sometimes infinite or undefined) totality of observations of interest.”
Example: All potential patients who might use a new drug.
![Page 16: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/16.jpg)
16
Essentials: Parameters
Used in conjunction with an underlying population“A function of the values of this population which define their distribution”Unobservable and unknowable
Nature, God, Truth
Example: Population mean or varianceWhen similar functions are calculated from a sample, they are called “statistics”.
![Page 17: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/17.jpg)
17
Essentials: Probabilities and decisions
Parameters cannot have a probabilityThey are either equal to some value or not
Hypotheses cannot have a probabilityThey are either true or false
A decision to accept or reject a hypothesis is made indirectly using the probability of the evidence given the hypothesis, rather than vice versa.
Errors in decisions are controlled, on average, based on an assumed series of results.
![Page 18: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/18.jpg)
18
A 95% Confidence Interval of(5 to 11) for the population mean implies:
1. The probability that the true mean is between5 and 11 is 0.95 (95%).
2. 95% of the time (for 95% of samples) the interval will include the true mean.This is one such interval.
3. Five to eleven covers 95% of the possible values of the true mean.
![Page 19: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/19.jpg)
19
A p-value < 0.05:
1. Assuming the treatment is not effective, there
is less than a 5% chance of obtaining such results.
2. The observed effect from the treatment isso large that there is less than a 5% chancethat the treatment truly is no better than placebo.
3. On average, only 5% of placebo-treated patients
will do better than active-treated patients.
![Page 20: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/20.jpg)
Things that make no sense to me #1
![Page 21: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/21.jpg)
21
Baseline differences
You’re reporting on a randomized, parallel-group trial. Active versus placebo.To your dismay, the groups appear to have been “different” at baseline
Mean (SD): 23 (2.3) versus 32 (2.7)
We need a p-value to tell us “how different” they are!
P<0.05 tells us the study is uninterpretable, right?
![Page 22: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/22.jpg)
22
The test
What is the “deep structure”?Population?
Parameter of interest?
Long-term process?
Decision rule’s meaning?
Point?
![Page 23: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/23.jpg)
23
Problem
Test appears to say something about the adequacy of the given allocation, whereas it can only be a test of the allocation procedure.
![Page 24: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/24.jpg)
24
What are we testing?
Null HypothesisThe process of randomization will result in balance across treatment groups.
PopulationAll possible random assignments of patients to treatment.
![Page 25: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/25.jpg)
25
What are we saying when p<.05?
When comparing 2 drugs after treatment….the difference is rather large to be caused by chance alone, therefore chance must not be the whole explanation.Infer that the drugs have an effect on outcome.Null hypothesis is not true.
When comparing 2 drugs before treatment….the difference is rather large to be caused by chance alone, therefore chance must not be the whole explanation.Infer that randomization has not taken place???…fraud???Type I error???…inadequate sample???
![Page 26: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/26.jpg)
26
Bottom line
The underlying problem is that randomization is, by definition, a chance mechanism!
So, no matter what the p-value is – unless we are willing to accept tampering as a possibility – we need to conclude that something unusual has happened because of CHANCE alone!
![Page 27: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/27.jpg)
27
Further silliness
Baseline imbalance does not necessarily mean that meaningful treatment inferences cannot be made
P-value for baseline test has no relation to the ability to make valid treatment comparisons at the end of the trial.
![Page 28: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/28.jpg)
28
Solutions
ANCOVAAnswers the question: “If both groups had had average overall baseline values, what treatment difference would we have seen?”Makes an average allowance for imbalance
StratificationAllows valid treatment comparison within each strata.Need to think of this before the trial if you want to do it correctly.
![Page 29: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/29.jpg)
29
In short…
The fact that baseline tests are commonly performed without much apparent harm is no more of a defense than saying of the policy of treating viruses with antibiotics that most patients recover.
S. Senn
![Page 30: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/30.jpg)
30
Power
![Page 31: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/31.jpg)
31
Power
Systems are subject to random variationotherwise, why would we experiment?
our lives would be simple without it
We try to see through the random variation (noise) and determine the true effect (signal)
![Page 32: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/32.jpg)
32
Power (cont’d)
How? Well-planned, adequately-powered experiments
Loose definition of power: “The probably that a statistically significant difference will be found when the null hypothesis is false (ie, when the treatments truly are not equal).”
![Page 33: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/33.jpg)
33
What Determines Power?
Hypothesis and model
Sample size
Variability among observations
What risk are you willing to take of wrongly rejecting Ho?
How small of a difference among treatments do you need to detect?
![Page 34: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/34.jpg)
34
Calculating Power
Determine variable of primary interestmean change from baseline in symptoms
Determine comparison of primary interest and null hypothesis
assume mean active is the same as placebo
Determine analysis methodANCOVA
![Page 35: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/35.jpg)
35
Calculating Power (cont’d)
Get an estimate of population variability among experimental units (Sigma)
literaturepilot/previous trialscan be a joke
Determine smallest difference between treatments you would like to detect (Delta)
often a joke
![Page 36: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/36.jpg)
36
Clinically relevant difference
A somewhat nebulous concept with various conventions used by statisticians in their power calculations and incidentally, therefore, a means by which they drive their medical colleagues to distraction. This is used in the theory of clinical trials, as opposed to the cynically relevant difference, which is used in the practice.
S. Senn
![Page 37: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/37.jpg)
37
Calculating Power (cont’d)
Determine risk you’re willing to take of wrongly rejecting Ho
Type I error ()Decide there’s an effect when there really isn’t one
“false conviction”
set low at 5%, but arbitrary
![Page 38: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/38.jpg)
38
Sample size (n) and Power are the only elements left!
Sample Size per Group (n)Sample Size per Group (n)5050 100100 150150 200200 250250
Po
wer
(%
)P
ow
er (
%)
5050
6060
7070
8080
9090
100100
Calculating Power (cont’d)
![Page 39: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/39.jpg)
39
Summary of power
Power is a function of:hypothesis being testedstatistical modelsample sizeassumed variability of populationrisk you’re willing to takeminimum “relevant effect size”
No guarantees
![Page 40: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/40.jpg)
40
Working definition
Power is the probability of a possible outcome of a potential decision conditional upon an imaginable circumstance given a conceivable value of an algebraic embodiment of an abstract mathematical idea and the strict adherence to an extremely precise rule.
S. Senn
![Page 41: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/41.jpg)
Things that make no sense to me #2
![Page 42: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/42.jpg)
42
Post-hoc power calculations
Suppose we’ve run a well-designed and adequately-powered study that “fails”
“fails” usually means p>0.05.
We need an excuse.
![Page 43: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/43.jpg)
43
Post-hoc power calculations
Obviously, the study was underpowered!assume that the variability was larger than anticipated
the sample size was therefore too small
all other assumptions were fine
What was the “actual power” of this wimpy study?
So, you see, the drug probably does work!…I am just a terrible scientist.
![Page 44: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/44.jpg)
44
Post-hoc power calculations
How do you pick which assumptions were correct/incorrect when recalculating power?
Aribitrary
Ridiculous to do based on the results of 1 study
A view that I support“The power of a trial is a useful concept when planning the trial but has little relevance to the interpretation of its results.” (S. Senn)
![Page 45: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/45.jpg)
45
Conclusion
…Be careful!
![Page 46: Can I Have a P-value For That, Please? Christopher J. Miller Associate Director, Biostatistics AstraZeneca, LP chris.miller@astrazeneca.com](https://reader033.vdocuments.net/reader033/viewer/2022051620/56649eea5503460f94bfbe6e/html5/thumbnails/46.jpg)
46
References
Lang T, Secic M. How to Report Statistics in Medicine, 1997.
Senn S. Statistical Issues in Drug Development, 1997.