geoff cumming: lam, paris 2 (friday 11 may, 2012) workshop: the new statistics in practice in this...
TRANSCRIPT
Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice
In this workshop I will discuss how to use the new statistics in practice. Choice of topics will be responsive to the interests of people attending. I could consider various measures, including correlations, proportions, and the standardized effect size Cohen’s d. I could consider a range of simple experimental designs. I will also discuss meta-analysis. ESCI will serve to illustrate many of the ideas, and calculate confidence intervals in the different situations. I will consider statistical power, but will emphasize the advantages of an alternative approach to planning experiments: Precision for planning. This approach calculates the N required for our planned experiment to be likely to give a confidence interval that is not greater than some specified target length. There will be ample time for discussion, and for considering data and situations that are of particular interest to participants.
1
2
The New Statistics in Practice
Geoff Cumming
Statistical Cognition Laboratory, School of Psychological Science, La Trobe University, Melbourne, Australia 3086
[email protected]/psy/staff/cumming.html
LAM, Paris, Talk 2 (Workshop), 11 May 2012
THANKS TO: Claudia Fritz, and: Bruce Thompson, Sue Finch, Robert Maillardet, Ben Ong, Ross Day, Mary Omodei, Jim McLennan, Sheila Crewther, David Crewther, Melanie Murphy, Cathy Faulkner, Pav Kalinowski, Jerry Lai, Debra Hansen, Mary Castellani, Mark Halloran, Kavi Jayasinghe, Mitra Jazayeri, Matthew Page, Leslie
Schachte, Anna Snell, Andrew Speirs-Bridge, Eva van der Brugge, Elizabeth Silver, Jacenta Abbott, Sarah Rostron, Amy Antcliffe, Lisa L. Harlow, Dennis Doverspike, Alan Reifman, Joseph S. Rossi, Frank L. Schmidt, Meng-Jia Wu, Fiona Fidler, Neil
Thomason, Claire Layman, Gideon Polya, Debra Riegert, Andrea Zekus, Mimi Williams, Lindy Cumming
© G. Cumming 2012
3
Lucky-Noluck, RCTs for new vs old treatment
Lucky (2009) found the new showed a statistically significant advantage over the old: M (difference) = 3.61, SD = 6.97, t(42) = 2.43, p = .02.
Noluck (2009) found no statistically significant difference between the two:M (difference) = 2.23, SD = 7.59, t(34) = 1.25, p
= .22.
Conclusion, from NHST? (Different, equivocal, or similar?)
Conclusion, from 95% CIs?Chapter 1
-2 0 2 4 6 8Difference between the means
Simms (Total N = 44)
Collins (Total N = 36)
Lucky (Total N = 44)
Noluck (Total N = 36)
4
Combination by meta-analysis (MA) of the Lucky and Noluck results.
The null hypothesis of no difference was rejected, p = .008.
What is your conclusion? Is the new treatment effective?
-2 0 2 4 6 8Difference between the means
MA (Total N = 80)
Chapter 1
-2 0 2 4 6 8Difference between the means
Simms (Total N = 44)
Collins (Total N = 36)
Lucky (Total N = 44)
Noluck (Total N = 36)
5
Three formats, all based on the same data:1.Null hypothesis significance testing (NHST)2.Confidence intervals (CI)3.Meta-analysis (MA), to combine results of two studies.The CI and MA formats indicates that Similar is the best interpretation. (A comparison of the two studies gives p = .55, so no sign of conflict between them!)
2.
3.
Formats matter! NHST can mislead. CIs can give better understanding and conclusions.
… just some more messing with your (NHST?) mind…
-2 0 2 4 6 8Difference between the means
-2 0 2 4 6 8Difference between the means
Simms (Total N = 44)
Collins (Total N = 36)
Lucky (Total N = 44)
Noluck (Total N = 36)
MA (Total N = 80)
Chapter 1
6
Lucky-Noluck evidence (statistical cognition)
Email authors of articles in psychology and medical journals. Ask them to rate: “Results of the two are broadly consistent, or
similar” Ask for comments, classify these as ‘mention NHST’ or no such
mention. Respondents who saw the CI figure:
Conclude: Even if see CIs, often think in terms of NHST! Better interpretation if avoid NHST,
and think in terms of intervals!
54223
330%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Mentioned NHST Did not mention NHST
Classification of respondents
Pe
rce
nta
ge
of
resp
on
ses
Similar Different
Chapter 1
Coulson, M., Healey, M., Fidler, F., & Cumming, G. (2010). Confidence intervals permit, but do not guarantee, better inference than statistical significance testing. Frontiers in Quantitative Psychology and Measurement, 1:26, 1-9. ) tinyurl.com/cisbetter
-2 0 2 4 6 8Difference between the means
Simms (Total N = 44)
Collins (Total N = 36)
Lucky (Total N = 44)
Noluck (Total N = 36)
7
Time for a crusade?!
8
The Boots anti-ageing stampede!
April 2009, Queues for ‘No. 7 Protect & Perfect Intense Beauty Serum’
Media reports: “significant clinical improvement in facial wrinkles…”
J. Dermatology, online:
A cosmetic ‘anti-ageing’ product improves photoaged skin: A double-blind, randomized controlled trial
“…statistically significant improvement in facial wrinkles as compared to baseline assessment (p = .013), whereas vehicle-treated skin was not significantly improved (p = .11)”
A highly critical paper in Significance, then a revised article:
“non-significant trend towards clinical improvement… (p = .10)…”
Watson, R. E. B., et al. (2009). British Journal of Dermatology, 161, 419-426.Chapter 2
9
Lucky-Noluck is everywhere!
“…incorrect procedure… in which researchers conclude that effects differ when one effect is significant (p < .05) but the other is not (p > .05). We reviewed 513 … articles in Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience and found that 78 used the correct procedure and 79 used the incorrect procedure.”
Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature neuroscience, 14, 1105-1107.
10
Effect size
Effect size, the amount of something of interest
Many ES measures are very familiar
An effect size (ES) can be:
A mean, or difference between means
A percentage, or percentage change
A correlation (e.g., Pearson r)
Proportion of variance (R2, 2, 2…)
A standardised measure (Cohen’s d, Hedges g…)
A regression slope (b or )
Many other things… (but NOT a p value!)Chapter 2
11
Types of ESs
ES in original units (e.g., mean, mean difference)
Standardised measure (e.g., Cohen’s d )—can help future MA
Units-free measure (e.g., Pearson r, , R2)
“Effect sizes may be expressed in the original units (e.g., the mean
number of questions answered correctly; kg/month for a regression
slope) and are often most easily understood when reported in original
units. It can often be valuable to report an effect size not only in
original units but also in some standardized or units-free unit (e.g., as
a Cohen’s d value) or a standardized regression weight. Multiple
degree-of-freedom effect-size indicators are often less useful than
effect-size indicators that decompose multiple degree-of-freedom tests
into meaningful one degree-of-freedom effects.” (Publication Manual,
p. 34) Chapter 2
12
APA Publication Manual, 6th ed: Reporting CIs
From p. 117:
“was also statistically significant… t(177) = 3.51, p < .001, d = 0.65, 95% CI [0.35, 0.95].”
“R2 = .25, ∆R2 = .04, F(1, 143) = 7.63, p = .006, 95% CI [.13, .37].” No need to repeat “95% CI” within the same paragraph, if meaning
clear:
“… 95% CIs [5.62, 8.31], [-2.43, 4.31], and [-4.29, -3.11], respectively.”
Don’t repeat the units when stating the CI:
“M = 30.5 cm, 99% CI [18.0, 43.0]”
Strangely, no other discipline seems to have a well-recognised format for
CI reporting! Even medicine, which has used CIs for 30 years!
My suggestion:
Always use 95%, unless excellent reasons for some other %.Chapters 1, 2
13
Manual: Reporting CIs in tables and figures
Tables, pp. 125-144:
“When a table includes point estimates, for example, means, correlations, or regression slopes, it should also, where possible, include confidence intervals.”
In a table use either (see examples, pp. 139-144):- a column of […, …] values, or - separate columns for the lower limit (LL), and upper limit (UL) values.
Figures, pp. 150-160:
“Figures can be used to illustrate the results… with error bars representing precision of the… estimates”.
“If your graph includes error bars, explain whether they represent standard deviations, standard errors, confidence limits, or ranges.”
Sadly, the only example error bars are SE bars.
Cumming, G., Fidler, F., & Vaux, D. L. (2007). Error bars in experimental biology. Journal of Cell Biology, 177, 7-11. tinyurl.com/errorbars101
Chapters 3, 4
14
CIs and replication: Statistical cognition
Click to indicate 10 ‘plausible’ replication means
Most researchers do reasonably well
BUT they underestimate variability (most think a 95% CI captures 95% of future means)
In fact, on average, captures 83%
A CI tells us what’s likely to happen next time, or what might have been!
A CI is much more informative than a p value
Cumming, G., Williams, J., & Fidler, F. (2004). Replication, and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, 299-311.
Chapter 5D
ep
en
de
nt v
aria
ble
.
95% CI
Where will the next mean fall? An internet experiment.
15
Some topics:
Compare two conditions—independent, or paired data
Randomised control trial (RCT)
CI on correlation, r
CI on proportion, P
Cohen’s d, and CI on d
Statistical power
Precision for planning
Meta-analysisChapters 6 - 15
The New Statistics: How?
16
CIs on ESs
All our CIs so far have been on means, and have been symmetric (upper arm = lower arm)
But we also need to have CIs for other ESs. Consider: Proportion ) …CIs on these ESs are, in
general
Correlation (Pearson r) ) not symmetric, and sometimes
Cohen’s d ) can be tricky to calculate
Chapter 14
17
CI on Proportion, P
Proportions lie within [0, 1], for example: Proportion of patients who, after therapy, no longer meet DSM
criteria for the initial diagnosis Proportion of responses that were errors (may be very low)
Limits at 0 and 1 mean we expect CIs to be asymmetric Excellent approx CIs:Altman, D. G., Machin, D., Bryant, T. N., & Gardner, M. J. (2000). Statistics with
confidence: Confidence intervals and statistical guidelines (2nd ed.). London: British Medical Journal Books.
Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, 903-916.
Proportions and Diff proportions pages of ESCI Effect sizes
Chapter 14
18
Example use of CIs on proportion, P
Difference between two proportions (instead of 2)
ES = 17/20 – 11/20 = .30, [.02, .53]
Diff proportions page of ESCI Effect sizes
Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, 903-916.
Access to finance? Large
corporations
Small
corporations
Total
Satisfactory access 17 11 28
Unsatisfactory access 3 9 12
Total 20 20 40
Chapter 14
19
CI on correlation, r
Use Fisher’s r to z transformation CIs are asymmetric, especially for r near -1 or 1 CIs are shorter when near -1 or 1 CIs may seem surprisingly wide, unless N is large
r to z and Two correlations pages of ESCI chapters 14-15
Correlations and Diff correlations pages of ESCI Effect sizes
Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, 903-916.
Chapter 14
20
Cohen’s d (A standardised effect size)
Cohen’s d is number of SDs by which two conditions differ (a z score)
d picture page of ESCI chapters 10-13
Lots of overlap of the populations!
For d = 0.5 (a medium effect), 69% of E points higher than C mean!
Cohen chose medium = 0.5 as, roughly, a typical, noticeable amount
that is of interest in behavioural and social science
Cohen’s small, medium, large: 0.2, 0.5, 0.8—but arbitrary!
Cohen’s d is the ES (in original units), divided by a suitable SD
Our sample d is a point estimate of the population
Chapter 11
21
Calculating Cohen’s d, for 2 independent groups
Option 1
Use some known or assumed population SD,
If = 4.0, d = 2.00/4.0 = 0.500
Option 2
Use the SD of the Control group
s = 3.964, d = 2.00/3.964 = 0.505
Option 3 (most commonly used)
Use the pooled within-group SD (as for the t-test)
s = 4.209, d = 2.00/4.209 = 0.475 [Hedge’s g… (!)Prefer Option 2 if one condition is a ‘base’ or ‘reference’ condition; and
Option 3 if not, especially if sample sizes are small.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530-572.
Control Experim'l17 1822 1925 2920 2419 2229 2926 2822 27
Mean 22.500 24.500SD 3.964 4.440
diff between means 2.000pooled within-group s 4.209
Chapter 11
22
CIs for Cohen’s d, for 2 independent groups
Option 1: Easy! Just find CI for the diff between means, then divide by
Options 2 and 3: Tricky! Need noncentral t distribution! (Chapter 10 and fairytale “How the noncentral t distribution got its hump” tinyurl/noncentralt )
For example, option 3:
Both numerator and denominator have sampling variability, so distribution of d is weird. Noncentral t !
The rubber ruler (!): the SD as an elastic unit of measurement.
d heap and CI for d pages of ESCI chapters 10-13
Or, for an excellent approximate method of calculating CIs for d :Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better
questions. Zeitschrift für Psychologie / Journal of Psychology, 217, 15-26.
)1()1(
)1()1(
)(
21
222
211
12
nn
snsn
MMd
Chapter 11
23
Unbiased estimate of is dunb
Unfortunately d overestimates .
The unbiased estimate of d is:
Multiply d by the adjustment factor to get dunb.
Routinely use dunb (sometimes called Hedges’ g, but terminology a
mess!)
Data two and Data paired pages of ESCI chapters 5-6Chapter 11
Degrees of freedom df Adjustment factor Percent bias of d
2 0.564 77.2%
5 0.841 18.9%
10 0.923 8.4%
20 0.962 4.0%
30 0.975 2.6%
50 0.985 1.5%
24
Power, and precision
Consider the “sensitivity”, or “informativeness” of
our experiment—the power, or precision
NHST world: Statistical power …the chance we’ll find something, if it is there
Estimation world: The MOE (half the width of a CI;
the length of one arm) is a measure of precision How large an N should we use, to get MOE no longer than
XX?
Chapters 12, 13
25
Statistical power: I’m ambivalent!
A Type 1 error is rejecting H0, when it is true (Prob = )
A Type 2 error is failing to reject H0, when there is a true effect (Prob = )
Power = 1 – = Prob(reject H0 IF H0 false)
Power is the chance we’ll find an effect, if
there is an effect (High power is good!)
At right: Single sample, N = 18,
= .05, = .5, power = .52
Power picture page of
ESCI chapters 10-13
To calculate power, we need the non-
central t distribution—unless is known
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence
intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530–572.
0
0.1
0.2
0.3
0.4
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00t
Pro
ba
bili
ty d
en
sity
Ho true, Central tHa true, NonCentral t
Chapter 12
26
Power recommendations
APA Manual:
“Take seriously the statistical power considerations
associated with the tests of hypotheses. …
routinely provide evidence that the study has
sufficient power to detect effects of substantive
interest…” (p. 30)
BUT power values are very rarely reported in
psychology journals
Chapter 12
27
Statistical power
Power depends on:
N, the sample size (larger n, higher power)
An EXACT target ES, the size of effect we’re looking
for (larger effect, higher power) Therefore, to calculate power, need to state the ES. “Our
experiment had power of .8 to find difference of 5.0 units on
the anxiety scale.” (Use expertise in the field to choose ES.)
Or: “…to find a medium-sized effect ( = 0.5).”
Other things—notably , 1 or 2 tails, and the
experimental designChapter 12
28
Statistical power: Some values
Power two and Power paired pages of ESCI chapters 10-
13
Two independent groups, = .05: For = 0.5 (medium effect), power = .5 if N = 32 for each group
For power = .8, = 0.5, need N = 64 (!!) and N = 95 with = .01
Scope for fudging! (Grant applications, ethics proposals…) E.g. two independent groups, = .05, N = 70, then:
For = 0.3, 0.4, 0.5 we get power = .42, .65, .84
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd. ed.). New York:
Academic Press.
Software: Gpower tinyurl.com/gpower3 Chapter 12
29
Statistical power in psychology: Often so low!
Cohen (1962): In published psychology research, the median power
to find a medium-sized effect is about .5.
Maxwell (2004): It was still about .5.
Our journals (and file drawers) are crammed with Type 2 errors:
Results that are statistically insignificant (ns) even though there is a
real effect!
“One can only speculate on the number of potentially fruitful lines of
investigation which have been abandoned because Type 2 errors
were made…” Cohen (1962)
Maxwell, S. E. (2004). The persistence of underpowered studies in psychological
research: Causes, consequences, and remedies. Psychological Methods, 9,
147-163.Chapter 12
30
Post hoc power: A bad idea!
Calculated after data are obtained. Use obtained d as target Obtain mean difference of 2.6 anxiety units, maybe power = .35
If we’d found a difference of 7.2 units, post hoc power might be .83
Replicate, and see ‘dance of post hoc power’! Mad!
Simulate two page of ESCI chapters 5-6
Devastatingly criticised as not telling us what we want to know (chance we’ll find an effect of a size chosen to be meaningful)
Merely reflects the outcome of our study. Tells us nothing new.
SPSS, etc, gives post hoc power in its printouts. (NAUGHTY!) Never use this value! (A cop out by the software publishers!)
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55, 19-24.
Chapter 12
31
32
Precision
Power has meaning only in the context of NHST
When using estimation, the corresponding concept is precision, as indexed by MOE Large MOE, low precision
Small MOE, high precision
APA Manual:“…use calculations based on a chosen target precision
(confidence interval width) to determine sample sizes.” (p. 31)
Chapter 13
MOE
33
Precision for planning (AIPE, accuracy in parameter estimation)
Calculate what N is required to give:
expected MOE no more than f × , (so f is like d, a number of SDs)
OR to have a 99% chance MOE is no more than f ×
‘assurance’ = 99%, expressed as = 99
Three Precision pages of ESCI chapters 5-6
Not yet widely used, but highly recommended (No need for H0!)
For example, f = 0.4, two independent groups, need N = 50
And for = 99, need N = 65
Such large N, even with such large f !
Chapter 13
f ×
34
Low power, poor precision!? What can we do?
Informativeness—my general term for quality, size, sensitivity
To increase informativeness (also precision & power): Choose experimental design to minimise error
Improve the measures, maybe measure twice and average
Target large effect sizes: Six therapy sessions, not two
Use large N (Phew!)—tho’ to halve SE, need to multiply N by 4!
Use Meta-analysis (combine results over experiments)
Yay! …very soon now…
An essential step in research planning, worth great effort!
Brainstorm!
Chapter 12
35
Single experiments—So many problems!
Dance of the p values—so wide!
Power often so low!
CIs often so wide, precision low!
CIs report accurately the uncertainty in data. But don’t shoot the messenger—it’s a message we need to
hear
The solutions:
Increase informativeness of individual studies
Combine results over studies—Meta-analysisChapter 7
36
The New Statistics: How?
Estimation: The six-step plan1.Use estimation thinking. State estimation questions
as: “How much…?”, “To what extent…?”, “How many…?”
• Key to a more quantitative discipline?
2.Indentify the ESs that best answer the questions3.From the data, calculate point and interval estimates
(CIs) for those ESs4.Make a picture, including CIs5.Interpret6.Use meta-analytic thinking at every stage
Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better
questions. Zeitschrift für Psychologie / Journal of Psychology, 217, 15-26.Chapters 1, 2,
15
37
Meta-analysis: Does psychotherapy work?
Gene Glass (1976), presidential
address to AERA
Combine 375 studies, find overall
average d = 0.68 (medium+)
On average, 75% of patients,
after therapy, are above the
mean of untreated patients
A person initially at the mean,
on average moves to the 75th
percentile.
d0.680 EC
50 60 70 80 90 100 110 120 130 140 150
Control
Experimental
Chapter 7
Hunt: The gripping MA story: How it saved social and behavioural research funding.
And guides choice of medical treatments. And is our best chance for saving the
world!
Hunt, M. (1997). How science takes stock. The story of meta-analysis. New York:
Sage
38
The meta-analysis picture
The forest plot
CIs make this picture possible; p values are irrelevant
ESCI Meta-Analysis
First year students easily grasp the basics
Meta-analysis should appear in the intro stats course!
Effect sizes used in meta-analysis: Means, Cohen’s d, r, others…
Cooper, H. M. (2009). Research synthesis and meta-analysis: A step-by-step
approach (4th ed.). Thousand Oaks, CA: Sage.
Cumming, G. (2006b). Meta-analysis: Pictures that explain how experimental
findings can be integrated. 7th International Conference on Teaching Statistics.
Brazil, July. tinyurl.com/teachma
Chapter 7
39
Meta-analysis: Small or large
SMALL
Combine 2 or 3 results, or studies
LARGE e.g. Cooper’s seven steps
1. Formulate the questions, and scope of the systematic review
2. Search and obtain literature, contact researchers, find grey
literature Establish selection criteria, read and select studies
3. Code studies, enter ES estimates and coding of study features
4. Choose what to include, and design the analyses
5. Analyse the data. Prefer random effects model.
6. Interpret; draw empirical, theoretical, and applied conclusions.
7. Prepare critical discussion, present the review
8. Receive $1,000,000 and gold medal. Retire early. (Joke, alas.)Chapter 9
40
Health sciences: The Cochrane Collaboration
Systematic reviews: meta-analytic summaries of research
Freely available over the internet Publicly available if your country subscribes
2,000+ reviews, aiming for 10,000+
28,000+ people in 100+ countries
Aim to update every two years (!)
Includes some psychology
Will psychology join, or should it do its own thing?
Campbell collaboration (social sciences, some psychology)
www.cochrane.org
Chapter 9
41Chapter 9
42
43
44
Models for meta-analysis
Fixed effect (FE) model
Assume every study estimates the same population ES: Assumes studies homogeneous: Only vary because of sampling
variability
Random effects (RE) model
Assumes Study i estimates i, randomly chosen from N (, 2)
Measures of heterogeneity: Q, 2, I2. Study-to-study variation
—in excess of that expected from sampling variability (cf. dance of the means)
Always (virtually always) choose random effects model
RE and FE weight studies differently, and usually give different results
If heterogeneity low, RE gives same result as FEChapter 8
45
But there’s more: Moderator analysis
If heterogeneity, look for moderators that may account for it
Simplest: Dichotomous moderator? (e.g., gender)
Subgroups page of ESCI Meta-analysis
Identify moderator, even if no study manipulated that variable!
Meta-analysis can give empirical summaries, but also:
theoretical progress, and
research guidance. Gold!
Example: Peter Wilson, clumsy children, meta-analysis of 50
studies
Identify performance on complex visuospatial tasks as moderator
Conduct empirical study on this moderator Chapter 9
46
Continuous moderator? Meta-regression
Fletcher & Kerr (2010): Does RTG fade with length of relationship?
Meta-regression of ES values (RTG score) against years, 13 studies
Correlation, not causality. Alternative interpretations?
Chapter 9
Relationship length in years
-0.5
0
0.5
1
1.5
2
503020105321.5
d
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8Log relationship length
Fis
her
's z
47
MA in the Publication Manual
Many mentions, esp. pp. 36-37, 183.
Mainstreaming meta-analysis!
MARS (Meta-Analysis Reporting Standards) pp. 251-252.
A further big advantage of the sixth edition
Cooper, H. (2010). Reporting research in psychology: How to meet Journal Article
Reporting Standards (APA Style). Washington, DC: APA Books.
Chapter 9
48
CMA: Software for meta-analysis
Comprehensive Meta Analysis www.Meta-Analysis.com
Enter ES, and its variance, for each study—in 100+ formats!
CMA calculates weighted combined ES, using FE or RE model
Assess heterogeneity of studies
Explore moderators (ANOVA, or meta-regression)
Forest plot
(Another software option: RevMan, from Cochrane website)
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009).
Introduction to meta-analysis. New York: Wiley.
Chapters 8, 9
49
Assessing possible publication bias
Funnel plot: graph of SE (or
variance) of the ES of a study
(Vertical axis, high values at
top) against ES (Horiz axis).
Do small studies (near
bottom) have large ES? If so,
small studies obtaining small
ES may be missing—not
published. (In the file
drawer!)
In this example: Yes!
Chapter 9
Subgroups page of ESCI Meta-analysis
ES
SE
No difference
50
Meta-analytic thinking
1. Think of past literature in meta-analytic terms
2. Think of our study as the next step in that progressively
cumulating meta-analysis
3. Report results so inclusion in future meta-analysis is easy Report all effect sizes (whether ns or not), in the best way
Manual: “…be sure to include small effect sizes (or statistically
nonsignificant findings)…” p. 32. Nothing in the file drawer!
Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530-572.
Chapters 1, 7, 9
51
The New Statistics: Actually doing it!
The editor says to remove CIs and just give p values! What do we
DO?!
Be strong! The reasons for TNS are compelling, and TNS is the
way of the future. It’s worth persisting!
Manual: “Wherever possible, base discussion and interpretation
of results on point and interval estimates” (p. 34). That’s a
great imprimatur!
The evidence should decide: Cite statistical cognition research.
Numerous scholars have criticised NHST, hardly anyone has
replied!
I add in p values if I must, but I won’t remove CIs (or ESs).Chapter 15
52
The New Statistics: How?
Estimation: The six-step plan1.Use estimation thinking. State estimation questions
as: “How much…?”, “To what extent…?”, “How many…?”
• Key to a more quantitative discipline?
2.Indentify the ESs that best answer the questions3.From the data, calculate point and interval estimates
(CIs) for those ESs4.Make a picture, including CIs5.Interpret6.Use meta-analytic thinking at every stage
Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better
questions. Zeitschrift für Psychologie / Journal of Psychology, 217, 15-26.Chapters 1, 2,
15
53
Queries or comments to:[email protected]
Geoff’s brief radio talk:tinyurl.com/geofftalk
Geoff’s short magazine article:tiny.cc/GeoffConversation
Preface, contents & sample chapter: tinyurl.com/tnschapter7
Dance of the p values: tinyurl.com/danceptrial2
Book info, and ESCI: www.thenewstatistics.com
Hug a confidence interval today!