geoff cumming: lam, paris 2 (friday 11 may, 2012) workshop: the new statistics in practice in this...

53
Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics in practice. Choice of topics will be responsive to the interests of people attending. I could consider various measures, including correlations, proportions, and the standardized effect size Cohen’s d. I could consider a range of simple experimental designs. I will also discuss meta-analysis. ESCI will serve to illustrate many of the ideas, and calculate confidence intervals in the different situations. I will consider statistical power, but will emphasize the advantages of an alternative approach to planning experiments: Precision for planning. This approach calculates the N required for our planned experiment to be likely to give a confidence interval that is not greater than some specified target length. There will be ample time for discussion, and for considering data and situations that are of particular interest to participants. 1

Upload: shaylee-bastin

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice

 

In this workshop I will discuss how to use the new statistics in practice. Choice of topics will be responsive to the interests of people attending. I could consider various measures, including correlations, proportions, and the standardized effect size Cohen’s d. I could consider a range of simple experimental designs. I will also discuss meta-analysis. ESCI will serve to illustrate many of the ideas, and calculate confidence intervals in the different situations.  I will consider statistical power, but will emphasize the advantages of an alternative approach to planning experiments: Precision for planning. This approach calculates the N required for our planned experiment to be likely to give a confidence interval that is not greater than some specified target length. There will be ample time for discussion, and for considering data and situations that are of particular interest to participants.

1

Page 2: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

2

The New Statistics in Practice

Geoff Cumming

Statistical Cognition Laboratory, School of Psychological Science, La Trobe University, Melbourne, Australia 3086

[email protected]/psy/staff/cumming.html

LAM, Paris, Talk 2 (Workshop), 11 May 2012

THANKS TO: Claudia Fritz, and: Bruce Thompson, Sue Finch, Robert Maillardet, Ben Ong, Ross Day, Mary Omodei, Jim McLennan, Sheila Crewther, David Crewther, Melanie Murphy, Cathy Faulkner, Pav Kalinowski, Jerry Lai, Debra Hansen, Mary Castellani, Mark Halloran, Kavi Jayasinghe, Mitra Jazayeri, Matthew Page, Leslie

Schachte, Anna Snell, Andrew Speirs-Bridge, Eva van der Brugge, Elizabeth Silver, Jacenta Abbott, Sarah Rostron, Amy Antcliffe, Lisa L. Harlow, Dennis Doverspike, Alan Reifman, Joseph S. Rossi, Frank L. Schmidt, Meng-Jia Wu, Fiona Fidler, Neil

Thomason, Claire Layman, Gideon Polya, Debra Riegert, Andrea Zekus, Mimi Williams, Lindy Cumming

© G. Cumming 2012

Page 3: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

3

Lucky-Noluck, RCTs for new vs old treatment

Lucky (2009) found the new showed a statistically significant advantage over the old: M (difference) = 3.61, SD = 6.97, t(42) = 2.43, p = .02.

Noluck (2009) found no statistically significant difference between the two:M (difference) = 2.23, SD = 7.59, t(34) = 1.25, p

= .22.

Conclusion, from NHST? (Different, equivocal, or similar?)

Conclusion, from 95% CIs?Chapter 1

-2 0 2 4 6 8Difference between the means

Simms (Total N = 44)

Collins (Total N = 36)

Lucky (Total N = 44)

Noluck (Total N = 36)

Page 4: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

4

Combination by meta-analysis (MA) of the Lucky and Noluck results.

The null hypothesis of no difference was rejected, p = .008.

What is your conclusion? Is the new treatment effective?

-2 0 2 4 6 8Difference between the means

MA (Total N = 80)

Chapter 1

-2 0 2 4 6 8Difference between the means

Simms (Total N = 44)

Collins (Total N = 36)

Lucky (Total N = 44)

Noluck (Total N = 36)

Page 5: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

5

Three formats, all based on the same data:1.Null hypothesis significance testing (NHST)2.Confidence intervals (CI)3.Meta-analysis (MA), to combine results of two studies.The CI and MA formats indicates that Similar is the best interpretation. (A comparison of the two studies gives p = .55, so no sign of conflict between them!)

2.

3.

Formats matter! NHST can mislead. CIs can give better understanding and conclusions.

… just some more messing with your (NHST?) mind…

-2 0 2 4 6 8Difference between the means

-2 0 2 4 6 8Difference between the means

Simms (Total N = 44)

Collins (Total N = 36)

Lucky (Total N = 44)

Noluck (Total N = 36)

MA (Total N = 80)

Chapter 1

Page 6: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

6

Lucky-Noluck evidence (statistical cognition)

Email authors of articles in psychology and medical journals. Ask them to rate: “Results of the two are broadly consistent, or

similar” Ask for comments, classify these as ‘mention NHST’ or no such

mention. Respondents who saw the CI figure:

Conclude: Even if see CIs, often think in terms of NHST! Better interpretation if avoid NHST,

and think in terms of intervals!

54223

330%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Mentioned NHST Did not mention NHST

Classification of respondents

Pe

rce

nta

ge

of

resp

on

ses

Similar Different

Chapter 1

Coulson, M., Healey, M., Fidler, F., & Cumming, G. (2010). Confidence intervals permit, but do not guarantee, better inference than statistical significance testing. Frontiers in Quantitative Psychology and Measurement, 1:26, 1-9. ) tinyurl.com/cisbetter

-2 0 2 4 6 8Difference between the means

Simms (Total N = 44)

Collins (Total N = 36)

Lucky (Total N = 44)

Noluck (Total N = 36)

Page 7: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

7

Time for a crusade?!

Page 8: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

8

The Boots anti-ageing stampede!

April 2009, Queues for ‘No. 7 Protect & Perfect Intense Beauty Serum’

Media reports: “significant clinical improvement in facial wrinkles…”

J. Dermatology, online:

A cosmetic ‘anti-ageing’ product improves photoaged skin: A double-blind, randomized controlled trial

“…statistically significant improvement in facial wrinkles as compared to baseline assessment (p = .013), whereas vehicle-treated skin was not significantly improved (p = .11)”

A highly critical paper in Significance, then a revised article:

“non-significant trend towards clinical improvement… (p = .10)…”

Watson, R. E. B., et al. (2009). British Journal of Dermatology, 161, 419-426.Chapter 2

Page 9: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

9

Lucky-Noluck is everywhere!

“…incorrect procedure… in which researchers conclude that effects differ when one effect is significant (p < .05) but the other is not (p > .05). We reviewed 513 … articles in Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience and found that 78 used the correct procedure and 79 used the incorrect procedure.”

Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature neuroscience, 14, 1105-1107.

Page 10: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

10

Effect size

Effect size, the amount of something of interest

Many ES measures are very familiar

An effect size (ES) can be:

A mean, or difference between means

A percentage, or percentage change

A correlation (e.g., Pearson r)

Proportion of variance (R2, 2, 2…)

A standardised measure (Cohen’s d, Hedges g…)

A regression slope (b or )

Many other things… (but NOT a p value!)Chapter 2

Page 11: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

11

Types of ESs

ES in original units (e.g., mean, mean difference)

Standardised measure (e.g., Cohen’s d )—can help future MA

Units-free measure (e.g., Pearson r, , R2)

“Effect sizes may be expressed in the original units (e.g., the mean

number of questions answered correctly; kg/month for a regression

slope) and are often most easily understood when reported in original

units. It can often be valuable to report an effect size not only in

original units but also in some standardized or units-free unit (e.g., as

a Cohen’s d value) or a standardized regression weight. Multiple

degree-of-freedom effect-size indicators are often less useful than

effect-size indicators that decompose multiple degree-of-freedom tests

into meaningful one degree-of-freedom effects.” (Publication Manual,

p. 34) Chapter 2

Page 12: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

12

APA Publication Manual, 6th ed: Reporting CIs

From p. 117:

“was also statistically significant… t(177) = 3.51, p < .001, d = 0.65, 95% CI [0.35, 0.95].”

“R2 = .25, ∆R2 = .04, F(1, 143) = 7.63, p = .006, 95% CI [.13, .37].” No need to repeat “95% CI” within the same paragraph, if meaning

clear:

“… 95% CIs [5.62, 8.31], [-2.43, 4.31], and [-4.29, -3.11], respectively.”

Don’t repeat the units when stating the CI:

“M = 30.5 cm, 99% CI [18.0, 43.0]”

Strangely, no other discipline seems to have a well-recognised format for

CI reporting! Even medicine, which has used CIs for 30 years!

My suggestion:

Always use 95%, unless excellent reasons for some other %.Chapters 1, 2

Page 13: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

13

Manual: Reporting CIs in tables and figures

Tables, pp. 125-144:

“When a table includes point estimates, for example, means, correlations, or regression slopes, it should also, where possible, include confidence intervals.”

In a table use either (see examples, pp. 139-144):- a column of […, …] values, or - separate columns for the lower limit (LL), and upper limit (UL) values.

Figures, pp. 150-160:

“Figures can be used to illustrate the results… with error bars representing precision of the… estimates”.

“If your graph includes error bars, explain whether they represent standard deviations, standard errors, confidence limits, or ranges.”

Sadly, the only example error bars are SE bars.

Cumming, G., Fidler, F., & Vaux, D. L. (2007). Error bars in experimental biology. Journal of Cell Biology, 177, 7-11. tinyurl.com/errorbars101

Chapters 3, 4

Page 14: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

14

CIs and replication: Statistical cognition

Click to indicate 10 ‘plausible’ replication means

Most researchers do reasonably well

BUT they underestimate variability (most think a 95% CI captures 95% of future means)

In fact, on average, captures 83%

A CI tells us what’s likely to happen next time, or what might have been!

A CI is much more informative than a p value

Cumming, G., Williams, J., & Fidler, F. (2004). Replication, and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, 299-311.

Chapter 5D

ep

en

de

nt v

aria

ble

.

95% CI

Where will the next mean fall? An internet experiment.

Page 15: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

15

Some topics:

Compare two conditions—independent, or paired data

Randomised control trial (RCT)

CI on correlation, r

CI on proportion, P

Cohen’s d, and CI on d

Statistical power

Precision for planning

Meta-analysisChapters 6 - 15

The New Statistics: How?

Page 16: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

16

CIs on ESs

All our CIs so far have been on means, and have been symmetric (upper arm = lower arm)

But we also need to have CIs for other ESs. Consider: Proportion ) …CIs on these ESs are, in

general

Correlation (Pearson r) ) not symmetric, and sometimes

Cohen’s d ) can be tricky to calculate

Chapter 14

Page 17: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

17

CI on Proportion, P

Proportions lie within [0, 1], for example: Proportion of patients who, after therapy, no longer meet DSM

criteria for the initial diagnosis Proportion of responses that were errors (may be very low)

Limits at 0 and 1 mean we expect CIs to be asymmetric Excellent approx CIs:Altman, D. G., Machin, D., Bryant, T. N., & Gardner, M. J. (2000). Statistics with

confidence: Confidence intervals and statistical guidelines (2nd ed.). London: British Medical Journal Books.

Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, 903-916.

Proportions and Diff proportions pages of ESCI Effect sizes

Chapter 14

Page 18: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

18

Example use of CIs on proportion, P

Difference between two proportions (instead of 2)

ES = 17/20 – 11/20 = .30, [.02, .53]

Diff proportions page of ESCI Effect sizes

Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, 903-916.

Access to finance? Large

corporations

Small

corporations

Total

Satisfactory access 17 11 28

Unsatisfactory access 3 9 12

Total 20 20 40

Chapter 14

Page 19: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

19

CI on correlation, r

Use Fisher’s r to z transformation CIs are asymmetric, especially for r near -1 or 1 CIs are shorter when near -1 or 1 CIs may seem surprisingly wide, unless N is large

r to z and Two correlations pages of ESCI chapters 14-15

Correlations and Diff correlations pages of ESCI Effect sizes

Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, 903-916.

Chapter 14

Page 20: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

20

Cohen’s d (A standardised effect size)

Cohen’s d is number of SDs by which two conditions differ (a z score)

d picture page of ESCI chapters 10-13

Lots of overlap of the populations!

For d = 0.5 (a medium effect), 69% of E points higher than C mean!

Cohen chose medium = 0.5 as, roughly, a typical, noticeable amount

that is of interest in behavioural and social science

Cohen’s small, medium, large: 0.2, 0.5, 0.8—but arbitrary!

Cohen’s d is the ES (in original units), divided by a suitable SD

Our sample d is a point estimate of the population

Chapter 11

Page 21: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

21

Calculating Cohen’s d, for 2 independent groups

Option 1

Use some known or assumed population SD,

If = 4.0, d = 2.00/4.0 = 0.500

Option 2

Use the SD of the Control group

s = 3.964, d = 2.00/3.964 = 0.505

Option 3 (most commonly used)

Use the pooled within-group SD (as for the t-test)

s = 4.209, d = 2.00/4.209 = 0.475 [Hedge’s g… (!)Prefer Option 2 if one condition is a ‘base’ or ‘reference’ condition; and

Option 3 if not, especially if sample sizes are small.

Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530-572.

Control Experim'l17 1822 1925 2920 2419 2229 2926 2822 27

Mean 22.500 24.500SD 3.964 4.440

diff between means 2.000pooled within-group s 4.209

Chapter 11

Page 22: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

22

CIs for Cohen’s d, for 2 independent groups

Option 1: Easy! Just find CI for the diff between means, then divide by

Options 2 and 3: Tricky! Need noncentral t distribution! (Chapter 10 and fairytale “How the noncentral t distribution got its hump” tinyurl/noncentralt )

For example, option 3:

Both numerator and denominator have sampling variability, so distribution of d is weird. Noncentral t !

The rubber ruler (!): the SD as an elastic unit of measurement.

d heap and CI for d pages of ESCI chapters 10-13

Or, for an excellent approximate method of calculating CIs for d :Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better

questions. Zeitschrift für Psychologie / Journal of Psychology, 217, 15-26.

)1()1(

)1()1(

)(

21

222

211

12

nn

snsn

MMd

Chapter 11

Page 23: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

23

Unbiased estimate of is dunb

Unfortunately d overestimates .

The unbiased estimate of d is:

Multiply d by the adjustment factor to get dunb.

Routinely use dunb (sometimes called Hedges’ g, but terminology a

mess!)

Data two and Data paired pages of ESCI chapters 5-6Chapter 11

Degrees of freedom df Adjustment factor Percent bias of d

2 0.564 77.2%

5 0.841 18.9%

10 0.923 8.4%

20 0.962 4.0%

30 0.975 2.6%

50 0.985 1.5%

Page 24: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

24

Power, and precision

Consider the “sensitivity”, or “informativeness” of

our experiment—the power, or precision

NHST world: Statistical power …the chance we’ll find something, if it is there

Estimation world: The MOE (half the width of a CI;

the length of one arm) is a measure of precision How large an N should we use, to get MOE no longer than

XX?

Chapters 12, 13

Page 25: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

25

Statistical power: I’m ambivalent!

A Type 1 error is rejecting H0, when it is true (Prob = )

A Type 2 error is failing to reject H0, when there is a true effect (Prob = )

Power = 1 – = Prob(reject H0 IF H0 false)

Power is the chance we’ll find an effect, if

there is an effect (High power is good!)

At right: Single sample, N = 18,

= .05, = .5, power = .52

Power picture page of

ESCI chapters 10-13

To calculate power, we need the non-

central t distribution—unless is known

Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence

intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530–572.

0

0.1

0.2

0.3

0.4

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00t

Pro

ba

bili

ty d

en

sity

Ho true, Central tHa true, NonCentral t

Chapter 12

Page 26: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

26

Power recommendations

APA Manual:

“Take seriously the statistical power considerations

associated with the tests of hypotheses. …

routinely provide evidence that the study has

sufficient power to detect effects of substantive

interest…” (p. 30)

BUT power values are very rarely reported in

psychology journals

Chapter 12

Page 27: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

27

Statistical power

Power depends on:

N, the sample size (larger n, higher power)

An EXACT target ES, the size of effect we’re looking

for (larger effect, higher power) Therefore, to calculate power, need to state the ES. “Our

experiment had power of .8 to find difference of 5.0 units on

the anxiety scale.” (Use expertise in the field to choose ES.)

Or: “…to find a medium-sized effect ( = 0.5).”

Other things—notably , 1 or 2 tails, and the

experimental designChapter 12

Page 28: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

28

Statistical power: Some values

Power two and Power paired pages of ESCI chapters 10-

13

Two independent groups, = .05: For = 0.5 (medium effect), power = .5 if N = 32 for each group

For power = .8, = 0.5, need N = 64 (!!) and N = 95 with = .01

Scope for fudging! (Grant applications, ethics proposals…) E.g. two independent groups, = .05, N = 70, then:

For = 0.3, 0.4, 0.5 we get power = .42, .65, .84

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd. ed.). New York:

Academic Press.

Software: Gpower tinyurl.com/gpower3 Chapter 12

Page 29: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

29

Statistical power in psychology: Often so low!

Cohen (1962): In published psychology research, the median power

to find a medium-sized effect is about .5.

Maxwell (2004): It was still about .5.

Our journals (and file drawers) are crammed with Type 2 errors:

Results that are statistically insignificant (ns) even though there is a

real effect!

“One can only speculate on the number of potentially fruitful lines of

investigation which have been abandoned because Type 2 errors

were made…” Cohen (1962)

Maxwell, S. E. (2004). The persistence of underpowered studies in psychological

research: Causes, consequences, and remedies. Psychological Methods, 9,

147-163.Chapter 12

Page 30: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

30

Post hoc power: A bad idea!

Calculated after data are obtained. Use obtained d as target Obtain mean difference of 2.6 anxiety units, maybe power = .35

If we’d found a difference of 7.2 units, post hoc power might be .83

Replicate, and see ‘dance of post hoc power’! Mad!

Simulate two page of ESCI chapters 5-6

Devastatingly criticised as not telling us what we want to know (chance we’ll find an effect of a size chosen to be meaningful)

Merely reflects the outcome of our study. Tells us nothing new.

SPSS, etc, gives post hoc power in its printouts. (NAUGHTY!) Never use this value! (A cop out by the software publishers!)

Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55, 19-24.

Chapter 12

Page 31: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

31

Page 32: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

32

Precision

Power has meaning only in the context of NHST

When using estimation, the corresponding concept is precision, as indexed by MOE Large MOE, low precision

Small MOE, high precision

APA Manual:“…use calculations based on a chosen target precision

(confidence interval width) to determine sample sizes.” (p. 31)

Chapter 13

MOE

Page 33: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

33

Precision for planning (AIPE, accuracy in parameter estimation)

Calculate what N is required to give:

expected MOE no more than f × , (so f is like d, a number of SDs)

OR to have a 99% chance MOE is no more than f ×

‘assurance’ = 99%, expressed as = 99

Three Precision pages of ESCI chapters 5-6

Not yet widely used, but highly recommended (No need for H0!)

For example, f = 0.4, two independent groups, need N = 50

And for = 99, need N = 65

Such large N, even with such large f !

Chapter 13

f ×

Page 34: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

34

Low power, poor precision!? What can we do?

Informativeness—my general term for quality, size, sensitivity

To increase informativeness (also precision & power): Choose experimental design to minimise error

Improve the measures, maybe measure twice and average

Target large effect sizes: Six therapy sessions, not two

Use large N (Phew!)—tho’ to halve SE, need to multiply N by 4!

Use Meta-analysis (combine results over experiments)

Yay! …very soon now…

An essential step in research planning, worth great effort!

Brainstorm!

Chapter 12

Page 35: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

35

Single experiments—So many problems!

Dance of the p values—so wide!

Power often so low!

CIs often so wide, precision low!

CIs report accurately the uncertainty in data. But don’t shoot the messenger—it’s a message we need to

hear

The solutions:

Increase informativeness of individual studies

Combine results over studies—Meta-analysisChapter 7

Page 36: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

36

The New Statistics: How?

Estimation: The six-step plan1.Use estimation thinking. State estimation questions

as: “How much…?”, “To what extent…?”, “How many…?”

• Key to a more quantitative discipline?

2.Indentify the ESs that best answer the questions3.From the data, calculate point and interval estimates

(CIs) for those ESs4.Make a picture, including CIs5.Interpret6.Use meta-analytic thinking at every stage

Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better

questions. Zeitschrift für Psychologie / Journal of Psychology, 217, 15-26.Chapters 1, 2,

15

Page 37: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

37

Meta-analysis: Does psychotherapy work?

Gene Glass (1976), presidential

address to AERA

Combine 375 studies, find overall

average d = 0.68 (medium+)

On average, 75% of patients,

after therapy, are above the

mean of untreated patients

A person initially at the mean,

on average moves to the 75th

percentile.

d0.680 EC

50 60 70 80 90 100 110 120 130 140 150

Control

Experimental

Chapter 7

Hunt: The gripping MA story: How it saved social and behavioural research funding.

And guides choice of medical treatments. And is our best chance for saving the

world!

Hunt, M. (1997). How science takes stock. The story of meta-analysis. New York:

Sage

Page 38: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

38

The meta-analysis picture

The forest plot

CIs make this picture possible; p values are irrelevant

ESCI Meta-Analysis

First year students easily grasp the basics

Meta-analysis should appear in the intro stats course!

Effect sizes used in meta-analysis: Means, Cohen’s d, r, others…

Cooper, H. M. (2009). Research synthesis and meta-analysis: A step-by-step

approach (4th ed.). Thousand Oaks, CA: Sage.

Cumming, G. (2006b). Meta-analysis: Pictures that explain how experimental

findings can be integrated. 7th International Conference on Teaching Statistics.

Brazil, July. tinyurl.com/teachma

Chapter 7

Page 39: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

39

Meta-analysis: Small or large

SMALL

Combine 2 or 3 results, or studies

LARGE e.g. Cooper’s seven steps

1. Formulate the questions, and scope of the systematic review

2. Search and obtain literature, contact researchers, find grey

literature Establish selection criteria, read and select studies

3. Code studies, enter ES estimates and coding of study features

4. Choose what to include, and design the analyses

5. Analyse the data. Prefer random effects model.

6. Interpret; draw empirical, theoretical, and applied conclusions.

7. Prepare critical discussion, present the review

8. Receive $1,000,000 and gold medal. Retire early. (Joke, alas.)Chapter 9

Page 40: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

40

Health sciences: The Cochrane Collaboration

Systematic reviews: meta-analytic summaries of research

Freely available over the internet Publicly available if your country subscribes

2,000+ reviews, aiming for 10,000+

28,000+ people in 100+ countries

Aim to update every two years (!)

Includes some psychology

Will psychology join, or should it do its own thing?

Campbell collaboration (social sciences, some psychology)

www.cochrane.org

Chapter 9

Page 41: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

41Chapter 9

Page 42: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

42

Page 43: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

43

Page 44: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

44

Models for meta-analysis

Fixed effect (FE) model

Assume every study estimates the same population ES: Assumes studies homogeneous: Only vary because of sampling

variability

Random effects (RE) model

Assumes Study i estimates i, randomly chosen from N (, 2)

Measures of heterogeneity: Q, 2, I2. Study-to-study variation

—in excess of that expected from sampling variability (cf. dance of the means)

Always (virtually always) choose random effects model

RE and FE weight studies differently, and usually give different results

If heterogeneity low, RE gives same result as FEChapter 8

Page 45: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

45

But there’s more: Moderator analysis

If heterogeneity, look for moderators that may account for it

Simplest: Dichotomous moderator? (e.g., gender)

Subgroups page of ESCI Meta-analysis

Identify moderator, even if no study manipulated that variable!

Meta-analysis can give empirical summaries, but also:

theoretical progress, and

research guidance. Gold!

Example: Peter Wilson, clumsy children, meta-analysis of 50

studies

Identify performance on complex visuospatial tasks as moderator

Conduct empirical study on this moderator Chapter 9

Page 46: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

46

Continuous moderator? Meta-regression

Fletcher & Kerr (2010): Does RTG fade with length of relationship?

Meta-regression of ES values (RTG score) against years, 13 studies

Correlation, not causality. Alternative interpretations?

Chapter 9

Relationship length in years

-0.5

0

0.5

1

1.5

2

503020105321.5

d

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8Log relationship length

Fis

her

's z

Page 47: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

47

MA in the Publication Manual

Many mentions, esp. pp. 36-37, 183.

Mainstreaming meta-analysis!

MARS (Meta-Analysis Reporting Standards) pp. 251-252.

A further big advantage of the sixth edition

Cooper, H. (2010). Reporting research in psychology: How to meet Journal Article

Reporting Standards (APA Style). Washington, DC: APA Books.

Chapter 9

Page 48: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

48

CMA: Software for meta-analysis

Comprehensive Meta Analysis www.Meta-Analysis.com

Enter ES, and its variance, for each study—in 100+ formats!

CMA calculates weighted combined ES, using FE or RE model

Assess heterogeneity of studies

Explore moderators (ANOVA, or meta-regression)

Forest plot

(Another software option: RevMan, from Cochrane website)

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009).

Introduction to meta-analysis. New York: Wiley.

Chapters 8, 9

Page 49: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

49

Assessing possible publication bias

Funnel plot: graph of SE (or

variance) of the ES of a study

(Vertical axis, high values at

top) against ES (Horiz axis).

Do small studies (near

bottom) have large ES? If so,

small studies obtaining small

ES may be missing—not

published. (In the file

drawer!)

In this example: Yes!

Chapter 9

Subgroups page of ESCI Meta-analysis

ES

SE

No difference

Page 50: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

50

Meta-analytic thinking

1. Think of past literature in meta-analytic terms

2. Think of our study as the next step in that progressively

cumulating meta-analysis

3. Report results so inclusion in future meta-analysis is easy Report all effect sizes (whether ns or not), in the best way

Manual: “…be sure to include small effect sizes (or statistically

nonsignificant findings)…” p. 32. Nothing in the file drawer!

Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530-572.

Chapters 1, 7, 9

Page 51: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

51

The New Statistics: Actually doing it!

The editor says to remove CIs and just give p values! What do we

DO?!

Be strong! The reasons for TNS are compelling, and TNS is the

way of the future. It’s worth persisting!

Manual: “Wherever possible, base discussion and interpretation

of results on point and interval estimates” (p. 34). That’s a

great imprimatur!

The evidence should decide: Cite statistical cognition research.

Numerous scholars have criticised NHST, hardly anyone has

replied!

I add in p values if I must, but I won’t remove CIs (or ESs).Chapter 15

Page 52: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

52

The New Statistics: How?

Estimation: The six-step plan1.Use estimation thinking. State estimation questions

as: “How much…?”, “To what extent…?”, “How many…?”

• Key to a more quantitative discipline?

2.Indentify the ESs that best answer the questions3.From the data, calculate point and interval estimates

(CIs) for those ESs4.Make a picture, including CIs5.Interpret6.Use meta-analytic thinking at every stage

Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better

questions. Zeitschrift für Psychologie / Journal of Psychology, 217, 15-26.Chapters 1, 2,

15

Page 53: Geoff Cumming: LAM, Paris 2 (Friday 11 May, 2012) Workshop: The New Statistics in Practice In this workshop I will discuss how to use the new statistics

53

Queries or comments to:[email protected]

Geoff’s brief radio talk:tinyurl.com/geofftalk

Geoff’s short magazine article:tiny.cc/GeoffConversation

Preface, contents & sample chapter: tinyurl.com/tnschapter7

Dance of the p values: tinyurl.com/danceptrial2

Book info, and ESCI: www.thenewstatistics.com

Hug a confidence interval today!