© 2012, richard p phelpsinternational test commission, 8th conference, amsterdam, july, 2012 1 the...

34
© 2012, Richard P PHELPS International Test Commission, 8th Confere nce, Amsterdam, July, 2012 1 The effect of testing on student achievement: 1910-2010 Richard P. PHELPS

Upload: roy-mccarthy

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

1

The effect of testing on student achievement: 1910-2010

Richard P. PHELPS

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

2

Meta-analysis

• A method for summarizing a large research literature, with a single, comparable measure.

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

3

The effect of testing on student achievement

• 12-year long study

• analyzed close to 700 separate studies, and more than 1,600 separate effects

• 2,000 other studies were reviewed and found incomplete or inappropriate

• lacking sufficient time and money, hundreds of other studies will not be reviewed

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

4

Looking for studies to include in the meta-analyses

1. Included only those studies that found an effect from testing on student achievement or on teacher instruction…

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

5

Studies included in the meta-analyses

2. …when:

• a test is newly introduced, or newly removed• quantity of testing is increased or reduced• test stakes are introduced or increased, or removed or

reduced

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

6

Studies included in the meta-analyses

3. …plus previous research summaries (e.g.)

• Kulik, Kulik, Bangert-Drowns, & Schwalb (1983-1991) on:

– mastery testing, – frequency of testing, and – programs for high-risk university

students• Basol & Johanson (2009) on testing

frequency • Jaekyung Lee (2007) on cross-state

studies • W.J. Haynie (2007) in career-tech ed

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

7

Number of studies of effects, by methodology type

Methodology typeNumber of

studiesNumber of

effects

Quantitative 177 640

Surveys and public opinion polls (US & Canada)

247 813

Qualitative 245 245

TOTAL 669 1698

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

8

Effect size: Cohen’s d

d = (YE - YC) / Spool

YE = mean, experimental group

YC = mean, control group

Spooled = standard deviation

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

9

Effect size: Other formulae

d = t*((n1+n2/n1*n2)^0.5

d = 2r/(1-r²)^0.5

d = (YE pre-YE post-YC pre+YC post)/Spooled post

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

10

Effect size: Interpretation

• d between 0.25 & 0.50 weak effect

• d between 0.50 et 0.75 medium effect

• d more than 0.75 strong effect

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

11

Quantitative studies

(population coverage ≈ 7 million persons)

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

12

Quantitative studies: Effect size

• “Bare bones” calculation:

d ≈ +0.55 …a medium effect

• Bare bones effect size adjusted for measurement error

d ≈ +0.71 …a stronger effect

• Using same-study-author aggregation

d ≈ +0.88 …a strong effect

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

13

Which predictors matter?

Treatment Group…Mean Effect

Size

…is made aware of performance, and control group is not +0.98

…receives targeted instruction (e.g., remediation) +0.96

…is tested with higher stakes than control group +0.87

…is tested more frequently than control group +0.85

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

14

More Moderators – Source of Test

Number of Studies

Mean Effect Size

Researcher or Teacher 87 0.93National 24 0.87Commercial 38 0.82State or District 11 0.72Total 160

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

15

More Moderators – Sponsor of Test

Number of Studies

Mean Effect Size

International 5 1.02Local 99 0.93National 45 0.81State 11 0.64Total 160

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

16

More Moderators - Study Design

Number of Studies

Mean Effect Size

Pre-post 12 0.97Experiment, Quasi-experiment 107 0.94Multivariate 26 0.80Experiment, posttest only 7 0.60Pre-post (with shadow test) 8 0.58

Total 160

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

17

More Moderators – Scale of Analysis

Number of Studies

Mean Effect Size

Aggregated 9 1.60Small-scale 118 0.91Large-scale 33 0.57

Total 160

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

18

More Moderators – Scale of Administration

Number of Studies

Mean Effect Size

Classroom 115 0.95Mid-scale 6 0.72Large-scale 39 0.71

Total 160

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

19

Surveys and opinion polls

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

20

Percentage of survey items, by respondent group and type of survey

05

10

1520253035

404550

Public opinion polls Program evaluationsurveys*

Per

cen

t EducationProviders

EducationConsumers

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

21

Number and percent of survey items,by test stakes and target group

Test stakes Number % Target group Number %

High 507 62 Students 393 46

Medium 184 23 Schools 281 33

Low 33 4 Teachers 116 14

Unknown 89 11 No stakes 64 7

TOTAL 813 TOTAL 854

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

22

Opinion polls, by year

• 244 between 1958--2008, in the U.S. & Canada

• 813 unique question-response combinations

• close to 700,000 individual respondents

0

20

40

60

80

100

120

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005

Year

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

23

Surveys and opinion polls: Regular standardized tests, performance tests

Regular tests(N ≈125)

Performance tests (N ≈ 50)

Respondent opinion d d

Achievement is increased 1.2 1.0

…weighted by size of study population 1.9 0.5

Instruction is improved 1.0 1.4

…weighted by size of study population 0.9 0.9

Tests help align instruction 1.0 1.0

…weighted by size of study population 0.5 0.9

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

24

Qualitative studies: Summary

(One cannot calculate an effect size.)

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

25

Qualitative studies, by methodology type

MethodologyNumber of

studies %

Case study 120 43

Experiment or pre-post study 21 7

Interviews (individual or group) 75 27

Journal 2 1

Review of official records, documents, reports 33 12

Research review 8 3

Survey 22 8

TOTAL 281 100

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

26

Qualitative studies: Effect on student achievement

Direction of effectNumber of

studies Percent of studiesPercent without

the inferred

Positive 204 84 93

Positive inferred 24 10

Mixed 5 2 2

No change 8 3 4

Negative 3 1 1

TOTAL 244 100 100

244 studies conducted in the past century in over 30 countries

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

27

Qualitative studies: Testing improves student achievement and teacher instruction

Achievement is improvedNumber of

studies %

Yes 200 95

Mixed results 1 <1

No 10 5

TOTAL 211 100

Instruction is improvedNumber of

studies %

Yes 158 96

No 7 4

TOTAL 165 100

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

28

Qualitative studies: Variation by rigor and test stakes

Direction of effect

Level of rigor

Totalhigh medium low

Positive 95 67 42 204

Positive inferred 10 8 6 24

Mixed 3 1 1 5

No change 4 3 1 8

Negative 1 1 1 3

TOTAL 113 80 51 244

Direction of effect

Stakes

Totalhigh medium low unknown

Positive 133 27 38 6 204

Positive inferred 12 5 7 24

Mixed 4 1 5

No change 2 1 5 8

Negative 3 3

TOTAL 154 33 51 6 244

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

29

Qualitative studies: Regular standardized tests and performance tests

Regular tests (N =176)

Performance tests (N = 69)

Study results % %

Generally positive 93 95

High-stakes tests 71 42

High level of study rigor 46 48

Student attitudes toward test positive 60 71

Teacher attitudes toward test positive 55 80

Student achievement improved 95 95

Instruction improved 92 100

Large-scale testing 86 68

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

30

An enormous research literature

• But, assertions that it does not exist at all are common

– Some claims are made by those who oppose standardized testing, and may be wishful thinking

– Others are “firstness” claims

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

31

Dismissive research reviews

• With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

32

Firstness claims

• With a firstness claim, a researcher insists that he or she is the first to ever study a topic

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

33

Social costs are enormous

• Research conducted by those without power or celebrity is dismissed -- ignored and lost

• Public policies are skewed, based exclusively on the research results of those with power or celebrity

• Society pays again and again for research that has already been done

© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

34

The effect of testing on student achievement:

1910-2010

Richard P. PHELPS