what do you know when you know the test results? the meanings of educational assessments annual...

40
What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational Assessment, Cambridge, UK: September 10th, 2008. www.dylanwiliam.net

Upload: kevin-erik-oconnor

Post on 12-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

What do you know when you know the test results? The meanings of educational assessments

Annual Conference of the International Association for Educational Assessment, Cambridge, UK: September 10th, 2008.

www.dylanwiliam.net

Page 2: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

2

Overview of presentationWhat have we learned about assessment?The importance of (un)reliabilityEvolving conceptions of validity(Mis)uses of assessments for educational accountabilitySome prospects for the future

Page 3: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

3

Reliability vs. consistencyClassical measures of reliabilityare meaningful only for groupsare designed for continuous measures

Scores versus gradesScores suffer from spurious accuracyGrades suffer from spurious precision

Classification consistencyA more technically appropriate measure of the reliability of assessmentCloser to the intuitive meaning of reliability

Page 4: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

4

ValidityTraditional definition: a property of assessments A test is valid to the extent that it assesses what it purports to assess Key properties (content validity)

Relevance Representativeness

“Trinitarian’ views of validity Content validity Criterion-related validity

Concurrent validity Predictive validity

Construct validity

Page 5: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

5

Predictive validity

20

40

60

80

100

20 40 60 80

Predictor

Criterion

Predictive validity~0.4

Probability of selecting best candidate with coin flip: 0.50

Probability of selecting best candidate with predictor: 0.60

Page 6: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

6

A practical exampleReliability 0.85

Validity 0.7

Should actually be in

Set 1 Set 2 Set 3 Set 4

Students Set 1 23 9 3

placed Set 2 9 12 6 3

in Set 3 3 6 7 4

Set 4 3 4 8

50 out of 100 students end up in the “wrong” set

Page 7: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

7

A made-up—but real—problemConsider the following scenario:We wish to select candidates for a medical degree programmeWe have an assessment that predicts performance on the programme with

a correlation of 0.71We have 1000 applicants for 100 places

The logical solutionChoose the 100 students with the highest score on the predictor.

Page 8: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

8

But what if…

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80

Predictor

Criterion

Correlation = 0.71

Page 9: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

9

Effects of hyperselectivityPercent of

cohort selectedPercent of

females selected

50 50

40 49

30 47

20 44

10 39

5 35

2 31

1 28

0.1 11

If there areequal numbers of males and

females, andmales and females have

equal mean scores, but the standard deviation of the

male scores is 20% greater than the standard deviation of the female scores

Page 10: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

10

ValidityValidity is a property of inferences, not of assessments

“One validates, not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971; emphasis in original)

The phrase “A valid test” is therefore a category error (like “A happy rock”) No such thing as a valid (or indeed invalid) assessment No such thing as a biased assessment

Reliability is a pre-requisite for validity Talking about “reliability and validity” is like talking about “swallows and birds” Validity includes reliability

Page 11: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

11

Modern conceptions of validity

Validity subsumes all aspects of assessment qualityReliabilityRepresentativeness (content coverage)RelevancePredictiveness

But not impact (Popham: right concern, wrong concept)

“Validity is an integrative evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989 p. 13)

Page 12: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

12

Consequential validity? No such thing!As has been stressed several times already, it is not that adverse social consequences of test use render the use invalid, but, rather, that adverse social consequences should not be attributable to any source of test invalidity such as construct-irrelevant variance. If the adverse social consequences are empirically traceable to sources of test invalidity, then the validity of the test use is jeopardized. If the social consequences cannot be so traced—or if the validation process can discount sources of test invalidity as the likely determinants, or at least render them less plausible—then the validity of the test use is not overturned. Adverse social consequences associated with valid test interpretation and use may implicate the attributes validly assessed, to be sure, as they function under the existing social conditions of the applied setting, but they are not in themselves indicative of invalidity. (Messick, 1989, p. 88-89)

Page 13: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

13

Threats to validityInadequate reliabilityConstruct-irrelevant varianceDifferences in scores are caused, in part, by differences not relevant to the

construct of interest The assessment assesses things it shouldn’t The assessment is “too big”

Construct under-representationDifferences in the construct are not reflected in scores

The assessment doesn’t assess things it should The assessment is “too small”

With clear construct definition all of these are technical—not value—issuesBut they interact strongly…

Page 14: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

Some practical applications

Page 15: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

15

School effectivenessDo differences in student achievement outcomes support inferences about school quality?

Key issues:Construct-irrelevant variance

Differences in prior achievementConstruct under-representation

Systematic exclusion of important areas of achievement

Page 16: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

Construct-irrelevant variance

Page 17: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

0.2

0.4

0.6

0.8

1.0

960 1000 1040 1080

CVA

%L2+EM 2007

Page 18: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

18

Differences in value-added are often insignificant…

(Wilson & Piebalga, 2008)

Middle 50%: differences notsignificantly different from

average

Page 19: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

19

…are usually small…In England:7% of the variability in secondary school examination scores are attributable

to the school93% of the variability in secondary school examination scores are nothing to

do with the school

Page 20: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

20

…and are transient

Goldstein & Leckie, 2008

Ranks of value-added scores predicted 6 years ahead

Page 21: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

Construct under-representation

Page 22: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

22

Learning is slower than usually assumed

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

6 7 8 9 10 11 12

Age (years)

Facility

Source: Leverhulme Numeracy Research Programme

860+570=?

Page 23: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

23

…so the overlap between cohorts is large

QuickTime™ and a decompressor

are needed to see this picture.

The spread of achievement within each cohort is greater than generally assumed

Page 24: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

24

…but is made worse by the standard procedures of test designConsider the fate of an item that all 7th grade students answer incorrectly and all 8th grade students answer correctly It will not be included in any 7th grade test because it fails to discriminate

between 7th graders It will not be included in any 8th grade test because it fails to discriminate

between 8th graders

Conclusion: any item that really tells you what students are learning will not be included in a standarized test

Page 25: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

The social consequences of inadequate assessments

Page 26: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

26

The Macnamara Fallacy The first step is to measure whatever can be easily measured.This is OK as far as it goes.The second step is to disregard that which can’t easily be measured or to give it an arbitrary quantitative value.This is artificial and misleading.The third step is to presume that what can’t be measured easily really isn’t important.This is blindness.The fourth step is to say that what can’t be easily measured really doesn’t exist.This is suicide.

(Handy, 1994 p. 219)

Page 27: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

27

Goodhart’s law (Campbell’s law)All performance indicators lose their meaning when adopted as policy targets: Inflation and money supplyRailtrack’s performance targetsNational Health Service waiting listsNational or provincial school achievement targets

The clearer you are about what you want, the more likely you are to get it, but the less likely it is to mean anything

Page 28: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

28

Effects of narrow assessmentIncentives to teach to the testFocus on some subjects at the expense of othersFocus on some aspects of a subject at the expense of othersFocus on some students at the expense of others (“bubble” students)

ConsequencesLearning that is

Narrow Shallow Transient

Page 29: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

29

Written examinations… they have occasioned and made well nigh imperative the use of mechanical and rote methods of teaching; they have occasioned cramming and the most vicious habits of study; they have caused much of the overpressure charged upon schools, some of which is real; they have tempted both teachers and pupils to dishonesty; and last but not least, they have permitted a mechanical method of school supervision.

(White, 1888 p517-518).

Page 30: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

30

“All the women are strong, all the men are good-looking, and all the children are above average.” Garrison Keillor

The Lake Wobegon effect revisited

QuickTime™ and a decompressor

are needed to see this picture.

Page 31: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

31

Achievement of English 16-year-olds

30

35

40

45

50

55

60

65

70

1995/961996/9751997/981998/991999/002000/012001/022002/032003/0462004/052005/062006/07

Percentage achieving

5 A*-C5A*-C +EM

Page 32: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

32

460

470

480

490

500

510

520

530

540

550

560

95 96 97 98 99 00 01 02 03 04 05 06

PIRLSPISA(S)PISA(M)PISA(R )TIMSS(M)TIMSS(S)

Page 33: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

33

What does it all mean?Reliability requires random sampling from the domain of interest

Increasing reliability requires increasing the size of the sample

Using teacher assessment in certification is attractive: Increases reliability (increased test time) Increases validity (addresses aspects of construct under-representation)

But problematic Lack of trust (“Fox guarding the hen house”) Problems of biased inferences (construct-irrelevant variance) Can introduce new kinds of construct under-representation (“banking” models of

assessment)

Page 34: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

34

Senning Transitional Switch

Early death rateSenning 12%Transitional 25% Bull, et al (2000). BMJ, 320, 1168-1173.

Improvements in pediatric cardiac surgery

Page 35: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

35

Life expectancy:Senning: 46.6 yearsSwitch: 62.6 years

Impact on life expectancy

Page 36: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

36

The challengeTo design an assessment system that is: Distributed

So that evidence collection is not undertaken entirely at the end Synoptic

So that learning has to accumulate Extensive

So that all important aspects are covered (breadth and depth) Manageable

So that costs are proportionate to benefits Trusted

So that stakeholders have faith in the outcomesThis is not rocket scienceIt’s much harder than that

Page 37: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

37

The effects of context

Page 38: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

38

The ancient yogis used logs of wood, stones, and ropes to help them practise asanas effectively. Extending this principle, Yogacharya Iyengar invented props which allow asanas to be held easily, and for a longer duration without strain.

Yogacharya Iyengar in setubandha sarvangasana

This version of the posture requires considerable strength in the neck, shoulders, and back requiring years of practice to achieve it. It should not be attempted without supervision.

Page 39: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

39

Page 40: What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational

40

The effects of contextBeliefs about what constitutes learning; in the value of competition between students; in the value of competition between schools; that test results measure school effectiveness; about the trustworthiness in numerical data, with bias towards a single number; that the key to schools’ effectiveness is strong top-down management; that teachers need to be told what to do, or conversely that they have all the answers

In the context of your own assessment system, which beliefs are most significant for education reform? can be changed?