marks for identifying uncertainty: stimulation of learning through certainty-based marking tony...

Marks for identifying uncertainty:

Stimulation of learning through

Certainty-Based Marking

Tony Gardner-Medwin

Physiology, University College London

www.ucl.ac.uk/LAPT

Cambridge AssessmentJune 2009

http://www.ucl.ac.uk/

Starting points (you may agree or disagree!)

• The nature of assessment affects how students learn & think• Objective tests/exercises can stimulate learning & understanding• Formative assessment is more important than summative

• Different Q types suit different situations, e.g. T/F, SBA, free text• Scaling to “% above chance” (%Knowledge) should be universal• Negative marking can be either really constructive or really awful

• Students & kids can enjoy assessment if it is stimulating, fair, varied, challenging, immediately rewarding, not humiliating -- like a game.

We should reward the acknowledgment of uncertainty

The take home message:

1. How Certainty-Based Marking works

2. How it relates to probability & knowledge

3. How students react & use it

4. CBM as summative assessment

5. Why isn’t it used more?

Which Certainty Level is Best?

-6

-5

-4

-3

-2

-1

0

1

2

3

0% 50% 100%How likely is your answer to be correct?

Mar

k ex

pec

ted

on

ave

rag

e

guessingrange

C=3 HighC=2 MidC=1 LowNo Reply

67% 80%

How well do students discriminate reliability ?

knowledge uncertainty don't know misconceptiondelusion

Decreasing certainty about what is true.

Increasing certainty about something false.

Increasing "ignorance"

Ordinary words we use to describe Knowledge

• Knowledge is a function of certainty (confidence, degree of belief)• There are states a lot worse than acknowledged ignorance

"It's not ignorance does so much damage - it's knowin' so derned much that ain't so." attrib J. Billings

“I was gratified to be able to answer promptly, and I did ! - I said I didn't know.” Mark Twain

argm

• You need to know the reliability of your knowledge to use it

• Confident errors are serious, requiring attention to explanations

• Expressing uncertainty when you are uncertain is a good thing

• Confidence is about understanding why things cannot be otherwise, not about personality

• if over- or under-confident, you must calibrate through practice

• reflection and justification are essential study habits

Student Learning: Principles they readily understand

In evaluation surveys, a majority of students have always said they like CBM, finding it useful and fair.

They asked to include it in exams, and after 5yrs exam use at UCL they voted 52% : 30% to retain it (in 2005/6), though this was rejected by the conservative medical establishment.

Cheap information (& increased teamwork) require :-

1) Identifying things you will get wrong and not Google!“unknown unknowns” rather than “don't knows”

2) Judging reliability and uncertainty correctly .... setting a threshold for seeking help

.... evaluating conflicting and corroborating information

Why test knowledge? Google makes it so easy to find !

These lessons are core things that CBM teaches

In olden times, you had to rely on your own stored information .... you would make a best choice and “go for it”

School leavers have more sparse (though broader) stored info, but still have a “go for it” culture - to a scary extent! .... responding with an immediate idea & not thinking much

Certainty (Degree of

Belief)

Choice

?

?

?

??

?

??

EV I DENCE

Nuggets of knowledge

Inference

Network of Understanding

CBM places greater demands on justification &

stimulates connections

To understand = to link correctly the facts that bear on an issue.

Thinking about uncertainty / justificationdevelops understanding of relationships

Using CBM

1. With UCL LAPT software, online or from CD

2. With Moodle - work in progress

3. With commercial software – some progress, more needed!

4. Secure exams, with OMR Cards [Speedwell]

The student loses about 3 marks per 'bit' of ignorance- up to a maximum of 3 bits

CBM quite closely follows the ideal ignorance measure

-8

-6

-4

-2

0

2

0 1 2 3 4

Ma

rk a

ss

ign

ed

Lack of knowledge [ bits ] = -log2 ( Prob'y assigned to correct choice )

No negative marking

-0.2

0

0.2

0.4

0.6

0.8

1

0% 50% 100%Mar

k ex

pect

ed o

n av

erag

e

Confidence (est'd prob'y correct)

reply

no reply

-1

-0.5

0

0.5

1

0% 50% 100%Mar

k ex

pect

ed o

n av

erag

e


reply

no reply

Fixed negative marking: +/-1

-2

-1

0

1

2

3

0% 50% 100%

Mar

k ex

pect

ed o

n av

erag

e


high

no reply

Hevner 1932

mid

low

50%

What’s a good mark scheme?

-3

-2

-1

0

1

2

3

0% 50% 100%

Mar

k ex

pect

ed o

n av

erag

e


high

no reply

Davies 2002

mid

low

-120-100-80-60-40-20

020406080

100

0% 50% 100%

Mar

k ex

pect

ed o

n av

erag

e


high

no reply

Hassmen & Hunt ‘94

midlow

35% 55% 67% 85%

min

max

Gardner-Medwin’06

The standard LAPT (1,2,3 / 0,-2,-6) scheme seems better than any of these.

CBM increases the reliability of exam data

'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance.

CBM increases the effective test length

With increased 'Reliability' you don't need so many exam questions to get data of equal quality.

Cronbach alpha (reliability)

80%

85%

90%

95%

80% 85% 90% 95%

using % correct

using CBM

CBM increases the reliability of exam data with True/False Questions'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance.

To achieve these increases using only % correct would have required on average 58% more questions.

Reliability and efficiency of exams (Quality of data / number of questions) are enhanced with CBM

1

2

3

whole class

bottom 1/3 top 1/3

Relative efficiency (adjusted conf- based scores / conventional) : m±sem

* P<0.05** P<0.01

**

**

*

Coef. of Determination (r²), between odd & even numbered Qs in 6 exams (m±sem)

0

0.2

0.4

0.6

0.8

1

whole class

bottom 1/3 top 1/3

conventional

adj. conf-based

difference

** differences all P<0.01**

****

Data from 6 medical student exams (250-300 T/F Qs each, >300 students).

1

1.2

1.4

1.6

1.8

whole class

bottom 1/3 top 1/3

Relative efficiency for predicting conventional scores

(adj'd conf-based / conv.) : m±sem

* P<0.025** P<0.01

**

**

*

Coef. of Determination (r²), between odd & even numbered Qs in 6 exams (m±sem)

0

0.2

0.4

0.6

0.8

1

whole class

bottom 1/3 top 1/3

conventional

predict conv.

difference

** differences all P<0.01**

** **

Certainty-based scores predict the conventional score on different Qs better than conventional scores do.

How should one handle students with poor calibration?

Significantly overconfident in exam: 2 students (1%)

e.g. 50% correct @C=1, 59%@C=2, 73%@C=3

Significantly underconfident in exam: 41 students (14%)

e.g. 83% correct @C=1, 89%@C=2, 99%@C=3

Maybe one shouldn’t penalise such students

Adjusted confidence-based score:

Mark the set of answers at each C level as if they were entered at the C level that gives the highest score**.

mean benefit = 1.5% ± 2.1% (median 0.6%)

** (first combining sets if %correct is not in ascending order)

Scaling CBM scores to be directly comparable with conventional scores

NCOR is based on number correct, scaled so guesses (50% prob’y correct) give on average 0%. (“% Knowledge”)

0%

20%

40%

60%

80%

100%

0% 50% 100%% correct above chance

CB

S

True/False ♦ and SBA □ (5 option) components of a formative test for 345 students were ranked by conventional scores. Then for each decile, mean CBS scores are plotted against % correct above chance (“% knowledge”).

Equivalence of **scaled CBM scores and conventional scores for standard setting.

Gardner-Medwin & Curtin 2007 REAP conference, data from Imperial College**CBS = ( (Total-Chance)/(Max-Chance) )p × 100%, where p = 0.6 for TF, 0.48 for SBA (5opt)

Why doesn't everybody already use CBM ?- a puzzle

• Enthusiasm was exhausted before the age of 'online'

• Some CBM methods were complex, opaque or non-motivating

• Reluctance to treat certainty as integral to knowledge

• Mistaken worries about 'personality bias'

• Under-rating of self-assessment & practice as learning tools

• Worry that CBM would need new questions

• Worry that CBM would upset standard-setting

• Inertia and vested interests

A few of the names associated with confidence testing in education

• Andrew Ahlgren• Jim Bruno• Confucius• Robert Ebel• Jack Good

• Kate Hevner• Darwin Hunt• Dieudonné Leclercq• Emir Shuford

London Colleagues: • Mike Gahan• David Bender• Nancy Curtin

“When you know a thing, to hold that you know it.And when you do not know a thing, to allow that you do not know it.This is knowledge.”

“Learning without thought is a waste of time.”

Confucius

We fail if we mark a lucky guess as if it were knowledge.

We fail if we mark misconceptions as no worse than

ignorance.

www.ucl.ac.uk/lapt

Lessons from experience with CBM

• Practice is needed before use in exams

• Exams should re-use questions from an open database only very sparingly

• Over-confidence and diffidence are both unhealthy traits that can be moderated by practice to achieve good calibration

• With multi-option questions, students tend (at least initially) to over-estimate reliability

• Standard setting - it is easy (but important!) to scale CBM marks to match familiar scales based on number correct.

Some Questions about CBM !

• Are there problems using it?• Why doesn't my VLE support CBM?• Do students need practice?• Isn't computer marked assessment just factual?• Does CBM increase retention?• Do I need new questions?• What are the best Q types? • What about school education?• Is it relevant to my subject, where opinions differ?• Isn't it bad to encourage guessing?• What if my only assessments are exams?• How do I convince an exam board?• Isn't it right/wrong that really matters?

LAPT: Maths in Medical Science1st year new medical curriculum:

Nov '00 - 312 replies/ 330 students

0%

20%

40%

1 2 3 4 5

Not Useful ..........Very Useful

Response to LAPT numeracy exercises in medical 1st year

"I think about confidence assessment

0%

10%

20%

30%

40%

50%

Every Time Most of thetime

Rarely Never No reply

"I sometimes change my answer while thinking aboutconfidence assessment"

05

1015202530

Disagree 1 2 3 4 Agree 5

%

• thinking about the basis and reliability of answers can help tie bits of knowledge together (to form “understanding”)

• checking an answer and rereading the question are worthwhile

• sound confidence judgement is a valued intellectual skill in every context, and one they can improve

• immediate feedback while still thinking about the basis of your answer is a hugely valuable study aid

• confident errors are far worse than acknowledged ignorance and are a wake-up call (-6!) to pay attention to explanations

• expressing uncertainty when you are uncertain is a good thing

Principles that students seem readily to understand :-

• both under- and over- confidence are impediments to learning

Students really take to confidence-based marking

The correlation, across students, between scores on one set of questions and another is higher for confidence than for simple scores.

But perhaps they are just measuring ability to handle confidence ?

R2 = 0.735

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

set 1 (simple)

set

2 (s

imp

le)

B.

7

R2 = 0.814

0%

20%

40%

60%

80%

0% 20% 40% 60% 80%

set 1 (confidence)

set

2 (c

on

fid

ence

)

C.

C

R2 = 0.776

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

set 1 (conf 0.6 )s

et

2 (

sim

ple

)

D.

No. Confidence scores are better than simple scores at predicting even the conventional scores on a different set of questions. This can only be because they are a statistically more efficient measure of knowledge.

When you know a thing,to hold that you know it.

And when you do not know a thing,to allow that you do not know it.

This is knowledge. Confucius

Known Knowns ... things we know that we know.Known Unknowns ... things that we know that we don't know.Unknown Unknowns ... things we do not know we don't know. Donald Rumsfeld

Will it snow next weekend?

Does a (good) weather forecaster have knowledge? - obviously yes, but expressed through a probability

How can you measure and reward this knowledge? - the origin of CBM >100 years ago.

Does insulin raise blood glucose levels?Similar, even though the Q is not about a probability. - the probability is your certainty that your answer is right

The key is to have a "proper" or "motivating" reward scheme, which ensures that the person does best by

expressing their true level of uncertainty

CBM data is a more valid measure of ability

'Validity' means it measures what you want, rather than just something easily measured.

Why CBM?• Get students to think more carefully• Reward recognition of uncertainty, either personal or in a group• Highlight misconceptions• Engage students more - the game element of CBM• Encourage criticism of Qs (intolerance of ambiguity or looseness)• In general: enhance self-assessment as a learning experience

NB All of the above arise with little or no practice with CBM. The following do require practice :

• More searching diagnostic data• More valid and reliable assessment data

(But NB with CBM you have conventional assessment data too.)

SUMMARY

marks for identifying uncertainty: stimulation of learning through certainty-based marking tony...

Documents

knowledge knowledge

knowledge uncertainty

test knowledge

probability knowledge

mark twain slide

summative assessment

certaintybased marking

formative assessment