confidence-based assessment tony gardner-medwin - physiology, ucl

24
Confidence-based assessment Tony Gardner-Medwin - Physiology, UCL context confidence assessment as a study tool confidence assessment in exams More info:- web site : www.ucl.ac.uk/~cusplap

Upload: greta

Post on 21-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Confidence-based assessment Tony Gardner-Medwin - Physiology, UCL. context confidence assessment as a study tool confidence assessment in exams. More info:- web site : www.ucl.ac.uk/~cusplap. INTROSPECTION AND ACTIVE LEARNING IN BIOMEDICAL STUDY Tony Gardner-Medwin - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

Confidence-based assessment

Tony Gardner-Medwin - Physiology, UCL

context

confidence assessment as a study tool

confidence assessment in exams

More info:-

web site : www.ucl.ac.uk/~cusplap

Page 2: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

INTROSPECTION AND ACTIVE LEARNING IN BIOMEDICAL STUDYTony Gardner-Medwin

In the CRUCIFORM

Confidence-based marking to develop introspection - LAPT

The problems:

Fewer staff, more students, less small group & practical teaching

Rote learning: students focus on information, not understanding

Poor introspection, concept manipulation, numeracy

Some ways computers can help:

Interactive simulations to develop visual intuition - LABVIEW

Thinking in parallel - TALK (cf. DISCOURSE -

see separate demo)

Life & Times of guess-who - an illustrative QUIZ

Page 3: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

TALK & PAGER

PAGER - Pops up messages onto students’ screens on the network

TALK - Show simultaneous student responses to the tutor/s (cf. DISCOURSE - commercial package)

PAGE - Any new version of a text file pops up on top of students’ work.

WATCH- up to 80 text messages visible simultaneously within a few secs.

NETWORK - Everyone sees all messages within a few secs.

Page 4: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

Knowledge depends on degree of belief, or confidence:

knowledge

uncertainty

ignorance

misconception

delusion

What is Knowledge?Knowledge depends on degree of belief, or confidence: knowledge uncertainty ignorance misconception delusion

increasing

nescience

=0 -log2(confidence*)

for truth of a

=1 true proposition

>>1

Measurement of knowledge requires the eliciting of confidence (or *subjective probability) for the truth of correct statements.

This requires a proper scheme of incentives

Page 5: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

LAPT confidence-based scoring scheme

Confidence Level 1 2 3

Score if Correct 1 2 3Score if incorrect 0 -2 -6P(correct) < 67% >67% >80%Odds < 2:1 >2:1 >4:1

-8

-6

-4

-2

0

2

4

0 1 2 3 4

-log2(Subjective Probability)

Sco

re0%

20%

40%

60%

80%

100%

0.5 0.75 1

Subjective Probability

Su

bje

cti

ve

Ex

pec

tati

on

of

Sc

ore

C=2

C=1

C=3

Page 6: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

Using ± 1 NEGATIVE MARKING

-50%

0%

50%

100%

Omits answers Ignorant,

answers all

Thinks allcorrect

Realisitic butcan't discrim.

Discriminates,omits uncertain

answers

Perfectknowledge

Using CONFIDENCE SCHEME

-50%

0%

50%

100%

Omits answers

Ignorant, answers all

C=3 for all

C=2 for all

Discriminates: C=3, C=1

Perfectknowledge

% CORRECT

0%

50%

100%

Omits answers Ignorant,

answers all

Thinks allcorrect

Realisitic butcan't discrim.

Discriminates,

omits uncertainanswers

Perfectknowledge

100 T/F Qs: 40% reliable confident answers P(correct) =160% uncertain knowledge: P(correct) =0.6Overall: P(correct) =0.76

Increasing student knowledge

Page 7: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

- evaluation next

- basic principle

Page 8: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

0

5 E +05

0 4 00

UCL LAPT usage (on campus only)

Page 9: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

"How useful was confidence assessment?"

0%

10%

20%

30%

40%

50%

Very

Useful

Useful Not useful

at all

No Reply

Evaluation study (with K. Issroff)

136 replies (/210) after 1st yr medical course

Page 10: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

How useful were the explanations?

0%10%20%30%40%50%60%

Very

Useful

Useful Not

useful at

all

No

Reply

Page 11: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

"I think about confidence assessment

0%

10%

20%

30%

40%

50%

Every Time Most of thetime

Rarely Never No reply

"I sometimes change my answer while thinking aboutconfidence assessment"

05

1015202530

Disagree 1 2 3 4 Agree 5

%

Page 12: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

Discrimination performance - in-course & exam [331 medical students: 190 F, 141 M]

50%

60%

70%

80%

90%

100%

i-c exF exM @ C=1

i-c exF exM @ C=2

i-c exF exM @ C=3

% c

orre

ct

5%, 95% percentiles

Page 13: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

• confident errors are far worse than acknowledged ignorance and are a wake-up call (-6!) to pay attention to explanations

• expressing uncertainty when you are uncertain is a good thing

• thinking about the basis and reliability of answers can help tie bits of knowledge together (to form “understanding”)

• checking an answer and rereading the question are worthwhile

• sound confidence judgement is a valued intellectual skill in every context, and one they can improve

Principles that students seem readily to understand :-

• both under- and over- confidence are impediments to learning

Page 14: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

- analysis of exam data

- student evaluation

Page 15: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

A problem with conventional scoring:

• many answers are based on partial and uncertain knowledge

• these contribute relatively little to the credit

• - but a lot to the variance

This is statistically inefficient

Since we can identify the uncertain answers, we can assess the magnitude of this problem under exam conditions

- 331 students, 500 True/False Questions

Page 16: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

simple score

con

fid

ence

-bas

ed s

core

A.

(50% correct)

d

a

c

b

y = x1.67

equality (only expected for a pure mix of certain knowledge and total

guesses) scores if uncertainty is homogeneous and correctly reported

theoretical scores for homogeneous uncertainty, based on an information theoretic measure

Page 17: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

0%

20%

40%

60%

80%

100%

answ ers correctansw ers

credit variance credit variance

@C=1

@C=2

@C=3

simple scores confidence scores

Simple scores (scaled conventional scores) were scaled so chance gives 0%, total knowledge 100% (equivalent to +1 for correct, -1 for incorrect, 0 for omission).

Breakdown of credit and variance due to uncertainty

- 65% of the variance came from answers at C=1, but only 18% of the credit.

Confidence scores: these give less weight to uncertain answers; uncertainty variance is then more in proportion to credit, and was reduced by 46% (relative to the variation of student marks)

Page 18: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

Exam marks are determined by:

1. the student’s knowledge and skills in the subject area

2. the level of difficulty of the questions

3. chance factors in the way questions relate to details of the student’s knowledge

4. chance factors in the way uncertainties are resolved (luck)

The most convincing test of this is to compare marks on one set of questions with marks for the same student on a different set . A good correlation means we are measuring something about the student, not just “noise”

(1) = “signal” (its measurement is the object of the exam)

(3,4) = “noise” (random factors obscuring the “signal”)

Confidence-based marks improve the “signal-to-noise ratio”

Page 19: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

The correlation, across students, between scores on one set of questions and another is higher for confidence than for simple scores.

But perhaps they are just measuring ability to handle confidence ?

R2 = 0.735

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

set 1 (simple)

set

2 (s

imp

le)

B.

7

R2 = 0.814

0%

20%

40%

60%

80%

0% 20% 40% 60% 80%

set 1 (confidence)

set

2 (c

on

fid

ence

)

C.

C

R2 = 0.776

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

set 1 (conf 0.6 )s

et

2 (

sim

ple

)

D.

No. Confidence scores are better than simple scores at predicting even the conventional scores on a different set of questions. This can only be because they are a statistically more efficient measure of knowledge.

Page 20: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

How should one handle students with poor calibration?

Significantly overconfident: 2 students (1%)

e.g. 50% correct @C=1, 59%@C=2, 73%@C=3

Significantly underconfident: 41 students (14%)

e.g. 83% correct @C=1, 89%@C=2, 99%@C=3

Maybe one shouldn’t penalise such students

Adjusted confidence-based score:

Mark the set of answers at each C level as if they were entered at the C level that gives the highest score.

mean benefit = 1.5% ± 2.1% (median 0.6%)

Page 21: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%simple scaled score

con

fid

en

ce-

ba

sed

sc

ore

A.

(50% correct)

d

a

c

b

(100% correct)

y = x1.67

equality (only expected for a pure mix of certain knowledge and total

guesses) scores if uncertainty is homogeneous and correctly reported

theoretical scores for homogeneous uncertainty, based on an information theoretic measure

Page 22: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

0.700

0.750

0.800

0.850

0.900

simple :simple

conf : conf

conf(adj): conf(adj)

simple :conf

simple :conf(adj)

Mean values of r2 for 16 random partitionings of the 500 questions : score on one set vs score on the other

simple conf conf (adj)

Signal / noise variance ratio: 2.8 5.3 4.3

Savings in no. of Qs required: - 48% 35%

Page 23: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL

SUMMARY CONCLUSIONS

• Adjusted confidence scores seem the best scores to use (they don’t discriminate on the basis of the calibration of a person’s confidence judgements, and are also the best predictors of performance on a separate set of questions).

• Reliable discrimination of student knowledge can be achieved with one third fewer questions, compared with conventional scoring.

• Confidence scoring is not only fundamentally more fair (rewarding students who can correctly identify which answers are uncertain) but it is more efficient at measuring performance.

Page 24: Confidence-based assessment  Tony Gardner-Medwin -  Physiology, UCL