marks for identifying uncertainty: stimulation of learning through certainty-based marking tony...
TRANSCRIPT
Marks for identifying uncertainty:
Stimulation of learning through
Certainty-Based Marking
Tony Gardner-Medwin
Physiology, University College London
www.ucl.ac.uk/LAPT
Cambridge AssessmentJune 2009
Starting points (you may agree or disagree!)
• The nature of assessment affects how students learn & think• Objective tests/exercises can stimulate learning & understanding• Formative assessment is more important than summative
• Different Q types suit different situations, e.g. T/F, SBA, free text• Scaling to “% above chance” (%Knowledge) should be universal• Negative marking can be either really constructive or really awful
• Students & kids can enjoy assessment if it is stimulating, fair, varied, challenging, immediately rewarding, not humiliating -- like a game.
We should reward the acknowledgment of uncertainty
The take home message:
1. How Certainty-Based Marking works
2. How it relates to probability & knowledge
3. How students react & use it
4. CBM as summative assessment
5. Why isn’t it used more?
Which Certainty Level is Best?
-6
-5
-4
-3
-2
-1
0
1
2
3
0% 50% 100%How likely is your answer to be correct?
Mar
k ex
pec
ted
on
ave
rag
e
guessingrange
C=3 HighC=2 MidC=1 LowNo Reply
67% 80%
How well do students discriminate reliability ?
knowledge uncertainty don't know misconceptiondelusion
Decreasing certainty about what is true.
Increasing certainty about something false.
Increasing "ignorance"
Ordinary words we use to describe Knowledge
• Knowledge is a function of certainty (confidence, degree of belief)• There are states a lot worse than acknowledged ignorance
"It's not ignorance does so much damage - it's knowin' so derned much that ain't so." attrib J. Billings
“I was gratified to be able to answer promptly, and I did ! - I said I didn't know.” Mark Twain
• You need to know the reliability of your knowledge to use it
• Confident errors are serious, requiring attention to explanations
• Expressing uncertainty when you are uncertain is a good thing
• Confidence is about understanding why things cannot be otherwise, not about personality
• if over- or under-confident, you must calibrate through practice
• reflection and justification are essential study habits
Student Learning: Principles they readily understand
In evaluation surveys, a majority of students have always said they like CBM, finding it useful and fair.
They asked to include it in exams, and after 5yrs exam use at UCL they voted 52% : 30% to retain it (in 2005/6), though this was rejected by the conservative medical establishment.
Cheap information (& increased teamwork) require :-
1) Identifying things you will get wrong and not Google!“unknown unknowns” rather than “don't knows”
2) Judging reliability and uncertainty correctly .... setting a threshold for seeking help
.... evaluating conflicting and corroborating information
Why test knowledge? Google makes it so easy to find !
These lessons are core things that CBM teaches
In olden times, you had to rely on your own stored information .... you would make a best choice and “go for it”
School leavers have more sparse (though broader) stored info, but still have a “go for it” culture - to a scary extent! .... responding with an immediate idea & not thinking much
Certainty (Degree of
Belief)
Choice
?
?
?
??
?
??
EV I DENCE
Nuggets of knowledge
Inference
Network of Understanding
CBM places greater demands on justification &
stimulates connections
To understand = to link correctly the facts that bear on an issue.
Thinking about uncertainty / justificationdevelops understanding of relationships
Using CBM
1. With UCL LAPT software, online or from CD
2. With Moodle - work in progress
3. With commercial software – some progress, more needed!
4. Secure exams, with OMR Cards [Speedwell]
The student loses about 3 marks per 'bit' of ignorance- up to a maximum of 3 bits
CBM quite closely follows the ideal ignorance measure
-8
-6
-4
-2
0
2
0 1 2 3 4
Ma
rk a
ss
ign
ed
Lack of knowledge [ bits ] = -log2 ( Prob'y assigned to correct choice )
No negative marking
-0.2
0
0.2
0.4
0.6
0.8
1
0% 50% 100%Mar
k ex
pect
ed o
n av
erag
e
Confidence (est'd prob'y correct)
reply
no reply
-1
-0.5
0
0.5
1
0% 50% 100%Mar
k ex
pect
ed o
n av
erag
e
Confidence (est'd prob'y correct)
reply
no reply
Fixed negative marking: +/-1
-2
-1
0
1
2
3
0% 50% 100%
Mar
k ex
pect
ed o
n av
erag
e
Confidence (est'd prob'y correct)
high
no reply
Hevner 1932
mid
low
50%
What’s a good mark scheme?
-3
-2
-1
0
1
2
3
0% 50% 100%
Mar
k ex
pect
ed o
n av
erag
e
Confidence (est'd prob'y correct)
high
no reply
Davies 2002
mid
low
-120-100-80-60-40-20
020406080
100
0% 50% 100%
Mar
k ex
pect
ed o
n av
erag
e
Confidence (est'd prob'y correct)
high
no reply
Hassmen & Hunt ‘94
midlow
35% 55% 67% 85%
min
max
Gardner-Medwin’06
The standard LAPT (1,2,3 / 0,-2,-6) scheme seems better than any of these.
CBM increases the reliability of exam data
'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance.
CBM increases the effective test length
With increased 'Reliability' you don't need so many exam questions to get data of equal quality.
Cronbach alpha (reliability)
80%
85%
90%
95%
80% 85% 90% 95%
using % correct
using CBM
CBM increases the reliability of exam data with True/False Questions'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance.
To achieve these increases using only % correct would have required on average 58% more questions.
Reliability and efficiency of exams (Quality of data / number of questions) are enhanced with CBM
1
2
3
whole class
bottom 1/3 top 1/3
Relative efficiency (adjusted conf- based scores / conventional) : m±sem
* P<0.05** P<0.01
**
**
*
Coef. of Determination (r²), between odd & even numbered Qs in 6 exams (m±sem)
0
0.2
0.4
0.6
0.8
1
whole class
bottom 1/3 top 1/3
conventional
adj. conf-based
difference
** differences all P<0.01**
****
Data from 6 medical student exams (250-300 T/F Qs each, >300 students).
1
1.2
1.4
1.6
1.8
whole class
bottom 1/3 top 1/3
Relative efficiency for predicting conventional scores
(adj'd conf-based / conv.) : m±sem
* P<0.025** P<0.01
**
**
*
Coef. of Determination (r²), between odd & even numbered Qs in 6 exams (m±sem)
0
0.2
0.4
0.6
0.8
1
whole class
bottom 1/3 top 1/3
conventional
predict conv.
difference
** differences all P<0.01**
** **
Certainty-based scores predict the conventional score on different Qs better than conventional scores do.
How should one handle students with poor calibration?
Significantly overconfident in exam: 2 students (1%)
e.g. 50% correct @C=1, 59%@C=2, 73%@C=3
Significantly underconfident in exam: 41 students (14%)
e.g. 83% correct @C=1, 89%@C=2, 99%@C=3
Maybe one shouldn’t penalise such students
Adjusted confidence-based score:
Mark the set of answers at each C level as if they were entered at the C level that gives the highest score**.
mean benefit = 1.5% ± 2.1% (median 0.6%)
** (first combining sets if %correct is not in ascending order)
Scaling CBM scores to be directly comparable with conventional scores
NCOR is based on number correct, scaled so guesses (50% prob’y correct) give on average 0%. (“% Knowledge”)
0%
20%
40%
60%
80%
100%
0% 50% 100%% correct above chance
CB
S
True/False ♦ and SBA □ (5 option) components of a formative test for 345 students were ranked by conventional scores. Then for each decile, mean CBS scores are plotted against % correct above chance (“% knowledge”).
Equivalence of **scaled CBM scores and conventional scores for standard setting.
Gardner-Medwin & Curtin 2007 REAP conference, data from Imperial College**CBS = ( (Total-Chance)/(Max-Chance) )p × 100%, where p = 0.6 for TF, 0.48 for SBA (5opt)
Why doesn't everybody already use CBM ?- a puzzle
• Enthusiasm was exhausted before the age of 'online'
• Some CBM methods were complex, opaque or non-motivating
• Reluctance to treat certainty as integral to knowledge
• Mistaken worries about 'personality bias'
• Under-rating of self-assessment & practice as learning tools
• Worry that CBM would need new questions
• Worry that CBM would upset standard-setting
• Inertia and vested interests
A few of the names associated with confidence testing in education
• Andrew Ahlgren• Jim Bruno• Confucius• Robert Ebel• Jack Good
• Kate Hevner• Darwin Hunt• Dieudonné Leclercq• Emir Shuford
London Colleagues: • Mike Gahan• David Bender• Nancy Curtin
“When you know a thing, to hold that you know it.And when you do not know a thing, to allow that you do not know it.This is knowledge.”
“Learning without thought is a waste of time.”
Confucius
We fail if we mark a lucky guess as if it were knowledge.
We fail if we mark misconceptions as no worse than
ignorance.
www.ucl.ac.uk/lapt
Lessons from experience with CBM
• Practice is needed before use in exams
• Exams should re-use questions from an open database only very sparingly
• Over-confidence and diffidence are both unhealthy traits that can be moderated by practice to achieve good calibration
• With multi-option questions, students tend (at least initially) to over-estimate reliability
• Standard setting - it is easy (but important!) to scale CBM marks to match familiar scales based on number correct.
Some Questions about CBM !
• Are there problems using it?• Why doesn't my VLE support CBM?• Do students need practice?• Isn't computer marked assessment just factual?• Does CBM increase retention?• Do I need new questions?• What are the best Q types? • What about school education?• Is it relevant to my subject, where opinions differ?• Isn't it bad to encourage guessing?• What if my only assessments are exams?• How do I convince an exam board?• Isn't it right/wrong that really matters?
LAPT: Maths in Medical Science1st year new medical curriculum:
Nov '00 - 312 replies/ 330 students
0%
20%
40%
1 2 3 4 5
Not Useful ..........Very Useful
Response to LAPT numeracy exercises in medical 1st year
"I think about confidence assessment
0%
10%
20%
30%
40%
50%
Every Time Most of thetime
Rarely Never No reply
"I sometimes change my answer while thinking aboutconfidence assessment"
05
1015202530
Disagree 1 2 3 4 Agree 5
%
• thinking about the basis and reliability of answers can help tie bits of knowledge together (to form “understanding”)
• checking an answer and rereading the question are worthwhile
• sound confidence judgement is a valued intellectual skill in every context, and one they can improve
• immediate feedback while still thinking about the basis of your answer is a hugely valuable study aid
• confident errors are far worse than acknowledged ignorance and are a wake-up call (-6!) to pay attention to explanations
• expressing uncertainty when you are uncertain is a good thing
Principles that students seem readily to understand :-
• both under- and over- confidence are impediments to learning
Students really take to confidence-based marking
The correlation, across students, between scores on one set of questions and another is higher for confidence than for simple scores.
But perhaps they are just measuring ability to handle confidence ?
R2 = 0.735
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
set 1 (simple)
set
2 (s
imp
le)
B.
7
R2 = 0.814
0%
20%
40%
60%
80%
0% 20% 40% 60% 80%
set 1 (confidence)
set
2 (c
on
fid
ence
)
C.
C
R2 = 0.776
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
set 1 (conf 0.6 )s
et
2 (
sim
ple
)
D.
No. Confidence scores are better than simple scores at predicting even the conventional scores on a different set of questions. This can only be because they are a statistically more efficient measure of knowledge.
When you know a thing,to hold that you know it.
And when you do not know a thing,to allow that you do not know it.
This is knowledge. Confucius
Known Knowns ... things we know that we know.Known Unknowns ... things that we know that we don't know.Unknown Unknowns ... things we do not know we don't know. Donald Rumsfeld
Will it snow next weekend?
Does a (good) weather forecaster have knowledge? - obviously yes, but expressed through a probability
How can you measure and reward this knowledge? - the origin of CBM >100 years ago.
Does insulin raise blood glucose levels?Similar, even though the Q is not about a probability. - the probability is your certainty that your answer is right
The key is to have a "proper" or "motivating" reward scheme, which ensures that the person does best by
expressing their true level of uncertainty
CBM data is a more valid measure of ability
'Validity' means it measures what you want, rather than just something easily measured.
Why CBM?• Get students to think more carefully• Reward recognition of uncertainty, either personal or in a group• Highlight misconceptions• Engage students more - the game element of CBM• Encourage criticism of Qs (intolerance of ambiguity or looseness)• In general: enhance self-assessment as a learning experience
NB All of the above arise with little or no practice with CBM. The following do require practice :
• More searching diagnostic data• More valid and reliable assessment data
(But NB with CBM you have conventional assessment data too.)
SUMMARY