creating assessments aka how to write a test. creating assessments all good assessments have three...
TRANSCRIPT
Creating Creating AssessmentsAssessments
AKA how to write a testAKA how to write a test
Creating AssessmentsCreating Assessments
All good assessments have three All good assessments have three key features: key features: – ValidityValidity– ReliabilityReliability– UsabilityUsability
ReliabilityReliability
Next to validity, reliability is the Next to validity, reliability is the most important characteristic of most important characteristic of assessment results.assessment results.
Why?Why?1. It provides the consistency to make 1. It provides the consistency to make
validity possible.validity possible.
2. It indicates the degree to which 2. It indicates the degree to which various kinds of generalizations are various kinds of generalizations are justifiable.justifiable.
ReliabilityReliability
re·li·a·ble adj. Capable of being relied on; dependable. re·li”a·bil“i·ty or re·li“a·ble·ness n. --re·li“a·bly adv.
(American Heritage Dictionary)
ReliabilityReliability
Reliability: the consistency of Reliability: the consistency of measurement, i.e. how consistent measurement, i.e. how consistent test scores or other assessment test scores or other assessment results are from one results are from one measurement to another.measurement to another.
ReliabilityReliability
Which is more reliable?Which is more reliable?
ReliabilityReliability
Classical Test Theory:Classical Test Theory:
X = T + eX = T + eWhere:Where: X =X = observed scoreobserved score
T = “true score”T = “true score”
e = errore = error
ReliabilityReliability
x =x = observed score: The score the observed score: The score the student receives on the exam.student receives on the exam.
T = “true score”: What the student T = “true score”: What the student “really” knows.“really” knows.
ReliabilityReliability
e = errore = error
Error variance is the variability that Error variance is the variability that exists in a set of scores and is due to exists in a set of scores and is due to factors other than the one being factors other than the one being assessed.assessed.
– Systematic: errors that are consistent.Systematic: errors that are consistent.– Random: errors that have no pattern.Random: errors that have no pattern.
ReliabilityReliability
e = errore = error
Positive error (i.e. raises score):Positive error (i.e. raises score):– Lucky guesses.Lucky guesses.– Items that give clues to the answer.Items that give clues to the answer.– Cheating (students, aides, teachers).Cheating (students, aides, teachers).
ReliabilityReliability
e = error scoree = error score
Negative error (i.e. lowers score):Negative error (i.e. lowers score):– Not following directions.Not following directions.– Miss-marking items.Miss-marking items.– Room climate/atmosphere.Room climate/atmosphere.– Hunger, fatigue, illness, “need to go Hunger, fatigue, illness, “need to go
potty”.potty”.– Assemblies, ball games, fire drills, etc. Assemblies, ball games, fire drills, etc. – Break-up of a relationship.Break-up of a relationship.
Circle the figures that Circle the figures that are half shaded. are half shaded.
ReliabilityReliability
Determining Reliability:Determining Reliability: Test-retest methodTest-retest method Equivalent formsEquivalent forms Split half methodSplit half method KR-20 methodKR-20 method Interrater reliabilityInterrater reliability Intrarater reliabilityIntrarater reliability
ReliabilityReliability
Standard Error of Measurement (SEM)= Standard Error of Measurement (SEM)= the estimated amount of variation the estimated amount of variation expected in a score.expected in a score.
rSDSEM 1
ReliabilityReliability
Example: If Sara scored 78 on a standardized Example: If Sara scored 78 on a standardized test with a SEM of 6 we can be:test with a SEM of 6 we can be:
68% certain her true score is between 72 and 8468% certain her true score is between 72 and 84 95% certain her true score is between 66 and 9095% certain her true score is between 66 and 90 99% certain her true score is between 60 and 9699% certain her true score is between 60 and 96
ReliabilityReliability
Summation of Reliability:Summation of Reliability:
1.1. Reliability refers to the results Reliability refers to the results and not to the instrument itself.and not to the instrument itself.
2.2. Reliability is a necessary but not Reliability is a necessary but not sufficient condition for validity.sufficient condition for validity.
3.3. The more reliable the The more reliable the assessment, the better.assessment, the better.
UsabilityUsability
The practical aspects of a test cannot be The practical aspects of a test cannot be neglected:neglected:
– Ease of administrationEase of administration– TimeTime
AdministrationAdministration ScoringScoring
– Ease of InterpretationEase of Interpretation– Availability of equivalent formsAvailability of equivalent forms– CostCost