evaluation: testing, objective-to-test-item matching and judgments of worth edtec 540 james marshall
TRANSCRIPT
Evaluation: Testing, Objective-to-Test-Item Matchingand Judgments of Worth
EDTEC 540
James Marshall
Session Overview
Evaluation Approaches Testing – one possible data point in
evaluation Norm-referenced Criterion-referenced
Objective-to-test-item matching Measurement error, reliability and validity
Evaluation, typically Typically, it doesn’t happen! That said, it should And it is required for many funded projects What happened? Were goals and objectives
achieved? How can we find that out? At the end is NOT the only time to measure
worth. When else? Strategies: tests, observations, surveys, chats
with managers, look at work, results
Evaluation Approaches
Objectivist Belief in a reality that can be known and measured. Prevalent in
education and our business. Objectives-based, deceptively simple. Establish goals-->set
objectives--> tailor instruction to obj-->judge effectiveness. Measures are analytical/quantitative in nature. Examples
Do first-graders know the letters of the alphabet? Can the new account representative describe the features of each
checking account – as defined by the bank? Others?
Advantages/disadvantages?
Evaluation Approaches
Constructivist Belief that people construct their own realities. Advocates believe that
truth is a matter of consensus, not measurement against an objective reality.
Evaluation creates detailed descriptions of that which is inside the head of the learner. Reliance upon open-ended exercises, observation, cases and immersion in
the field. Observation is useful for us, in that IDs build prototypes, conduct formative
evaluations, revise and cycle again. Measures are qualitative in nature. Examples
Role play exercise to deal with a hostile customer Theme Park Tycoon – running a theme park for a year Essay question asking you to describe your understanding of Educational
Technology Advantages/disadvantages?
Evaluation Approaches
Postmodern/Critical Objectivists proclaim objectivity. Constructivists
approve of subjectivity. Postmoderns are social activists.
Focus on questions of power, “Who are you to set objectives for others?” Use of deconstruction to see what’s inside texts and materials.
Most interested in the hidden curriculum, such as the teaching of traditional gender roles. What does the curriculum teach?
Why should IDs care about this evaluation approach?
Evaluation Frameworks:Kirkpatrick’s Model
Level 4: Does it matter? Does it advance strategy?
Level 3: Are they doing it (objectives) consistently and appropriately?+++++++++++++++++++++++++
Level 2: Can they do it (objectives)? Do they show the skills and abilities?
Level 1: Did they like the experience? Satisfaction? Use? Repeat use?
Evaluation Frameworks: CIPP
Context assesses program/product needs, problems or opportunities specific to the project environment.
Input to assess, evaluate and allocate project resources in order to meet identified needs and objectives, solve problems, and optimize program impact.
Process assesses project implementation. Product assesses planed and unintended
(unforeseen) outcomes, both to keep a project on track and to determine effectiveness or impact.
Types of Tests
Used to evaluate changes in skills and knowledge
Is testing alone sufficient?
Test Types: Norm-Referenced
Compare an individual's performance to the performance of other people.
Require varying item difficulties. Assume not everybody is going to "get it"
Discern those who "got it" from those who didn't.
Normal Distribution
Test Types: Norm-Referenced
Norm-referenced tests compare the individual to the group. Accomplished statistically by “norming” the test with large
numbers of people.
Consider: You sat for the GRE and received the following scores.
You need to retake the test. What is your study plan?
570 51 22 28
Test Types: Norm-Referenced
Limitations Not especially helpful for:
identifying individual skill deficiencies identifying weaknesses in the instruction
Test Types: Criterion-Referenced
Compares an individual's performance to the acceptable standard of performance for those tasks.
Requires completely specified objectives. Asks: Can this person do that which has been
specified in the objectives? Results in yes-no decisions about
competence.
Test Types: Criterion-Referenced
Applications Diagnosis of individual skill deficiencies Certification of skills Evaluation and revision of instruction
Limitations Tend to focus on specific skills Results may not reflect general aptitudes Everyone may get an “A”
IQ test
GRE
SDSU Writing Competency
Red Cross Lifesaving Certificate
EDTEC 540 midterm and final exams
NR CRT
Which Test is Which?
Give out a CA driver's license
Pick students for Russian lang. training
Determine entrance into medical school
PADI Scuba Certification
Select one EDTEC scholarship recipient
Figure out where to revise a course
Decide which students need remediation
NR CRT
Which Test is Which?
Utility of Test Scores Selection & screening (before):
mastery of prerequisites -- for remediation/placement mastery of course objectives -- for acceleration (“testing out”)
Individual diagnosis and prescription (along the way) Practice (along the way) Grades & summative scores (at or after the end):
promotion certification and licensure
Administrative: course evaluation trainer accountability
Objectives Objectives ItemsItems
Given a map of the USA with state borders marked, the lwbat write the abbreviation for 45 of 50 states in 15 mins.
Given a pair of well-worn shoes, the lwbat identify what's wrong with the shoes and the tools and materials necessary to fix them.
Given a goal, lwbt write at least two appropriate objectives with proper ABCD parts.
Here is a map of the USA with the states outlined-- but no names. Use the state abbreviations and fill them in-- you've got 15 mins to get at least 45.
Take a look at this pair of shoes. What problems do you see? What will you need to fix them?
The goal of the instruction is: "ID's will know how to write resumes." Write at least 2 objectives with all four parts.
Criterion-referenced Test Items
Matching Test Items to Objectives
Matching ensures validity Validity is the extent to which the test measures what is
important to performance. Does a high score on the test equate to high performance on the job?
The validity of a criterion-referenced test is enhanced when: objectives match real-world performances (based on solid
analysis); test items match stated objectives (including condition).
Match, or Not?
Given any stocked fruit or vegetable, the Ralphs Grocery Checker will be able to verbally state the code which matches the produce provided with 100% accuracy.
Here is a persimmon from the produce department and the produce code job aid. Please state the produce code for this item. You may examine the persimmon and reference the job aid.
Match, or Not?
Given a tree in need of pruning, the gardener’s apprentice will be able to select the correct tree pruning device, based upon the type of tree presented.
Here is an overgrown elm tree. Please select the appropriate tool with which you will prune the tree.
Match, or Not? Given a descriptive order
for a Café Mocha, including size, caf/decaf, type of milk, the barista will be able to create the drink as specified in the Starbuck’s Guide to Coffee Creations.
A customer has just ordered a Grande, non-fat, mocha. Please list the ingredients you will need, and describe the steps you would take to create the drink.
Evaluating a Training Program
Consider: Your evaluation uses a
criterion-based test to see if the new account representatives can describe the different types of accounts offered by the bank. All representatives were
able to meet the specified criteria
Case closed… or, do you want to know more?
Ideas in Testing
Measurement Error Validity Reliability
Measurement Error
Many causes: mechanical or scoring errors poor wording (confusing,
ambiguous) poor subject matter, content
(validity) score variation from one time to
another (reliability) score variation from "equivalent"
tests test administration procedure inter-rater reliability mood of the student
Validity
Does the test assess what's important? Does it really seek out the skill and knowledge linked to the world? (content validity)
Types: Content Validity (most important to us) Predictive Validity (e.g. SAT, GRE)
Reliability
Are the scores produced by the test trustworthy and stable over time?
Assessed by: parallel (equivalent) forms or test-retest internal consistency
Testing and Evaluation
A Look Ahead: ED 690 – Procedures of Investigation
Provides introduction to evaluation procedures and methods
Introduces research process, statistical analysis ED 791A, 791B, 791C
Evaluation sequence most often completed by EDTEC students, over writing a thesis
Conduct a full-scale evaluation (design, research, report) for a living, breathing client over a two-semester timeframe