evaluation: testing, objective-to-test-item matching and judgments of worth edtec 540 james marshall

Evaluation: Testing, Objective-to-Test-Item Matchingand Judgments of Worth

EDTEC 540

James Marshall

Session Overview

Evaluation Approaches Testing – one possible data point in

evaluation Norm-referenced Criterion-referenced

Objective-to-test-item matching Measurement error, reliability and validity

Evaluation, typically Typically, it doesn’t happen! That said, it should And it is required for many funded projects What happened? Were goals and objectives

achieved? How can we find that out? At the end is NOT the only time to measure

worth. When else? Strategies: tests, observations, surveys, chats

with managers, look at work, results

Evaluation Approaches

Objectivist Belief in a reality that can be known and measured. Prevalent in

education and our business. Objectives-based, deceptively simple. Establish goals-->set

objectives--> tailor instruction to obj-->judge effectiveness. Measures are analytical/quantitative in nature. Examples

Do first-graders know the letters of the alphabet? Can the new account representative describe the features of each

checking account – as defined by the bank? Others?

Advantages/disadvantages?


Constructivist Belief that people construct their own realities. Advocates believe that

truth is a matter of consensus, not measurement against an objective reality.

Evaluation creates detailed descriptions of that which is inside the head of the learner. Reliance upon open-ended exercises, observation, cases and immersion in

the field. Observation is useful for us, in that IDs build prototypes, conduct formative

evaluations, revise and cycle again. Measures are qualitative in nature. Examples

Role play exercise to deal with a hostile customer Theme Park Tycoon – running a theme park for a year Essay question asking you to describe your understanding of Educational

Technology Advantages/disadvantages?


Postmodern/Critical Objectivists proclaim objectivity. Constructivists

approve of subjectivity. Postmoderns are social activists.

Focus on questions of power, “Who are you to set objectives for others?” Use of deconstruction to see what’s inside texts and materials.

Most interested in the hidden curriculum, such as the teaching of traditional gender roles. What does the curriculum teach?

Why should IDs care about this evaluation approach?

Evaluation Frameworks:Kirkpatrick’s Model

Level 4: Does it matter? Does it advance strategy?

Level 3: Are they doing it (objectives) consistently and appropriately?+++++++++++++++++++++++++

Level 2: Can they do it (objectives)? Do they show the skills and abilities?

Level 1: Did they like the experience? Satisfaction? Use? Repeat use?

Evaluation Frameworks: CIPP

Context assesses program/product needs, problems or opportunities specific to the project environment.

Input to assess, evaluate and allocate project resources in order to meet identified needs and objectives, solve problems, and optimize program impact.

Process assesses project implementation. Product assesses planed and unintended

(unforeseen) outcomes, both to keep a project on track and to determine effectiveness or impact.

Types of Tests

Used to evaluate changes in skills and knowledge

Is testing alone sufficient?

Test Types: Norm-Referenced

Compare an individual's performance to the performance of other people.

Require varying item difficulties. Assume not everybody is going to "get it"

Discern those who "got it" from those who didn't.

Normal Distribution


Norm-referenced tests compare the individual to the group. Accomplished statistically by “norming” the test with large

numbers of people.

Consider: You sat for the GRE and received the following scores.

You need to retake the test. What is your study plan?

570 51 22 28


Limitations Not especially helpful for:

identifying individual skill deficiencies identifying weaknesses in the instruction

Test Types: Criterion-Referenced

Compares an individual's performance to the acceptable standard of performance for those tasks.

Requires completely specified objectives. Asks: Can this person do that which has been

specified in the objectives? Results in yes-no decisions about

competence.

Test Types: Criterion-Referenced

Applications Diagnosis of individual skill deficiencies Certification of skills Evaluation and revision of instruction

Limitations Tend to focus on specific skills Results may not reflect general aptitudes Everyone may get an “A”

IQ test

GRE

SDSU Writing Competency

Red Cross Lifesaving Certificate

EDTEC 540 midterm and final exams

NR CRT

Which Test is Which?

Give out a CA driver's license

Pick students for Russian lang. training

Determine entrance into medical school

PADI Scuba Certification

Select one EDTEC scholarship recipient

Figure out where to revise a course

Decide which students need remediation

NR CRT

Which Test is Which?

Utility of Test Scores Selection & screening (before):

mastery of prerequisites -- for remediation/placement mastery of course objectives -- for acceleration (“testing out”)

Individual diagnosis and prescription (along the way) Practice (along the way) Grades & summative scores (at or after the end):

promotion certification and licensure

Administrative: course evaluation trainer accountability

Objectives Objectives ItemsItems

Given a map of the USA with state borders marked, the lwbat write the abbreviation for 45 of 50 states in 15 mins.

Given a pair of well-worn shoes, the lwbat identify what's wrong with the shoes and the tools and materials necessary to fix them.

Given a goal, lwbt write at least two appropriate objectives with proper ABCD parts.

Here is a map of the USA with the states outlined-- but no names. Use the state abbreviations and fill them in-- you've got 15 mins to get at least 45.

Take a look at this pair of shoes. What problems do you see? What will you need to fix them?

The goal of the instruction is: "ID's will know how to write resumes." Write at least 2 objectives with all four parts.

Criterion-referenced Test Items

Matching Test Items to Objectives

Matching ensures validity Validity is the extent to which the test measures what is

important to performance. Does a high score on the test equate to high performance on the job?

The validity of a criterion-referenced test is enhanced when: objectives match real-world performances (based on solid

analysis); test items match stated objectives (including condition).

Match, or Not?

Given any stocked fruit or vegetable, the Ralphs Grocery Checker will be able to verbally state the code which matches the produce provided with 100% accuracy.

Here is a persimmon from the produce department and the produce code job aid. Please state the produce code for this item. You may examine the persimmon and reference the job aid.

Match, or Not?

Given a tree in need of pruning, the gardener’s apprentice will be able to select the correct tree pruning device, based upon the type of tree presented.

Here is an overgrown elm tree. Please select the appropriate tool with which you will prune the tree.

Match, or Not? Given a descriptive order

for a Café Mocha, including size, caf/decaf, type of milk, the barista will be able to create the drink as specified in the Starbuck’s Guide to Coffee Creations.

A customer has just ordered a Grande, non-fat, mocha. Please list the ingredients you will need, and describe the steps you would take to create the drink.

Evaluating a Training Program

Consider: Your evaluation uses a

criterion-based test to see if the new account representatives can describe the different types of accounts offered by the bank. All representatives were

able to meet the specified criteria

Case closed… or, do you want to know more?

Ideas in Testing

Measurement Error Validity Reliability

Measurement Error

Many causes: mechanical or scoring errors poor wording (confusing,

ambiguous) poor subject matter, content

(validity) score variation from one time to

another (reliability) score variation from "equivalent"

tests test administration procedure inter-rater reliability mood of the student

Validity

Does the test assess what's important? Does it really seek out the skill and knowledge linked to the world? (content validity)

Types: Content Validity (most important to us) Predictive Validity (e.g. SAT, GRE)

Reliability

Are the scores produced by the test trustworthy and stable over time?

Assessed by: parallel (equivalent) forms or test-retest internal consistency

Testing and Evaluation

A Look Ahead: ED 690 – Procedures of Investigation

Provides introduction to evaluation procedures and methods

Introduces research process, statistical analysis ED 791A, 791B, 791C

Evaluation sequence most often completed by EDTEC students, over writing a thesis

Conduct a full-scale evaluation (design, research, report) for a living, breathing client over a two-semester timeframe

evaluation: testing, objective-to-test-item matching and judgments of worth edtec 540 james marshall

Documents

evaluation frameworks

evaluation norm

validity slide

james marshall slide

objective reality

project resources

project environment

project implementation