six major challenges for educational and psychological ... · testing practices ronald k. hambleton...

61
Six Major Challenges for Educational and Psychological Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11, 2006

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Six Major Challenges for Educational and Psychological Testing Practices

Ronald K. HambletonUniversity of Massachusetts at Amherst

Annual APA Meeting, New Orleans, Aug. 11, 2006

Page 2: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

In 1966 (I began my studies at the University of Toronto, in Canada.)

1. Multiple-Choice Tests 2. Relatively Simple Statistics

(up to only ANOVA and linear regression)

3. Routine Psychometric Studies Could Be Published

4. Computer Cards/Tapes

Page 3: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

In 2006 (40 years later)

1. Wide Array of Item Types2. Complex Statistical Modeling of

Data (IRT, GT, SEM)3. Standard-Setting, DIF, CBT,

CAT, Performance Testing, Automated Scoring and Test Development

4. Laptops, Desktops, Internet

Page 4: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Impossible to predict changes between 1966 and 2006, but a few initial predictions about the next 40 years seem possible because some trends are clear..1. Wider Uses of Psychological Tests in

International Markets2. Advances in Modeling of Test Data3. New Item Types/Scoring Are Coming

-High Fidelity Simulations-Item Algorithms, Item Cloning-Computer Scoring of Free Responses

Page 5: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

State of Affairs Today, cont.:

4. Advances with Computer-Based Tests

5. Improvements in Score Reporting Practices (e.g., simpler, clearer, more informative displays)

6. And, Better Training in Psychometric Methods Is Needed(for Psychologists and Educational Research Specialists)

Page 6: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Two Goals of the Presentation

•Address these six (likely) advances and their impact on educational and psychological testing practices.

•Describe challenges that need to be addressed.

Page 7: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

1. Use of Tests in International Markets• Interest in test translations and test adaptations has increased tremendously in the past 15 years:

--Several IQ and personality tests have been adapted into more than 100 languages.

--Achievement tests for large scale international assessments (PISA, TIMSS) in over 30 languages.

Page 8: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

1. Use of Tests in International Markets

--International uses of credentialing exams is expanding (e.g., see Microsoft)

--Many high school graduation/college admissions tests are in multiple languages (e.g., see Israel, South Africa, USA).

--Health scientists with their “Quality of Life” measures are receiving wide use in many languages and cultures.

--Marketing researchers are doing more.

Page 9: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

1. Use of Tests in International Markets

• But--major misunderstandings about the difficulties of translating and adapting tests from one language and culture to another. (See Hambleton, Merenda, & Spielberger, 2006; ITC Brussels Conference, 2006)

Page 10: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Example 1

Out of sight, out of mind(Back translated from French)

invisible, insane

Page 11: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Example 2 (IEA Study in Reading)Example 2 (IEA Study in Reading)

Are these words similar in meaning?

Pessimistic -- Sanguine

Page 12: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Pessimistic -- SanguineAdapted to

Pessimistic -- Optimistic

Page 13: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Example 3 (1995 TIMSS Pilot)Example 3 (1995 TIMSS Pilot)Alex reads his book for 1 hour and then

used a book mark to keep his place. How much longer will it take him to finish the book?

A. ½ hourB. 2 hoursC. 5 hoursD. 10 hours

Page 14: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Common Misunderstandings:• That most anyone who knows two languages can do the translation.

• That a backward translation design is sufficient. (Need a forward design.)

• That translators, if they have the correct training, can produce a valid instrument in a second language and culture.

• Use of bilinguals to compile empirical evidence is sufficient.

Page 15: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges Ahead:

•Hire qualified translators (and several of them).

•Use forward and backward designs (and newer designs) to review test items.

•Compile empirical evidence to address construct, method, and item bias.

Page 16: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges Ahead, cont.:

• Integrate best methodologies and practices to guide future test adaptation studies.

•Recognize the complexity of the work, so more resources, time, and expertise are available to do the job consistent with ITC and AERA/APA/NCME test standards.

Page 17: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

2. Advances in Statistical Modelingof Test and Item Level Data

•IRT models have become popular and for several good reasons—lots of positive features (e.g., model parameter invariance, item and test information).

•Modern measurement theory and practices are now here.

Page 18: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Item Response Functions (4 choice item):

0.0

0.5

1.0

-3 -2 -1 0 1 2 3

Ability

Prob

abili

ty

ai = 1.00bi1 = -1.25bi2 = -0.25bi3 = 1.50

k=0

k=1

k=2

k=3

Page 19: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Graded Response Model:

Pe

ex mix

Da b

Da b ii ix

i ix

*( )

( )( ) , , ,...,θθ

θ=+

=−

−10 1

Pi0 10* ( ) .θ =

Pi mi( )* ( ) .+ =1 0 0θ

P P Pix ix i x( ) ( ) ( )*( )*θ θ θ= − +1

Page 20: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Generalized Partial Credit Model:

])([exp1

)](exp[)|(

11

1

∑∑

==

=

−+

== r

sisi

k

r

k

sisi

iba

bakxP

θ

θθ

Page 21: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

New IRT Polytomous Response Models•Partial credit model•Generalized partial credit model•Graded response model•Logistic multidimensional model•Rating scale models•Hundreds more models exist!

Page 22: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Many Examples of Successful IRT Applications

•Automated test assembly (targeting)•Computer-adaptive testing (shorten)•Detection of potentially biased test items

•Equating (fairness and change)•Test score reporting (e.g., item mapping) (IRT creates options)

Page 23: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges Ahead:

•There are questions of model choice (fit, practicality), and calibration of items with small samples.

• Identifying and handling dependencies in the data (common with new item types).

Page 24: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges Ahead, cont.:

• Establishing invariance of item parameters over subgroups of the population of interest. (e.g., Black, Hispanic, White; Male, Female; state to state, country to country)

• More training is needed for persons to do the IRT applications, read the test manuals, etc.

Page 25: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Ability Estimation [0-1 vs. Testlet Scoring]—See paper by Zenisky, et al., JEM, 2002.

Polytomously-Scored Ability Estimates

3210-1-2-3

-2

-3Dic

hoto

mou

sly-

Sco

red

Abi

llity

Est

imat

es 3

2

1

0

-1

Page 26: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,
Page 27: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

3. Generation of New Item Types

•Lots of “sizzle” here with simulations (e.g., virtual reality, performance tasks) and other item types. But--

--Can new skills be measured?--Can old skills be measured better?--What’s the value-added versus the costs of development? Measurement/minute of testing?

Page 28: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Site Planning Vignettes (Bejar, 1991)

Image from NCARB (2000)

Page 29: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Site Planning Vignettes (Bejar, 1991)

Image from NCARB (2000)

Page 30: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Dynamic Problem Solving Simulation(Clauser, et al., 1997)

Image from NBME (2001)

Page 31: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Examples of Advances

•Pioneering research of Bennett and his colleagues with the architectural exams.

•Work of Clauser and Nungester with sequential problem solving tests in medicine.

Page 32: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Immediate, Less Costly, and Useful New Item Formats

•Multiple-Correct Answers•Short Answer •Extended Answer (Essay)•Highlighting Text• Inserting Text

Page 33: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

•Ranking (or Ordering)•Numerical Responses (Including Multiple)

•“Drag and Drop”•Sequential Problems

Page 34: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

•More than 50 new item formats.•Complex item stems, sorting tasks, interactive graphics, audio, visual, job aids, sequential problems, joy sticks, touch screens, pattern scoring, and more.

Page 35: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges Ahead:•An increased commitment to validation of these new item types is needed:

--Face validity is important but not sufficient. Much more empirical validity evidence is needed to support the use of new item types.

--Need to judge increase in test score validity against extra time and costs.

Page 36: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

4. Computer-Based Testing

•Advantages are well-known: --Flexibility in scheduling tests--Potential for immediate score reporting

--Assessment of higher level thinking with new item types (in principle)

--New test designs (to reduce time)•Many testing agencies on computer.

Page 37: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Computer-Based Test (CBT)Designs

LINEAR CATMULTISTAGE

Page 38: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Fixed Length Multiple Forms (Linear)

•A Single Form (acceptable if volume is low)

•Multiple Parallel Forms•“Linear on the Fly Tests” (LOFT)

Page 39: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

M

E

E

EH

H

H

E

H

E H E H H

E

_

+

+

High Low Proficiency Scale

Item Bank

_

_ + +

+

+

-

--

- -

-

-

-

+

+

+

+

+

+

+

+

_

_ _ _ +

Page 40: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Three-Stage Test Design

Stage 3E-E

Stage 3E-M

Stage 2Easy (E)

Stage 3M-E

Stage 3M-M

Stage 2Medium (M)

Stage 3H-M

Stage 3H-H

Stage 2Hard (H)

Stage 1(Routing Test)

Page 41: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Automated Test Construction•Mimicking test development committees

•Content and statistical considerations, exposure controls

•Operations research methodology, linear programming, IRT

•van der Linden, Luecht, Stocking, and others have advanced topic

Page 42: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

One Big Challenge: Item Exposure

•Items exposed to candidates every day testing is done.

•How serious is item exposure? When present, test score validity is lowered. (e.g., GRE example)

Page 43: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

andMoving Averages (Ning Hambleton, 2006)

M

M+2*SD

M-2*SD

Page 44: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Example of an Exposed Item

Page 45: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

One Big Challenge: Item Exposure•How can item exposure be detected?

•How much more vulnerable are the performance based tasks?

•How can the tasks be disguised and/or cloned? Impact of even minor revisions on item statistics?

•Can item types be found that may be less susceptible to exposure?

Page 46: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Other Challenges, cont.: •How to make CBT cost effective for schools?

•Researching other ways to address item exposure: Increasing the size of item banks via cloning, algorithmic item writing, rotating banks, writing items to statistical specs., etc.

•Matching test designs to intended uses of the scores.

Page 47: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

5. Improvements in Score Reporting

•Least studied topic today (do you know any research?) in assessment, and one of the most important:

•Lots of evidence that score users are easily confused. (Concept of measurement error is not understood; error bands are confusing.)

Page 48: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Score Reporting• Critically important topic, and almost no educational research studies available.

• Substantial empirical evidence suggesting that policy-makers, educators, and the public are confused by test score scales and reports. (What are typical IQ scores?)

• Thanks, April Zenisky for the next slide:

Page 49: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Put the results for both years for a single state together, then list next state

Lots of questions about the axis here

Page 50: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

One Promising Advance:

•Placing meaningful points on test score scales--e.g., performance standards, defining skills at selected scores, providing averages, “market basket” concept (e.g., explaining what respondents can do in relation to a collection of test items).

Page 51: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

0.000.10

0.200.30

0.400.50

0.600.70

0.800.90

1.00

-3 -2 -1 0 1 2 3Proficiency Scale

Exp

ecte

d Sc

ore

(on

the

0-1

met

ric)

Reporting Items PointsCategoryTopic 1 13 16Topic 2 18 21Topic 3 9 12Topic 4 8 11Topic 5 12 15

Item Characteristic Curves for an Item Bank

P=0.65

W N P

Page 52: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Candidate Diagnostic Score Report 1Candidate

Performance Content / Skill AreasPerformance Level

60

PASSING

NEARPASSING

WEAKNESSES

MAJORWEAKNESSES

Score Range:75 to 100

Score Range:65 to 74

Score Range:55 to 64

Score Range:1 to 54

Candidates in this performance level can [text to be inserted here, text to be inserted here, text to be inserted here, and text to be inserted here].

Many candidates in this performance level do not [insert relevant text here].

Candidates in this performance level can [text to be inserted here, text to be inserted here, text to be inserted here, and text to be inserted here].

Many candidates in this performance level do not [insert relevant text here].

Candidates in this performance level can [text to be inserted here, text to be inserted here, text to be inserted here, and text to be inserted here].

Many candidates in this performance level do not [insert relevant text here].

Candidates in this performance level can [text to be inserted here, text to be inserted here, text to be inserted here, and text to be inserted here].

Many candidates in this performance level do not [insert relevant text here].

Page 53: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Diagnostic Score Report No. 2

Page 54: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges:

• Can we develop empirically-based principles to assist in the design of meaningful and useful score scales and reports?

• How can diagnostic reports be enhanced? (e.g., rule space methodology, MIRT, collateral and prior information)

Page 55: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges, cont.:•Evaluation of new methods for studying score reports: Focus groups, “think aloud” studies, experimental studies, field-tests.

•Need to commit more resources and time to this immensely important topic!

Page 56: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

6. Improvement in Training forSpecialists and Others

• Major shortage of persons with good psychometric training.

• We need to do a better job in training educators and psychologists to construct and to use tests incorporating recent advances.

--Many Schools of Education and Psychology offer only minimal training.

Page 57: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Challenges:

•What knowledge and skills do modern psychometricians need?

•What do counselors, teachers, and others need to learn about testing and testing practices to increase the validity of test score uses?

Page 58: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Conclusions• Easy to make the case that the emerging technology (IRT models, computers, item types, etc.) should be used to improve credentialing, selection, achievement, and personality tests—face validity is high.

• At the same time, research on the various advances must be carried out, and AERA-APA-NCME Test Standards followed, to confirm the strengths and weaknesses of these advances.

Page 59: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Conclusions, cont.:

• Innovations and technological advances without supporting research findings and validity evidence are simply “sizzle” and “marketing” and won’t lead, necessarily, to more valid assessments.

Page 60: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Conclusions, cont.:

More important topics to study too:

•Admissions testing•Cognition and testing•Hierarchical modeling and analysis of test data

Page 61: Six Major Challenges for Educational and Psychological ... · Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11,

Conclusions, cont.•A strong argument has been made here for full employment of psychometricians!

•At the same time, all six topics, and many more, are critical if tests in the 21st century are going to meet the complex informational needs of our society.