exploring skills diagnostic opportunities at measured progress lou dibello william stout meetings...

46
Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Upload: francine-brooks

Post on 25-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Exploring Skills Diagnostic Opportunities at Measured Progress

Lou DiBelloWilliam Stout

Meetings with Measured Progress

February 11-12, 2008

Page 2: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 2

I. Overview, Goals Purpose Establish a clear conceptual

framework and language for understanding and discussing diagnostic assessment

Identify practical steps for developing diagnostic assessments

Consider challenges Explore possibilities for collaborative

work between IAI or AIARE and MP

Page 3: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 3

Presentation Agenda

I. Overview, goals, purpose II. Background III. Assessment as evidentiary

system IV. Practical Steps and Challenges V. Possibilities for Collaborative Work VI. Wrap-up

Page 4: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 4

II. Background

Page 5: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 5

Who we are Our primary expertise is theoretical and

applied psychometrics Our primary interest is broader: to

develop the engineering science of diagnostic assessment

In addition to science and theory, we are focused on practical issues: costs, production, sustainability, scalability, implementation, evaluation, dissemination

Page 6: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 6

Who we are-Bill Bill is Professor Emeritus, Dept. of Statistics,

University of Illinois at Urbana-Champaign Co-lead of Informative Assessment

Initiative in the Learning Sciences Research Institute, University of Illinois at Chicago

Co-founder of (LLC) Applied InformativeAssessment Research Enterprises (AIARE). 

Past director of ETS External Diagnostic Research Team (the X Team)

Page 7: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 7

Who we are-Lou Lou is Co-lead of Informative Assessment

Initiative Research Professor and Associate Director

of Learning Sciences Research Institute,University of Illinois at Chicago

Co-founder of (LLC) Applied Informative Assessment Research Enterprises (AIARE)

Former Director ETS Profile Scoring Initiative —Contract Manager for the X Team

Page 8: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 8

Who we are-Bill and Lou Bill:

Distinguished psychometrician; past president of the Psychometric Society

NCME scientific award winner for foundational work in skills diagnostic modeling, dimensionality, and item and test bias detection

Lou: Recently served as a research director within

the testing industry Directed effort to operationalize diagnostic

assessment for a large scale operational assessment

Page 9: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 9

Our Affiliations Informative Assessment Initiative (IAI)

One of three initiatives that make up the Learning Sciences Research Institute (LSRI) at UIC

LSRI directed by Jim Pellegrino & Susan Goldman The other two initiatives are Cognitive Science

and Math and Science Education Applied Informative Assessment Research

Enterprises (AIARE); a new LLC that owns and licenses Arpeggio software

Page 10: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 10

Joint Work: Bill, Lou (& Louis) Pursuing research and development at the forefront

of a new skills diagnostic psychometric research area Invited co-editors of an upcoming special issue of the

Jour. of Educational Measurement on skills diagnosis Served as invited co-authors of a foundational paper

on psychometric approaches to cognitive diagnostic assessment, just published in the Handbook of Statistics (DiBello, Roussos & Stout, 2007)

Other publications in refereed academic journals Directed numerous research and development

projects, both within academia and private sector

Page 11: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 11

III. A View of Assessments as Evidentiary Systems

Page 12: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 12

A view of Assessment Design

Assessment as Evidentiary System Assessment design is deciding: “…

how one wants to frame inferences about students, what data one needs to see, how one arranges situations to get the pertinent data, and how one justifies reasoning from the data to inferences about the student.“ Junker

Page 13: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 13

Integrated Classroom or Learning Environment

Instruction

Curriculum

Assessment

Page 14: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 14

Assessment

Observation

Interpretation

Cognition

Instruction

Curriculum

Integrated Classroom or Learning Environment

Page 15: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 15

Assessment Triangle-Pellegrino et al

Assessment

Observation

Interpretation

Cognition

Page 16: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 16

Comprehensive View of Assessment

Assessment conceptually involves: Cognition Curriculum design Instruction

Teaching practice Teacher preparation

Psychometrics Assessment design Testing Industry Marketing and

Implementation

Page 17: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 17

Validity—Thinking about Assessment Quality and Value

Level 1: test design was soundly based on cognitive principles—”inner” and “outer”

Level 2: test meets quantitatively defined requirements for internal diagnostic quality

Level 3: independent confirmation, outside the test, demonstrates that test-based diagnostic skills inferences are accurate—includes protocol studies and criterion validity

Level 4: consequential validity: proper use of assessment and differential instruction leads to improved teaching and learning

Page 18: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 18

Validity Studies Level 1: design –expert analyses Level 2: internal diagnostic quality—

gather data and compute reliability and fit Level 3: independent confirmation—

includes protocol studies and criterion validity

Level 4: consequential validity—studies of learning outcomes, teacher practices, teacher preparation

Page 19: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 19

Practical Assessment Validity Assessment validity provides a

conceptual framework for thinking about diagnostics

Validity studies are expensive, and it is not practical to address very many of the aspects of validity at once. A reasonable strategy is to identify specific validity targets to address as part of diagnostic development and stage them over time

Page 20: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 20

IV. Practical Steps and Challenges in Developing Successful Skills Diagnostic Assessments

Page 21: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 21

Implementation Paradigm Describe assessment purpose Describe a model for the skills space Develop and analyze the assessment items Specify an appropriate psychometric

model linking observable performance to latent skills

Select statistical methods for model estimation and evaluating the results

Develop methods for reporting assessment results to examinees, teachers, and others

Page 22: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 22

Walking through the Steps The next few slides walk through

the steps of the Implementation Paradigm: Purpose Skills space Tasks/items Formative Reports Psychometric Model: Fusion Model Model calibration: Arpeggio

Page 23: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 23

Diagnostic Assessment Purposes

Provide timely information about students’ learning and understanding

Support teachers, learners, parents Support teacher actions, decisions, planning

track students’ progress toward standards diagnose deficiencies group by skill profiles for instruction and

practice Curriculum evaluation and planning

Page 24: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 24

Skills Framework A cognitive diagnostic model (e.g. the

Fusion Model) requires item-skills links as input

The skills framework=set of skills selected for measurement and reporting

For K-12 classrooms, the skills must be: aligned with standards and curriculum aligned with teacher actions supportable statistically

Page 25: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 25

Q matrix—encodes the skills required for each item

1 0 0 1 0

1 0 1 1 0

0 1 1 1 0

0 1 1 1 0

0 1 1 1 0

0 0 0 0 1

0 0 0 0 1

Q

Items=rows

Skills=columns

7x5 matrix

For example: Item 2 requires skills 1, 3, and 4

Page 26: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 26

Skills Example—PTS3 Reading

A good starting point for PTS3 Reading skills is: Skill 1: Literary Skill 2: Informational Skill 3: Comprehension & Analysis Skill 4: Reading Process & Language

Skills

Page 27: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 27

PTS3-Math Initial Skills

A good starting point for PTS3 Math skills is: Skill 1: Numbers and Operations Skill 2: Algebra Skill 3: Geometry & Measurement Skill 4: Data Analysis & Probability

Page 28: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 28

Skills—Practical Constraints

Alternative skills representations may be supported within the substantive literature

Theory may suggest that 100 skills influence performance within a particular mathematics test domain. A 50 minute assessment cannot accurately measure 100 skills, and teachers could not manage diagnostic 100-skill profiles for each student

Skills must be simultaneously comprehensive, of “coarse” granularity, aligned with standards, curriculum and instruction

Page 29: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 29

Skills Pragmatics—Focus Developing skills frameworks is usually a

creative act. A small number of foundational or core skills must be determined that are: Important and useful to measure Statistically supportable by the assessment So that other skills can be ignored with impunity

Think of this as focusing the assessment design in light of the diagnostic purpose—assumptions about what to measure and what to “ignore”

Page 30: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 30

Diagnostic “Score Reports” A key component of diagnostic assessment is

the “score report, ” construed broadly as any and all information presented to users as a result of assessment performance

A diagnostic assessment reports a profile of scores such as mastery/nonmastery on each skill

In addition, the score report can and should include information that promotes better teaching and learning:

possible action steps for teacher or learner suggestions to student for improvement interpretive information

Page 31: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 31

Score Reporting Statistics An Arpeggio analysis produces (as noted in Bill’s Monday presentation):

Item/skill level parameters For each student a posterior probability of mastery for

each skill For each student, a classification of master/non-master

for each skill based on the above posterior probability Examinee probability distribution on the skill space Estimates of skill classification accuracy Fit statistics

The skills profiles are based on 2nd and 3rd above

Page 32: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 32

Skills Classification Accuracy The Fusion Model and Arpeggio provide

several estimated indices of skills classification accuracy or reliability: CCR=individual skill correct classification rate TCR=test-retest consistency rate (like

classical reliability) Skill Pattern correctness or consistency rates

As is the case for standard unidimensional IRT reliability, these measures are internal to the model and the data—no external criteria

Page 33: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 33

Evaluating the assessment Once the model is calibrated, we estimate

the skills classification accuracy and calculate certain measures of fit that are directly relevant to the diagnostic purpose of the assessment. Both reflect on: Which skills are selected and their definitions Skill codings in Q matrix Model suitability Statistical analysis procedures employed

Page 34: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 34

Model-Data Fit We evaluate model-data fit by computing

fit indices directly relevant to the diagnostic purpose. Considering MCMC convergence, item parameter values and fit, we examine: Are items appropriate and of “good quality” Are skills framework and Q matrix appropriate Is the test “well designed”—enough good items

for each skill; no fatal information-blocking in the Q matrix; good alignment between difficult items and difficult skills; other aspects of good design

Are any aspects of the model suspect

Page 35: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 35

V. Possibilities for Collaborative Work

Page 36: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 36

Status of Diagnostic Research DiBello and Stout have collaborated with

other researchers, including Louis Roussos Their studies provide a scientific and applied

foundation for cognitive diagnostic research The IRT based skills-diagnostic Fusion Model

has been developed, along with software called Arpeggio for calibrating the Fusion Model that employs the Markov Chain Monte Carlo (MCMC) statistical methodology

Page 37: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 37

“X Team” 3-year R&D output Arpeggio R & D was directed within ETS by

DiBello and externally by Stout 46 Research Studies 18 studies on modeling

issues 4 studies on skills-level linking methods 4 studies on skills-level reliability 2 studies on techniques for data-model fit 10 applied studies 8 theoretical studies backing the algorithms

5 Descriptions of Algorithms and sw code 12 Sets of user documentations

Page 38: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 38

Assets and Resources Estimated $12M of investment underlies the

development of Arpeggio software system and underlying theory, research studies, analyses

Resources: Informative Assessment Initiative within LSRI-UIC (IAI), Applied Informative Assessment Research Enterprises

(AIARE)-LLC Ownership of Arpeggio software and broad rights to

license patent Louis Roussos of MP is a major researcher,

inventor, collaborator, developer of Arpeggio

Page 39: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 39

Current Status of Arpeggio AIARE owns copyright and trademark

to all Arpeggio software and has unconstrained access to patent rights, including right to license them to others

Practical reality: freedom to fashion any agreement that is mutually beneficial to MP and AIARE

ETS is guaranteed a share of royalties

Page 40: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 40

IAI Current Activities (as background)

NSF project to do formative assessment using established math curricula ($3M-funded)

IES proposal for classroom assessment($2M-applied for) More grants likely to be applied for concerning skills

level formative and embedded assessments (testing as integral part of curricular learning process)

Upgrade and expand capabilities of Arpeggio and the Fusion Model (technical grant proposals planned)

Develop, upgrade, and disseminate the engineering science of diagnostic assessment in educational settings

Work with testing companies, such as ETS and CTB

Page 41: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 41

IAI Project Ideas (some maybe of interest to MP) Developing Specific Diagnostic

Assessments & Pilot Trials The Practice of Developing Lists of Skills

for Diagnostic Measurement & Reporting Assessment-Curriculum-Instruction

Linkages Diagnostic Validity Studies Foundational and Applied Psychometric

Diagnostic Research

Page 42: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 42

Diagnostic Assessment Design

Develop diagnostic scoring capability for PTS3 and other existing tests

Design new diagnostic tests Needs and capacity analyses

What market needs exist How might diagnostic assessment help

teachers and learners, directly in the classroom, indirectly through summative or accountability tests

What capacity do teachers and curricula have to incorporate and use diagnostic assessment

Page 43: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 43

Planned Foundational and Applied Diagnostic Psychometric Research

Diagnostic Modeling Skills-level assessment accuracy Model-data Fit Computational speed and performance Efficacy Studies Group-level diagnostic survey testing a la

NAEP Embedded Assessments Growth Modeling

Page 44: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 44

Concrete Possibilities with MP Proposal is that Measured Progress and we

explore possible cooperation that can aid MP bring to fruition its strong interest in skills diagnosis

Seems like a superb opportunity to pursue Turn PTS3 in stages into a skills diagnostic test Grants/contracts Collaboration on research projects of joint

interest Explore diagnostic applications to state tests AERA/NCME proposals

Page 45: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 45

VI. Wrap-up We are mapping the dimensions of what we

envision as a new engineering science of diagnostic assessment

Focused on supporting teachers and learners, school districts, state departments of education

With due attention to sustainability and scalability to support commercial and operational success

As a natural mode of dissemination, we are appealing especially to testing companies interested in assessment products and services that improve teaching and learning

Page 46: Exploring Skills Diagnostic Opportunities at Measured Progress Lou DiBello William Stout Meetings with Measured Progress February 11-12, 2008

Learning Science Research Institute--UIC--Informative Assessment Initiative 46

Discussion

Discussion Next Steps