exploring skills diagnostic opportunities at measured progress lou dibello william stout meetings...
TRANSCRIPT
Exploring Skills Diagnostic Opportunities at Measured Progress
Lou DiBelloWilliam Stout
Meetings with Measured Progress
February 11-12, 2008
Learning Science Research Institute--UIC--Informative Assessment Initiative 2
I. Overview, Goals Purpose Establish a clear conceptual
framework and language for understanding and discussing diagnostic assessment
Identify practical steps for developing diagnostic assessments
Consider challenges Explore possibilities for collaborative
work between IAI or AIARE and MP
Learning Science Research Institute--UIC--Informative Assessment Initiative 3
Presentation Agenda
I. Overview, goals, purpose II. Background III. Assessment as evidentiary
system IV. Practical Steps and Challenges V. Possibilities for Collaborative Work VI. Wrap-up
Learning Science Research Institute--UIC--Informative Assessment Initiative 4
II. Background
Learning Science Research Institute--UIC--Informative Assessment Initiative 5
Who we are Our primary expertise is theoretical and
applied psychometrics Our primary interest is broader: to
develop the engineering science of diagnostic assessment
In addition to science and theory, we are focused on practical issues: costs, production, sustainability, scalability, implementation, evaluation, dissemination
Learning Science Research Institute--UIC--Informative Assessment Initiative 6
Who we are-Bill Bill is Professor Emeritus, Dept. of Statistics,
University of Illinois at Urbana-Champaign Co-lead of Informative Assessment
Initiative in the Learning Sciences Research Institute, University of Illinois at Chicago
Co-founder of (LLC) Applied InformativeAssessment Research Enterprises (AIARE).
Past director of ETS External Diagnostic Research Team (the X Team)
Learning Science Research Institute--UIC--Informative Assessment Initiative 7
Who we are-Lou Lou is Co-lead of Informative Assessment
Initiative Research Professor and Associate Director
of Learning Sciences Research Institute,University of Illinois at Chicago
Co-founder of (LLC) Applied Informative Assessment Research Enterprises (AIARE)
Former Director ETS Profile Scoring Initiative —Contract Manager for the X Team
Learning Science Research Institute--UIC--Informative Assessment Initiative 8
Who we are-Bill and Lou Bill:
Distinguished psychometrician; past president of the Psychometric Society
NCME scientific award winner for foundational work in skills diagnostic modeling, dimensionality, and item and test bias detection
Lou: Recently served as a research director within
the testing industry Directed effort to operationalize diagnostic
assessment for a large scale operational assessment
Learning Science Research Institute--UIC--Informative Assessment Initiative 9
Our Affiliations Informative Assessment Initiative (IAI)
One of three initiatives that make up the Learning Sciences Research Institute (LSRI) at UIC
LSRI directed by Jim Pellegrino & Susan Goldman The other two initiatives are Cognitive Science
and Math and Science Education Applied Informative Assessment Research
Enterprises (AIARE); a new LLC that owns and licenses Arpeggio software
Learning Science Research Institute--UIC--Informative Assessment Initiative 10
Joint Work: Bill, Lou (& Louis) Pursuing research and development at the forefront
of a new skills diagnostic psychometric research area Invited co-editors of an upcoming special issue of the
Jour. of Educational Measurement on skills diagnosis Served as invited co-authors of a foundational paper
on psychometric approaches to cognitive diagnostic assessment, just published in the Handbook of Statistics (DiBello, Roussos & Stout, 2007)
Other publications in refereed academic journals Directed numerous research and development
projects, both within academia and private sector
Learning Science Research Institute--UIC--Informative Assessment Initiative 11
III. A View of Assessments as Evidentiary Systems
Learning Science Research Institute--UIC--Informative Assessment Initiative 12
A view of Assessment Design
Assessment as Evidentiary System Assessment design is deciding: “…
how one wants to frame inferences about students, what data one needs to see, how one arranges situations to get the pertinent data, and how one justifies reasoning from the data to inferences about the student.“ Junker
Learning Science Research Institute--UIC--Informative Assessment Initiative 13
Integrated Classroom or Learning Environment
Instruction
Curriculum
Assessment
Learning Science Research Institute--UIC--Informative Assessment Initiative 14
Assessment
Observation
Interpretation
Cognition
Instruction
Curriculum
Integrated Classroom or Learning Environment
Learning Science Research Institute--UIC--Informative Assessment Initiative 15
Assessment Triangle-Pellegrino et al
Assessment
Observation
Interpretation
Cognition
Learning Science Research Institute--UIC--Informative Assessment Initiative 16
Comprehensive View of Assessment
Assessment conceptually involves: Cognition Curriculum design Instruction
Teaching practice Teacher preparation
Psychometrics Assessment design Testing Industry Marketing and
Implementation
Learning Science Research Institute--UIC--Informative Assessment Initiative 17
Validity—Thinking about Assessment Quality and Value
Level 1: test design was soundly based on cognitive principles—”inner” and “outer”
Level 2: test meets quantitatively defined requirements for internal diagnostic quality
Level 3: independent confirmation, outside the test, demonstrates that test-based diagnostic skills inferences are accurate—includes protocol studies and criterion validity
Level 4: consequential validity: proper use of assessment and differential instruction leads to improved teaching and learning
Learning Science Research Institute--UIC--Informative Assessment Initiative 18
Validity Studies Level 1: design –expert analyses Level 2: internal diagnostic quality—
gather data and compute reliability and fit Level 3: independent confirmation—
includes protocol studies and criterion validity
Level 4: consequential validity—studies of learning outcomes, teacher practices, teacher preparation
Learning Science Research Institute--UIC--Informative Assessment Initiative 19
Practical Assessment Validity Assessment validity provides a
conceptual framework for thinking about diagnostics
Validity studies are expensive, and it is not practical to address very many of the aspects of validity at once. A reasonable strategy is to identify specific validity targets to address as part of diagnostic development and stage them over time
Learning Science Research Institute--UIC--Informative Assessment Initiative 20
IV. Practical Steps and Challenges in Developing Successful Skills Diagnostic Assessments
Learning Science Research Institute--UIC--Informative Assessment Initiative 21
Implementation Paradigm Describe assessment purpose Describe a model for the skills space Develop and analyze the assessment items Specify an appropriate psychometric
model linking observable performance to latent skills
Select statistical methods for model estimation and evaluating the results
Develop methods for reporting assessment results to examinees, teachers, and others
Learning Science Research Institute--UIC--Informative Assessment Initiative 22
Walking through the Steps The next few slides walk through
the steps of the Implementation Paradigm: Purpose Skills space Tasks/items Formative Reports Psychometric Model: Fusion Model Model calibration: Arpeggio
Learning Science Research Institute--UIC--Informative Assessment Initiative 23
Diagnostic Assessment Purposes
Provide timely information about students’ learning and understanding
Support teachers, learners, parents Support teacher actions, decisions, planning
track students’ progress toward standards diagnose deficiencies group by skill profiles for instruction and
practice Curriculum evaluation and planning
Learning Science Research Institute--UIC--Informative Assessment Initiative 24
Skills Framework A cognitive diagnostic model (e.g. the
Fusion Model) requires item-skills links as input
The skills framework=set of skills selected for measurement and reporting
For K-12 classrooms, the skills must be: aligned with standards and curriculum aligned with teacher actions supportable statistically
Learning Science Research Institute--UIC--Informative Assessment Initiative 25
Q matrix—encodes the skills required for each item
1 0 0 1 0
1 0 1 1 0
0 1 1 1 0
0 1 1 1 0
0 1 1 1 0
0 0 0 0 1
0 0 0 0 1
Q
Items=rows
Skills=columns
7x5 matrix
For example: Item 2 requires skills 1, 3, and 4
Learning Science Research Institute--UIC--Informative Assessment Initiative 26
Skills Example—PTS3 Reading
A good starting point for PTS3 Reading skills is: Skill 1: Literary Skill 2: Informational Skill 3: Comprehension & Analysis Skill 4: Reading Process & Language
Skills
Learning Science Research Institute--UIC--Informative Assessment Initiative 27
PTS3-Math Initial Skills
A good starting point for PTS3 Math skills is: Skill 1: Numbers and Operations Skill 2: Algebra Skill 3: Geometry & Measurement Skill 4: Data Analysis & Probability
Learning Science Research Institute--UIC--Informative Assessment Initiative 28
Skills—Practical Constraints
Alternative skills representations may be supported within the substantive literature
Theory may suggest that 100 skills influence performance within a particular mathematics test domain. A 50 minute assessment cannot accurately measure 100 skills, and teachers could not manage diagnostic 100-skill profiles for each student
Skills must be simultaneously comprehensive, of “coarse” granularity, aligned with standards, curriculum and instruction
Learning Science Research Institute--UIC--Informative Assessment Initiative 29
Skills Pragmatics—Focus Developing skills frameworks is usually a
creative act. A small number of foundational or core skills must be determined that are: Important and useful to measure Statistically supportable by the assessment So that other skills can be ignored with impunity
Think of this as focusing the assessment design in light of the diagnostic purpose—assumptions about what to measure and what to “ignore”
Learning Science Research Institute--UIC--Informative Assessment Initiative 30
Diagnostic “Score Reports” A key component of diagnostic assessment is
the “score report, ” construed broadly as any and all information presented to users as a result of assessment performance
A diagnostic assessment reports a profile of scores such as mastery/nonmastery on each skill
In addition, the score report can and should include information that promotes better teaching and learning:
possible action steps for teacher or learner suggestions to student for improvement interpretive information
Learning Science Research Institute--UIC--Informative Assessment Initiative 31
Score Reporting Statistics An Arpeggio analysis produces (as noted in Bill’s Monday presentation):
Item/skill level parameters For each student a posterior probability of mastery for
each skill For each student, a classification of master/non-master
for each skill based on the above posterior probability Examinee probability distribution on the skill space Estimates of skill classification accuracy Fit statistics
The skills profiles are based on 2nd and 3rd above
Learning Science Research Institute--UIC--Informative Assessment Initiative 32
Skills Classification Accuracy The Fusion Model and Arpeggio provide
several estimated indices of skills classification accuracy or reliability: CCR=individual skill correct classification rate TCR=test-retest consistency rate (like
classical reliability) Skill Pattern correctness or consistency rates
As is the case for standard unidimensional IRT reliability, these measures are internal to the model and the data—no external criteria
Learning Science Research Institute--UIC--Informative Assessment Initiative 33
Evaluating the assessment Once the model is calibrated, we estimate
the skills classification accuracy and calculate certain measures of fit that are directly relevant to the diagnostic purpose of the assessment. Both reflect on: Which skills are selected and their definitions Skill codings in Q matrix Model suitability Statistical analysis procedures employed
Learning Science Research Institute--UIC--Informative Assessment Initiative 34
Model-Data Fit We evaluate model-data fit by computing
fit indices directly relevant to the diagnostic purpose. Considering MCMC convergence, item parameter values and fit, we examine: Are items appropriate and of “good quality” Are skills framework and Q matrix appropriate Is the test “well designed”—enough good items
for each skill; no fatal information-blocking in the Q matrix; good alignment between difficult items and difficult skills; other aspects of good design
Are any aspects of the model suspect
Learning Science Research Institute--UIC--Informative Assessment Initiative 35
V. Possibilities for Collaborative Work
Learning Science Research Institute--UIC--Informative Assessment Initiative 36
Status of Diagnostic Research DiBello and Stout have collaborated with
other researchers, including Louis Roussos Their studies provide a scientific and applied
foundation for cognitive diagnostic research The IRT based skills-diagnostic Fusion Model
has been developed, along with software called Arpeggio for calibrating the Fusion Model that employs the Markov Chain Monte Carlo (MCMC) statistical methodology
Learning Science Research Institute--UIC--Informative Assessment Initiative 37
“X Team” 3-year R&D output Arpeggio R & D was directed within ETS by
DiBello and externally by Stout 46 Research Studies 18 studies on modeling
issues 4 studies on skills-level linking methods 4 studies on skills-level reliability 2 studies on techniques for data-model fit 10 applied studies 8 theoretical studies backing the algorithms
5 Descriptions of Algorithms and sw code 12 Sets of user documentations
Learning Science Research Institute--UIC--Informative Assessment Initiative 38
Assets and Resources Estimated $12M of investment underlies the
development of Arpeggio software system and underlying theory, research studies, analyses
Resources: Informative Assessment Initiative within LSRI-UIC (IAI), Applied Informative Assessment Research Enterprises
(AIARE)-LLC Ownership of Arpeggio software and broad rights to
license patent Louis Roussos of MP is a major researcher,
inventor, collaborator, developer of Arpeggio
Learning Science Research Institute--UIC--Informative Assessment Initiative 39
Current Status of Arpeggio AIARE owns copyright and trademark
to all Arpeggio software and has unconstrained access to patent rights, including right to license them to others
Practical reality: freedom to fashion any agreement that is mutually beneficial to MP and AIARE
ETS is guaranteed a share of royalties
Learning Science Research Institute--UIC--Informative Assessment Initiative 40
IAI Current Activities (as background)
NSF project to do formative assessment using established math curricula ($3M-funded)
IES proposal for classroom assessment($2M-applied for) More grants likely to be applied for concerning skills
level formative and embedded assessments (testing as integral part of curricular learning process)
Upgrade and expand capabilities of Arpeggio and the Fusion Model (technical grant proposals planned)
Develop, upgrade, and disseminate the engineering science of diagnostic assessment in educational settings
Work with testing companies, such as ETS and CTB
Learning Science Research Institute--UIC--Informative Assessment Initiative 41
IAI Project Ideas (some maybe of interest to MP) Developing Specific Diagnostic
Assessments & Pilot Trials The Practice of Developing Lists of Skills
for Diagnostic Measurement & Reporting Assessment-Curriculum-Instruction
Linkages Diagnostic Validity Studies Foundational and Applied Psychometric
Diagnostic Research
Learning Science Research Institute--UIC--Informative Assessment Initiative 42
Diagnostic Assessment Design
Develop diagnostic scoring capability for PTS3 and other existing tests
Design new diagnostic tests Needs and capacity analyses
What market needs exist How might diagnostic assessment help
teachers and learners, directly in the classroom, indirectly through summative or accountability tests
What capacity do teachers and curricula have to incorporate and use diagnostic assessment
Learning Science Research Institute--UIC--Informative Assessment Initiative 43
Planned Foundational and Applied Diagnostic Psychometric Research
Diagnostic Modeling Skills-level assessment accuracy Model-data Fit Computational speed and performance Efficacy Studies Group-level diagnostic survey testing a la
NAEP Embedded Assessments Growth Modeling
Learning Science Research Institute--UIC--Informative Assessment Initiative 44
Concrete Possibilities with MP Proposal is that Measured Progress and we
explore possible cooperation that can aid MP bring to fruition its strong interest in skills diagnosis
Seems like a superb opportunity to pursue Turn PTS3 in stages into a skills diagnostic test Grants/contracts Collaboration on research projects of joint
interest Explore diagnostic applications to state tests AERA/NCME proposals
Learning Science Research Institute--UIC--Informative Assessment Initiative 45
VI. Wrap-up We are mapping the dimensions of what we
envision as a new engineering science of diagnostic assessment
Focused on supporting teachers and learners, school districts, state departments of education
With due attention to sustainability and scalability to support commercial and operational success
As a natural mode of dissemination, we are appealing especially to testing companies interested in assessment products and services that improve teaching and learning
Learning Science Research Institute--UIC--Informative Assessment Initiative 46
Discussion
Discussion Next Steps