struggling for meaning in standards-based assessment mark wilson uc berkeley
DESCRIPTION
What do we mean by “standards- based” assessments? What people often think they are getting: –A useful result for each standard “ideal approach” –The illusion of “standards-based” assessments What they are usually getting: –A single result that is somehow related to all, or a subset of, the standards –The reality of “standards-based” assessmentsTRANSCRIPT
Struggling for meaning in standards-based
assessmentMark WilsonUC Berkeley
Outline• What do we mean by “standards-based” assessments?
• Some current solutions to the problem of assessing standards
• An alternative– Learning performances– Learning progressions– Progress variables
What do we mean by “standards-based”
assessments?• What people often think they are getting:– A useful result for each standard
•“ideal approach”– The illusion of “standards-based” assessments
• What they are usually getting: – A single result that is somehow related to all, or a subset of, the standards
– The reality of “standards-based” assessments
How standards-based is “standards-based”?
• “Fidelity”--how well do the assessments match the standards?
• High Fidelity: each standard has its own useable result
• Moderate Fidelity: each standard is represented by at least one item in the assessments
• Low Fidelity: the items match some of the standards
Why can’t each standard be assessed?
Fidelity versus Cost when total cost is fixed
Number of items
Fidelity
$ per item
i.e., in the “ideal approach” we need so many items per standard that we can’t afford it.
Common Solutions: “Standards-based”
• One (more or less) items per standard– not enough for actual assessments of standards– Also used to provide emphasis among standards (i.e., “gold standards”)
• Sample standards over time• Assess only a certain subset of the standards
• Validate through “alignment review”• Decide to have a much smaller set of standards– Popham’s “Instructionally-sensitive assessments”
E.g. #1
Eg. #2
“Standards-based” assessments
• Do not have high fidelity to standards
• Are what can be afforded• Still maintain “threat” effect– Although low density of items per standard means that “threat” on any one standard is low
Thinking about an Alternative
• “A mile wide and an inch deep”– now-classic criticism of US curricula in Mathematics and Science
• Need for standards to be interpretable by educators, policy-makers, etc.
• Need to enable long-term view of student growth
• Need to find a more efficient way to use item information than in “ideal approach”
Learning Performances• Learning performances: a way of elaborating on content standards by specifying what students should be able to when they achieve a standard– E.g., students should be able to describe phenomena, use models to explain patterns in data, construct scientific explanations, or test hypotheses
– Reiser (2002), Perkins (1998)
Learning performance example
• Benchmark (AAAS, 1993):– [The student will understand that] Individual organisms with certain traits are more likely than others to survive and have offspring
• LP expansion (Reiser et al, 2003):– Students identify and represent mathematically the variation on a trait in a population.
– Students hypothesize the function a trait may serve and explain how some variations of the trait are advantageous in the environment.
– Students predict, supported with evidence, how the variation on the trait will affect the likelihood that individuals in the population will survive an environmental stress.
Learning progressions• Learning progressions: descriptions of the successively more sophisticated ways of thinking about an idea that follow one another as students learn– Aka learning trajectories, progressions of developmental competence, and profile strands
• More than one path leads to competence• Need to engage in curriculum debate about which learning progressions are most important– Try and choose them so that we end up with fewer standards per grade level
Learning progression examples
• Evolutionary Biology– Catley, K., Reiser, B., and Lehrer, R. (2005). Tracing a
prospective learning progression for developing understanding of evolution.
• Atomic-Molecular Theory– Smith, C., Wiser, M., Anderson, C.W., Krajcik, J., and
Coppola, B. (2004). Implications of research on children’s learning for assessment: matter and atomic molecular theory.
• Both available at:– http://www7.nationalacademies.org/bota/Test_Design_K-12_Science.html
Progress Variables• Progress variable: Assessment expression of a learning progression
• Aim is to use what we know about meaningful differences in item difficulty to make the interpretation of the results more efficient– Borrow interpretative and psychometric strength from easier and more difficult items, so that we don’t need as many as does the “ideal approach”.
• Progress variables are a principal component of the BEAR Assessment System (Wilson, 2005; Wilson & Sloane, 2000):
The BEAR Assessment System
4 principles: 4 building blocks
Examples provided by:
Principle 1: Developmental Perspective
Building Block 1: Construct Map• Developmental perspective
– assessment system should be based on a developmental perspective of student learning
• Progress variable– Visual metaphor for
• how the students develop and • how we think about how their item responses might
change
Example: Why things sink and float
Levels of Understanding
Buoyancy depends on thedensity of the object relative
to the density of the medium.
Lessons
12: Relative Density
Assessment Activities
Reflective Lesson @10
Reflective Lesson @7
Reflective Lesson @6
Post test
Buoyancy depends on thedensity of the object.
11: Density of Medium
10: Density of Object
Buoyancy depends on themass and volume of the object.
7: Mass and Volume
Reflective Lesson @11
Buoyancy depends on thevolume of the object.
Buoyancy depends on themass of the object.
6: Volume
Reflective Lesson @4
Pretest
4: Mass
1: Introduction
Principle 2: Match between curriculum and assessment
Building Block 2: Items design
• Instruction & assessment match– there must be a match between what is taught and what is assessed
• Items design – a set of principles that allows one to observe the students under a set of standard conditions that span the intended range of the item contexts
Example: Why things sink and float
Please answer the following question. Write as much information as you need to explain your answer. Use evidence, examples and what you have learned to support your explanations.Why do things sink and float?
Principle 3: Interpretable by teachersBuilding Block 3: Outcome space
• Management by teachers– that teachers must be the managers of the system, and hence must have the tools to use it efficiently and use the assessment data effectively and appropriately
• Outcome space– Categories of student responses must make sense to teachers
Example: Why things sink and float
Level What the Student Knows
RD Relative Density
D Density
MV Mass and Volume
M V Mass Volume
PM Productive Misconception
UF Unconventional Feature
OT Off Target
NR No Response
Principle 4: Evidence of qualityBuilding Block 4: Measurement model
• Evidence of quality– reliability and validity evidence, evidence for fairness
• Measurement model– multidimensional item response models, to provide links over time both longitudinally within cohorts and across cohorts
Example: Evaluate progress of a group
OT UF PM M V MV D RD
Evaluate a student’s locations over time
Embedded Assessments
BEAR Assessment System: Principles
Developmental Perspective
Need a framework for communicating meaning
Match between Instruction and Assessment
Need methods of gathering data that are acceptable and useful to all participants
Interpretable byTeachers
Need a way to value what we see in student work
Evidence of Quality
Need a technique of interpreting data that allows meaningful reporting to multiple audiences
In conclusion…• Achieving meaningful measures is tough under any circumstances,
• but especially so in an accountability situation, – where the requirements for accountability and the scale of the evaluation make it very expensive.
• Strategies like learning performances, learning progressions and progress variables are needed to make meaning possible, and affordable.
References• American Association for the Advancement of Science (1993). Benchmarks for Science Literacy. New York:
Oxford University Press.• Catley, K., Reiser, B., and Lehrer, R. (2005). Tracing a prospective learning progression for developing
understanding of evolution. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K-12 Science Achievement, Washington, DC.(http://www7.nationalacademies.org/bota/Test_Design_K-12_Science.html)
• Reiser, B.J., Krajcik, J., Moje, E., and Marx, R. (2003). Design strategies for developing science instructional materials. Paper presented at the National Association for Research in Science Teaching Annual Meeting, March, Philadelphia, PA.
• Smith, C., Wiser, M., Anderson, C.W., Krajcik, J., and Coppola, B. (2004). Implications of research on children’s learning for assessment: matter and atomic molecular theory. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. .(http://www7.nationalacademies.org/bota/Test_Design_K-12_Science.html)
• Wilson, M. (2005). Constructing measures: An item-response modeling approach. Mahwah, NJ: Lawrence Erlbaum Associates.(https://www.erlbaum.com/shop/tek9.asp?pg=products&specific=0-8058-4785-5)
• Wilson, M, & Bertenthal, M. (Eds.). (2005). Systems for state science assessment. Report of the Committee on Test Design for K-12 Science Achievement. Washington, D.C.: National Academy Press. (http://books.nap.edu/catalog/11312.html)
• Wilson, M., and Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 12(2), 181–208. Available at: http://www.leaonline.com/doi/pdfplus/10.1207/S15324818AME1302_4