inference & culture slide 1 april 29, 2003 argument substance and argument structure in...

36
Inference & Culture Slide 1 April 29, 2003 Argument Substance and Argument Structure in Educational Assessment Robert J. Mislevy Department of Measurement, Statistics, & Evaluation University of Maryland, College Park, MD April 29, 2003 Presented at Conference on Inference, Culture, and Ordinary Thinking in Dispute Resolution, Benjamin N. Cardozo School of Law, Yeshiva University, New York, New York, April 27-29, 2003. This work builds on research with Linda Steinberg and Russell Almond at Educational Testing Service on the structure of educational assessments.

Post on 21-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Inference & Culture Slide 1April 29, 2003

Argument Substance and Argument Structurein Educational Assessment

Robert J. Mislevy

Department of Measurement, Statistics, & Evaluation

University of Maryland, College Park, MD

April 29, 2003

Presented at Conference on Inference, Culture, and Ordinary Thinking in Dispute Resolution, Benjamin N. Cardozo School of Law, Yeshiva University, New York, New York, April 27-29, 2003. This work builds on research with Linda Steinberg and Russell Almond at Educational Testing Service on the structure of educational assessments.

Inference & Culture Slide 2April 29, 2003

Central Points

Educational assessment has changed considerably over the last century.

Why? Strikingly different psychological perspectives on nature of learning and knowledge.

Can be seen as elaborations of same argument structure.» Wigmore, Toulmin

Inference & Culture Slide 3April 29, 2003

Messick (1994) on assessment design:

[B]egin by asking what complex of knowledge, skills, or other attribute should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society.

Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors?

Thus, the nature of the construct guides the selection or construction of relevant tasks as well as the rational development of construct-based scoring criteria and rubrics.

Inference & Culture Slide 4April 29, 2003

Toulmin's (1958) structure for arguments

Reasoning flows from data (D) to claim (C) by justification of a warrant (W), which in turn is supported by backing (B). The inference may need to be qualified by alternative explanations (A), which may have rebuttal evidence (R) to support them.

C

D

W

B

A

R

since

soon

accountof

unless

supports

Inference & Culture Slide 5April 29, 2003

Perspectives on learning and knowledge

Trait/differential (~1900 - ) Behaviorist (~1950 - 1980) Information-processing (~1970 - ) Sociocultural (~1980 - )

Inference & Culture Slide 6April 29, 2003

Trait/Differential Perspective A relatively stable characteristic of a person—

an attribute, enduring process, or disposition—which is consistently manifested to some degree when relevant, despite considerable variation in the range of settings and circumstances. (Messick, 1989)

Interest in people's differential status on common traits

Useful in selection, prediction, and educational decisions—not so much for instruction

Inference & Culture Slide 7April 29, 2003

Spearman’s “Theorem of indifference of the indicator”

This means that, for the purpose of indicating the amount of g possessed by a person, any test will do just well as any other, provided only that its correlation with g is equally high. ...

Another consequence of the indifference of the indicator consists in the significance that should be attached to personal estimates of “intelligence” made by teachers and others. However unlike may be the kinds of observation from which these estimates may have been derived, still insofar as they have a sufficiently broad basis to make the influence of g dominate over that of the s’s [subjects], they will tend to measure precisely the same thing.

Inference & Culture Slide 8April 29, 2003

An Analytical Reasoning ItemPet Shop Display

Arturo is planning the parakeet display for his pet shop. He has five parakeets, Alice, Bob, Carla, Diwakar, and Etria. Each is a different color; not necessarily in the same order, they are white, speckled, green, blue, and yellow. Arturo has two cages. The top cage holds three birds, and the bottom cage holds two. The display must meet the following additional conditions:

Alice is in the bottom cage. Bob is in the top cage and is not speckled. Carla cannot be in the same cage as the blue parakeet. Etria is green. The green parakeet and the speckled parakeet are in the same cage.

If Carla is in the top cage, which of the following must be true?a) The green parakeet is in the bottom cage.b) The speckled parakeet is in the bottom cage.c) Diwakar is in the top cage.d) Diwakar is in the bottom cage.e) The blue parakeet is in the top cage.

Inference & Culture Slide 9April 29, 2003

LSAT on AR Items LSAT's description of AR takes a trait perspective:

"Analytical reasoning items are designed to measure the ability to understand a structure of relationships and to draw conclusions about the structure."

AR items are in the LSAT not because either lawyers or law students routinely have to solve problems just like these in their jobs or their studies, but because there is evidence that students who can solve these kinds of puzzles tend to perform better in law school than students who don't.

C: Su e h as a h ig h va lue

of An a lyt ica l R e aso n in g .

W : S tu d en ts w ho a re h ig h o n

Ana lytica l Re a sonin g te nd to d o

we ll on log ic al p u zzle s tha tqu e ry re la t io n s tha t fo llo w fro m

exp lic it re la t io ns a nd co n stra ints.

B: Em pirica l stud ie s sh ow

h ig h co rre la t io ns b etwe e n

AR te st sco res an d colleg eg rad es, ope n -en d ed

p rob lem so lvin g ta sks, an d

ra ting s of e m plo ye e s

r easo n ing s kills on th e jo b .

A: Su e a nsw e re d

co rrect ly as a re sult

of a lu cky gu e ss.

since

so

o n

a cco un to f

u n less

su ppo rts

R: Su e sp e nt less

th a n 10 se co n dson th is item .

D 1: Su e

a n sw ered th e

Pe t Sh op ite m

co rre ctly.

D 2 : L og ica l

st ru ctu re a nd

co n te n ts of Pet

Sh o p ite m .

an d

C: Su e h as a h ig h va lue

of An a lyt ica l R e aso n in g .

W : S tu d en ts w ho a re h ig h o n

Ana lytica l Re a sonin g te nd to d o

we ll on log ic al p u zzle s tha tqu e ry re la t io n s tha t fo llo w fro m

exp lic it re la t io ns a nd co n stra ints.

B: Em pirica l stud ie s sh ow

h ig h co rre la t io ns b etwe e n

AR te st sco res an d colleg eg rad es, ope n -en d ed

p rob lem so lvin g ta sks, an d

ra ting s of e m plo ye e s

r easo n ing s kills on th e jo b .

A: Su e a nsw e re d

co rrect ly as a re sult

of a lu cky gu e ss.

since

so

o n

a cco un to f

u n less

su ppo rts

R: Su e sp e nt less

th a n 10 se co n dson th is item .

D 1: Su e

a n sw ered th e

Pe t Sh op ite m

co rre ctly.

D 2 : L og ica l

st ru ctu re a nd

co n te n ts of Pet

Sh o p ite m .

an d

1) Note that the warrant requires a conjunction of data about the nature of Sue's performance and the nature of the performance situation.

1) Note that the warrant requires a conjunction of data about the nature of Sue's performance and the nature of the performance situation.

W1: Correspondenceof darkest mark andkeyed responsemeans correctanswer.

C: Sue has a high valueof Analytical Reasoning.

W

B

A

since

so

onaccount

of

unless

supports

R

D1: Sueanswered thePet Shop itemcorrectly.

D2: Logicalstructure andcontents of PetShop item.

and

D11 : Sue'smarks on theanswer sheet forPet Shop item.

D12Answer key forthe Pet Shopitem.

since

and

W2: Elements inschemas for validAR items.

D22Particularcontent of PetShop item.

since

2) A closer look at the “data”:

Must reason from unique work products and item materials, to aspects addressed in the general warrant.

2) A closer look at the “data”:

Must reason from unique work products and item materials, to aspects addressed in the general warrant.

Inference & Culture Slide 13April 29, 2003

Multiple pieces of evidence of the same kind

C: Sue has a high valueof Analytical Reasoning.

W:Students who are high onAnalytical Reasoning tend to dowell on logical puzzles thatquery relations that follow fromexplicit relations and constraints.

B: ...

A: ...

since

so

onaccount

of

unless

supports

R: ...

D11: Sue'sanswer to Item 1

D21 structure

and contentsof Item 1

and

D1n: Sue'sanswer to Item n

...D2n structure

and contentsof Item n

...

Inference & Culture Slide 14April 29, 2003

Multiple pieces of evidence of different kinds

C: Sue has a high valueof Analytical Reasoning.

W1:[[warrant relogic puzzles]]

A : [[Alternatives rerecommendations]]

since

unless

D11: Sue'sanswer to Item 1

Dn1 Teacher

recommendationabout Sue

D12 : Structure& content ofPet Shop item

Dn2 Conditions

of observation

for recommendation

...

A0: ...unless

A : [[Alternatives relogic puzzles]]

:Wn: [[Warrant rerecommendations]]

and and

so

unless

since

Inference & Culture Slide 15April 29, 2003

Statistical Modeling of Assessment Data

X1

.X2

.X3

.

p()

p(X1|)

p(X2|)

p(X3|)

Claims in terms of values of unobservable variables in student model (SM)--characterize student knowledge.

Data modeled as depending probabilistically on SM vars.

Estimate conditional distributions of data given SM vars.

Bayes theorem to infer SM variables given data.

Claims in terms of values of unobservable variables in student model (SM)--characterize student knowledge.

Data modeled as depending probabilistically on SM vars.

Estimate conditional distributions of data given SM vars.

Bayes theorem to infer SM variables given data.

Inference & Culture Slide 16April 29, 2003

Behaviorist PerspectiveThe educational process consists of providing a series of

environments that permit the student to learn new behaviors or modify or eliminate existing behaviors and to practice these behaviors to the point that he displays them at some reasonably satisfactory level of competence and regularity under appropriate circumstances. …

The evaluation of the success of instruction and of the student’s learning becomes a matter of placing the student in a sample of situations in which the different learned behaviors may appropriately occur and noting the frequency and accuracy with which they do occur.

D.R. Krathwohl & D.A. Payne, 1971, p. 17-18.

C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p

W:Sampling theory machineryA: [e.g., observational

errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]

since

so

unless

and

for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.

D11: Sue'sanswer to Item j

D11: Sue'sanswer to Item j

D1j : Sue'sanswer to Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.

The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.

C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p

W:Sampling theory machineryA: [e.g., observational

errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]

since

so

unless

and

for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.

D11: Sue'sanswer to Item j

D11: Sue'sanswer to Item j

D1j : Sue'sanswer to Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

The claim addresses the expected value of performance of the targeted kind in the targeted situations.

The claim addresses the expected value of performance of the targeted kind in the targeted situations.

C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p

W:Sampling theory machineryA: [e.g., observational

errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]

since

so

unless

and

for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.

D11: Sue'sanswer to Item j

D11: Sue'sanswer to Item j

D1j : Sue'sanswer to Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

The task data address the salient features of the stimulus situations (i.e., tasks).

The task data address the salient features of the stimulus situations (i.e., tasks).

C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p

W:Sampling theory machineryA: [e.g., observational

errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]

since

so

unless

and

for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.

D11: Sue'sanswer to Item j

D11: Sue'sanswer to Item j

D1j : Sue'sanswer to Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

D2j structure

and contentsof Item j

The student data address the salient features of the responses.

The student data address the salient features of the responses.

Inference & Culture Slide 21April 29, 2003

The Information-Processing Perspective Epitomized in Newell and Simon’s (1972) Human

Problem Solving Examines the procedures by which people acquire,

store, and use knowledge to solve problems. Modeling problem-solving in terms of the capabilities

and the limitations of human thought and memory. Importance of knowledge structures, relationships,

procedures in learning domains. Use of rules, production systems, task

decompositions, and means-ends analyses.

Inference & Culture Slide 22April 29, 2003

Responses consistent with the "subtract smaller from larger" bug

821- 285 664

885- 221 664

63- 15 52

17- 9 12

W :Sampling theory

since

so

and

for items withfeature setdefining Class 1

D11D11D11j : Sue'sanswer to Item j, Class 1

D2j

of Item j

D2j

of Item j

D21j structure

and contents

of Item j, Class1

C : Sue's probability of

answering a Class 1subtraction problem withborrowing is p1

W0: Theory about how persons withconfigurations {K1,...,Km} would belikely to respond to items withdifferent salient features.

W :Sampling theory

since

so

and

for items withfeature setdefining Class n

D11D11D1nj : Sue'sanswer to Item j, Class n

D2j

of Item j

D2j

of Item j

D2nj structure

and contents

of Item j, Class n

C : Sue's probability of

answering a Class nsubtraction problem withborrowing is pn

since

and

so

...

...

C: Sue's configuration ofproduction rules foroperating in the domain(knowledge and skill) is K

W :Sampling theory

since

so

and

for items withfeature setdefining Class 1

D11D11D11j : Sue'sanswer to Item j, Class 1

D2j

of Item j

D2j

of Item j

D21j structure

and contents

of Item j, Class1

C : Sue's probability of

answering a Class 1subtraction problem withborrowing is p1

W0: Theory about how persons withconfigurations {K1,...,Km} would belikely to respond to items withdifferent salient features.

W :Sampling theory

since

so

and

for items withfeature setdefining Class n

D11D11D1nj : Sue'sanswer to Item j, Class n

D2j

of Item j

D2j

of Item j

D2nj structure

and contents

of Item j, Class n

C : Sue's probability of

answering a Class nsubtraction problem withborrowing is pn

since

and

so

...

...

C: Sue's configuration ofproduction rules foroperating in the domain(knowledge and skill) is K

Like behaviorist inference at level of behavior in classes of structurally similar tasks.

Like behaviorist inference at level of behavior in classes of structurally similar tasks.

W :Sampling theory

since

so

and

for items withfeature setdefining Class 1

D11D11D11j : Sue'sanswer to Item j, Class 1

D2j

of Item j

D2j

of Item j

D21j structure

and contents

of Item j, Class1

C : Sue's probability of

answering a Class 1subtraction problem withborrowing is p1

W0: Theory about how persons withconfigurations {K1,...,Km} would belikely to respond to items withdifferent salient features.

W :Sampling theory

since

so

and

for items withfeature setdefining Class n

D11D11D1nj : Sue'sanswer to Item j, Class n

D2j

of Item j

D2j

of Item j

D2nj structure

and contents

of Item j, Class n

C : Sue's probability of

answering a Class nsubtraction problem withborrowing is pn

since

and

so

...

...

C: Sue's configuration ofproduction rules foroperating in the domain(knowledge and skill) is K

Patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior.

Patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior.

D1,t+1: Sue'sactions attime t+1

W: [theory about strategies andprocedures people at various levels oftroubleshooting expertise tend toemploy when iteratively solvingproblems in the domain.]

since

and

so

C: Sue's level oftroubleshootingskill with is K.

D1,t: Sue'sactions attime t

D1,t-1: Sue'sactions attime t-1

D2,t: Contextafter time t

D2,t-1:Context aftertime t-1

...

D1,t-2: Sue'sactions attime t-2

D2,t-2:Context aftertime t-2

...

Assessing inquiry processes:Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action.

Assessing inquiry processes:Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action.

Inference & Culture Slide 27April 29, 2003

The Sociocultural Perspective

Stresses how knowledge is conditioned and constrained by the technologies, information resources, representation systems, and social situations ...

Incorporates explanatory concepts that have proved useful in fields such as ethnography and sociocultural psychology to study collaborative work, … mutual understanding in conversation, and other characteristics of interaction that are relevant to the functional success of the participants’ activities.

Greeno, Collins, & Resnick, 1997, p. 7.

AP Studio Art Portfolios

D11D11D3j : Artpiece j in theconcentration.

D1 :Student's learning

in the course ofcarrying out theconcentration.

W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]

C

since

and

so

C: The level ofperformance forthe Concentrationsection is K.

D2 :Conditions under

which the work wascarried out.

Statements in narrative explaining theconcentration, its influences, goals, etc.

B: Generalrubric

tailors

AP Studio Art Portfolios

D11D11D3j : Artpiece j in theconcentration.

D1 :Student's learning

in the course ofcarrying out theconcentration.

W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]

C

since

and

so

C: The level ofperformance forthe Concentrationsection is K.

D2 :Conditions under

which the work wascarried out.

Statements in narrative explaining theconcentration, its influences, goals, etc.

B: Generalrubric

tailors

Claim concerns level of performance represented by unique project, in socially-determined general evaluation scheme.

Claim concerns level of performance represented by unique project, in socially-determined general evaluation scheme.

AP Studio Art Portfolios

D11D11D3j : Artpiece j in theconcentration.

D1 :Student's learning

in the course ofcarrying out theconcentration.

W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]

C

since

and

so

C: The level ofperformance forthe Concentrationsection is K.

D2 :Conditions under

which the work wascarried out.

Statements in narrative explaining theconcentration, its influences, goals, etc.

B: Generalrubric

tailors

Data from student are (1) works of art and (2) explanation of project goals, approach, rationale.

Data from student are (1) works of art and (2) explanation of project goals, approach, rationale.

AP Studio Art Portfolios

D11D11D3j : Artpiece j in theconcentration.

D1 :Student's learning

in the course ofcarrying out theconcentration.

W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]

C

since

and

so

C: The level ofperformance forthe Concentrationsection is K.

D2 :Conditions under

which the work wascarried out.

Statements in narrative explaining theconcentration, its influences, goals, etc.

B: Generalrubric

tailors

Student text helps assure performance conditions meet the requirements of the warrant.

Student text helps assure performance conditions meet the requirements of the warrant.

AP Studio Art Portfolios

D11D11D3j : Artpiece j in theconcentration.

D1 :Student's learning

in the course ofcarrying out theconcentration.

W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]

C

since

and

so

C: The level ofperformance forthe Concentrationsection is K.

D2 :Conditions under

which the work wascarried out.

Statements in narrative explaining theconcentration, its influences, goals, etc.

B: Generalrubric

tailors

Student text contributes to how raters apply general evaluation rubric to this student’s work.

Student text contributes to how raters apply general evaluation rubric to this student’s work.

D1,t+1: Sue'sspeech act attime t+1

W: [theory about what people atvarious levels of conversationalcompetence will behave in contextswith specified features]

C

since

and

so

C: Sue's level ofconversationalcompetence is K.

D1,t: Sue'sspeech act attime t

D1,t-1: Sue'sspeech act attime t-1

D3,t+1: I'sspeech act attime t+1

D3,t: I'sspeech act attime t

D3,t-1: I'sspeech act attime t-1

...

D2,t: Contextafter time t

D2,t-1:Context aftertime t-1

...

D1,t-2: Sue'sspeech act attime t-2

D2,t-2:Context aftertime t-2

...

D3,t-2: I'sspeech act attime t-2

Conversational Competence

D1,t+1: Sue'sspeech act attime t+1

W: [theory about what people atvarious levels of conversationalcompetence will behave in contextswith specified features]

C

since

and

so

C: Sue's level ofconversationalcompetence is K.

D1,t: Sue'sspeech act attime t

D1,t-1: Sue'sspeech act attime t-1

D3,t+1: I'sspeech act attime t+1

D3,t: I'sspeech act attime t

D3,t-1: I'sspeech act attime t-1

...

D2,t: Contextafter time t

D2,t-1:Context aftertime t-1

...

D1,t-2: Sue'sspeech act attime t-2

D2,t-2:Context aftertime t-2

...

D3,t-2: I'sspeech act attime t-2

Conversational Competence

Challenges:1) Time dependencies.2) Interlocutor’s behavior affects context-- is required by warrant for evidence about certain aspects of competence.3) How constrained? Naturalistic vs. interviewer.

Challenges:1) Time dependencies.2) Interlocutor’s behavior affects context-- is required by warrant for evidence about certain aspects of competence.3) How constrained? Naturalistic vs. interviewer.

Inference & Culture Slide 35April 29, 2003

Conclusion

What changes?Developments in psychology, technology, and social factors (e.g., accommodations) continually place demands on assessment that outstrip familiar forms.

What doesn’t change?We want to draw inferences about what students know and can do as seen from some perspective; that perspective tells us what kinds of things we need to see them do, in what kinds of situations, to ground those inferences.

Inference & Culture Slide 36April 29, 2003

Conclusion

We see elaborations, extensions, and specializations of enduring principles of evidentiary reasoning.

We find continued value in tools such as Toulmin diagrams, Wigmore charts, and Bayesian inference networks to understand yesterday's assessments, manage today's, and design the assessments of tomorrow.