model speciﬁcation for cognitive assessment of ... · model speciﬁcation for cognitive...

Model Specification for Cognitive Assessment

of Proportional Reasoning

Rhiannon Weaver

Department of Statistics

Carnegie Mellon University

Brian Junker

Department of Statistics

Carnegie Mellon University

January 20, 2004

Abstract

In modern psychometric analysis of cognitive assessment, there is a choice between psychometric vs. cognitive

science paradigms for modeling the latent scale. The first involves as few as one continuous latent ability parameter,

while the second focuses on a set of binary latent skills. When the expert cognitive model is qualitatively specified

(eg. paragraphs describing general trends in observable behavior for a set of developmental stages), interpretation

of responses and latent variables is flexible, and may even be ambiguous. Models incorporating aspects from both

psychometric and cognitive science paradigms can help in exploring response patterns, and refining both the exam

design and the cognitive model. Here we present two such analyses of a pilot study of proportional reasoning. The

first is a Rasch model using binary response coding with a continuous latent trait, which can approximate a set of

developmental stages through careful item design and milestones. The second is a Bayes net model using polytomous

response coding linked compensatorily to a set of latent skills, which allows for a factor-analytic approach to skill

interpretation. We explore each schemes usefulness in inferring a students current state of knowledge, directing

program planners, and directing assessment developers toward refining either the qualitative cognitive model or exam

items.

The work presented in this paper was supported in part by NSF grant # ESI-9876538 to the Learning Research and Development

Center, University of Pittsburgh, and in part by a graduate fellowship awarded under NSF VIGRE grant #DMS-9819950. The

authors would also like to thank Gail Baxter, Lou DiBello and Christine Witkowski of the Educational Testing Service in Princeton

NJ for their collaboration in the project.

1

Contents

1 Introduction 4

1.1 Latent Variable Models for Cognitive Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Proportional Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Analysis of an Assessment Test for Proportional Reasoning . . . . . . . . . . . . . . . . . . 7

2 Data 10

2.1 The Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Response Strategy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Dichotomous Coding with a Single Latent Trait: Rasch Model 19

3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Polytomous Coding with Many Latent Skills: Bayes Net 34

4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2.1 Classifying Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2.2 Interpreting Skills through Examination of Logistic Regression Coefficients . . . . 44

4.3 Goodness of Fit for the Bayes Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Methodological Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.4.1 Assessing The Influence of Individual Students . . . . . . . . . . . . . . . . . . . . 59

4.4.2 Contradictory Skills Model Misspecification or Model Misfit? . . . . . . . . . . . 60

4.4.3 Prior Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Future Work 68

6 Suggestions for Future Design 70

7 Summary 72

8 References 73

2

9 Appendix 76

9.1 Exam Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

9.2 Conditional Percentage Estimates and Standard Errors: Rasch Model . . . . . . . . . . . . . 79

9.3 Regression Estimates and Standard Errors: Bayes Net . . . . . . . . . . . . . . . . . . . . . 80

9.4 Student K-maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3

1 Introduction

1.1 Latent Variable Models for Cognitive Diagnosis

In educational settings, diagnostic assessments serve as barometers of student performance. A cognitive

model is developed which outlines the progression of understanding that a student undergoes as he or

she learns a particular concept. From this cognitive model, the assessment test itself is designed. The

test consists of a set of items that yield observable results based on a students unobserved position in

the cognitive model. In contrast to a purely evaluative assessment geared toward measuring a students

mastery of a specific curriculum, diagnostic assessments are also used as teaching tools; they give instructors

feedback on how to adapt teaching styles to help students having trouble, and help program planners to adapt

curricula when evidence shows a general lack of understanding across all students.

Latent variable models are used extensively in cognitive diagnosis, with a range of modeling schemes.

Most schemes can be described through a two-way heirarchical structure (Junker, 1999), with the manifest

variables (ie, observed responses) and task features at the first level, and latent variables (ie, unobserved

student parameters) at the second level. Manifest variables can be coded as dichotomous or polytomous; the

choice of how to represent the observed responses depends on the item design and the detail, or granularity,

coded in the latent space. In modern psychometric analysis, there is a choice between as few as one continu-

ous generic ability parameter for the latent space vs. a larger-dimensional set of traditionally discrete skills.

This distinction has been referred to as a psychometric vs. cognitive science representation of the students

knowledge (Hunt, 1995).

Junker (1999) surveys a number of models for statistical analysis under these frameworks. He notes

that the main task in model construction is not one of choosing a continuous vs. discrete latent scale, but

in choosing the granularity of the model. This choice is dictated by the specific research situation and

questions of interest. The models Junker surveys can be used as a general guideline or set of building blocks

to accomodate a given cognitive model for a specific situation.

Item Response Theory (IRT) models based on a single latent trait are simpler than multidimensional

models for estimation and inference, but often it is difficult to interpret the ability parameter when directing

instructors toward a students particular weaknesses. However, when the cognitive model indicates a single

progression of understanding, the latent trait can be used to classify students according to a more detailed

set of developmental stages by relying on careful item design and milestones. This is similar to the approach

of Masters and Forster (1999) and Draney, Pirolli and Wilson (1995). In this scheme, the latent trait can

capture a richer underlying structure based on skills without specifically adding the complexity of skills into

4

the model.

Skills-based models often focus on breaking a task into a set of known sub-tasks and considering each

sub-task as a skill. Skills can then be deterministically linked to items in a conjunctive way, with possession

of all skills required for a correct response. But it can sometimes be difficult to see how the possession of a

particular skill is related to the response. This is especially true when the responses themselves are strategies

a student employs in constructing their answer to a word problem. In this case, it may make sense to first

explore the relationship between strategies and underlying skills through a broader, compensatory model.

Compensatory models, so named because posession of one latent skill can compensate for the lack of

another, have been explored both for multidimensional latent trait variables (eg, Wilson, Wood and Gibbons,

1983; Muraki and Carlson, 1995) and for discrete latent skill variables (eg, Maris, 1995). Minimally con-

strained models of this type allow for an exploratory factor-analytic interpretation of the latent scale, and

highly constrained models allow for confirmatory factor analysis. This kind of analysis can help in defining

and interpreting a discrete skill space for a broadly specified cognitive model.

Both IRT and compensatory discrete skills models can be useful in situations which do not immediately

fall into either of the psychometric or cognitive science extremes. Such is the case for proportional reasoning,

in which facility is measured through both strategy choice and implementation of strategy choice. Mastering

proportional reasoning involves not only mathematical competency, but also a psychological familiarity

that can be hard to quantify when deconstructing tasks into component steps. This makes it difficult to

model proportional reasoning with a skills-based cognitive science approach. On the other hand, addressing

the psychological facets of proportional reasoning with a psychometric model gives little insight into why

students choose the strategies they do.

1.2 Proportional Reasoning

The term proportional reasoning is used to describe any kind of reasoning that focuses on the relation-

ship between two ratios. Proportional reasoning is involved in everyday mathematical situations such as

calculating the correct dosage for medicine, converting a measurement from one unit to another (pounds

to kilograms, Celsius to Farenheit), or calculating estimated arrival times for travel. Generic proportional

reasoning tasks require the student to reason with the equation

ab=

cd

(1)

Though proportional reasoning is taught in a mathematics setting and can often be reduced to a simple

linear model y = kx, it is actually a psychological construct: The essential characteristic of proportional

5

reasoning is that it focuses on describing, predicting or evaluating the relationship between two relationships

(i.e., a second-order relationship) rather than simply a relationship between two concrete objects (or two

directly perceivable quantities). (Piaget & Imhelder, 1975, via Lesh, Post and Behr, 1992). Students first

learning proportional reasoning rarely approach the subject from a purely mathematical standpoint. It is an

important concept in middle school mathematics because it is often a students first exposure to explicitly

modeling these kinds of second-order relationships. To master proportional reasoning, a student must be

able to (Baxter and Witkowski, January 2002)

Conceive of a multiplicative relationship and possess a notion of change in a relative sense;

Recognize that when two quantities are changing, the change in one depends on the change in the other

(covariance);

Recognize that while some aspects of the situation change, the ratio relationship remains constant

(invariance); and

Employ an appropriate multiplicative strategy to solve problems;

Solving proportional reasoning tasks often reduces to solving the relationship in equation 1 for a missing

value, as in Figure 1. Such tasks are called Missing Value tasks. However, student strategy in achieving the

correct answer varies. Vergnaud (1983) has developed extensive models for student strategies in proportional

reasoning tasks, and a version of his model for strategies in Missing Value tasks appears in Figure 1.

Typically, the student conceives of two measure spaces, where each space represents one of the units

in the problem (candy, money, inches, etc). The student can then compare differing values within a single

measure space or similar values between measure spaces. Comparisons within measure spaces involve

scale factors, and lead to so-called scalar strategies, whereas comparisons between measure spaces involve

divining the functional relationship between two quantities, and lead to so-called functional strategies. After

making a comparison, students may arrive at the correct answer through an additive method (build up or

count out), or through multiplication or division. A common error is to replace the multiplicative ratio

model with an additive model, substituting subtraction for division. Cross-multiplication is seen as a separate

strategy from other multiplicative strategies because it is procedural in nature; students are taught to set the

problem up as ad = bc and to solve, however it is not clear if students employing this strategy understand

the second-order relationship entailed in proportional reasoning.

Baxter and Junker (November, 2001) proposed a broad, developmental stages model for proportional

reasoning in middle school students (see Figure 2). This qualitative developmental cognitive model is based

on an extensive literature review and on interviews with subject matter experts. The model is qualitative in

6

Example: The pizza shop advertises that 3 pizzas will serve about 10 people. How much pizza should I buy

if there will be 50 people at my party?

-Measure 1 Measure 2

Value 1

Value 2

a b

c (d)??

6

PizzaPeople

310

?50

Reasoning Strategy Calculation Solution

Multiplicative Functional ba = k d = ck

Multiplicative Scalar ca = l d = bl

Multiplicative Cross-Multiplication ad = bc d = bca

Additive Build Up ca = l d = b + ... + b (l times)

Erroneous Addition a b = c d d = b + (c a)

Other Other Squaring, Estimation, etc. Varied

Figure 1: A Vergnaud Model for Missing Value tasks in proportional reasoning.

that it does not explicitly list a quantifiable set of underlying skills that students learn, but instead proposes

stages of development and a general overview of observed student performance in each stage. The stages

are ordered in that they represent a progression of understanding, with higher stages indicating a higher level

of sophistication in proportional reasoning.

Sophistication in this sense embodies an ability first to set up a framework that outlines the correct

relationships between quantities, and then to generalize solution strategies and approach the arithmetic from

a context-free, abstract point of view. Much of this sophistication is evidenced not only by the students

ability to compute the right answer, but the strategy that the student employs in the computation. More

sophisticated strategies are those that indicate more facility with the concepts of proportional reasoning.

Thus the cognitive model outlines expected performance not only in terms of which kinds of tasks the student

will get right or wrong, but also in terms of strategies the students will employ when solving problems.

1.3 Analysis of an Assessment Test for Proportional Reasoning

This paper focuses on data collected from a diagnostic test developed from Baxter and Junkers proposed

cognitive model for proportional reasoning. Section 2 outlines the diagnostic test, study design and data

collection. The overall goal of the study is to describe how the diagnostic assessment test fares in describing

a students current state of knowledge. This goal is two-fold, in that a well-designed test should be able to:

7

II) Early Attempts at Quantifying

III) Recognition of Multiplicative Relationship

Students have the intuition that a ratio is two numbers that change

rely on additive strategies such as build up when multiplicative reas

distinguishable from situations involving relative change.

together but the change may be additive or multiplicative. They often

oning is required. Situations involving absolute change are not always

IV) Accomodating Covariance and Invariance

V) Functional and Scalar Relationships

OBSERVABLE PERFORMANCESTAGE

I) Qualitative Young students generally possess a good deal of knowledge aboutquantity that permits them to answer questions about more and less eg, which drink is sweeter?) or fairness (eg,divide pizza or cookies so everyone gets a fair share).

Early attempts at quantifying often involve constant additive differences (ie, a b = c d) rather than multiplicative relationships

Students recognize the invariant nature of the relationships between pairs of changing quantities. These students have a repetoire of generalizable strategies and they select the most efficient strategy fora given problem. Conceptions of covariance and invariance are welldeveloped.

Students begin to develop a multiplicative change model. They recog nize that while some quantities may be changing, relationships amongthe quantities remain invariant. They view a ratio as a single unit to which basic arithmetic operations may be applied. They can typicallydistinguish situations involving absolute change from those involving relative change. Strategy use is contextspecific and when the num

plicative situations. Concepts of covariance fail when students are asked to scale up a figure

bers are hard these students may resort to additive reasoning in multi

Figure 2: The proposed cognitive model for development of proportional reasoning (Baxter and Junker,

November 2001).

8

Reflect student progress toward targeted learning goals

Make clear gaps in students understanding

In terms of model development, these goals are achieved by specifying a cognitive model, translating the

cognitive model into a meaningful latent structure and designing items that pinpoint a students place on the

chosen latent scale. The latent structure and the response coding can either enhance or frustrate inference

relating to these goals. Sections 3 and 4 outline two different choices of latent structure and response coding:

a univariate continuous latent variable with a dichotomous response coding, and a set of binary latent skills

with polytomous response coding.

The simpler latent structure and coding scheme described in Section 3 can address the first goal, reflect-

ing student progress toward learning goals. Given a set of ordered developmental stages with associated

patterns of observable performance, it is plausible that test items can be designed such that increasing so-

phistication leads to increasing general ability to generate right vs. wrong answers. Classification then is a

simple matter of relative ranking. Almost all of the work for inference on the latent scale is done behind

the scenes the latent structure gains a meaningful interpretation of developmental stages due to careful

item designs specifically targeted at milestones in development. This scheme is very similar to the profi-

ciency/difficulty scale advocated by Masters and colleagues (Masters and Forster, 1999; Draney Pirolli and

Wilson, 1995; Masters and Evans, 1986). Though it can be informative relative to an existing set of develop-

mental stages, this scheme relies heavily on a single progression of learning, and limits inference on where

the underlying cognitive model may fail for particular student.

The analysis described in Section 4,using discrete latent skills and polytomous response coding, can

address the second goal of explaining why students choose particular strategy patterns. With this approach,

we can define developmental stages as clusters of likely skill patterns, but we are not limited to knowledge

a priori of such clusters. This approach addresses more flexibly the problem of defining gaps in student

understanding, and it allows for a more broad classification of students according to which skills they pos-

sess. However, with increased complexity comes increased difficulty in interpretation. Skills gain meaning

through both the task types that highlight them and the strategies they elicit. These relationships can be

difficult to define in the mathematical context of the model.

Sections 5, 6 and 7 conclude with a discussion of what our analyses tells us about the process of

developing cognitive diagnostic tests for broadly-defined developmental stages or skills, in terms of defining

and pursuing specific goals in inference, data requirements for pursuing those goals, and the use of pilot

studies to refine and redesign future tests.

9

2 Data

2.1 The Study Design

Data for this analysis comes from a pilot study conducted in a major urban school district in May 2001

(Baxter and Witkowski, 2002), to support the design of a cognitively diagnostic assessment in proportional

reasoning. Two forms of a test for proportional reasoning were given to 125 students, taken from three

different middle schools. Exam 1 had 14 items, while exam 2 had 15 items.

In general it is very difficult to deduce strategy choice solely from the students final answer alone.

Instead, for this analysis the strategies themselves were coded as the observed response. In order to achieve

this, students were interviewed as they took the diagnostic test and were recorded via audiotape. In most

cases, a written transcript of the interview was also provided. The students numerical answer and written

work were also recorded. From these transcripts, interviews and written answers, a general strategy was

coded by Baxter and Witkowski (2002) for each item.

Seven major kinds of tasks were asked on the forms. These are described in Figure 3. The first three task

typesMissing Value, Similarity, and Comparisonwere designed to test overall facility with proportional

reasoning. The last four task typesEquivalence, Invariance, Relative/Absolute, and Covariancewere

designed to pinpoint specific aspects of reasoning involved in a proportional reasoning task. Tables 1 and

2 show the breakdown of task types on the two exam forms. Tasks 1.10 and 2.13 were not scored. Students

did not perform as expected on these tasks, and after inspection it was decided that these tasks were not

strictly proportional reasoning tasks.

The qualitiative cognitive model describes observable performance not only in terms of correct vs. in-

correct answer, but also in terms of the specific strategies employed to solve each problem. Because of

this specificity, it was necessary to interview the students during the exam and to record transcripts of their

solution strategies. There is some flexibility in how to code the observed strategies so that the data set will

be rich enough to capture the cognitive model but sparse enough for inference with a relatively small sample

of students.

2.2 Response Strategy Coding

The ability to make inferences on the latent cognitive model depends not only on the specificity of the

item design, but also on how the responses are coded. Due to the small sample size, relatively coarse

codings for each task type were needed. These codings were chosen with the observed performance from

10

Students are given three of four values (a, b and c), two of the values are presented as a ratio (a/b) and the task is to deter mine the missing value d such that a/b = c/d.

A subtype of missing value questions involving scaling. Students are asked to scale the given quantities up or down, preserving similarity or some other quantity modelled as a ratio orproportion (eg, taste, shape).

Students are presented with four values (a,b,c and d) in the form of two ratios (a/b and c/d). The task is to determine if thetwo ratios are equivalent.

A subtype of comparison questions. Students are presentedwith ratio comparison or equivalent fraction problems in arithmetic symbols without a story context.

Students are presented with a ratio relationship and asked to judge if the ratio relationship is the same or different under a changed condition

Students are presented with a scenario in which they are required to make a judgement about a given relationship to solvefor an unknown. Students must decide if the relationship tosolve is relative (ie, using constant ratios a/b = c/d) or absolute(ie, using constant differences a b = c d).

Students are asked to rearrange collections of colored chips orother objects while preserving a ratio relationship of colors to numbers. This was done using figures on paper, and thus it waspossible for students to come up with a rearrangement that used a different number of chips than was originally present.

Missing Value

DESCRIPTIONQUESTION TYPE

Similarity

Comparison

Equivalence

Invariance

Relative/Absolute

Covariance

Figure 3: Task types and descriptions from the pilot study.

11

Item Part Type Note

1.1 a,b Invariance Shading task

1.1 c Covariance

1.2 - Missing Value

1.3 - Missing Value

1.4 - Missing Value

1.5 a,b Relative/Absolute

1.6 - Relative/Absolute

1.7 - Similarity Scaling task

1.8 - Missing Value

1.9 - Similarity

1.10 - Invariance Not scored

1.11 b Invariance

1.12 - Equivalence

1.13 - Equivalence

1.14 - Equivalence

Table 1: Items on Exam 1.

12

Item Part Type Note

2.1 a,b Invariance Shading task

2.2 a Missing Value

2.2 b Missing Value

2.3 - Comparison

2.4 - Missing Value

2.5 - Comparison

2.6 - Invariance

2.7 a Similarity Scaling task

2.8 - Missing Value

2.9 - Similarity

2.10 - Invariance

2.11 a Comparison

2.11 b Similarity

2.12 - Missing Value

2.13 - Context-free MV Not scored

2.14 - Equivalence

2.15 - Equivalence

Table 2: Items on Exam 2.

13

the cognitive model in mind, as well as the specific aspects of proportional reasoning the the task type

is probing. For example, the cognitive model specifies that students with lower levels of sophistication

often make qualitative comparisons, so the Comparison coding incorporates a qualitative strategy. The

cognitive model also specifies that, when asked to describe the invariant features of a ratio relationship,

students with lower sophistication will rely on surface features as opposed to the ratio. Thus the Invariance

coding incorporates a surface strategy. And although a student may use count up or subtraction methods

on Relative/Absolute tasks, the coding is simply a reflection of the students recognition of an absolute task

context, as that is the specific aspect of proportional reasoning that Relative/Absolute tasks were designed

to measure.

Figure 4 displays three different strategy codings for Missing Value tasks (similar codings for all task

types are shown in Figures 5 through 10). The initial coding for the data is very rich but is difficult to use

with such a small data set. The Modified Vergnaud coding condenses the strategies into seven general

categories, but retains more detail in the erroneous strategies than the Vergnaud model described in Figure

1. It also condenses Functional and Scalar strategies into one single multiplicative category. This was done

because there were very few ( < 10) instances in the pilot study where students used functional strategies,

but there were a greater number of observed erroneous strategies. Merging functional and scalar strategies

into one simple multiplicative strategy may make it harder to distinguish students in Stage IV vs. Stage V,

whereas breaking erroneous strategies into four separate strata may help in distinguishing between Stage III

and Stage IV students.

Finally, the Valid/Invalid coding is a very coarse meausurement of valid vs. invalid strategies. In this

case, a valid strategy is one that will lead to the correct answer if implemented without simple arithmetic

errors. An invalid strategy is one that is not recognizeable or that will not lead to a correct answer if

implemented without simple arithmetic errors. Prelminary data analysis indicated that correctness of answer

was independent of task given strategy choice. The features of each task influenced the students strategy

choice, and the correctness or incorrectness of the answer was a result of the implementation of the chosen

strategy. Thus, valid vs. invalid strategy choice is closely related to right vs. wrong answer, but is more

stringent toward guessing (invalid strategy which may lead to the right answer).

Within each level (valid or invalid), the strategies still have a measure of order. For instance, though all

are valid strategies, additive strategies are the least sophisticated, followed by cross-multiplication, followed

by multiplicative strategies. In invalid strategies, a non-recognizeable strategy (N/C) is not as sophisticated

as a strategy with a misconception of the covarying relationship, which in turn is not as sophisticated as an

14

Functional OperatorScalar OperatorScalar DecompositionFactor of ChangeUnit ValueUnit Value*CrossMultiplication*Build UpBuild UpBuild Up *Count Out

Incorrect Addition

ProductQuotientOtherIncomplete

InverseEstimationErr. Repeated AddnErr. Build UpSquaring

Additive/Mult.PartialMultiplicative

Valid

Invalid

Initial Coding Modified Vergnaud Valid/Invalid

N/C

Misconceptions

CrossMultiplication

Additive

Multiplicative

Incorrect Implementation

Meaningless Multiplicative

Figure 4: Three different coding schemes for Missing Value tasks.

15

incorrect implementation strategy (ie, a faulty implementation of an otherwise valid strategy).

The final codings and descriptions for each task type are defined in Figures 5 through 10. Tables 3 and

4 show the breakdown of responses from the pilot study for each item. The specific questions asked on each

item are shown in the Appendix.

3 Incorrect Implementation(Inc. Imp.)

2 Additive

1 CrossMultiplication(XMult.)

6 NonClassifiable(N/C)

5 Meaningless Multiplicative(M. Mult.)

4 Misconceptions(Misconcep.)

DescriptionStudent uses some form of Functional or

Student uses BuildUp or Count Out

Student displays some knowledge of thecovarying relationship but makes a

Student substitutes the multiplicativecovariance relationship with addition,squaring, or a combination of addition

Scalar strategy.Student uses CrossMultiplication.

strategy.

mathematical or conceptual error.

and multiplication.Student chooses two numbers and

Student uses a nonclassifiable or nonrecognizeable strategy.

multiplies or divides them.

# Name (Abbreviation) Valid/Invalid0 Multiplicative

(Mult.)

Valid

Invalid

Missing Value / Similarity: Students are given three of four values (a,b,and c), two ofthe values presented as a ratio (a/b) and are asked to solve for the missing value. Similarity problems are a subtype that specifically ask the students to scale ratios in order to preserve similarity or other quantities (eg, taste, shape).

Figure 5: Final coding used for Missing Value and Similarity tasks.

16


(Mult.)

Valid

3 Incorrect Implementation(Inc. Imp.)

2 Additive

1 CrossMultiplication(XMult.)

DescriptionStudent uses some form of Functional or

Student uses BuildUp or Count Out

Student displays some knowledge of thecovarying relationship but makes a

Scalar strategy.Student uses CrossMultiplication.

strategy.

mathematical or conceptual error.4 Qualitative Student uses a comparison with the

words "more" or "less", either with addition or noncomputationally.

5 NonClassifiable(N/C.)

Student uses a nonclassifiable or non recognizeable strategy.

Invalid

Comparison: Students are presented with two ratios (a/b and c/d) and asked to determineif the ratios are equivalent.

Figure 6: Final coding used for Comparison tasks.


(Mult.)1

DescriptionStudent uses some form of Functional orScalar strategy.

5 NonClassifiable(N/C.)


2

Valid

Invalid

Additive Student uses BuildUp or Count Outstrategy.

Misconceptions(Misconcep.)

Student substitutes the multiplicative equality with addition, squaring, or a

3combination of addition and multiplication.

NonComputational(NonComp.)

Student matches surface patterns of thefractions or gives a noncomputationalreason (e.g. "it just looks right").

Elimination/Guess(Elim./Guess.)

Student chooses a multiple choice answer via elimination or guessing.

4

Equivalence: Students are presented with threeway comparison problems without astory context.

Figure 7: Final coding used for Equivalence tasks.

17

# Valid/Invalid0

1

Description

3

Ratio

NonRatio

Valid

Invalid

NonClassifiable(N/C.)

Name (Abbreviation)


Student recognizes an invariant ratiorelationship.Student mentions a mathematical invar

not a ratio relationship. Surface2 Student focuses on a surface feature of

the task (e.g. color, shape).

iance (either correct or incorrect) that is

Invariance: Students are presented with a ratio relationship and asked to judge if therelationship changes under a changed condition

Figure 8: Final coding used for Invariance tasks.

# Valid/Invalid0

1

Description

3

Valid

Invalid


Name (Abbreviation)


2

Absolute

Erroneous Absolute

Relative

(Err. Absolute)

Student employs a form of absolute reas

Student employs a form of absolute reasoning but makes a conceptual error whenimplementing the absolute strategy.

oning (a b = c d).

Student employs a form of relative reasoning (a / b = c / d).

Relative/Absolute: Students are presented with common missing value settings but mustrecognize that the relationship is absolute (a b = c d) rather than relative (a/b = c/d)

Figure 9: Final coding used for Relative/Absolute tasks.

18



4

# Valid/Invalid0

DescriptionName (Abbreviation)

2

Ratio and Totals (Pres. Both)

Student preserves the ratio relationship,the overall total number of chips and thesubtotals or red and blue chips.

1 Ratio Only Student preserves the ratio relationship,but not the total numbers of chips.

All Totals(Total (RB))

Valid

3 Overall Total(Overall)

Student preserves only the overall totalnumber of chips.

Invalid

Student preserves the total number of chips and the totals for red and blue, butnot the ratio relationship.

Covariance: Students are asked (on paper) to rearrange collections of colored chips whilepreserving a ratio relationship of colors to numbers.

Figure 10: Final coding used for Covariance tasks.

3 Dichotomous Coding with a Single Latent Trait: Rasch Model

3.1 Method

As a students developmental stage increases, the student should gain facility in choosing valid strategies

on tasks regardless of the type of proportional reasoning task or the item characteristics of non-proportional

reasoning such as arithmetic difficulty. Thus, ranking students based on their facility in choosing valid vs.

invalid strategies should also capture patterns of response outlined in the five stages of the cognitive model

from section 1.2. To explore this conjecture, we fitted the Rasch (1980) model to the response variables

Xi j =

1 : Student i chose a valid strategy on item j

0 : Student i chose an invalid strategy on item j

and examined changes in strategy choice with respect to increasing latent score. The Rasch model, common

in item response theory (IRT), defines latent variables i and difficulty parameters j, such that

P j(i) = P(Xi j = 1) =ei j

1 + ei j(2)

In this context, the variable i represents the propensity of student i to make a valid strategy choice (X i j =

1), whereas the variable j measures the difficulty of item j. The model also assumes local independence:

19

Item Strategies Total

Missing Value Mult. X-Mult Additive Inc. Imp. Misconcep. M. Mult. N/C

1.2 46 2 16 0 0 4 0 68

2.4 34 0 9 0 1 3 2 49

1.4 51 3 11 0 0 8 3 76

2.2a 33 0 21 0 0 1 2 57

2.2b 26 0 0 11 0 8 12 57

2.9 31 1 2 0 11 5 6 56

2.8 28 0 18 0 0 0 11 57

1.3 60 1 2 1 0 1 3 68

1.8 32 6 16 0 0 7 7 68

2.12 12 2 5 17 6 2 13 57

Similarity Mult. X-Mult Additive Inc. Imp. Misconcep. M. Mult. N/C

1.7 5 6 1 2 46 1 7 68

2.7a 5 1 3 3 25 0 20 57

2.11b 30 2 2 1 6 4 12 57

1.9 15 3 0 2 0 0 48 68

Comparison Mult. X-Mult Additive Inc. Imp. Qualitative N/C

2.3 31 1 4 8 5 8 57

2.5 14 0 4 14 12 13 57

2.11a 22 0 0 1 24 10 57

Table 3: Observed strategies by item from the pilot study: Tasks stressing overall facility with proportional

reasoning.

20

Item Strategies Total

Equivalence Mult. Additive Misconcep. Non-Comp Elim/Guess N/C

1.12 24 1 17 7 2 17 68

1.13 38 2 13 1 0 14 68

1.14 29 7 15 0 2 15 68

2.14 10 4 13 4 4 22 57

2.15 17 3 11 1 2 23 57

Invariance Ratio Non-Ratio Surface N/C

1.1ab 5 6 56 1 68

2.1ab 6 8 43 0 57

1.11b 16 0 0 16 32

2.6 27 0 0 21 48

2.10 34 0 0 23 57

Relative/Absolute Absolute Err. Absolute Relative N/C

1.5 22 3 43 0 68

1.6 39 15 15 5 74

Covariance Pres. Both Ratio Only Total(RB) Overall N/C

1.1c 35 12 9 3 9 68

Table 4: Observed strategies by item from the pilot study: Tasks highlighting singular aspects of proportional

reasoning.

21

For N students and J items, the Xi j are viewed as conditionally independent given i, j, yielding the

likelihood,

N

i=1

J

j=1

[

P j(i)]Xi j [

1 P j(i)]1Xi j

Parameter estimates were obtained via a Markov Chain Monte Carlo (MCMC) method, incorporating

a Metropolis-within-Gibbs sampling scheme. The j parameters were given Normal(0, 1) priors. The i

parameters were given Normal(0, ) priors, with estimated.

To check goodness of fit for the underlying Rasch model, response curves for each item were fit as a

function of . That is, for item j, we plotted

P j() =e j

1 + e j

for a range of values. Non-parametric curve estimates were achieved by calculating the average response

iN Xi j#{i N}

for a suitable interval N, and plotting this point at the average i value for the bin N.

Outfit statistics were also calculated for each item to test goodness of fit. From Johnson, Cohen and

Junker (1999), the outfit statistic for item j calculated from observed values x for N students is

T (x|, ) =N

i=1

(xi j Ei j)2

NWi j

where Ei j is the expected value of Xi j given i and j, and Wi j is the variance of Xi j given i and j. Posterior

predictive p-values of the statistic can be estimated using the MCMC output. Specifically, let x1...xM be

data simulated from the model with corresponding and parameters from steps 1...M of the chain. The

predictive p-value can be approximated as:

p j #{s : T j(x|s, s) < T j(x

s |s, s); s = 1...M}

M

If the value of p j is small (< 0.05, for example) for a particular item j, then there may be reason for concern

about the models fit to that particular item.

The developmental stages were approximated by grouping students according to their estimated i val-

ues. Based on the literature (from Baxter and Junker, November 2001), the majority of middle school

students should be in Stage IV, with smaller proportions of students in Stages III and V, and essentially no

22

students in Stages I or II. To approximate these, students with i values in the lowest 20% were classified

as Stage III, students in the middle 60% were classified as Stage IV, and students in the highest 20% were

classified as Stage V.

By monotonicity in the response functions in equation ( 2), we know that as a students stage increases,

that student has a higher propensity of picking valid strategies over invalid strategies. However, the Rasch

model does not detail what kind of valid strategies are chosen, or what kind of invalid strategies are chosen.

To explore this, we conditioned on valid vs. invalid strategy choice and studied changes in sophistication

for the sub-types of valid strategies and the sub-types of invalid strategies as approximated stage increases.

In order to get a measure of variability, conditional percentages were computed for approximated stages

based on ranked i values at each step of the Markov Chain. This method re-assigned strategy choices to the

three developmental stages in groups by student. The standard errors take into account variablity between

students but cannot account for the variability of answers from a single student.

3.2 Results

For this analysis, Markov Chains of 20,000 iterations were run, using candidate distributions tuned to yield

acceptance of between 50% and 70% of the . and . components at each step. The first 1000 iterations

from each chain were discarded for burn-in. As a diagnostic for assessing convergence, Gewekes (1992)

statistic comparing means of the beginning and end of the chain showed no serious divergences for the

parameters. The parameters were slow-mixing, and Gewekes diagnostic showed signs of non-stationarity

for all but three of these parameters. However, quantile-quantile plots (first half of remaining observations

vs. second half of remaining observations) for the problematic parameters did not indicate any serious

deviations, and density plots were generally symmetric and unimodal. Estimates of i for each student and

j for each item were obtained using posterior means.

Outfit predictive p-values for the 30 items in the pilot study are shown in Table 5. With the exception of

perhaps items 2.15 and 1.6, the outfit statistics do not indicate a lack of fit. Fitted response curves for these

two items are shown in Figure 11. The intervals N were chosen to incorporate 15 observations to each bin,

although the bin corresponding to the highest values may contain fewer than 15 observations. While item

2.15 has a smooth plotted response curve, item 1.6 seems to have an anomolous dip in the plotted response

curve.

Figure 12 shows the ranking of students according to estimated i for valid strategy choice, while Table

6 shows the ranking of items according to the estimated j difficulty parameters.

23

Outfit Outfit

Type Item P-value Type Item P-value

Missing Value 1.2 0.8135 2.5 0.1549

2.4 0.4675 2.11a 0.4939

1.4 0.8282 Equivalence 1.12 0.4275

2.2a 0.8744 1.13 0.2319

2.2b 0.7880 1.14 0.3087

2.9 0.4866 2.14 0.8493

2.8 0.6524 2.15 0.0741

1.3 0.8311 Invariance 1.1ab 0.8872

1.8 0.7647 2.1ab 0.6506

2.12 0.6904 1.11b 0.3323

Similarity 1.7 0.5976 2.6 0.6081

2.7a 0.5964 2.10 0.7538

2.11b 0.6791 Relative/Absolute 1.5 0.4733

1.9 0.7173 1.6 0.0523

Comparison 2.3 0.2214 Covariance 1.1c 0.3784

Table 5: Approximate posterior predictive p-values for outfit statistics calculated from the binary Rasch

Model.

24

Theta (%iles ~= 0.26 )

Bet

a 22

= 0

.633

2

3 2 1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

X

X

X

X

Question 2.15

Theta (%iles ~= 0.2 )

Bet

a 29

=

0.02

16

3 2 1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

X X

X

X

X

Question 1.6

Figure 11: Observed and fitted response curves for Valid Strategy Choice, items 2.15 and 1.6.

25

0 20 40 60 80 100 120

2

1

01

23

The

ta

Students

Stage III20%

Stage IV60%

Stage V20%

Figure 12: Ranked posterior means i for 125 students. Students are grouped to approximate the three

Developmental Stages from the Cognitive Model.

26

Type Item Type Item

-2.60 Missing Value 2.2a 0.01 Covariance 1.1c

-2.53 Missing Value 1.2 0.08 Invariance 1.11b

-2.39 Missing Value 1.3 0.17 Missing Value 2.2b

-1.97 Missing Value 2.4 0.49 Comparison 2.11a

-1.82 Missing Value 1.4 0.63 Equivalence 2.15

-1.52 Missing Value 2.8 0.67 Equivalence 1.12

-1.38 Missing Value 1.8 0.72 Missing Value 2.12

-0.61 Comparison 2.3 0.81 Comparison 2.5

-0.50 Missing Value 2.9 0.86 Relative/Absolute 1.5

-0.46 Invariance 2.10 1.16 Similarity 1.9

-0.45 Similarity 2.11b 1.19 Equivalence 2.14

-0.33 Equivalence 1.13 1.66 Similarity (II) 1.7

-0.32 Invariance 2.6 1.71 Similarity (II) 2.7a

-0.07 Equivalence 1.14 2.07 Invariance (II) 2.1ab

-0.02 Relative/Absolute 1.6 2.47 Invariance (II) 1.1ab

Table 6: Items ranked from easiest to hardest via difficulty parameter. Similarity (II) indicates a figure

scaling task. Invariance (II) indicates a shading task.

27

The most obvious pattern in estimated difficulty parameters is the cluster of Missing Value tasks with

low estimates. All but three of the Missing Value tasks fall in the first quartile of ranked parameters.

The cluster of low-ranked missing value tasks are very similar in structure. All require the student to scale

up a ratio, and all have at least one integer multiple relationship between compared quantities. Item 2.9

has a slightly higher difficulty parameter than the cluster mentioned above. Though item 2.9 has an integer

multiple relationship, the relationship also happens to be a square (7 to 49), and elicits a higher proportion

of squaring strategies than other items. Item 2.2b, ranked second in difficulty for Missing Value tasks, is the

only missing value task on either test form that asks the students to take a given ratio and scale the numbers

down to obtain a smaller result. Item 2.12 was the most difficult Missing Value task. The ratio relationships

in this item contained common divisors but no integer multiples, and the numbers were very close (9/15 to

12/20). This item elicited many incorrect implementation and incorrect addition strategies.

The most difficult items are categorized as Similarity and Invariance tasks. A (II) beside the Similarity

task type indicates that the task asks students to scale up a figure. This task is specifically mentioned in

the cognitive model as difficult for all but stage V students. Indeed, the figure scaling tasks do appear to

be among the most difficult for students. A (II) beside the Invariance task type indicates that the Invariance

task specifically asks students to describe varying and invariant characteristics of sets of shaded figures.

These shading tasks were the most difficult on the exams; students in general had a hard time discerning the

constant ratio of shaded to unshaded regions in each set of figures. These items elicited a high proportion of

surface strategies.

Table 7 shows the breakdown of valid vs. invalid strategies chosen by students in each approximated

developmental stage. Figures 13 through 15 show bar graphs of the breakdown of percentages for valid

strategies, with a (*) indicating percentage estimates with standard errors above 0.03.

In general, the patterns of response do indicate that Stage V students choose different kinds of valid

strategies than Stage III students. Specifically, students who have higher overall proficiency at choosing valid

strategies tend to choose valid multiplicative strategies more so than students with lower overall proficiency

at choosing valid strategies. The valid strategies chosen by Stage III students are more often build-up strate-

gies. Figures 13-(i) and 13-(ii) indicate that for Missing Value and Similarity tasks cross-multiplication

was a strategy chosen by high-ranked students, whereas Figure 14-(i) shows that for Comparison tasks

cross-multiplication was much less frequently chosen, and it was chosen by lower-ranked students. This

pattern agrees with the expected observed performance outlined in the qualitative cognitive model, as well

as highlighting the procedural nature of cross-multiplication.

28

Task Type Stage III Stage IV Stage V

Missing Value Valid 59 292 117

Invalid 59 81 5

% Valid 50% 78% 96%

Similarity Valid 1 43 29

Invalid 49 107 21

% Valid 2% 29% 58%

Comparison Valid 3 53 20

Invalid 24 58 13

%Valid 11% 48% 61%

Equivalence Valid 15 70 50

Invalid 51 118 14

%Valid 23% 37% 78%

Invariance Valid 4 51 33

Invalid 46 110 18

% Valid 8% 32% 65%

Relative/Absolute Valid 8 32 21

Invalid 23 50 8

%Valid 26% 39% 72%

Covariance Valid 2 22 11

Invalid 14 16 3

% Valid 13% 58% 79%

Table 7: Valid vs. Invalid responses

29

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Multiplicative XMult. Additive

(i) Missing Value Valid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0


(ii) Similarity Valid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

(iii) Missing Value Invalid Strategies

Inc. Imp. Misconcep. M. Mult. N/C

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Inc. Imp. Misconcep. M. Mult. N/C

(iv) Similarity Invalid Strategies

Figure 13: Valid vs. Invalid Strategies: Missing Value and Similarity Tasks

30

Turning to invalid strategies, students in lower stages have a higher percentage of choosing non-classifiable

strategies. The cognitive model also mentions that students frequently employ incorrect addition strategies

on scaling Similarity tasks. This is reflected in the large percentage of Misconception strategies employed

by students from all three stages (Figures 13-(iii) and 13-(iv)). Consistent with the difficulty parameters and

the cognitive model, students in Stage III found scaling tasks exceedingly difficult, employing the largest

percentages of non-classifiable and meaningless multiplicative strategies.

Also among invalid strategies, the cognitive model predicts that students in lower stages are more likely

to respond with a qualitative strategy on Comparison tasks (Figure 14-(iii)). This is reflected in the per-

centages. Given that an invalid strategy was chosen, for students in Stage III it is most likely that the invalid

strategy is qualitative. Finally, while all of the strategies listed are invalid, as developmental stage increases

students are more likely to choose an invalid strategy that is at least closer to valid (Incorrect implementation,

Misconception) than a non-recognizeable strategy.

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0


(i) Comparison Valid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Multiplicative Additive

(ii) Equivalence Valid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Inc. Imp. Qualitative N/C

(iii) Comparison Invalid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Misconcep. Noncomp Elim/Guess N/C

(iv) Equivalence Invalid Strategies

Figure 14: Valid vs. Invalid Strategies: Comparison and Equivalence Tasks

Invariance, Relative/Absolute and Covariance tasks (Figure 15) had only one valid strategy to employ,

thus the only conditioning that goes beyond the Rasch model is for invalid strategies. These invalid strategies

31

are all shown in Figure 15. For Invariance tasks, both the non-ratio and non-classifiable patterns behave

as is expected, with students in higher stages choosing the more sophisticated invalid strategies more of-

ten, given that an invalid strategy has been chosen. However, the Surface strategy, notable as a strategy

chosen by lower-stage students, is also prominent in Stage V students as well. The shading tasks in partic-

ular elicit a large amount of surface strategies. The same pattern is notable for the Relative strategy in

Relative/Absolute tasks. With these tasks, even higher level students sometimes resort to less sophisticated

strategies.

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

NonRatio Surface N/C

(i) Invariance Invalid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Err. Absolute Relative N/C

(ii) Relative/Absolute Invalid Strategies

St.III

St.IV

St.V

0.0 0.2 0.4 0.6 0.8 1.0

Ratio Only Total(RB) Overall N/C

(iii) Covariance Invalid Strategies

Figure 15: Invalid Strategies: Invariance, Relative/Absolute and Covariance Tasks

For the Covariance task the standard errors were quite high. There was only one Covariance task asked

and hence few responses. The most notable feature of the breakdown in Figure 15 is the large percentage

of Stage III students who maintained the ratio relationship. However, ignoring the total numbers of chips

appears to make it easier to maintain a ratio relationship. Accounting for the totals made it more difficult to

maintain the ratio relationship.

32

3.3 Discussion

The Rasch model employs a very simple binary coding of the data and models the latent variable as a single

continuous trait. Interpreting the latent scale as a simple ability parameter does not capture the fact that

students with low proficiency choose different kinds of valid strategies than students with high proficiency.

Whether in a polytomous data coding or specific item design, it is critical to incorporate this information

into the assessment.

Some patterns in strategy choice conditional on valid vs. invalid appear consistent with the cognitive

model. When employing valid strategies, students with low proficiency tend to choose additive strategies

more often than students with higher proficiency. When choosing invalid strategies, students with low profi-

ciency tend to choose non-classifiable strategies more often than students with higher proficiency. Difficulty

parameters also reflect descriptions in the cognitive model; specifically, scaling tasks, which are noted as

difficult for all but stage V students, were among the most difficult for students in the pilot study as well.

On the other hand, some observed patterns are not consistent with the cognitive model. Whether these

inconsistencies are due to a mis-specification of the cognitive model or to a flawed task design is unclear.

For example, although the cognitive model specifies surface features as typical of Stage II and Stage III

students, surface strategies were elicited from Stage IV and V students on shading tasks. It could be that

the specific format of shading elicits surface strategies from more sophisticated students (indicating that the

cognitive model need to be refined). However, these items are also the first asked on the exams, and may

simply be worded too broadly for students to follow. On Relative/Absolute tasks, the invalid Relative

strategy is prominent in higher ranked students. This could be because higher ranked students have a natural

tendency to view situations as relative, or it could be a priming effect: the Relative/Absolute tasks were

asked after a string of Missing Value tasks, each of which was solved via a ratio.

Based on simple milestones from the cognitive model, such as the reliance on Build-Up vs. Multiplica-

tive strategies and the ability to solve scaling tasks, it may be possible to design an exam for which valid

vs. invalid strategy choice stratifies students quite well into the three developmental stages. Evidence for

the validity of the cognitive model can be seen by a natural unforced set of patterns that also follow what is

outlined by experts. But in using this kind of Rasch analysis, any connection to the underlying skill patterns

that govern strategy choice must be known beforehand and incorporated into the existing cognitive model.

The latent scale does not provide enough information about underlying skill patterns. For instance, despite

the fact that missing value and Comparison tasks are described as comprehensive, incorporating all aspects

of proportional reasoning, these tasks seemed to be easier on the valid/invalid scale than those tasks designed

33

to highlight only one specific aspect of proportional reasoning, such as the Invariance and relative/absoulte

tasks. Does this indicate that when solving Missing Value tasks, students rely on concepts and skills other

than those defined or measured by the single-aspect tasks? Do Missing Value, Similarity and Comparison

tasks rely compensatorily rather than conjunctively on the single facets of proportional reasoning? The

Rasch analysis sheds no light on the question. In order to obtain specific information on patterns of skills

and their link to patterns of strategies, polytomous coding needs to be addressed in the model, and the latent

structure needs to be expanded to include sets of discrete skills.

4 Polytomous Coding with Many Latent Skills: Bayes Net

4.1 Method

The Rasch analysis is based on a unidimensional latent variable which can be seen as a single overall

propensity to select valid strategies. The patterns of strategies based on this latent scale seem to follow the

cognitive model, but restructuring the latent space as a set of discrete skills can help pinpoint which aspects

of proportional reasoning a student has mastered. The underlying skill patterns, coupled with specific item

features, drive the students strategy choice.

The qualitative cognitive model for proportional reasoning (Figure 2) does not address the underlying

skills in each of the developmental stages. But the aspects of proportional reasoning suggested by Baxter

and Witkowski (2002) can be adapted to define a working set of skills:

Covariance: Student understands that the change in one quantity depends on the change in another

Relative/Absolute: Student recognizes the fact that the change in quantities is a relative, multiplicative

change as opposed to an absolute change

Invariance: Student recognizes the presence of a constant ratio relationship

Multiplicative Modeling: Student can implement a valid multiplicative strategy (as opposed to a valid

additive strategy such as Build-Up)

Adaptability: Strategy choice is generalizable and efficient for the given numbers in a problem

These skills are related more to a psychological conceptualization than they are to mathematical imple-

mentation, with the possible exception of Invariance and Multiplicative Modeling, and it is not immediately

clear how to connect strategies with each of these skills. To study the relationships between skills and

34

strategies, it is necessary to use a polytomous strategy coding as opposed to a simple binary coding of

valid/invalid. For task type j with strategies k = 0...k j and r = 1...r j repetitions, we define

Xi jr = k : Student i chooses strategy k on the rth task of type j,

A model based on four skills Covariance, Invariance, Relative/Absolute and Multiplicative Modeling

is displayed schematically in Figure 16. Although the adaptability skill defined above certainly plays

a role in strategy choice for tasks measuring general ability in proportional reasoning, our data is not coded

to distinguish an adaptable strategy from general multiplicative strategies. Measuring adaptability requires

that some kind of optimal multiplicative strategy can be identified for each item.

i1

i2

i3

i4

TASKS

Invariance

Relative/Absolute

Covariance

STRATEGIES

Missing Value:

Similarity:

Comparison:

Equivalence:

Invariance:

Relative/Absolute:

Covariance:

SKILLS (binary indicators)

additivelogisticmodels

0: Mult.1: XMult.2: Additive

4: Misconcep.

6: N/C

0: Mult.1: XMult.2: Additive

4: Misconcep.

6: N/C

0: Mult.1: XMult2: Additive

4: Qualitative5: N/C

0: Mult.1: Additive 3: NonComp.

4: Elim/Guess5: N/C

0: Ratio 1: NonRatio2: Surface3: N/C

0: Absolute

3: N/C

0: Pres. Both 1: Ratio Only

3: Overall 4: N/C

10 replications

4 replications

3 replications

5 replications

5 replications

2 replications

1 replication

3: Inc. Imp.

3: Inc. Imp

3: Inc. Imp.

2: Misconcep.

1: Err. Absolute2: Relative

2: Total (RB)

5: M. Mult.

5: M. Mult.

Multiplicative Modeling

Valid Invalid

Figure 16: A model linking skills, task types and strategies for the Pilot Data.

To describe the model mathematically we define latent skill indicators,

35

is =

1 : Student i possesses skill s

0 : else,

and a design matrix relating skills to tasks,

Q js =

1 : Task type j highlights skill s

0 : else.

The model we choose to fit is a kind of Bayes Net (Mislevy, 1995), where possession of a certain skill

relevant to a particular strategy increases the probability of choosing that strategy over other strategies. The

skills act compensatorily; not all skills need be present for the implementation of a certain strategy. Observed

strategy choice is linked in our model to unobserved skills via an additive logistic regression model:

log

P(Xi jr = k | i, j.)

P(Xi jr = 0 | i, j.)

= jk +

S

s=1

[

Q jssjkis]

with parameters:

jk : Log odds of selecting strategy k for task type j.

sjk : Change in log odds of choosing strategy k when is goes from 0 to 1.

In this model, the indicators Q js are fixed in advance, based on predictions from the cognitive model

about which skills are relavent to strategies for each task type. The parameters jk and sjk , and the latent

skill indicators is are estimated in fitting the model. Also, the model assumes a form of local indepen-

dence. Given a students position (i1, ..., i4) in the latent skill space, that students replies are viewed as

independent.

For identifiability, j0 and sj0 are set to 0 for all task types j and skills s. For the pilot study, the baseline

0 category represents a multiplicative strategy for Missing Value, Similarity, Comparison, and Equivalence

tasks, and it represents the sole valid strategy for the remaining three task types.

Though the skills are given names and descriptions adapted from Baxter and Witkowski (2002), the defi-

nition of what the skill measures within the model is determined by the links set in the Q-matrix, the structure

of the task types especially those task types which highlight single skills and inequality constraints set

on the model.

The matrix Q indicates which skills are highlighted in the structure and strategy coding of each task

type. The matrix represents a compensatory relationship between latent class and observed response, as

36

with the linear logistic test model (Fischer, 1973), as opposed to a conjunctive relationship, as with certain

other models (Junker and Sijtsma, 2001; DiBello, Stout and Roussos, 1995). Since different strategies rely

on diffent sets of skills, an entry of 1 in Q js indicates only that the skill s plays a role in determining the

probability structure of the strategies in task type j. The model is essentially a polytomous version of Maris

(1995) compensatory multiple classification latent class model.

Missing Value

Similarity

Comparison

Equivalence

Invariance

Rel/Abs

Covariance

Question Type

Q

Covariance Relative/Absolute Invariance Mult. Model

1 1 1 1

11

0 0

0 0 1 0

0 1 0 0

1 0 0 0

1 1

1 1

11 1 1

Skill

Figure 17: The (compensatory) Q-matrix.

The Q-matrix for the pilot data is shown in Figure 17. For tasks designed to measure overall ability

in proportional reasoning, all skills are highlighted. For those task types designed to pinpoint specific

areas of knowledge, Q is much more sparse. The Q-matrix highlights two skills for Equivalence tasks:

Invariance and Multiplicative Modeling. These skills were chosen because the Invariance skill stresses

the recognition of a ratio relationship and the Multiplicative Modeling skill stresses the employment of a

multiplicative strategy. Covariance and Relative/Absolute skills were seen as more psychological traits than

mathematical, needed for translating a real-world situation into mathematical terms. As the Equivalence

tasks were context-free math tasks with equations already set up, it was theorized that these tasks would

highlight the more mathematical skills.

Further constraints must be made to induce identifiability in the set of latent skills. This is done with

37

inequality or monotonicity constraints, which are generally weaker than imposing linear constraints among

the parameters. Inequality constraints also help with the interpretation of skills by imposing prior relation-

ships between skills and strategies, highlighting which strategies are more likely given certain sets of skills.

For each task type j, an inequality constraint can take the form:

P(Xi jr = k1 | is = l1) < P(Xi jr = k2 | is = l2).

This allows us to encode conditions such as:

Strategy k1 is less likely than strategy k2 when skill s is low.

Strategy k is less likely when skill s is low than when skill s is high.

The constraints for the pilot data, shown in Table 8, implement only the first kind of conditions, given a

low skill level. The inequalities are designed to loosely constrain the model for identifiability but to provide

little information on specific skills definitions. As such, the interpretation of skills in this model will be

highly dependent on the Q-matrix entries for the single-skill task types.

4.2 Results

Parameter estimation for the pilot data was obtained via MCMC. A set of four skills yields 24 latent classes.

Since the cognitive model is not specific about the development or likelihood of skill patterns, a uniform

prior was set on the probability of each of the 24 latent classes. This translates to a uniform prior for each

parameter is. Normal(0, 2 = 3) priors were set for each of the unconstrained jk and sjk parameters,

subject to the inequality constraints in Table 8. Candidate distributions were tuned to accept approximately

50% to 70% of proposals during each iteration of the Markov Chain.

Chains of 10,000 iterations were run for parameter estimation, with the first 1000 iterations discarded

for burn-in. The jk and sjk parameters were estimated using posterior means. Student skill patterns were

estimated by examining the posterior distribution of the set of 16 different possible skill patterns for each

student.

Due to the large number of item parameters and relatively small sample size, obtaining convergent

chains was problematic. Chains for parameters associated with a larger number of responses (for example,

Multiplicative strategies on Missing Value tasks) were more stable than those associated with only a few re-

sponses (for example, Erroneous Absolute strategies for Relative/Absolute tasks). Posteriors were generally

unimodal, with a few incidences of possible label switching for parameters associated with fewer observa-

tions. The following analyses are subject to large standard errors, and we have limited discussion to the

38

Task Type Skill Level Constraint

Missing Value Covariance Low P(Misconception) < P(Meaningless Multiplicative)

Missing Value Relative/Absolute Low P(Incorrect Implementation) < P(Misconception)

Missing Value Invariance Low P(Multiplicative) < P(Additive)

Missing Value Mult. Modeling Low P(Multiplicative) < P(Additive)

Similarity Covariance Low P(Misconception) < P(Meaningless Multiplicative)

Similarity Relative/Absolute Low P(Incorrect Implementation) < P(Misconception)

Similarity Invariance Low P(Multiplicative) < P(Incorrect Implementation)

Similarity Mult. Modeling Low P(Multiplicative) < P(Incorrect Implementation)

Comparison Covariance Low P(Incorrect Implementation) < P(Qualitative)

Comparison Relative/Absolute Low P(Additive) < P(Incorrect Implementation)

Comparison Invariance Low P(Multiplicative) < P(Additive)

Comparison Mult. Modeling Low P(Multiplicative < P(Additive)

Equivalence Invariance Low P(Misconception) < P(Non-Computational)

Equivalence Mult. Modeling Low P(Multiplicative) < P(Additive)

Invariance Invariance Low P(Ratio) < P(Non-Ratio)

Relative/Absolute Relative/Absolute Low P(Absolute) < P(Relative)

Covariance Covariance Low P(Ratio and Totals) < P(Ratio Only)

Table 8: Constraints set on the Discrete Bayes Net model.

39

strategies that had larger numbers of observations in the data set. The methodology, however, is applicable

in any setting.

4.2.1 Classifying Students

Covariance Relative/Absolute Invariance Multiplicative Modal Probabilistic

Modeling Count Count

0 0 0 0 3 4.43

0 0 0 1 5 5.69

0 0 1 0 9 8.31

0 0 1 1 10 7.56

0 1 0 0 5 7.12

0 1 0 1 9 9.59

0 1 1 0 13 10.17

0 1 1 1 8 10.38

1 0 0 0 3 5.50

1 0 0 1 6 5.61

1 0 1 0 8 7.14

1 0 1 1 8 6.82

1 1 0 0 6 6.19

1 1 0 1 5 9.77

1 1 1 0 10 8.60

1 1 1 1 17 12.12

Table 9: Number of students (out of 125) classified into each group of skill proficiencies. The modal count

is obtained by assigning each student to the most likely posterior skill pattern, while the probabilistic count

is obtained by summing over the posterior probability distribution for each student.

Four latent skills yield sixteen different skill patterns. These patterns, as well as the number of students

classified into each, are shown in Table 9. While the modal category can be used for explicit categorization

to a set of skill patterns, one can also characterise uncertainty in the classification by looking at the posterior

distribution of skill patterns for each student. To obtain a compact visual representation of this posterior

40

distribution, we modify a graphical display from computer engineering called a Karnaugh map or k-map

(cf. Wakerly, 2000).

The structure of a k-map is illustrated in Figure 18. For a set of four skills {A, B,C,D}, the k-map dis-

plays the skill space as a set of 16 cells in a grid. Adjacent non-diagonal cells differ from each other by only

one digit, with top and bottom cells logically adjacent, and left edge and right edge cells logically adjacent.

The topology is similar to an unrolled toroid, where rectangular regions of adjoining cells correspond to

different marginal logical statements about the skills.

00 01 11 10

11

10

01

00 0000

0100

1100 1111

0111

0011

0110

0010

1000 1001

1101

0101

0001

1011 1010

1110

B = 1

B =1, C = 1

C = 1

B = 0, C = 0, D = 0

D = 1

A = 1

A,B

C,D

Figure 18: The structure of a k-map for four skills {A,B,C,D}.

K-maps can be drawn for larger sets of skills, but as the dimension increases, the size and complexity

of the k-map increases. A five-skill k-map consists of two four-skill k-maps embedded in a one-skill k-map

(see Figure 19). A six-skill k-map consists of four four-skill k-maps embedded in a two-skill k-map. K-

map representations of more than six skills are rarely used in computer engineering, as the complexity of

41

the 1-digit differences in high-dimensional space makes visualization difficult. However, when dealing with

probability distributions centered on a single cell, higher dimensional k-maps may still be useful. If it is

known beforehand that certain skills are easily discriminable, the k-maps can be set up so that comparison

across graphs corresponds to 1-digit differences in those skills. In that case it may still be possible to

concentrate the bulk of the distribution in one region of the k-map.

C = 1

D = 1

A = 1

A,B

E = 0

11

10

01

00

000 010 110 100

10000 10010 10110 10100

11000 11010 11110 11100

01000 01010 01110 01100

00000 00010 00110 00100

C,D,E

C = 1

D = 1

B = 1

11

10

01

00

E = 1

10001 10011 10111 10101

11001 11011 11111 11101

01001 01011 01111 01101

00001 00011 00111 00101

A=0,B=0,C=0,D=0C,D,E

001 011 111 101

Figure 19: A k-map for five skills {A,B,C,D,E}.

Four probabilistic k-maps are illustrated in Figure 20. In the true state space each students place is

represented by a single 1 in one of the 16 grid spaces. The K-maps presented here are probabilistic

because the sum over the grid is equal to 1, and the map represents an estimated posterior distribution.

Each region is shaded relative to the probability of the skill pattern, with darker shades indicating higher

probability. To make patterns more visible, the map is normalized relative to the mode of the distribution,

which is listed at the top of the graph. Thus, shading across students is not comparable.

In Figure 20, student 238 has a very strong peak (probability 0.57) at the k-map cell corresponding to

possession of all four skills, with the density falling away quickly at adjoining squares. Student 245, how-

ever, has the bulk of the distribution spread across two adjoining squares. The 1-digit difference between

these squares corresponds to the Relative/Absolute skill. This indicates that while there is strong certainty

that the student possesses Covariance, Invariance, and Multiplicative Modeling skills, there is still uncer-

tainty regarding the students possession of relative vs. absolute reasoning. For student 314, the distribution

42

Invariance,Multiplicative Modeling

Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

Student 238 : mode = 0.57


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10



Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10



Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10


Figure 20: The distribution of skill patterns for four different students using a probabilistic k-map.

43

indicates high for Invariance and low for Multiplicative Modeling, but uncertainty in both the Covariance

and Relative/Absolute skill. Student 220 has the most diffuse distribution of the four students shown, with

the mass centered on 0 for Invariance but with uncertainty in the remaining three skills.

4.2.2 Interpreting Skills through Examination of Logistic Regression Coefficients

In comparison to the Rasch model, the Bayes net provides flexibility for inference at the student level.

Looking at posterior distributions provides information on specific skill patterns, and diffuse patterns on

one skill axis can also indicate where the model may fail for particular students, or which skills may be

poorly elicited from the test. But while the Bayes net model highlights student-specific inference on skills,

interpretation of the skills is more challenging, because the meanings of skills are derived from constraints

and the compensatory link between skills and strategy choice outlined by the Q-matrix.

One way of assessing the links between skills and strategies is to examine task-specific probabilities of

skill possession given strategy response. Define a matrix with C = 24 rows and S = 4 columns, as defined

by the left hand side of Table 9. The cs entry represents the status of posession of skill s in the c-th latent

class. For task type j, response k, and a latent class c, we compute

Pc( j, k) =exp(S

s=1 Q jscssjk)

Cl=1 exp(

Ss=1 Q jsls

sjk)

(3)

If we assume a uniform prior on the C latent classes, then

Pc( j, k) = P(latent class c | response k to task type j).

When the prior on latent classes is not uniform, Pc( j, k) does not have a simple Bayesian interpretation.

However it can still be used to assess the information about skills that is provided by strategies for each

task type. Estimates of Pc( j, k) can be obtained by substituting the posterior mean estimates for the sjk

parameters into equation 3. Across all skill patterns c, Pc( j, k) is a probability distribution. This distribution

is considered task-specific because it uses only information from the computed sjk parameters, and does not

take into account any information on the marginal likelihood of the skill patterns (in the form of the latent

classes). The variability of the estimate can be obtained through direct approximation using the MCMC

output, or through the -method.

The probability distributions in equation 3 also have a useful graphical interpretation using k-maps (See

Figure 21). On the logit scale, the sjk term is associated with the two horizontal or vertical lines in the

k-map corresponding to the 1-digit differences for skill s. A k-map based on the probabilities in equation

44

3 is essentially a way of visualizing in one graph all of the jk parameters for task type j, response k. We

use these k-maps to interpret skills in terms of tasks and strategies by examining which tasks and strategies

particular skills load on, similar to interpreting the matrix of factor loadings in a factor analysis.

00 01 11 10

11

10

01

00 0000

0100

1100 1111

0111

0011

0110

0010

1000 1001

1101

0101

0001

1011 1010

1110

+jkA

+jkB

+jkC

= 1

= 1

C = 1

D = 1

A,B

C,D

jkA

jkC

jkB

Djk

+Djk

Baseline

Figure 21: A k-map representation of the conditional probability of skill patterns given task type j, response

k (logit scale) for four skills {A, B, C, D}. Starting from the baseline category, each horizontal and vertical

line on the k-map represents adding or subtracting a jk term in the exponent of the numerator.

Figures 22 through 28 show the estimated marginal probabilities of response for each category, as well

as k-map representations of the estimated task-specific probability of skill patterns given each response (

Pc( j, k), c = 1, ...16), for the seven task types. The marginal probabilities give an idea of the frequency of

the strategy across all students, while the k-map shows the kinds of skills which are present in students who

chose those strategies.

Constraining sjk = 0 with the Q-matrix is equivalent to looking at a coarsening of the grid in the k-map;

45

0 1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

0.5

Marginal Probability

Missing Value

Invariance, Multiplicative Modeling

Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

0. Multiplicative


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

1. CrossMultiplication


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

2. Additive


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

3. Incorrect Implementation


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

4. Misconceptions


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

5. Meaningless Multiplicative


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

6. NonClassifiable

Figure 22: Marginal and conditional probabilities: Missing Value

46

0 1 2 3 4 5 6

0.0

0.1

0.2

0.3


Similarity


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

0. Multiplicative


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10



Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

2. Additive


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10



Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

4. Misconceptions


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

5. Meaningless Multiplicative


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

6. NonClassifiable

Figure 23: Marginal and conditional probabilities: Similarity

47

certain one-digit differences will yield no change in the probability. But unconstrained skills which do not

impact a response pattern will also coarsen the k-map in the same way. For example, the Q-matrix for

Missing Value, Similarity and Comparison tasks highlights all skills and does not fix any sjk parameters at

0. The multiplicative response for Missing Value tasks (Figure 22) appears to highlight specific inference

in all four skill areas; the darkest cell and its four darkest neighbors correspond to at least three of four

skills. However the multiplicative response for Similarity tasks (Figure 23) highlights a coarser region

corresponding to possesion of Invariance and Relative/Absolute skills, but yielding more uncertainty about

Covariance and Multiplicative Modeling skills.

0 1 2 3 4 5

0.0

0.1

0.2

0.3

0.4


Comparison


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

0. Multiplicative


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10


Invariance, Multiplicative ModelingC

ova

ria

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

2. Additive


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10



Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

4. Qualitative


Co

varia

nce

, R

ela

tive

/Ab

solu

te

00

01

11

10

00 01 11 10

5. NonClassifiable

Figure 24: Marginal and conditional probabilities: Comparison

For both Missing Value (Figure 22) and Comparison (

model speciﬁcation for cognitive assessment of ... · model speciﬁcation for cognitive...

Documents