model specification for cognitive assessment of ... · model specification for cognitive...
TRANSCRIPT
-
Model Specification for Cognitive Assessment
of Proportional Reasoning
Rhiannon Weaver
Department of Statistics
Carnegie Mellon University
Brian Junker
Department of Statistics
Carnegie Mellon University
January 20, 2004
Abstract
In modern psychometric analysis of cognitive assessment, there is a choice between psychometric vs. cognitive
science paradigms for modeling the latent scale. The first involves as few as one continuous latent ability parameter,
while the second focuses on a set of binary latent skills. When the expert cognitive model is qualitatively specified
(eg. paragraphs describing general trends in observable behavior for a set of developmental stages), interpretation
of responses and latent variables is flexible, and may even be ambiguous. Models incorporating aspects from both
psychometric and cognitive science paradigms can help in exploring response patterns, and refining both the exam
design and the cognitive model. Here we present two such analyses of a pilot study of proportional reasoning. The
first is a Rasch model using binary response coding with a continuous latent trait, which can approximate a set of
developmental stages through careful item design and milestones. The second is a Bayes net model using polytomous
response coding linked compensatorily to a set of latent skills, which allows for a factor-analytic approach to skill
interpretation. We explore each schemes usefulness in inferring a students current state of knowledge, directing
program planners, and directing assessment developers toward refining either the qualitative cognitive model or exam
items.
The work presented in this paper was supported in part by NSF grant # ESI-9876538 to the Learning Research and Development
Center, University of Pittsburgh, and in part by a graduate fellowship awarded under NSF VIGRE grant #DMS-9819950. The
authors would also like to thank Gail Baxter, Lou DiBello and Christine Witkowski of the Educational Testing Service in Princeton
NJ for their collaboration in the project.
1
-
Contents
1 Introduction 4
1.1 Latent Variable Models for Cognitive Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Proportional Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Analysis of an Assessment Test for Proportional Reasoning . . . . . . . . . . . . . . . . . . 7
2 Data 10
2.1 The Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Response Strategy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Dichotomous Coding with a Single Latent Trait: Rasch Model 19
3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Polytomous Coding with Many Latent Skills: Bayes Net 34
4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Classifying Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 Interpreting Skills through Examination of Logistic Regression Coefficients . . . . 44
4.3 Goodness of Fit for the Bayes Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Methodological Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Assessing The Influence of Individual Students . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Contradictory Skills Model Misspecification or Model Misfit? . . . . . . . . . . . 60
4.4.3 Prior Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Future Work 68
6 Suggestions for Future Design 70
7 Summary 72
8 References 73
2
-
9 Appendix 76
9.1 Exam Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.2 Conditional Percentage Estimates and Standard Errors: Rasch Model . . . . . . . . . . . . . 79
9.3 Regression Estimates and Standard Errors: Bayes Net . . . . . . . . . . . . . . . . . . . . . 80
9.4 Student K-maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3
-
1 Introduction
1.1 Latent Variable Models for Cognitive Diagnosis
In educational settings, diagnostic assessments serve as barometers of student performance. A cognitive
model is developed which outlines the progression of understanding that a student undergoes as he or
she learns a particular concept. From this cognitive model, the assessment test itself is designed. The
test consists of a set of items that yield observable results based on a students unobserved position in
the cognitive model. In contrast to a purely evaluative assessment geared toward measuring a students
mastery of a specific curriculum, diagnostic assessments are also used as teaching tools; they give instructors
feedback on how to adapt teaching styles to help students having trouble, and help program planners to adapt
curricula when evidence shows a general lack of understanding across all students.
Latent variable models are used extensively in cognitive diagnosis, with a range of modeling schemes.
Most schemes can be described through a two-way heirarchical structure (Junker, 1999), with the manifest
variables (ie, observed responses) and task features at the first level, and latent variables (ie, unobserved
student parameters) at the second level. Manifest variables can be coded as dichotomous or polytomous; the
choice of how to represent the observed responses depends on the item design and the detail, or granularity,
coded in the latent space. In modern psychometric analysis, there is a choice between as few as one continu-
ous generic ability parameter for the latent space vs. a larger-dimensional set of traditionally discrete skills.
This distinction has been referred to as a psychometric vs. cognitive science representation of the students
knowledge (Hunt, 1995).
Junker (1999) surveys a number of models for statistical analysis under these frameworks. He notes
that the main task in model construction is not one of choosing a continuous vs. discrete latent scale, but
in choosing the granularity of the model. This choice is dictated by the specific research situation and
questions of interest. The models Junker surveys can be used as a general guideline or set of building blocks
to accomodate a given cognitive model for a specific situation.
Item Response Theory (IRT) models based on a single latent trait are simpler than multidimensional
models for estimation and inference, but often it is difficult to interpret the ability parameter when directing
instructors toward a students particular weaknesses. However, when the cognitive model indicates a single
progression of understanding, the latent trait can be used to classify students according to a more detailed
set of developmental stages by relying on careful item design and milestones. This is similar to the approach
of Masters and Forster (1999) and Draney, Pirolli and Wilson (1995). In this scheme, the latent trait can
capture a richer underlying structure based on skills without specifically adding the complexity of skills into
4
-
the model.
Skills-based models often focus on breaking a task into a set of known sub-tasks and considering each
sub-task as a skill. Skills can then be deterministically linked to items in a conjunctive way, with possession
of all skills required for a correct response. But it can sometimes be difficult to see how the possession of a
particular skill is related to the response. This is especially true when the responses themselves are strategies
a student employs in constructing their answer to a word problem. In this case, it may make sense to first
explore the relationship between strategies and underlying skills through a broader, compensatory model.
Compensatory models, so named because posession of one latent skill can compensate for the lack of
another, have been explored both for multidimensional latent trait variables (eg, Wilson, Wood and Gibbons,
1983; Muraki and Carlson, 1995) and for discrete latent skill variables (eg, Maris, 1995). Minimally con-
strained models of this type allow for an exploratory factor-analytic interpretation of the latent scale, and
highly constrained models allow for confirmatory factor analysis. This kind of analysis can help in defining
and interpreting a discrete skill space for a broadly specified cognitive model.
Both IRT and compensatory discrete skills models can be useful in situations which do not immediately
fall into either of the psychometric or cognitive science extremes. Such is the case for proportional reasoning,
in which facility is measured through both strategy choice and implementation of strategy choice. Mastering
proportional reasoning involves not only mathematical competency, but also a psychological familiarity
that can be hard to quantify when deconstructing tasks into component steps. This makes it difficult to
model proportional reasoning with a skills-based cognitive science approach. On the other hand, addressing
the psychological facets of proportional reasoning with a psychometric model gives little insight into why
students choose the strategies they do.
1.2 Proportional Reasoning
The term proportional reasoning is used to describe any kind of reasoning that focuses on the relation-
ship between two ratios. Proportional reasoning is involved in everyday mathematical situations such as
calculating the correct dosage for medicine, converting a measurement from one unit to another (pounds
to kilograms, Celsius to Farenheit), or calculating estimated arrival times for travel. Generic proportional
reasoning tasks require the student to reason with the equation
ab=
cd
(1)
Though proportional reasoning is taught in a mathematics setting and can often be reduced to a simple
linear model y = kx, it is actually a psychological construct: The essential characteristic of proportional
5
-
reasoning is that it focuses on describing, predicting or evaluating the relationship between two relationships
(i.e., a second-order relationship) rather than simply a relationship between two concrete objects (or two
directly perceivable quantities). (Piaget & Imhelder, 1975, via Lesh, Post and Behr, 1992). Students first
learning proportional reasoning rarely approach the subject from a purely mathematical standpoint. It is an
important concept in middle school mathematics because it is often a students first exposure to explicitly
modeling these kinds of second-order relationships. To master proportional reasoning, a student must be
able to (Baxter and Witkowski, January 2002)
Conceive of a multiplicative relationship and possess a notion of change in a relative sense;
Recognize that when two quantities are changing, the change in one depends on the change in the other
(covariance);
Recognize that while some aspects of the situation change, the ratio relationship remains constant
(invariance); and
Employ an appropriate multiplicative strategy to solve problems;
Solving proportional reasoning tasks often reduces to solving the relationship in equation 1 for a missing
value, as in Figure 1. Such tasks are called Missing Value tasks. However, student strategy in achieving the
correct answer varies. Vergnaud (1983) has developed extensive models for student strategies in proportional
reasoning tasks, and a version of his model for strategies in Missing Value tasks appears in Figure 1.
Typically, the student conceives of two measure spaces, where each space represents one of the units
in the problem (candy, money, inches, etc). The student can then compare differing values within a single
measure space or similar values between measure spaces. Comparisons within measure spaces involve
scale factors, and lead to so-called scalar strategies, whereas comparisons between measure spaces involve
divining the functional relationship between two quantities, and lead to so-called functional strategies. After
making a comparison, students may arrive at the correct answer through an additive method (build up or
count out), or through multiplication or division. A common error is to replace the multiplicative ratio
model with an additive model, substituting subtraction for division. Cross-multiplication is seen as a separate
strategy from other multiplicative strategies because it is procedural in nature; students are taught to set the
problem up as ad = bc and to solve, however it is not clear if students employing this strategy understand
the second-order relationship entailed in proportional reasoning.
Baxter and Junker (November, 2001) proposed a broad, developmental stages model for proportional
reasoning in middle school students (see Figure 2). This qualitative developmental cognitive model is based
on an extensive literature review and on interviews with subject matter experts. The model is qualitative in
6
-
Example: The pizza shop advertises that 3 pizzas will serve about 10 people. How much pizza should I buy
if there will be 50 people at my party?
-Measure 1 Measure 2
Value 1
Value 2
a b
c (d)??
6
PizzaPeople
310
?50
Reasoning Strategy Calculation Solution
Multiplicative Functional ba = k d = ck
Multiplicative Scalar ca = l d = bl
Multiplicative Cross-Multiplication ad = bc d = bca
Additive Build Up ca = l d = b + ... + b (l times)
Erroneous Addition a b = c d d = b + (c a)
Other Other Squaring, Estimation, etc. Varied
Figure 1: A Vergnaud Model for Missing Value tasks in proportional reasoning.
that it does not explicitly list a quantifiable set of underlying skills that students learn, but instead proposes
stages of development and a general overview of observed student performance in each stage. The stages
are ordered in that they represent a progression of understanding, with higher stages indicating a higher level
of sophistication in proportional reasoning.
Sophistication in this sense embodies an ability first to set up a framework that outlines the correct
relationships between quantities, and then to generalize solution strategies and approach the arithmetic from
a context-free, abstract point of view. Much of this sophistication is evidenced not only by the students
ability to compute the right answer, but the strategy that the student employs in the computation. More
sophisticated strategies are those that indicate more facility with the concepts of proportional reasoning.
Thus the cognitive model outlines expected performance not only in terms of which kinds of tasks the student
will get right or wrong, but also in terms of strategies the students will employ when solving problems.
1.3 Analysis of an Assessment Test for Proportional Reasoning
This paper focuses on data collected from a diagnostic test developed from Baxter and Junkers proposed
cognitive model for proportional reasoning. Section 2 outlines the diagnostic test, study design and data
collection. The overall goal of the study is to describe how the diagnostic assessment test fares in describing
a students current state of knowledge. This goal is two-fold, in that a well-designed test should be able to:
7
-
II) Early Attempts at Quantifying
III) Recognition of Multiplicative Relationship
Students have the intuition that a ratio is two numbers that change
rely on additive strategies such as build up when multiplicative reas
distinguishable from situations involving relative change.
together but the change may be additive or multiplicative. They often
oning is required. Situations involving absolute change are not always
IV) Accomodating Covariance and Invariance
V) Functional and Scalar Relationships
OBSERVABLE PERFORMANCESTAGE
I) Qualitative Young students generally possess a good deal of knowledge aboutquantity that permits them to answer questions about more and less eg, which drink is sweeter?) or fairness (eg,divide pizza or cookies so everyone gets a fair share).
Early attempts at quantifying often involve constant additive differences (ie, a b = c d) rather than multiplicative relationships
Students recognize the invariant nature of the relationships between pairs of changing quantities. These students have a repetoire of generalizable strategies and they select the most efficient strategy fora given problem. Conceptions of covariance and invariance are welldeveloped.
Students begin to develop a multiplicative change model. They recog nize that while some quantities may be changing, relationships amongthe quantities remain invariant. They view a ratio as a single unit to which basic arithmetic operations may be applied. They can typicallydistinguish situations involving absolute change from those involving relative change. Strategy use is contextspecific and when the num
plicative situations. Concepts of covariance fail when students are asked to scale up a figure
bers are hard these students may resort to additive reasoning in multi
Figure 2: The proposed cognitive model for development of proportional reasoning (Baxter and Junker,
November 2001).
8
-
Reflect student progress toward targeted learning goals
Make clear gaps in students understanding
In terms of model development, these goals are achieved by specifying a cognitive model, translating the
cognitive model into a meaningful latent structure and designing items that pinpoint a students place on the
chosen latent scale. The latent structure and the response coding can either enhance or frustrate inference
relating to these goals. Sections 3 and 4 outline two different choices of latent structure and response coding:
a univariate continuous latent variable with a dichotomous response coding, and a set of binary latent skills
with polytomous response coding.
The simpler latent structure and coding scheme described in Section 3 can address the first goal, reflect-
ing student progress toward learning goals. Given a set of ordered developmental stages with associated
patterns of observable performance, it is plausible that test items can be designed such that increasing so-
phistication leads to increasing general ability to generate right vs. wrong answers. Classification then is a
simple matter of relative ranking. Almost all of the work for inference on the latent scale is done behind
the scenes the latent structure gains a meaningful interpretation of developmental stages due to careful
item designs specifically targeted at milestones in development. This scheme is very similar to the profi-
ciency/difficulty scale advocated by Masters and colleagues (Masters and Forster, 1999; Draney Pirolli and
Wilson, 1995; Masters and Evans, 1986). Though it can be informative relative to an existing set of develop-
mental stages, this scheme relies heavily on a single progression of learning, and limits inference on where
the underlying cognitive model may fail for particular student.
The analysis described in Section 4,using discrete latent skills and polytomous response coding, can
address the second goal of explaining why students choose particular strategy patterns. With this approach,
we can define developmental stages as clusters of likely skill patterns, but we are not limited to knowledge
a priori of such clusters. This approach addresses more flexibly the problem of defining gaps in student
understanding, and it allows for a more broad classification of students according to which skills they pos-
sess. However, with increased complexity comes increased difficulty in interpretation. Skills gain meaning
through both the task types that highlight them and the strategies they elicit. These relationships can be
difficult to define in the mathematical context of the model.
Sections 5, 6 and 7 conclude with a discussion of what our analyses tells us about the process of
developing cognitive diagnostic tests for broadly-defined developmental stages or skills, in terms of defining
and pursuing specific goals in inference, data requirements for pursuing those goals, and the use of pilot
studies to refine and redesign future tests.
9
-
2 Data
2.1 The Study Design
Data for this analysis comes from a pilot study conducted in a major urban school district in May 2001
(Baxter and Witkowski, 2002), to support the design of a cognitively diagnostic assessment in proportional
reasoning. Two forms of a test for proportional reasoning were given to 125 students, taken from three
different middle schools. Exam 1 had 14 items, while exam 2 had 15 items.
In general it is very difficult to deduce strategy choice solely from the students final answer alone.
Instead, for this analysis the strategies themselves were coded as the observed response. In order to achieve
this, students were interviewed as they took the diagnostic test and were recorded via audiotape. In most
cases, a written transcript of the interview was also provided. The students numerical answer and written
work were also recorded. From these transcripts, interviews and written answers, a general strategy was
coded by Baxter and Witkowski (2002) for each item.
Seven major kinds of tasks were asked on the forms. These are described in Figure 3. The first three task
typesMissing Value, Similarity, and Comparisonwere designed to test overall facility with proportional
reasoning. The last four task typesEquivalence, Invariance, Relative/Absolute, and Covariancewere
designed to pinpoint specific aspects of reasoning involved in a proportional reasoning task. Tables 1 and
2 show the breakdown of task types on the two exam forms. Tasks 1.10 and 2.13 were not scored. Students
did not perform as expected on these tasks, and after inspection it was decided that these tasks were not
strictly proportional reasoning tasks.
The qualitiative cognitive model describes observable performance not only in terms of correct vs. in-
correct answer, but also in terms of the specific strategies employed to solve each problem. Because of
this specificity, it was necessary to interview the students during the exam and to record transcripts of their
solution strategies. There is some flexibility in how to code the observed strategies so that the data set will
be rich enough to capture the cognitive model but sparse enough for inference with a relatively small sample
of students.
2.2 Response Strategy Coding
The ability to make inferences on the latent cognitive model depends not only on the specificity of the
item design, but also on how the responses are coded. Due to the small sample size, relatively coarse
codings for each task type were needed. These codings were chosen with the observed performance from
10
-
Students are given three of four values (a, b and c), two of the values are presented as a ratio (a/b) and the task is to deter mine the missing value d such that a/b = c/d.
A subtype of missing value questions involving scaling. Students are asked to scale the given quantities up or down, preserving similarity or some other quantity modelled as a ratio orproportion (eg, taste, shape).
Students are presented with four values (a,b,c and d) in the form of two ratios (a/b and c/d). The task is to determine if thetwo ratios are equivalent.
A subtype of comparison questions. Students are presentedwith ratio comparison or equivalent fraction problems in arithmetic symbols without a story context.
Students are presented with a ratio relationship and asked to judge if the ratio relationship is the same or different under a changed condition
Students are presented with a scenario in which they are required to make a judgement about a given relationship to solvefor an unknown. Students must decide if the relationship tosolve is relative (ie, using constant ratios a/b = c/d) or absolute(ie, using constant differences a b = c d).
Students are asked to rearrange collections of colored chips orother objects while preserving a ratio relationship of colors to numbers. This was done using figures on paper, and thus it waspossible for students to come up with a rearrangement that used a different number of chips than was originally present.
Missing Value
DESCRIPTIONQUESTION TYPE
Similarity
Comparison
Equivalence
Invariance
Relative/Absolute
Covariance
Figure 3: Task types and descriptions from the pilot study.
11
-
Item Part Type Note
1.1 a,b Invariance Shading task
1.1 c Covariance
1.2 - Missing Value
1.3 - Missing Value
1.4 - Missing Value
1.5 a,b Relative/Absolute
1.6 - Relative/Absolute
1.7 - Similarity Scaling task
1.8 - Missing Value
1.9 - Similarity
1.10 - Invariance Not scored
1.11 b Invariance
1.12 - Equivalence
1.13 - Equivalence
1.14 - Equivalence
Table 1: Items on Exam 1.
12
-
Item Part Type Note
2.1 a,b Invariance Shading task
2.2 a Missing Value
2.2 b Missing Value
2.3 - Comparison
2.4 - Missing Value
2.5 - Comparison
2.6 - Invariance
2.7 a Similarity Scaling task
2.8 - Missing Value
2.9 - Similarity
2.10 - Invariance
2.11 a Comparison
2.11 b Similarity
2.12 - Missing Value
2.13 - Context-free MV Not scored
2.14 - Equivalence
2.15 - Equivalence
Table 2: Items on Exam 2.
13
-
the cognitive model in mind, as well as the specific aspects of proportional reasoning the the task type
is probing. For example, the cognitive model specifies that students with lower levels of sophistication
often make qualitative comparisons, so the Comparison coding incorporates a qualitative strategy. The
cognitive model also specifies that, when asked to describe the invariant features of a ratio relationship,
students with lower sophistication will rely on surface features as opposed to the ratio. Thus the Invariance
coding incorporates a surface strategy. And although a student may use count up or subtraction methods
on Relative/Absolute tasks, the coding is simply a reflection of the students recognition of an absolute task
context, as that is the specific aspect of proportional reasoning that Relative/Absolute tasks were designed
to measure.
Figure 4 displays three different strategy codings for Missing Value tasks (similar codings for all task
types are shown in Figures 5 through 10). The initial coding for the data is very rich but is difficult to use
with such a small data set. The Modified Vergnaud coding condenses the strategies into seven general
categories, but retains more detail in the erroneous strategies than the Vergnaud model described in Figure
1. It also condenses Functional and Scalar strategies into one single multiplicative category. This was done
because there were very few ( < 10) instances in the pilot study where students used functional strategies,
but there were a greater number of observed erroneous strategies. Merging functional and scalar strategies
into one simple multiplicative strategy may make it harder to distinguish students in Stage IV vs. Stage V,
whereas breaking erroneous strategies into four separate strata may help in distinguishing between Stage III
and Stage IV students.
Finally, the Valid/Invalid coding is a very coarse meausurement of valid vs. invalid strategies. In this
case, a valid strategy is one that will lead to the correct answer if implemented without simple arithmetic
errors. An invalid strategy is one that is not recognizeable or that will not lead to a correct answer if
implemented without simple arithmetic errors. Prelminary data analysis indicated that correctness of answer
was independent of task given strategy choice. The features of each task influenced the students strategy
choice, and the correctness or incorrectness of the answer was a result of the implementation of the chosen
strategy. Thus, valid vs. invalid strategy choice is closely related to right vs. wrong answer, but is more
stringent toward guessing (invalid strategy which may lead to the right answer).
Within each level (valid or invalid), the strategies still have a measure of order. For instance, though all
are valid strategies, additive strategies are the least sophisticated, followed by cross-multiplication, followed
by multiplicative strategies. In invalid strategies, a non-recognizeable strategy (N/C) is not as sophisticated
as a strategy with a misconception of the covarying relationship, which in turn is not as sophisticated as an
14
-
Functional OperatorScalar OperatorScalar DecompositionFactor of ChangeUnit ValueUnit Value*CrossMultiplication*Build UpBuild UpBuild Up *Count Out
Incorrect Addition
ProductQuotientOtherIncomplete
InverseEstimationErr. Repeated AddnErr. Build UpSquaring
Additive/Mult.PartialMultiplicative
Valid
Invalid
Initial Coding Modified Vergnaud Valid/Invalid
N/C
Misconceptions
CrossMultiplication
Additive
Multiplicative
Incorrect Implementation
Meaningless Multiplicative
Figure 4: Three different coding schemes for Missing Value tasks.
15
-
incorrect implementation strategy (ie, a faulty implementation of an otherwise valid strategy).
The final codings and descriptions for each task type are defined in Figures 5 through 10. Tables 3 and
4 show the breakdown of responses from the pilot study for each item. The specific questions asked on each
item are shown in the Appendix.
3 Incorrect Implementation(Inc. Imp.)
2 Additive
1 CrossMultiplication(XMult.)
6 NonClassifiable(N/C)
5 Meaningless Multiplicative(M. Mult.)
4 Misconceptions(Misconcep.)
DescriptionStudent uses some form of Functional or
Student uses BuildUp or Count Out
Student displays some knowledge of thecovarying relationship but makes a
Student substitutes the multiplicativecovariance relationship with addition,squaring, or a combination of addition
Scalar strategy.Student uses CrossMultiplication.
strategy.
mathematical or conceptual error.
and multiplication.Student chooses two numbers and
Student uses a nonclassifiable or nonrecognizeable strategy.
multiplies or divides them.
# Name (Abbreviation) Valid/Invalid0 Multiplicative
(Mult.)
Valid
Invalid
Missing Value / Similarity: Students are given three of four values (a,b,and c), two ofthe values presented as a ratio (a/b) and are asked to solve for the missing value. Similarity problems are a subtype that specifically ask the students to scale ratios in order to preserve similarity or other quantities (eg, taste, shape).
Figure 5: Final coding used for Missing Value and Similarity tasks.
16
-
# Name (Abbreviation) Valid/Invalid0 Multiplicative
(Mult.)
Valid
3 Incorrect Implementation(Inc. Imp.)
2 Additive
1 CrossMultiplication(XMult.)
DescriptionStudent uses some form of Functional or
Student uses BuildUp or Count Out
Student displays some knowledge of thecovarying relationship but makes a
Scalar strategy.Student uses CrossMultiplication.
strategy.
mathematical or conceptual error.4 Qualitative Student uses a comparison with the
words "more" or "less", either with addition or noncomputationally.
5 NonClassifiable(N/C.)
Student uses a nonclassifiable or non recognizeable strategy.
Invalid
Comparison: Students are presented with two ratios (a/b and c/d) and asked to determineif the ratios are equivalent.
Figure 6: Final coding used for Comparison tasks.
# Name (Abbreviation) Valid/Invalid0 Multiplicative
(Mult.)1
DescriptionStudent uses some form of Functional orScalar strategy.
5 NonClassifiable(N/C.)
Student uses a nonclassifiable or non recognizeable strategy.
2
Valid
Invalid
Additive Student uses BuildUp or Count Outstrategy.
Misconceptions(Misconcep.)
Student substitutes the multiplicative equality with addition, squaring, or a
3combination of addition and multiplication.
NonComputational(NonComp.)
Student matches surface patterns of thefractions or gives a noncomputationalreason (e.g. "it just looks right").
Elimination/Guess(Elim./Guess.)
Student chooses a multiple choice answer via elimination or guessing.
4
Equivalence: Students are presented with threeway comparison problems without astory context.
Figure 7: Final coding used for Equivalence tasks.
17
-
# Valid/Invalid0
1
Description
3
Ratio
NonRatio
Valid
Invalid
NonClassifiable(N/C.)
Name (Abbreviation)
Student uses a nonclassifiable or non recognizeable strategy.
Student recognizes an invariant ratiorelationship.Student mentions a mathematical invar
not a ratio relationship. Surface2 Student focuses on a surface feature of
the task (e.g. color, shape).
iance (either correct or incorrect) that is
Invariance: Students are presented with a ratio relationship and asked to judge if therelationship changes under a changed condition
Figure 8: Final coding used for Invariance tasks.
# Valid/Invalid0
1
Description
3
Valid
Invalid
NonClassifiable(N/C.)
Name (Abbreviation)
Student uses a nonclassifiable or non recognizeable strategy.
2
Absolute
Erroneous Absolute
Relative
(Err. Absolute)
Student employs a form of absolute reas
Student employs a form of absolute reasoning but makes a conceptual error whenimplementing the absolute strategy.
oning (a b = c d).
Student employs a form of relative reasoning (a / b = c / d).
Relative/Absolute: Students are presented with common missing value settings but mustrecognize that the relationship is absolute (a b = c d) rather than relative (a/b = c/d)
Figure 9: Final coding used for Relative/Absolute tasks.
18
-
NonClassifiable(N/C.)
Student uses a nonclassifiable or non recognizeable strategy.
4
# Valid/Invalid0
DescriptionName (Abbreviation)
2
Ratio and Totals (Pres. Both)
Student preserves the ratio relationship,the overall total number of chips and thesubtotals or red and blue chips.
1 Ratio Only Student preserves the ratio relationship,but not the total numbers of chips.
All Totals(Total (RB))
Valid
3 Overall Total(Overall)
Student preserves only the overall totalnumber of chips.
Invalid
Student preserves the total number of chips and the totals for red and blue, butnot the ratio relationship.
Covariance: Students are asked (on paper) to rearrange collections of colored chips whilepreserving a ratio relationship of colors to numbers.
Figure 10: Final coding used for Covariance tasks.
3 Dichotomous Coding with a Single Latent Trait: Rasch Model
3.1 Method
As a students developmental stage increases, the student should gain facility in choosing valid strategies
on tasks regardless of the type of proportional reasoning task or the item characteristics of non-proportional
reasoning such as arithmetic difficulty. Thus, ranking students based on their facility in choosing valid vs.
invalid strategies should also capture patterns of response outlined in the five stages of the cognitive model
from section 1.2. To explore this conjecture, we fitted the Rasch (1980) model to the response variables
Xi j =
1 : Student i chose a valid strategy on item j
0 : Student i chose an invalid strategy on item j
and examined changes in strategy choice with respect to increasing latent score. The Rasch model, common
in item response theory (IRT), defines latent variables i and difficulty parameters j, such that
P j(i) = P(Xi j = 1) =ei j
1 + ei j(2)
In this context, the variable i represents the propensity of student i to make a valid strategy choice (X i j =
1), whereas the variable j measures the difficulty of item j. The model also assumes local independence:
19
-
Item Strategies Total
Missing Value Mult. X-Mult Additive Inc. Imp. Misconcep. M. Mult. N/C
1.2 46 2 16 0 0 4 0 68
2.4 34 0 9 0 1 3 2 49
1.4 51 3 11 0 0 8 3 76
2.2a 33 0 21 0 0 1 2 57
2.2b 26 0 0 11 0 8 12 57
2.9 31 1 2 0 11 5 6 56
2.8 28 0 18 0 0 0 11 57
1.3 60 1 2 1 0 1 3 68
1.8 32 6 16 0 0 7 7 68
2.12 12 2 5 17 6 2 13 57
Similarity Mult. X-Mult Additive Inc. Imp. Misconcep. M. Mult. N/C
1.7 5 6 1 2 46 1 7 68
2.7a 5 1 3 3 25 0 20 57
2.11b 30 2 2 1 6 4 12 57
1.9 15 3 0 2 0 0 48 68
Comparison Mult. X-Mult Additive Inc. Imp. Qualitative N/C
2.3 31 1 4 8 5 8 57
2.5 14 0 4 14 12 13 57
2.11a 22 0 0 1 24 10 57
Table 3: Observed strategies by item from the pilot study: Tasks stressing overall facility with proportional
reasoning.
20
-
Item Strategies Total
Equivalence Mult. Additive Misconcep. Non-Comp Elim/Guess N/C
1.12 24 1 17 7 2 17 68
1.13 38 2 13 1 0 14 68
1.14 29 7 15 0 2 15 68
2.14 10 4 13 4 4 22 57
2.15 17 3 11 1 2 23 57
Invariance Ratio Non-Ratio Surface N/C
1.1ab 5 6 56 1 68
2.1ab 6 8 43 0 57
1.11b 16 0 0 16 32
2.6 27 0 0 21 48
2.10 34 0 0 23 57
Relative/Absolute Absolute Err. Absolute Relative N/C
1.5 22 3 43 0 68
1.6 39 15 15 5 74
Covariance Pres. Both Ratio Only Total(RB) Overall N/C
1.1c 35 12 9 3 9 68
Table 4: Observed strategies by item from the pilot study: Tasks highlighting singular aspects of proportional
reasoning.
21
-
For N students and J items, the Xi j are viewed as conditionally independent given i, j, yielding the
likelihood,
N
i=1
J
j=1
[
P j(i)]Xi j [
1 P j(i)]1Xi j
Parameter estimates were obtained via a Markov Chain Monte Carlo (MCMC) method, incorporating
a Metropolis-within-Gibbs sampling scheme. The j parameters were given Normal(0, 1) priors. The i
parameters were given Normal(0, ) priors, with estimated.
To check goodness of fit for the underlying Rasch model, response curves for each item were fit as a
function of . That is, for item j, we plotted
P j() =e j
1 + e j
for a range of values. Non-parametric curve estimates were achieved by calculating the average response
iN Xi j#{i N}
for a suitable interval N, and plotting this point at the average i value for the bin N.
Outfit statistics were also calculated for each item to test goodness of fit. From Johnson, Cohen and
Junker (1999), the outfit statistic for item j calculated from observed values x for N students is
T (x|, ) =N
i=1
(xi j Ei j)2
NWi j
where Ei j is the expected value of Xi j given i and j, and Wi j is the variance of Xi j given i and j. Posterior
predictive p-values of the statistic can be estimated using the MCMC output. Specifically, let x1...xM be
data simulated from the model with corresponding and parameters from steps 1...M of the chain. The
predictive p-value can be approximated as:
p j #{s : T j(x|s, s) < T j(x
s |s, s); s = 1...M}
M
If the value of p j is small (< 0.05, for example) for a particular item j, then there may be reason for concern
about the models fit to that particular item.
The developmental stages were approximated by grouping students according to their estimated i val-
ues. Based on the literature (from Baxter and Junker, November 2001), the majority of middle school
students should be in Stage IV, with smaller proportions of students in Stages III and V, and essentially no
22
-
students in Stages I or II. To approximate these, students with i values in the lowest 20% were classified
as Stage III, students in the middle 60% were classified as Stage IV, and students in the highest 20% were
classified as Stage V.
By monotonicity in the response functions in equation ( 2), we know that as a students stage increases,
that student has a higher propensity of picking valid strategies over invalid strategies. However, the Rasch
model does not detail what kind of valid strategies are chosen, or what kind of invalid strategies are chosen.
To explore this, we conditioned on valid vs. invalid strategy choice and studied changes in sophistication
for the sub-types of valid strategies and the sub-types of invalid strategies as approximated stage increases.
In order to get a measure of variability, conditional percentages were computed for approximated stages
based on ranked i values at each step of the Markov Chain. This method re-assigned strategy choices to the
three developmental stages in groups by student. The standard errors take into account variablity between
students but cannot account for the variability of answers from a single student.
3.2 Results
For this analysis, Markov Chains of 20,000 iterations were run, using candidate distributions tuned to yield
acceptance of between 50% and 70% of the . and . components at each step. The first 1000 iterations
from each chain were discarded for burn-in. As a diagnostic for assessing convergence, Gewekes (1992)
statistic comparing means of the beginning and end of the chain showed no serious divergences for the
parameters. The parameters were slow-mixing, and Gewekes diagnostic showed signs of non-stationarity
for all but three of these parameters. However, quantile-quantile plots (first half of remaining observations
vs. second half of remaining observations) for the problematic parameters did not indicate any serious
deviations, and density plots were generally symmetric and unimodal. Estimates of i for each student and
j for each item were obtained using posterior means.
Outfit predictive p-values for the 30 items in the pilot study are shown in Table 5. With the exception of
perhaps items 2.15 and 1.6, the outfit statistics do not indicate a lack of fit. Fitted response curves for these
two items are shown in Figure 11. The intervals N were chosen to incorporate 15 observations to each bin,
although the bin corresponding to the highest values may contain fewer than 15 observations. While item
2.15 has a smooth plotted response curve, item 1.6 seems to have an anomolous dip in the plotted response
curve.
Figure 12 shows the ranking of students according to estimated i for valid strategy choice, while Table
6 shows the ranking of items according to the estimated j difficulty parameters.
23
-
Outfit Outfit
Type Item P-value Type Item P-value
Missing Value 1.2 0.8135 2.5 0.1549
2.4 0.4675 2.11a 0.4939
1.4 0.8282 Equivalence 1.12 0.4275
2.2a 0.8744 1.13 0.2319
2.2b 0.7880 1.14 0.3087
2.9 0.4866 2.14 0.8493
2.8 0.6524 2.15 0.0741
1.3 0.8311 Invariance 1.1ab 0.8872
1.8 0.7647 2.1ab 0.6506
2.12 0.6904 1.11b 0.3323
Similarity 1.7 0.5976 2.6 0.6081
2.7a 0.5964 2.10 0.7538
2.11b 0.6791 Relative/Absolute 1.5 0.4733
1.9 0.7173 1.6 0.0523
Comparison 2.3 0.2214 Covariance 1.1c 0.3784
Table 5: Approximate posterior predictive p-values for outfit statistics calculated from the binary Rasch
Model.
24
-
Theta (%iles ~= 0.26 )
Bet
a 22
= 0
.633
2
3 2 1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
X
X
X
X
Question 2.15
Theta (%iles ~= 0.2 )
Bet
a 29
=
0.02
16
3 2 1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
X X
X
X
X
Question 1.6
Figure 11: Observed and fitted response curves for Valid Strategy Choice, items 2.15 and 1.6.
25
-
0 20 40 60 80 100 120
2
1
01
23
The
ta
Students
Stage III20%
Stage IV60%
Stage V20%
Figure 12: Ranked posterior means i for 125 students. Students are grouped to approximate the three
Developmental Stages from the Cognitive Model.
26
-
Type Item Type Item
-2.60 Missing Value 2.2a 0.01 Covariance 1.1c
-2.53 Missing Value 1.2 0.08 Invariance 1.11b
-2.39 Missing Value 1.3 0.17 Missing Value 2.2b
-1.97 Missing Value 2.4 0.49 Comparison 2.11a
-1.82 Missing Value 1.4 0.63 Equivalence 2.15
-1.52 Missing Value 2.8 0.67 Equivalence 1.12
-1.38 Missing Value 1.8 0.72 Missing Value 2.12
-0.61 Comparison 2.3 0.81 Comparison 2.5
-0.50 Missing Value 2.9 0.86 Relative/Absolute 1.5
-0.46 Invariance 2.10 1.16 Similarity 1.9
-0.45 Similarity 2.11b 1.19 Equivalence 2.14
-0.33 Equivalence 1.13 1.66 Similarity (II) 1.7
-0.32 Invariance 2.6 1.71 Similarity (II) 2.7a
-0.07 Equivalence 1.14 2.07 Invariance (II) 2.1ab
-0.02 Relative/Absolute 1.6 2.47 Invariance (II) 1.1ab
Table 6: Items ranked from easiest to hardest via difficulty parameter. Similarity (II) indicates a figure
scaling task. Invariance (II) indicates a shading task.
27
-
The most obvious pattern in estimated difficulty parameters is the cluster of Missing Value tasks with
low estimates. All but three of the Missing Value tasks fall in the first quartile of ranked parameters.
The cluster of low-ranked missing value tasks are very similar in structure. All require the student to scale
up a ratio, and all have at least one integer multiple relationship between compared quantities. Item 2.9
has a slightly higher difficulty parameter than the cluster mentioned above. Though item 2.9 has an integer
multiple relationship, the relationship also happens to be a square (7 to 49), and elicits a higher proportion
of squaring strategies than other items. Item 2.2b, ranked second in difficulty for Missing Value tasks, is the
only missing value task on either test form that asks the students to take a given ratio and scale the numbers
down to obtain a smaller result. Item 2.12 was the most difficult Missing Value task. The ratio relationships
in this item contained common divisors but no integer multiples, and the numbers were very close (9/15 to
12/20). This item elicited many incorrect implementation and incorrect addition strategies.
The most difficult items are categorized as Similarity and Invariance tasks. A (II) beside the Similarity
task type indicates that the task asks students to scale up a figure. This task is specifically mentioned in
the cognitive model as difficult for all but stage V students. Indeed, the figure scaling tasks do appear to
be among the most difficult for students. A (II) beside the Invariance task type indicates that the Invariance
task specifically asks students to describe varying and invariant characteristics of sets of shaded figures.
These shading tasks were the most difficult on the exams; students in general had a hard time discerning the
constant ratio of shaded to unshaded regions in each set of figures. These items elicited a high proportion of
surface strategies.
Table 7 shows the breakdown of valid vs. invalid strategies chosen by students in each approximated
developmental stage. Figures 13 through 15 show bar graphs of the breakdown of percentages for valid
strategies, with a (*) indicating percentage estimates with standard errors above 0.03.
In general, the patterns of response do indicate that Stage V students choose different kinds of valid
strategies than Stage III students. Specifically, students who have higher overall proficiency at choosing valid
strategies tend to choose valid multiplicative strategies more so than students with lower overall proficiency
at choosing valid strategies. The valid strategies chosen by Stage III students are more often build-up strate-
gies. Figures 13-(i) and 13-(ii) indicate that for Missing Value and Similarity tasks cross-multiplication
was a strategy chosen by high-ranked students, whereas Figure 14-(i) shows that for Comparison tasks
cross-multiplication was much less frequently chosen, and it was chosen by lower-ranked students. This
pattern agrees with the expected observed performance outlined in the qualitative cognitive model, as well
as highlighting the procedural nature of cross-multiplication.
28
-
Task Type Stage III Stage IV Stage V
Missing Value Valid 59 292 117
Invalid 59 81 5
% Valid 50% 78% 96%
Similarity Valid 1 43 29
Invalid 49 107 21
% Valid 2% 29% 58%
Comparison Valid 3 53 20
Invalid 24 58 13
%Valid 11% 48% 61%
Equivalence Valid 15 70 50
Invalid 51 118 14
%Valid 23% 37% 78%
Invariance Valid 4 51 33
Invalid 46 110 18
% Valid 8% 32% 65%
Relative/Absolute Valid 8 32 21
Invalid 23 50 8
%Valid 26% 39% 72%
Covariance Valid 2 22 11
Invalid 14 16 3
% Valid 13% 58% 79%
Table 7: Valid vs. Invalid responses
29
-
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Multiplicative XMult. Additive
(i) Missing Value Valid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Multiplicative XMult. Additive
(ii) Similarity Valid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
(iii) Missing Value Invalid Strategies
Inc. Imp. Misconcep. M. Mult. N/C
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Inc. Imp. Misconcep. M. Mult. N/C
(iv) Similarity Invalid Strategies
Figure 13: Valid vs. Invalid Strategies: Missing Value and Similarity Tasks
30
-
Turning to invalid strategies, students in lower stages have a higher percentage of choosing non-classifiable
strategies. The cognitive model also mentions that students frequently employ incorrect addition strategies
on scaling Similarity tasks. This is reflected in the large percentage of Misconception strategies employed
by students from all three stages (Figures 13-(iii) and 13-(iv)). Consistent with the difficulty parameters and
the cognitive model, students in Stage III found scaling tasks exceedingly difficult, employing the largest
percentages of non-classifiable and meaningless multiplicative strategies.
Also among invalid strategies, the cognitive model predicts that students in lower stages are more likely
to respond with a qualitative strategy on Comparison tasks (Figure 14-(iii)). This is reflected in the per-
centages. Given that an invalid strategy was chosen, for students in Stage III it is most likely that the invalid
strategy is qualitative. Finally, while all of the strategies listed are invalid, as developmental stage increases
students are more likely to choose an invalid strategy that is at least closer to valid (Incorrect implementation,
Misconception) than a non-recognizeable strategy.
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Multiplicative XMult. Additive
(i) Comparison Valid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Multiplicative Additive
(ii) Equivalence Valid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Inc. Imp. Qualitative N/C
(iii) Comparison Invalid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Misconcep. Noncomp Elim/Guess N/C
(iv) Equivalence Invalid Strategies
Figure 14: Valid vs. Invalid Strategies: Comparison and Equivalence Tasks
Invariance, Relative/Absolute and Covariance tasks (Figure 15) had only one valid strategy to employ,
thus the only conditioning that goes beyond the Rasch model is for invalid strategies. These invalid strategies
31
-
are all shown in Figure 15. For Invariance tasks, both the non-ratio and non-classifiable patterns behave
as is expected, with students in higher stages choosing the more sophisticated invalid strategies more of-
ten, given that an invalid strategy has been chosen. However, the Surface strategy, notable as a strategy
chosen by lower-stage students, is also prominent in Stage V students as well. The shading tasks in partic-
ular elicit a large amount of surface strategies. The same pattern is notable for the Relative strategy in
Relative/Absolute tasks. With these tasks, even higher level students sometimes resort to less sophisticated
strategies.
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
NonRatio Surface N/C
(i) Invariance Invalid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Err. Absolute Relative N/C
(ii) Relative/Absolute Invalid Strategies
St.III
St.IV
St.V
0.0 0.2 0.4 0.6 0.8 1.0
Ratio Only Total(RB) Overall N/C
(iii) Covariance Invalid Strategies
Figure 15: Invalid Strategies: Invariance, Relative/Absolute and Covariance Tasks
For the Covariance task the standard errors were quite high. There was only one Covariance task asked
and hence few responses. The most notable feature of the breakdown in Figure 15 is the large percentage
of Stage III students who maintained the ratio relationship. However, ignoring the total numbers of chips
appears to make it easier to maintain a ratio relationship. Accounting for the totals made it more difficult to
maintain the ratio relationship.
32
-
3.3 Discussion
The Rasch model employs a very simple binary coding of the data and models the latent variable as a single
continuous trait. Interpreting the latent scale as a simple ability parameter does not capture the fact that
students with low proficiency choose different kinds of valid strategies than students with high proficiency.
Whether in a polytomous data coding or specific item design, it is critical to incorporate this information
into the assessment.
Some patterns in strategy choice conditional on valid vs. invalid appear consistent with the cognitive
model. When employing valid strategies, students with low proficiency tend to choose additive strategies
more often than students with higher proficiency. When choosing invalid strategies, students with low profi-
ciency tend to choose non-classifiable strategies more often than students with higher proficiency. Difficulty
parameters also reflect descriptions in the cognitive model; specifically, scaling tasks, which are noted as
difficult for all but stage V students, were among the most difficult for students in the pilot study as well.
On the other hand, some observed patterns are not consistent with the cognitive model. Whether these
inconsistencies are due to a mis-specification of the cognitive model or to a flawed task design is unclear.
For example, although the cognitive model specifies surface features as typical of Stage II and Stage III
students, surface strategies were elicited from Stage IV and V students on shading tasks. It could be that
the specific format of shading elicits surface strategies from more sophisticated students (indicating that the
cognitive model need to be refined). However, these items are also the first asked on the exams, and may
simply be worded too broadly for students to follow. On Relative/Absolute tasks, the invalid Relative
strategy is prominent in higher ranked students. This could be because higher ranked students have a natural
tendency to view situations as relative, or it could be a priming effect: the Relative/Absolute tasks were
asked after a string of Missing Value tasks, each of which was solved via a ratio.
Based on simple milestones from the cognitive model, such as the reliance on Build-Up vs. Multiplica-
tive strategies and the ability to solve scaling tasks, it may be possible to design an exam for which valid
vs. invalid strategy choice stratifies students quite well into the three developmental stages. Evidence for
the validity of the cognitive model can be seen by a natural unforced set of patterns that also follow what is
outlined by experts. But in using this kind of Rasch analysis, any connection to the underlying skill patterns
that govern strategy choice must be known beforehand and incorporated into the existing cognitive model.
The latent scale does not provide enough information about underlying skill patterns. For instance, despite
the fact that missing value and Comparison tasks are described as comprehensive, incorporating all aspects
of proportional reasoning, these tasks seemed to be easier on the valid/invalid scale than those tasks designed
33
-
to highlight only one specific aspect of proportional reasoning, such as the Invariance and relative/absoulte
tasks. Does this indicate that when solving Missing Value tasks, students rely on concepts and skills other
than those defined or measured by the single-aspect tasks? Do Missing Value, Similarity and Comparison
tasks rely compensatorily rather than conjunctively on the single facets of proportional reasoning? The
Rasch analysis sheds no light on the question. In order to obtain specific information on patterns of skills
and their link to patterns of strategies, polytomous coding needs to be addressed in the model, and the latent
structure needs to be expanded to include sets of discrete skills.
4 Polytomous Coding with Many Latent Skills: Bayes Net
4.1 Method
The Rasch analysis is based on a unidimensional latent variable which can be seen as a single overall
propensity to select valid strategies. The patterns of strategies based on this latent scale seem to follow the
cognitive model, but restructuring the latent space as a set of discrete skills can help pinpoint which aspects
of proportional reasoning a student has mastered. The underlying skill patterns, coupled with specific item
features, drive the students strategy choice.
The qualitative cognitive model for proportional reasoning (Figure 2) does not address the underlying
skills in each of the developmental stages. But the aspects of proportional reasoning suggested by Baxter
and Witkowski (2002) can be adapted to define a working set of skills:
Covariance: Student understands that the change in one quantity depends on the change in another
Relative/Absolute: Student recognizes the fact that the change in quantities is a relative, multiplicative
change as opposed to an absolute change
Invariance: Student recognizes the presence of a constant ratio relationship
Multiplicative Modeling: Student can implement a valid multiplicative strategy (as opposed to a valid
additive strategy such as Build-Up)
Adaptability: Strategy choice is generalizable and efficient for the given numbers in a problem
These skills are related more to a psychological conceptualization than they are to mathematical imple-
mentation, with the possible exception of Invariance and Multiplicative Modeling, and it is not immediately
clear how to connect strategies with each of these skills. To study the relationships between skills and
34
-
strategies, it is necessary to use a polytomous strategy coding as opposed to a simple binary coding of
valid/invalid. For task type j with strategies k = 0...k j and r = 1...r j repetitions, we define
Xi jr = k : Student i chooses strategy k on the rth task of type j,
A model based on four skills Covariance, Invariance, Relative/Absolute and Multiplicative Modeling
is displayed schematically in Figure 16. Although the adaptability skill defined above certainly plays
a role in strategy choice for tasks measuring general ability in proportional reasoning, our data is not coded
to distinguish an adaptable strategy from general multiplicative strategies. Measuring adaptability requires
that some kind of optimal multiplicative strategy can be identified for each item.
i1
i2
i3
i4
TASKS
Invariance
Relative/Absolute
Covariance
STRATEGIES
Missing Value:
Similarity:
Comparison:
Equivalence:
Invariance:
Relative/Absolute:
Covariance:
SKILLS (binary indicators)
additivelogisticmodels
0: Mult.1: XMult.2: Additive
4: Misconcep.
6: N/C
0: Mult.1: XMult.2: Additive
4: Misconcep.
6: N/C
0: Mult.1: XMult2: Additive
4: Qualitative5: N/C
0: Mult.1: Additive 3: NonComp.
4: Elim/Guess5: N/C
0: Ratio 1: NonRatio2: Surface3: N/C
0: Absolute
3: N/C
0: Pres. Both 1: Ratio Only
3: Overall 4: N/C
10 replications
4 replications
3 replications
5 replications
5 replications
2 replications
1 replication
3: Inc. Imp.
3: Inc. Imp
3: Inc. Imp.
2: Misconcep.
1: Err. Absolute2: Relative
2: Total (RB)
5: M. Mult.
5: M. Mult.
Multiplicative Modeling
Valid Invalid
Figure 16: A model linking skills, task types and strategies for the Pilot Data.
To describe the model mathematically we define latent skill indicators,
35
-
is =
1 : Student i possesses skill s
0 : else,
and a design matrix relating skills to tasks,
Q js =
1 : Task type j highlights skill s
0 : else.
The model we choose to fit is a kind of Bayes Net (Mislevy, 1995), where possession of a certain skill
relevant to a particular strategy increases the probability of choosing that strategy over other strategies. The
skills act compensatorily; not all skills need be present for the implementation of a certain strategy. Observed
strategy choice is linked in our model to unobserved skills via an additive logistic regression model:
log
P(Xi jr = k | i, j.)
P(Xi jr = 0 | i, j.)
= jk +
S
s=1
[
Q jssjkis]
with parameters:
jk : Log odds of selecting strategy k for task type j.
sjk : Change in log odds of choosing strategy k when is goes from 0 to 1.
In this model, the indicators Q js are fixed in advance, based on predictions from the cognitive model
about which skills are relavent to strategies for each task type. The parameters jk and sjk , and the latent
skill indicators is are estimated in fitting the model. Also, the model assumes a form of local indepen-
dence. Given a students position (i1, ..., i4) in the latent skill space, that students replies are viewed as
independent.
For identifiability, j0 and sj0 are set to 0 for all task types j and skills s. For the pilot study, the baseline
0 category represents a multiplicative strategy for Missing Value, Similarity, Comparison, and Equivalence
tasks, and it represents the sole valid strategy for the remaining three task types.
Though the skills are given names and descriptions adapted from Baxter and Witkowski (2002), the defi-
nition of what the skill measures within the model is determined by the links set in the Q-matrix, the structure
of the task types especially those task types which highlight single skills and inequality constraints set
on the model.
The matrix Q indicates which skills are highlighted in the structure and strategy coding of each task
type. The matrix represents a compensatory relationship between latent class and observed response, as
36
-
with the linear logistic test model (Fischer, 1973), as opposed to a conjunctive relationship, as with certain
other models (Junker and Sijtsma, 2001; DiBello, Stout and Roussos, 1995). Since different strategies rely
on diffent sets of skills, an entry of 1 in Q js indicates only that the skill s plays a role in determining the
probability structure of the strategies in task type j. The model is essentially a polytomous version of Maris
(1995) compensatory multiple classification latent class model.
Missing Value
Similarity
Comparison
Equivalence
Invariance
Rel/Abs
Covariance
Question Type
Q
Covariance Relative/Absolute Invariance Mult. Model
1 1 1 1
11
0 0
0 0 1 0
0 1 0 0
1 0 0 0
1 1
1 1
11 1 1
Skill
Figure 17: The (compensatory) Q-matrix.
The Q-matrix for the pilot data is shown in Figure 17. For tasks designed to measure overall ability
in proportional reasoning, all skills are highlighted. For those task types designed to pinpoint specific
areas of knowledge, Q is much more sparse. The Q-matrix highlights two skills for Equivalence tasks:
Invariance and Multiplicative Modeling. These skills were chosen because the Invariance skill stresses
the recognition of a ratio relationship and the Multiplicative Modeling skill stresses the employment of a
multiplicative strategy. Covariance and Relative/Absolute skills were seen as more psychological traits than
mathematical, needed for translating a real-world situation into mathematical terms. As the Equivalence
tasks were context-free math tasks with equations already set up, it was theorized that these tasks would
highlight the more mathematical skills.
Further constraints must be made to induce identifiability in the set of latent skills. This is done with
37
-
inequality or monotonicity constraints, which are generally weaker than imposing linear constraints among
the parameters. Inequality constraints also help with the interpretation of skills by imposing prior relation-
ships between skills and strategies, highlighting which strategies are more likely given certain sets of skills.
For each task type j, an inequality constraint can take the form:
P(Xi jr = k1 | is = l1) < P(Xi jr = k2 | is = l2).
This allows us to encode conditions such as:
Strategy k1 is less likely than strategy k2 when skill s is low.
Strategy k is less likely when skill s is low than when skill s is high.
The constraints for the pilot data, shown in Table 8, implement only the first kind of conditions, given a
low skill level. The inequalities are designed to loosely constrain the model for identifiability but to provide
little information on specific skills definitions. As such, the interpretation of skills in this model will be
highly dependent on the Q-matrix entries for the single-skill task types.
4.2 Results
Parameter estimation for the pilot data was obtained via MCMC. A set of four skills yields 24 latent classes.
Since the cognitive model is not specific about the development or likelihood of skill patterns, a uniform
prior was set on the probability of each of the 24 latent classes. This translates to a uniform prior for each
parameter is. Normal(0, 2 = 3) priors were set for each of the unconstrained jk and sjk parameters,
subject to the inequality constraints in Table 8. Candidate distributions were tuned to accept approximately
50% to 70% of proposals during each iteration of the Markov Chain.
Chains of 10,000 iterations were run for parameter estimation, with the first 1000 iterations discarded
for burn-in. The jk and sjk parameters were estimated using posterior means. Student skill patterns were
estimated by examining the posterior distribution of the set of 16 different possible skill patterns for each
student.
Due to the large number of item parameters and relatively small sample size, obtaining convergent
chains was problematic. Chains for parameters associated with a larger number of responses (for example,
Multiplicative strategies on Missing Value tasks) were more stable than those associated with only a few re-
sponses (for example, Erroneous Absolute strategies for Relative/Absolute tasks). Posteriors were generally
unimodal, with a few incidences of possible label switching for parameters associated with fewer observa-
tions. The following analyses are subject to large standard errors, and we have limited discussion to the
38
-
Task Type Skill Level Constraint
Missing Value Covariance Low P(Misconception) < P(Meaningless Multiplicative)
Missing Value Relative/Absolute Low P(Incorrect Implementation) < P(Misconception)
Missing Value Invariance Low P(Multiplicative) < P(Additive)
Missing Value Mult. Modeling Low P(Multiplicative) < P(Additive)
Similarity Covariance Low P(Misconception) < P(Meaningless Multiplicative)
Similarity Relative/Absolute Low P(Incorrect Implementation) < P(Misconception)
Similarity Invariance Low P(Multiplicative) < P(Incorrect Implementation)
Similarity Mult. Modeling Low P(Multiplicative) < P(Incorrect Implementation)
Comparison Covariance Low P(Incorrect Implementation) < P(Qualitative)
Comparison Relative/Absolute Low P(Additive) < P(Incorrect Implementation)
Comparison Invariance Low P(Multiplicative) < P(Additive)
Comparison Mult. Modeling Low P(Multiplicative < P(Additive)
Equivalence Invariance Low P(Misconception) < P(Non-Computational)
Equivalence Mult. Modeling Low P(Multiplicative) < P(Additive)
Invariance Invariance Low P(Ratio) < P(Non-Ratio)
Relative/Absolute Relative/Absolute Low P(Absolute) < P(Relative)
Covariance Covariance Low P(Ratio and Totals) < P(Ratio Only)
Table 8: Constraints set on the Discrete Bayes Net model.
39
-
strategies that had larger numbers of observations in the data set. The methodology, however, is applicable
in any setting.
4.2.1 Classifying Students
Covariance Relative/Absolute Invariance Multiplicative Modal Probabilistic
Modeling Count Count
0 0 0 0 3 4.43
0 0 0 1 5 5.69
0 0 1 0 9 8.31
0 0 1 1 10 7.56
0 1 0 0 5 7.12
0 1 0 1 9 9.59
0 1 1 0 13 10.17
0 1 1 1 8 10.38
1 0 0 0 3 5.50
1 0 0 1 6 5.61
1 0 1 0 8 7.14
1 0 1 1 8 6.82
1 1 0 0 6 6.19
1 1 0 1 5 9.77
1 1 1 0 10 8.60
1 1 1 1 17 12.12
Table 9: Number of students (out of 125) classified into each group of skill proficiencies. The modal count
is obtained by assigning each student to the most likely posterior skill pattern, while the probabilistic count
is obtained by summing over the posterior probability distribution for each student.
Four latent skills yield sixteen different skill patterns. These patterns, as well as the number of students
classified into each, are shown in Table 9. While the modal category can be used for explicit categorization
to a set of skill patterns, one can also characterise uncertainty in the classification by looking at the posterior
distribution of skill patterns for each student. To obtain a compact visual representation of this posterior
40
-
distribution, we modify a graphical display from computer engineering called a Karnaugh map or k-map
(cf. Wakerly, 2000).
The structure of a k-map is illustrated in Figure 18. For a set of four skills {A, B,C,D}, the k-map dis-
plays the skill space as a set of 16 cells in a grid. Adjacent non-diagonal cells differ from each other by only
one digit, with top and bottom cells logically adjacent, and left edge and right edge cells logically adjacent.
The topology is similar to an unrolled toroid, where rectangular regions of adjoining cells correspond to
different marginal logical statements about the skills.
00 01 11 10
11
10
01
00 0000
0100
1100 1111
0111
0011
0110
0010
1000 1001
1101
0101
0001
1011 1010
1110
B = 1
B =1, C = 1
C = 1
B = 0, C = 0, D = 0
D = 1
A = 1
A,B
C,D
Figure 18: The structure of a k-map for four skills {A,B,C,D}.
K-maps can be drawn for larger sets of skills, but as the dimension increases, the size and complexity
of the k-map increases. A five-skill k-map consists of two four-skill k-maps embedded in a one-skill k-map
(see Figure 19). A six-skill k-map consists of four four-skill k-maps embedded in a two-skill k-map. K-
map representations of more than six skills are rarely used in computer engineering, as the complexity of
41
-
the 1-digit differences in high-dimensional space makes visualization difficult. However, when dealing with
probability distributions centered on a single cell, higher dimensional k-maps may still be useful. If it is
known beforehand that certain skills are easily discriminable, the k-maps can be set up so that comparison
across graphs corresponds to 1-digit differences in those skills. In that case it may still be possible to
concentrate the bulk of the distribution in one region of the k-map.
C = 1
D = 1
A = 1
A,B
E = 0
11
10
01
00
000 010 110 100
10000 10010 10110 10100
11000 11010 11110 11100
01000 01010 01110 01100
00000 00010 00110 00100
C,D,E
C = 1
D = 1
B = 1
11
10
01
00
E = 1
10001 10011 10111 10101
11001 11011 11111 11101
01001 01011 01111 01101
00001 00011 00111 00101
A=0,B=0,C=0,D=0C,D,E
001 011 111 101
Figure 19: A k-map for five skills {A,B,C,D,E}.
Four probabilistic k-maps are illustrated in Figure 20. In the true state space each students place is
represented by a single 1 in one of the 16 grid spaces. The K-maps presented here are probabilistic
because the sum over the grid is equal to 1, and the map represents an estimated posterior distribution.
Each region is shaded relative to the probability of the skill pattern, with darker shades indicating higher
probability. To make patterns more visible, the map is normalized relative to the mode of the distribution,
which is listed at the top of the graph. Thus, shading across students is not comparable.
In Figure 20, student 238 has a very strong peak (probability 0.57) at the k-map cell corresponding to
possession of all four skills, with the density falling away quickly at adjoining squares. Student 245, how-
ever, has the bulk of the distribution spread across two adjoining squares. The 1-digit difference between
these squares corresponds to the Relative/Absolute skill. This indicates that while there is strong certainty
that the student possesses Covariance, Invariance, and Multiplicative Modeling skills, there is still uncer-
tainty regarding the students possession of relative vs. absolute reasoning. For student 314, the distribution
42
-
Invariance,Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
Student 238 : mode = 0.57
Invariance,Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
Student 245 : mode = 0.41
Invariance,Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
Student 314 : mode = 0.26
Invariance,Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
Student 220 : mode = 0.18
Figure 20: The distribution of skill patterns for four different students using a probabilistic k-map.
43
-
indicates high for Invariance and low for Multiplicative Modeling, but uncertainty in both the Covariance
and Relative/Absolute skill. Student 220 has the most diffuse distribution of the four students shown, with
the mass centered on 0 for Invariance but with uncertainty in the remaining three skills.
4.2.2 Interpreting Skills through Examination of Logistic Regression Coefficients
In comparison to the Rasch model, the Bayes net provides flexibility for inference at the student level.
Looking at posterior distributions provides information on specific skill patterns, and diffuse patterns on
one skill axis can also indicate where the model may fail for particular students, or which skills may be
poorly elicited from the test. But while the Bayes net model highlights student-specific inference on skills,
interpretation of the skills is more challenging, because the meanings of skills are derived from constraints
and the compensatory link between skills and strategy choice outlined by the Q-matrix.
One way of assessing the links between skills and strategies is to examine task-specific probabilities of
skill possession given strategy response. Define a matrix with C = 24 rows and S = 4 columns, as defined
by the left hand side of Table 9. The cs entry represents the status of posession of skill s in the c-th latent
class. For task type j, response k, and a latent class c, we compute
Pc( j, k) =exp(S
s=1 Q jscssjk)
Cl=1 exp(
Ss=1 Q jsls
sjk)
(3)
If we assume a uniform prior on the C latent classes, then
Pc( j, k) = P(latent class c | response k to task type j).
When the prior on latent classes is not uniform, Pc( j, k) does not have a simple Bayesian interpretation.
However it can still be used to assess the information about skills that is provided by strategies for each
task type. Estimates of Pc( j, k) can be obtained by substituting the posterior mean estimates for the sjk
parameters into equation 3. Across all skill patterns c, Pc( j, k) is a probability distribution. This distribution
is considered task-specific because it uses only information from the computed sjk parameters, and does not
take into account any information on the marginal likelihood of the skill patterns (in the form of the latent
classes). The variability of the estimate can be obtained through direct approximation using the MCMC
output, or through the -method.
The probability distributions in equation 3 also have a useful graphical interpretation using k-maps (See
Figure 21). On the logit scale, the sjk term is associated with the two horizontal or vertical lines in the
k-map corresponding to the 1-digit differences for skill s. A k-map based on the probabilities in equation
44
-
3 is essentially a way of visualizing in one graph all of the jk parameters for task type j, response k. We
use these k-maps to interpret skills in terms of tasks and strategies by examining which tasks and strategies
particular skills load on, similar to interpreting the matrix of factor loadings in a factor analysis.
00 01 11 10
11
10
01
00 0000
0100
1100 1111
0111
0011
0110
0010
1000 1001
1101
0101
0001
1011 1010
1110
+jkA
+jkB
+jkC
= 1
= 1
C = 1
D = 1
A,B
C,D
jkA
jkC
jkB
Djk
+Djk
Baseline
Figure 21: A k-map representation of the conditional probability of skill patterns given task type j, response
k (logit scale) for four skills {A, B, C, D}. Starting from the baseline category, each horizontal and vertical
line on the k-map represents adding or subtracting a jk term in the exponent of the numerator.
Figures 22 through 28 show the estimated marginal probabilities of response for each category, as well
as k-map representations of the estimated task-specific probability of skill patterns given each response (
Pc( j, k), c = 1, ...16), for the seven task types. The marginal probabilities give an idea of the frequency of
the strategy across all students, while the k-map shows the kinds of skills which are present in students who
chose those strategies.
Constraining sjk = 0 with the Q-matrix is equivalent to looking at a coarsening of the grid in the k-map;
45
-
0 1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
0.5
Marginal Probability
Missing Value
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
0. Multiplicative
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
1. CrossMultiplication
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
2. Additive
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
3. Incorrect Implementation
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
4. Misconceptions
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
5. Meaningless Multiplicative
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
6. NonClassifiable
Figure 22: Marginal and conditional probabilities: Missing Value
46
-
0 1 2 3 4 5 6
0.0
0.1
0.2
0.3
Marginal Probability
Similarity
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
0. Multiplicative
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
1. CrossMultiplication
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
2. Additive
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
3. Incorrect Implementation
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
4. Misconceptions
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
5. Meaningless Multiplicative
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
6. NonClassifiable
Figure 23: Marginal and conditional probabilities: Similarity
47
-
certain one-digit differences will yield no change in the probability. But unconstrained skills which do not
impact a response pattern will also coarsen the k-map in the same way. For example, the Q-matrix for
Missing Value, Similarity and Comparison tasks highlights all skills and does not fix any sjk parameters at
0. The multiplicative response for Missing Value tasks (Figure 22) appears to highlight specific inference
in all four skill areas; the darkest cell and its four darkest neighbors correspond to at least three of four
skills. However the multiplicative response for Similarity tasks (Figure 23) highlights a coarser region
corresponding to possesion of Invariance and Relative/Absolute skills, but yielding more uncertainty about
Covariance and Multiplicative Modeling skills.
0 1 2 3 4 5
0.0
0.1
0.2
0.3
0.4
Marginal Probability
Comparison
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
0. Multiplicative
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
1. CrossMultiplication
Invariance, Multiplicative ModelingC
ova
ria
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
2. Additive
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
3. Incorrect Implementation
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
4. Qualitative
Invariance, Multiplicative Modeling
Co
varia
nce
, R
ela
tive
/Ab
solu
te
00
01
11
10
00 01 11 10
5. NonClassifiable
Figure 24: Marginal and conditional probabilities: Comparison
For both Missing Value (Figure 22) and Comparison (