model specification for cognitive assessment of ... · model specification for cognitive...

91
Model Specification for Cognitive Assessment of Proportional Reasoning * Rhiannon Weaver Department of Statistics Carnegie Mellon University Brian Junker Department of Statistics Carnegie Mellon University January 20, 2004 Abstract In modern psychometric analysis of cognitive assessment, there is a choice between psychometric vs. cognitive science paradigms for modeling the latent scale. The first involves as few as one continuous latent ability parameter, while the second focuses on a set of binary latent skills. When the expert cognitive model is qualitatively specified (eg. paragraphs describing general trends in observable behavior for a set of developmental stages), interpretation of responses and latent variables is flexible, and may even be ambiguous. Models incorporating aspects from both psychometric and cognitive science paradigms can help in exploring response patterns, and refining both the exam design and the cognitive model. Here we present two such analyses of a pilot study of proportional reasoning. The first is a Rasch model using binary response coding with a continuous latent trait, which can approximate a set of developmental stages through careful item design and milestones. The second is a Bayes net model using polytomous response coding linked compensatorily to a set of latent skills, which allows for a factor-analytic approach to skill interpretation. We explore each scheme’s usefulness in inferring a student’s current state of knowledge, directing program planners, and directing assessment developers toward refining either the qualitative cognitive model or exam items. * The work presented in this paper was supported in part by NSF grant # ESI-9876538 to the Learning Research and Development Center, University of Pittsburgh, and in part by a graduate fellowship awarded under NSF VIGRE grant #DMS-9819950. The authors would also like to thank Gail Baxter, Lou DiBello and Christine Witkowski of the Educational Testing Service in Princeton NJ for their collaboration in the project. 1

Upload: buikhuong

Post on 11-Aug-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • Model Specification for Cognitive Assessment

    of Proportional Reasoning

    Rhiannon Weaver

    Department of Statistics

    Carnegie Mellon University

    Brian Junker

    Department of Statistics

    Carnegie Mellon University

    January 20, 2004

    Abstract

    In modern psychometric analysis of cognitive assessment, there is a choice between psychometric vs. cognitive

    science paradigms for modeling the latent scale. The first involves as few as one continuous latent ability parameter,

    while the second focuses on a set of binary latent skills. When the expert cognitive model is qualitatively specified

    (eg. paragraphs describing general trends in observable behavior for a set of developmental stages), interpretation

    of responses and latent variables is flexible, and may even be ambiguous. Models incorporating aspects from both

    psychometric and cognitive science paradigms can help in exploring response patterns, and refining both the exam

    design and the cognitive model. Here we present two such analyses of a pilot study of proportional reasoning. The

    first is a Rasch model using binary response coding with a continuous latent trait, which can approximate a set of

    developmental stages through careful item design and milestones. The second is a Bayes net model using polytomous

    response coding linked compensatorily to a set of latent skills, which allows for a factor-analytic approach to skill

    interpretation. We explore each schemes usefulness in inferring a students current state of knowledge, directing

    program planners, and directing assessment developers toward refining either the qualitative cognitive model or exam

    items.

    The work presented in this paper was supported in part by NSF grant # ESI-9876538 to the Learning Research and Development

    Center, University of Pittsburgh, and in part by a graduate fellowship awarded under NSF VIGRE grant #DMS-9819950. The

    authors would also like to thank Gail Baxter, Lou DiBello and Christine Witkowski of the Educational Testing Service in Princeton

    NJ for their collaboration in the project.

    1

  • Contents

    1 Introduction 4

    1.1 Latent Variable Models for Cognitive Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.2 Proportional Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.3 Analysis of an Assessment Test for Proportional Reasoning . . . . . . . . . . . . . . . . . . 7

    2 Data 10

    2.1 The Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2 Response Strategy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3 Dichotomous Coding with a Single Latent Trait: Rasch Model 19

    3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4 Polytomous Coding with Many Latent Skills: Bayes Net 34

    4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    4.2.1 Classifying Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.2.2 Interpreting Skills through Examination of Logistic Regression Coefficients . . . . 44

    4.3 Goodness of Fit for the Bayes Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.4 Methodological Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    4.4.1 Assessing The Influence of Individual Students . . . . . . . . . . . . . . . . . . . . 59

    4.4.2 Contradictory Skills Model Misspecification or Model Misfit? . . . . . . . . . . . 60

    4.4.3 Prior Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    5 Future Work 68

    6 Suggestions for Future Design 70

    7 Summary 72

    8 References 73

    2

  • 9 Appendix 76

    9.1 Exam Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    9.2 Conditional Percentage Estimates and Standard Errors: Rasch Model . . . . . . . . . . . . . 79

    9.3 Regression Estimates and Standard Errors: Bayes Net . . . . . . . . . . . . . . . . . . . . . 80

    9.4 Student K-maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    3

  • 1 Introduction

    1.1 Latent Variable Models for Cognitive Diagnosis

    In educational settings, diagnostic assessments serve as barometers of student performance. A cognitive

    model is developed which outlines the progression of understanding that a student undergoes as he or

    she learns a particular concept. From this cognitive model, the assessment test itself is designed. The

    test consists of a set of items that yield observable results based on a students unobserved position in

    the cognitive model. In contrast to a purely evaluative assessment geared toward measuring a students

    mastery of a specific curriculum, diagnostic assessments are also used as teaching tools; they give instructors

    feedback on how to adapt teaching styles to help students having trouble, and help program planners to adapt

    curricula when evidence shows a general lack of understanding across all students.

    Latent variable models are used extensively in cognitive diagnosis, with a range of modeling schemes.

    Most schemes can be described through a two-way heirarchical structure (Junker, 1999), with the manifest

    variables (ie, observed responses) and task features at the first level, and latent variables (ie, unobserved

    student parameters) at the second level. Manifest variables can be coded as dichotomous or polytomous; the

    choice of how to represent the observed responses depends on the item design and the detail, or granularity,

    coded in the latent space. In modern psychometric analysis, there is a choice between as few as one continu-

    ous generic ability parameter for the latent space vs. a larger-dimensional set of traditionally discrete skills.

    This distinction has been referred to as a psychometric vs. cognitive science representation of the students

    knowledge (Hunt, 1995).

    Junker (1999) surveys a number of models for statistical analysis under these frameworks. He notes

    that the main task in model construction is not one of choosing a continuous vs. discrete latent scale, but

    in choosing the granularity of the model. This choice is dictated by the specific research situation and

    questions of interest. The models Junker surveys can be used as a general guideline or set of building blocks

    to accomodate a given cognitive model for a specific situation.

    Item Response Theory (IRT) models based on a single latent trait are simpler than multidimensional

    models for estimation and inference, but often it is difficult to interpret the ability parameter when directing

    instructors toward a students particular weaknesses. However, when the cognitive model indicates a single

    progression of understanding, the latent trait can be used to classify students according to a more detailed

    set of developmental stages by relying on careful item design and milestones. This is similar to the approach

    of Masters and Forster (1999) and Draney, Pirolli and Wilson (1995). In this scheme, the latent trait can

    capture a richer underlying structure based on skills without specifically adding the complexity of skills into

    4

  • the model.

    Skills-based models often focus on breaking a task into a set of known sub-tasks and considering each

    sub-task as a skill. Skills can then be deterministically linked to items in a conjunctive way, with possession

    of all skills required for a correct response. But it can sometimes be difficult to see how the possession of a

    particular skill is related to the response. This is especially true when the responses themselves are strategies

    a student employs in constructing their answer to a word problem. In this case, it may make sense to first

    explore the relationship between strategies and underlying skills through a broader, compensatory model.

    Compensatory models, so named because posession of one latent skill can compensate for the lack of

    another, have been explored both for multidimensional latent trait variables (eg, Wilson, Wood and Gibbons,

    1983; Muraki and Carlson, 1995) and for discrete latent skill variables (eg, Maris, 1995). Minimally con-

    strained models of this type allow for an exploratory factor-analytic interpretation of the latent scale, and

    highly constrained models allow for confirmatory factor analysis. This kind of analysis can help in defining

    and interpreting a discrete skill space for a broadly specified cognitive model.

    Both IRT and compensatory discrete skills models can be useful in situations which do not immediately

    fall into either of the psychometric or cognitive science extremes. Such is the case for proportional reasoning,

    in which facility is measured through both strategy choice and implementation of strategy choice. Mastering

    proportional reasoning involves not only mathematical competency, but also a psychological familiarity

    that can be hard to quantify when deconstructing tasks into component steps. This makes it difficult to

    model proportional reasoning with a skills-based cognitive science approach. On the other hand, addressing

    the psychological facets of proportional reasoning with a psychometric model gives little insight into why

    students choose the strategies they do.

    1.2 Proportional Reasoning

    The term proportional reasoning is used to describe any kind of reasoning that focuses on the relation-

    ship between two ratios. Proportional reasoning is involved in everyday mathematical situations such as

    calculating the correct dosage for medicine, converting a measurement from one unit to another (pounds

    to kilograms, Celsius to Farenheit), or calculating estimated arrival times for travel. Generic proportional

    reasoning tasks require the student to reason with the equation

    ab=

    cd

    (1)

    Though proportional reasoning is taught in a mathematics setting and can often be reduced to a simple

    linear model y = kx, it is actually a psychological construct: The essential characteristic of proportional

    5

  • reasoning is that it focuses on describing, predicting or evaluating the relationship between two relationships

    (i.e., a second-order relationship) rather than simply a relationship between two concrete objects (or two

    directly perceivable quantities). (Piaget & Imhelder, 1975, via Lesh, Post and Behr, 1992). Students first

    learning proportional reasoning rarely approach the subject from a purely mathematical standpoint. It is an

    important concept in middle school mathematics because it is often a students first exposure to explicitly

    modeling these kinds of second-order relationships. To master proportional reasoning, a student must be

    able to (Baxter and Witkowski, January 2002)

    Conceive of a multiplicative relationship and possess a notion of change in a relative sense;

    Recognize that when two quantities are changing, the change in one depends on the change in the other

    (covariance);

    Recognize that while some aspects of the situation change, the ratio relationship remains constant

    (invariance); and

    Employ an appropriate multiplicative strategy to solve problems;

    Solving proportional reasoning tasks often reduces to solving the relationship in equation 1 for a missing

    value, as in Figure 1. Such tasks are called Missing Value tasks. However, student strategy in achieving the

    correct answer varies. Vergnaud (1983) has developed extensive models for student strategies in proportional

    reasoning tasks, and a version of his model for strategies in Missing Value tasks appears in Figure 1.

    Typically, the student conceives of two measure spaces, where each space represents one of the units

    in the problem (candy, money, inches, etc). The student can then compare differing values within a single

    measure space or similar values between measure spaces. Comparisons within measure spaces involve

    scale factors, and lead to so-called scalar strategies, whereas comparisons between measure spaces involve

    divining the functional relationship between two quantities, and lead to so-called functional strategies. After

    making a comparison, students may arrive at the correct answer through an additive method (build up or

    count out), or through multiplication or division. A common error is to replace the multiplicative ratio

    model with an additive model, substituting subtraction for division. Cross-multiplication is seen as a separate

    strategy from other multiplicative strategies because it is procedural in nature; students are taught to set the

    problem up as ad = bc and to solve, however it is not clear if students employing this strategy understand

    the second-order relationship entailed in proportional reasoning.

    Baxter and Junker (November, 2001) proposed a broad, developmental stages model for proportional

    reasoning in middle school students (see Figure 2). This qualitative developmental cognitive model is based

    on an extensive literature review and on interviews with subject matter experts. The model is qualitative in

    6

  • Example: The pizza shop advertises that 3 pizzas will serve about 10 people. How much pizza should I buy

    if there will be 50 people at my party?

    -Measure 1 Measure 2

    Value 1

    Value 2

    a b

    c (d)??

    6

    PizzaPeople

    310

    ?50

    Reasoning Strategy Calculation Solution

    Multiplicative Functional ba = k d = ck

    Multiplicative Scalar ca = l d = bl

    Multiplicative Cross-Multiplication ad = bc d = bca

    Additive Build Up ca = l d = b + ... + b (l times)

    Erroneous Addition a b = c d d = b + (c a)

    Other Other Squaring, Estimation, etc. Varied

    Figure 1: A Vergnaud Model for Missing Value tasks in proportional reasoning.

    that it does not explicitly list a quantifiable set of underlying skills that students learn, but instead proposes

    stages of development and a general overview of observed student performance in each stage. The stages

    are ordered in that they represent a progression of understanding, with higher stages indicating a higher level

    of sophistication in proportional reasoning.

    Sophistication in this sense embodies an ability first to set up a framework that outlines the correct

    relationships between quantities, and then to generalize solution strategies and approach the arithmetic from

    a context-free, abstract point of view. Much of this sophistication is evidenced not only by the students

    ability to compute the right answer, but the strategy that the student employs in the computation. More

    sophisticated strategies are those that indicate more facility with the concepts of proportional reasoning.

    Thus the cognitive model outlines expected performance not only in terms of which kinds of tasks the student

    will get right or wrong, but also in terms of strategies the students will employ when solving problems.

    1.3 Analysis of an Assessment Test for Proportional Reasoning

    This paper focuses on data collected from a diagnostic test developed from Baxter and Junkers proposed

    cognitive model for proportional reasoning. Section 2 outlines the diagnostic test, study design and data

    collection. The overall goal of the study is to describe how the diagnostic assessment test fares in describing

    a students current state of knowledge. This goal is two-fold, in that a well-designed test should be able to:

    7

  • II) Early Attempts at Quantifying

    III) Recognition of Multiplicative Relationship

    Students have the intuition that a ratio is two numbers that change

    rely on additive strategies such as build up when multiplicative reas

    distinguishable from situations involving relative change.

    together but the change may be additive or multiplicative. They often

    oning is required. Situations involving absolute change are not always

    IV) Accomodating Covariance and Invariance

    V) Functional and Scalar Relationships

    OBSERVABLE PERFORMANCESTAGE

    I) Qualitative Young students generally possess a good deal of knowledge aboutquantity that permits them to answer questions about more and less eg, which drink is sweeter?) or fairness (eg,divide pizza or cookies so everyone gets a fair share).

    Early attempts at quantifying often involve constant additive differences (ie, a b = c d) rather than multiplicative relationships

    Students recognize the invariant nature of the relationships between pairs of changing quantities. These students have a repetoire of generalizable strategies and they select the most efficient strategy fora given problem. Conceptions of covariance and invariance are welldeveloped.

    Students begin to develop a multiplicative change model. They recog nize that while some quantities may be changing, relationships amongthe quantities remain invariant. They view a ratio as a single unit to which basic arithmetic operations may be applied. They can typicallydistinguish situations involving absolute change from those involving relative change. Strategy use is contextspecific and when the num

    plicative situations. Concepts of covariance fail when students are asked to scale up a figure

    bers are hard these students may resort to additive reasoning in multi

    Figure 2: The proposed cognitive model for development of proportional reasoning (Baxter and Junker,

    November 2001).

    8

  • Reflect student progress toward targeted learning goals

    Make clear gaps in students understanding

    In terms of model development, these goals are achieved by specifying a cognitive model, translating the

    cognitive model into a meaningful latent structure and designing items that pinpoint a students place on the

    chosen latent scale. The latent structure and the response coding can either enhance or frustrate inference

    relating to these goals. Sections 3 and 4 outline two different choices of latent structure and response coding:

    a univariate continuous latent variable with a dichotomous response coding, and a set of binary latent skills

    with polytomous response coding.

    The simpler latent structure and coding scheme described in Section 3 can address the first goal, reflect-

    ing student progress toward learning goals. Given a set of ordered developmental stages with associated

    patterns of observable performance, it is plausible that test items can be designed such that increasing so-

    phistication leads to increasing general ability to generate right vs. wrong answers. Classification then is a

    simple matter of relative ranking. Almost all of the work for inference on the latent scale is done behind

    the scenes the latent structure gains a meaningful interpretation of developmental stages due to careful

    item designs specifically targeted at milestones in development. This scheme is very similar to the profi-

    ciency/difficulty scale advocated by Masters and colleagues (Masters and Forster, 1999; Draney Pirolli and

    Wilson, 1995; Masters and Evans, 1986). Though it can be informative relative to an existing set of develop-

    mental stages, this scheme relies heavily on a single progression of learning, and limits inference on where

    the underlying cognitive model may fail for particular student.

    The analysis described in Section 4,using discrete latent skills and polytomous response coding, can

    address the second goal of explaining why students choose particular strategy patterns. With this approach,

    we can define developmental stages as clusters of likely skill patterns, but we are not limited to knowledge

    a priori of such clusters. This approach addresses more flexibly the problem of defining gaps in student

    understanding, and it allows for a more broad classification of students according to which skills they pos-

    sess. However, with increased complexity comes increased difficulty in interpretation. Skills gain meaning

    through both the task types that highlight them and the strategies they elicit. These relationships can be

    difficult to define in the mathematical context of the model.

    Sections 5, 6 and 7 conclude with a discussion of what our analyses tells us about the process of

    developing cognitive diagnostic tests for broadly-defined developmental stages or skills, in terms of defining

    and pursuing specific goals in inference, data requirements for pursuing those goals, and the use of pilot

    studies to refine and redesign future tests.

    9

  • 2 Data

    2.1 The Study Design

    Data for this analysis comes from a pilot study conducted in a major urban school district in May 2001

    (Baxter and Witkowski, 2002), to support the design of a cognitively diagnostic assessment in proportional

    reasoning. Two forms of a test for proportional reasoning were given to 125 students, taken from three

    different middle schools. Exam 1 had 14 items, while exam 2 had 15 items.

    In general it is very difficult to deduce strategy choice solely from the students final answer alone.

    Instead, for this analysis the strategies themselves were coded as the observed response. In order to achieve

    this, students were interviewed as they took the diagnostic test and were recorded via audiotape. In most

    cases, a written transcript of the interview was also provided. The students numerical answer and written

    work were also recorded. From these transcripts, interviews and written answers, a general strategy was

    coded by Baxter and Witkowski (2002) for each item.

    Seven major kinds of tasks were asked on the forms. These are described in Figure 3. The first three task

    typesMissing Value, Similarity, and Comparisonwere designed to test overall facility with proportional

    reasoning. The last four task typesEquivalence, Invariance, Relative/Absolute, and Covariancewere

    designed to pinpoint specific aspects of reasoning involved in a proportional reasoning task. Tables 1 and

    2 show the breakdown of task types on the two exam forms. Tasks 1.10 and 2.13 were not scored. Students

    did not perform as expected on these tasks, and after inspection it was decided that these tasks were not

    strictly proportional reasoning tasks.

    The qualitiative cognitive model describes observable performance not only in terms of correct vs. in-

    correct answer, but also in terms of the specific strategies employed to solve each problem. Because of

    this specificity, it was necessary to interview the students during the exam and to record transcripts of their

    solution strategies. There is some flexibility in how to code the observed strategies so that the data set will

    be rich enough to capture the cognitive model but sparse enough for inference with a relatively small sample

    of students.

    2.2 Response Strategy Coding

    The ability to make inferences on the latent cognitive model depends not only on the specificity of the

    item design, but also on how the responses are coded. Due to the small sample size, relatively coarse

    codings for each task type were needed. These codings were chosen with the observed performance from

    10

  • Students are given three of four values (a, b and c), two of the values are presented as a ratio (a/b) and the task is to deter mine the missing value d such that a/b = c/d.

    A subtype of missing value questions involving scaling. Students are asked to scale the given quantities up or down, preserving similarity or some other quantity modelled as a ratio orproportion (eg, taste, shape).

    Students are presented with four values (a,b,c and d) in the form of two ratios (a/b and c/d). The task is to determine if thetwo ratios are equivalent.

    A subtype of comparison questions. Students are presentedwith ratio comparison or equivalent fraction problems in arithmetic symbols without a story context.

    Students are presented with a ratio relationship and asked to judge if the ratio relationship is the same or different under a changed condition

    Students are presented with a scenario in which they are required to make a judgement about a given relationship to solvefor an unknown. Students must decide if the relationship tosolve is relative (ie, using constant ratios a/b = c/d) or absolute(ie, using constant differences a b = c d).

    Students are asked to rearrange collections of colored chips orother objects while preserving a ratio relationship of colors to numbers. This was done using figures on paper, and thus it waspossible for students to come up with a rearrangement that used a different number of chips than was originally present.

    Missing Value

    DESCRIPTIONQUESTION TYPE

    Similarity

    Comparison

    Equivalence

    Invariance

    Relative/Absolute

    Covariance

    Figure 3: Task types and descriptions from the pilot study.

    11

  • Item Part Type Note

    1.1 a,b Invariance Shading task

    1.1 c Covariance

    1.2 - Missing Value

    1.3 - Missing Value

    1.4 - Missing Value

    1.5 a,b Relative/Absolute

    1.6 - Relative/Absolute

    1.7 - Similarity Scaling task

    1.8 - Missing Value

    1.9 - Similarity

    1.10 - Invariance Not scored

    1.11 b Invariance

    1.12 - Equivalence

    1.13 - Equivalence

    1.14 - Equivalence

    Table 1: Items on Exam 1.

    12

  • Item Part Type Note

    2.1 a,b Invariance Shading task

    2.2 a Missing Value

    2.2 b Missing Value

    2.3 - Comparison

    2.4 - Missing Value

    2.5 - Comparison

    2.6 - Invariance

    2.7 a Similarity Scaling task

    2.8 - Missing Value

    2.9 - Similarity

    2.10 - Invariance

    2.11 a Comparison

    2.11 b Similarity

    2.12 - Missing Value

    2.13 - Context-free MV Not scored

    2.14 - Equivalence

    2.15 - Equivalence

    Table 2: Items on Exam 2.

    13

  • the cognitive model in mind, as well as the specific aspects of proportional reasoning the the task type

    is probing. For example, the cognitive model specifies that students with lower levels of sophistication

    often make qualitative comparisons, so the Comparison coding incorporates a qualitative strategy. The

    cognitive model also specifies that, when asked to describe the invariant features of a ratio relationship,

    students with lower sophistication will rely on surface features as opposed to the ratio. Thus the Invariance

    coding incorporates a surface strategy. And although a student may use count up or subtraction methods

    on Relative/Absolute tasks, the coding is simply a reflection of the students recognition of an absolute task

    context, as that is the specific aspect of proportional reasoning that Relative/Absolute tasks were designed

    to measure.

    Figure 4 displays three different strategy codings for Missing Value tasks (similar codings for all task

    types are shown in Figures 5 through 10). The initial coding for the data is very rich but is difficult to use

    with such a small data set. The Modified Vergnaud coding condenses the strategies into seven general

    categories, but retains more detail in the erroneous strategies than the Vergnaud model described in Figure

    1. It also condenses Functional and Scalar strategies into one single multiplicative category. This was done

    because there were very few ( < 10) instances in the pilot study where students used functional strategies,

    but there were a greater number of observed erroneous strategies. Merging functional and scalar strategies

    into one simple multiplicative strategy may make it harder to distinguish students in Stage IV vs. Stage V,

    whereas breaking erroneous strategies into four separate strata may help in distinguishing between Stage III

    and Stage IV students.

    Finally, the Valid/Invalid coding is a very coarse meausurement of valid vs. invalid strategies. In this

    case, a valid strategy is one that will lead to the correct answer if implemented without simple arithmetic

    errors. An invalid strategy is one that is not recognizeable or that will not lead to a correct answer if

    implemented without simple arithmetic errors. Prelminary data analysis indicated that correctness of answer

    was independent of task given strategy choice. The features of each task influenced the students strategy

    choice, and the correctness or incorrectness of the answer was a result of the implementation of the chosen

    strategy. Thus, valid vs. invalid strategy choice is closely related to right vs. wrong answer, but is more

    stringent toward guessing (invalid strategy which may lead to the right answer).

    Within each level (valid or invalid), the strategies still have a measure of order. For instance, though all

    are valid strategies, additive strategies are the least sophisticated, followed by cross-multiplication, followed

    by multiplicative strategies. In invalid strategies, a non-recognizeable strategy (N/C) is not as sophisticated

    as a strategy with a misconception of the covarying relationship, which in turn is not as sophisticated as an

    14

  • Functional OperatorScalar OperatorScalar DecompositionFactor of ChangeUnit ValueUnit Value*CrossMultiplication*Build UpBuild UpBuild Up *Count Out

    Incorrect Addition

    ProductQuotientOtherIncomplete

    InverseEstimationErr. Repeated AddnErr. Build UpSquaring

    Additive/Mult.PartialMultiplicative

    Valid

    Invalid

    Initial Coding Modified Vergnaud Valid/Invalid

    N/C

    Misconceptions

    CrossMultiplication

    Additive

    Multiplicative

    Incorrect Implementation

    Meaningless Multiplicative

    Figure 4: Three different coding schemes for Missing Value tasks.

    15

  • incorrect implementation strategy (ie, a faulty implementation of an otherwise valid strategy).

    The final codings and descriptions for each task type are defined in Figures 5 through 10. Tables 3 and

    4 show the breakdown of responses from the pilot study for each item. The specific questions asked on each

    item are shown in the Appendix.

    3 Incorrect Implementation(Inc. Imp.)

    2 Additive

    1 CrossMultiplication(XMult.)

    6 NonClassifiable(N/C)

    5 Meaningless Multiplicative(M. Mult.)

    4 Misconceptions(Misconcep.)

    DescriptionStudent uses some form of Functional or

    Student uses BuildUp or Count Out

    Student displays some knowledge of thecovarying relationship but makes a

    Student substitutes the multiplicativecovariance relationship with addition,squaring, or a combination of addition

    Scalar strategy.Student uses CrossMultiplication.

    strategy.

    mathematical or conceptual error.

    and multiplication.Student chooses two numbers and

    Student uses a nonclassifiable or nonrecognizeable strategy.

    multiplies or divides them.

    # Name (Abbreviation) Valid/Invalid0 Multiplicative

    (Mult.)

    Valid

    Invalid

    Missing Value / Similarity: Students are given three of four values (a,b,and c), two ofthe values presented as a ratio (a/b) and are asked to solve for the missing value. Similarity problems are a subtype that specifically ask the students to scale ratios in order to preserve similarity or other quantities (eg, taste, shape).

    Figure 5: Final coding used for Missing Value and Similarity tasks.

    16

  • # Name (Abbreviation) Valid/Invalid0 Multiplicative

    (Mult.)

    Valid

    3 Incorrect Implementation(Inc. Imp.)

    2 Additive

    1 CrossMultiplication(XMult.)

    DescriptionStudent uses some form of Functional or

    Student uses BuildUp or Count Out

    Student displays some knowledge of thecovarying relationship but makes a

    Scalar strategy.Student uses CrossMultiplication.

    strategy.

    mathematical or conceptual error.4 Qualitative Student uses a comparison with the

    words "more" or "less", either with addition or noncomputationally.

    5 NonClassifiable(N/C.)

    Student uses a nonclassifiable or non recognizeable strategy.

    Invalid

    Comparison: Students are presented with two ratios (a/b and c/d) and asked to determineif the ratios are equivalent.

    Figure 6: Final coding used for Comparison tasks.

    # Name (Abbreviation) Valid/Invalid0 Multiplicative

    (Mult.)1

    DescriptionStudent uses some form of Functional orScalar strategy.

    5 NonClassifiable(N/C.)

    Student uses a nonclassifiable or non recognizeable strategy.

    2

    Valid

    Invalid

    Additive Student uses BuildUp or Count Outstrategy.

    Misconceptions(Misconcep.)

    Student substitutes the multiplicative equality with addition, squaring, or a

    3combination of addition and multiplication.

    NonComputational(NonComp.)

    Student matches surface patterns of thefractions or gives a noncomputationalreason (e.g. "it just looks right").

    Elimination/Guess(Elim./Guess.)

    Student chooses a multiple choice answer via elimination or guessing.

    4

    Equivalence: Students are presented with threeway comparison problems without astory context.

    Figure 7: Final coding used for Equivalence tasks.

    17

  • # Valid/Invalid0

    1

    Description

    3

    Ratio

    NonRatio

    Valid

    Invalid

    NonClassifiable(N/C.)

    Name (Abbreviation)

    Student uses a nonclassifiable or non recognizeable strategy.

    Student recognizes an invariant ratiorelationship.Student mentions a mathematical invar

    not a ratio relationship. Surface2 Student focuses on a surface feature of

    the task (e.g. color, shape).

    iance (either correct or incorrect) that is

    Invariance: Students are presented with a ratio relationship and asked to judge if therelationship changes under a changed condition

    Figure 8: Final coding used for Invariance tasks.

    # Valid/Invalid0

    1

    Description

    3

    Valid

    Invalid

    NonClassifiable(N/C.)

    Name (Abbreviation)

    Student uses a nonclassifiable or non recognizeable strategy.

    2

    Absolute

    Erroneous Absolute

    Relative

    (Err. Absolute)

    Student employs a form of absolute reas

    Student employs a form of absolute reasoning but makes a conceptual error whenimplementing the absolute strategy.

    oning (a b = c d).

    Student employs a form of relative reasoning (a / b = c / d).

    Relative/Absolute: Students are presented with common missing value settings but mustrecognize that the relationship is absolute (a b = c d) rather than relative (a/b = c/d)

    Figure 9: Final coding used for Relative/Absolute tasks.

    18

  • NonClassifiable(N/C.)

    Student uses a nonclassifiable or non recognizeable strategy.

    4

    # Valid/Invalid0

    DescriptionName (Abbreviation)

    2

    Ratio and Totals (Pres. Both)

    Student preserves the ratio relationship,the overall total number of chips and thesubtotals or red and blue chips.

    1 Ratio Only Student preserves the ratio relationship,but not the total numbers of chips.

    All Totals(Total (RB))

    Valid

    3 Overall Total(Overall)

    Student preserves only the overall totalnumber of chips.

    Invalid

    Student preserves the total number of chips and the totals for red and blue, butnot the ratio relationship.

    Covariance: Students are asked (on paper) to rearrange collections of colored chips whilepreserving a ratio relationship of colors to numbers.

    Figure 10: Final coding used for Covariance tasks.

    3 Dichotomous Coding with a Single Latent Trait: Rasch Model

    3.1 Method

    As a students developmental stage increases, the student should gain facility in choosing valid strategies

    on tasks regardless of the type of proportional reasoning task or the item characteristics of non-proportional

    reasoning such as arithmetic difficulty. Thus, ranking students based on their facility in choosing valid vs.

    invalid strategies should also capture patterns of response outlined in the five stages of the cognitive model

    from section 1.2. To explore this conjecture, we fitted the Rasch (1980) model to the response variables

    Xi j =

    1 : Student i chose a valid strategy on item j

    0 : Student i chose an invalid strategy on item j

    and examined changes in strategy choice with respect to increasing latent score. The Rasch model, common

    in item response theory (IRT), defines latent variables i and difficulty parameters j, such that

    P j(i) = P(Xi j = 1) =ei j

    1 + ei j(2)

    In this context, the variable i represents the propensity of student i to make a valid strategy choice (X i j =

    1), whereas the variable j measures the difficulty of item j. The model also assumes local independence:

    19

  • Item Strategies Total

    Missing Value Mult. X-Mult Additive Inc. Imp. Misconcep. M. Mult. N/C

    1.2 46 2 16 0 0 4 0 68

    2.4 34 0 9 0 1 3 2 49

    1.4 51 3 11 0 0 8 3 76

    2.2a 33 0 21 0 0 1 2 57

    2.2b 26 0 0 11 0 8 12 57

    2.9 31 1 2 0 11 5 6 56

    2.8 28 0 18 0 0 0 11 57

    1.3 60 1 2 1 0 1 3 68

    1.8 32 6 16 0 0 7 7 68

    2.12 12 2 5 17 6 2 13 57

    Similarity Mult. X-Mult Additive Inc. Imp. Misconcep. M. Mult. N/C

    1.7 5 6 1 2 46 1 7 68

    2.7a 5 1 3 3 25 0 20 57

    2.11b 30 2 2 1 6 4 12 57

    1.9 15 3 0 2 0 0 48 68

    Comparison Mult. X-Mult Additive Inc. Imp. Qualitative N/C

    2.3 31 1 4 8 5 8 57

    2.5 14 0 4 14 12 13 57

    2.11a 22 0 0 1 24 10 57

    Table 3: Observed strategies by item from the pilot study: Tasks stressing overall facility with proportional

    reasoning.

    20

  • Item Strategies Total

    Equivalence Mult. Additive Misconcep. Non-Comp Elim/Guess N/C

    1.12 24 1 17 7 2 17 68

    1.13 38 2 13 1 0 14 68

    1.14 29 7 15 0 2 15 68

    2.14 10 4 13 4 4 22 57

    2.15 17 3 11 1 2 23 57

    Invariance Ratio Non-Ratio Surface N/C

    1.1ab 5 6 56 1 68

    2.1ab 6 8 43 0 57

    1.11b 16 0 0 16 32

    2.6 27 0 0 21 48

    2.10 34 0 0 23 57

    Relative/Absolute Absolute Err. Absolute Relative N/C

    1.5 22 3 43 0 68

    1.6 39 15 15 5 74

    Covariance Pres. Both Ratio Only Total(RB) Overall N/C

    1.1c 35 12 9 3 9 68

    Table 4: Observed strategies by item from the pilot study: Tasks highlighting singular aspects of proportional

    reasoning.

    21

  • For N students and J items, the Xi j are viewed as conditionally independent given i, j, yielding the

    likelihood,

    N

    i=1

    J

    j=1

    [

    P j(i)]Xi j [

    1 P j(i)]1Xi j

    Parameter estimates were obtained via a Markov Chain Monte Carlo (MCMC) method, incorporating

    a Metropolis-within-Gibbs sampling scheme. The j parameters were given Normal(0, 1) priors. The i

    parameters were given Normal(0, ) priors, with estimated.

    To check goodness of fit for the underlying Rasch model, response curves for each item were fit as a

    function of . That is, for item j, we plotted

    P j() =e j

    1 + e j

    for a range of values. Non-parametric curve estimates were achieved by calculating the average response

    iN Xi j#{i N}

    for a suitable interval N, and plotting this point at the average i value for the bin N.

    Outfit statistics were also calculated for each item to test goodness of fit. From Johnson, Cohen and

    Junker (1999), the outfit statistic for item j calculated from observed values x for N students is

    T (x|, ) =N

    i=1

    (xi j Ei j)2

    NWi j

    where Ei j is the expected value of Xi j given i and j, and Wi j is the variance of Xi j given i and j. Posterior

    predictive p-values of the statistic can be estimated using the MCMC output. Specifically, let x1...xM be

    data simulated from the model with corresponding and parameters from steps 1...M of the chain. The

    predictive p-value can be approximated as:

    p j #{s : T j(x|s, s) < T j(x

    s |s, s); s = 1...M}

    M

    If the value of p j is small (< 0.05, for example) for a particular item j, then there may be reason for concern

    about the models fit to that particular item.

    The developmental stages were approximated by grouping students according to their estimated i val-

    ues. Based on the literature (from Baxter and Junker, November 2001), the majority of middle school

    students should be in Stage IV, with smaller proportions of students in Stages III and V, and essentially no

    22

  • students in Stages I or II. To approximate these, students with i values in the lowest 20% were classified

    as Stage III, students in the middle 60% were classified as Stage IV, and students in the highest 20% were

    classified as Stage V.

    By monotonicity in the response functions in equation ( 2), we know that as a students stage increases,

    that student has a higher propensity of picking valid strategies over invalid strategies. However, the Rasch

    model does not detail what kind of valid strategies are chosen, or what kind of invalid strategies are chosen.

    To explore this, we conditioned on valid vs. invalid strategy choice and studied changes in sophistication

    for the sub-types of valid strategies and the sub-types of invalid strategies as approximated stage increases.

    In order to get a measure of variability, conditional percentages were computed for approximated stages

    based on ranked i values at each step of the Markov Chain. This method re-assigned strategy choices to the

    three developmental stages in groups by student. The standard errors take into account variablity between

    students but cannot account for the variability of answers from a single student.

    3.2 Results

    For this analysis, Markov Chains of 20,000 iterations were run, using candidate distributions tuned to yield

    acceptance of between 50% and 70% of the . and . components at each step. The first 1000 iterations

    from each chain were discarded for burn-in. As a diagnostic for assessing convergence, Gewekes (1992)

    statistic comparing means of the beginning and end of the chain showed no serious divergences for the

    parameters. The parameters were slow-mixing, and Gewekes diagnostic showed signs of non-stationarity

    for all but three of these parameters. However, quantile-quantile plots (first half of remaining observations

    vs. second half of remaining observations) for the problematic parameters did not indicate any serious

    deviations, and density plots were generally symmetric and unimodal. Estimates of i for each student and

    j for each item were obtained using posterior means.

    Outfit predictive p-values for the 30 items in the pilot study are shown in Table 5. With the exception of

    perhaps items 2.15 and 1.6, the outfit statistics do not indicate a lack of fit. Fitted response curves for these

    two items are shown in Figure 11. The intervals N were chosen to incorporate 15 observations to each bin,

    although the bin corresponding to the highest values may contain fewer than 15 observations. While item

    2.15 has a smooth plotted response curve, item 1.6 seems to have an anomolous dip in the plotted response

    curve.

    Figure 12 shows the ranking of students according to estimated i for valid strategy choice, while Table

    6 shows the ranking of items according to the estimated j difficulty parameters.

    23

  • Outfit Outfit

    Type Item P-value Type Item P-value

    Missing Value 1.2 0.8135 2.5 0.1549

    2.4 0.4675 2.11a 0.4939

    1.4 0.8282 Equivalence 1.12 0.4275

    2.2a 0.8744 1.13 0.2319

    2.2b 0.7880 1.14 0.3087

    2.9 0.4866 2.14 0.8493

    2.8 0.6524 2.15 0.0741

    1.3 0.8311 Invariance 1.1ab 0.8872

    1.8 0.7647 2.1ab 0.6506

    2.12 0.6904 1.11b 0.3323

    Similarity 1.7 0.5976 2.6 0.6081

    2.7a 0.5964 2.10 0.7538

    2.11b 0.6791 Relative/Absolute 1.5 0.4733

    1.9 0.7173 1.6 0.0523

    Comparison 2.3 0.2214 Covariance 1.1c 0.3784

    Table 5: Approximate posterior predictive p-values for outfit statistics calculated from the binary Rasch

    Model.

    24

  • Theta (%iles ~= 0.26 )

    Bet

    a 22

    = 0

    .633

    2

    3 2 1 0 1 2 3

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    X

    X

    X

    X

    Question 2.15

    Theta (%iles ~= 0.2 )

    Bet

    a 29

    =

    0.02

    16

    3 2 1 0 1 2 3

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    X X

    X

    X

    X

    Question 1.6

    Figure 11: Observed and fitted response curves for Valid Strategy Choice, items 2.15 and 1.6.

    25

  • 0 20 40 60 80 100 120

    2

    1

    01

    23

    The

    ta

    Students

    Stage III20%

    Stage IV60%

    Stage V20%

    Figure 12: Ranked posterior means i for 125 students. Students are grouped to approximate the three

    Developmental Stages from the Cognitive Model.

    26

  • Type Item Type Item

    -2.60 Missing Value 2.2a 0.01 Covariance 1.1c

    -2.53 Missing Value 1.2 0.08 Invariance 1.11b

    -2.39 Missing Value 1.3 0.17 Missing Value 2.2b

    -1.97 Missing Value 2.4 0.49 Comparison 2.11a

    -1.82 Missing Value 1.4 0.63 Equivalence 2.15

    -1.52 Missing Value 2.8 0.67 Equivalence 1.12

    -1.38 Missing Value 1.8 0.72 Missing Value 2.12

    -0.61 Comparison 2.3 0.81 Comparison 2.5

    -0.50 Missing Value 2.9 0.86 Relative/Absolute 1.5

    -0.46 Invariance 2.10 1.16 Similarity 1.9

    -0.45 Similarity 2.11b 1.19 Equivalence 2.14

    -0.33 Equivalence 1.13 1.66 Similarity (II) 1.7

    -0.32 Invariance 2.6 1.71 Similarity (II) 2.7a

    -0.07 Equivalence 1.14 2.07 Invariance (II) 2.1ab

    -0.02 Relative/Absolute 1.6 2.47 Invariance (II) 1.1ab

    Table 6: Items ranked from easiest to hardest via difficulty parameter. Similarity (II) indicates a figure

    scaling task. Invariance (II) indicates a shading task.

    27

  • The most obvious pattern in estimated difficulty parameters is the cluster of Missing Value tasks with

    low estimates. All but three of the Missing Value tasks fall in the first quartile of ranked parameters.

    The cluster of low-ranked missing value tasks are very similar in structure. All require the student to scale

    up a ratio, and all have at least one integer multiple relationship between compared quantities. Item 2.9

    has a slightly higher difficulty parameter than the cluster mentioned above. Though item 2.9 has an integer

    multiple relationship, the relationship also happens to be a square (7 to 49), and elicits a higher proportion

    of squaring strategies than other items. Item 2.2b, ranked second in difficulty for Missing Value tasks, is the

    only missing value task on either test form that asks the students to take a given ratio and scale the numbers

    down to obtain a smaller result. Item 2.12 was the most difficult Missing Value task. The ratio relationships

    in this item contained common divisors but no integer multiples, and the numbers were very close (9/15 to

    12/20). This item elicited many incorrect implementation and incorrect addition strategies.

    The most difficult items are categorized as Similarity and Invariance tasks. A (II) beside the Similarity

    task type indicates that the task asks students to scale up a figure. This task is specifically mentioned in

    the cognitive model as difficult for all but stage V students. Indeed, the figure scaling tasks do appear to

    be among the most difficult for students. A (II) beside the Invariance task type indicates that the Invariance

    task specifically asks students to describe varying and invariant characteristics of sets of shaded figures.

    These shading tasks were the most difficult on the exams; students in general had a hard time discerning the

    constant ratio of shaded to unshaded regions in each set of figures. These items elicited a high proportion of

    surface strategies.

    Table 7 shows the breakdown of valid vs. invalid strategies chosen by students in each approximated

    developmental stage. Figures 13 through 15 show bar graphs of the breakdown of percentages for valid

    strategies, with a (*) indicating percentage estimates with standard errors above 0.03.

    In general, the patterns of response do indicate that Stage V students choose different kinds of valid

    strategies than Stage III students. Specifically, students who have higher overall proficiency at choosing valid

    strategies tend to choose valid multiplicative strategies more so than students with lower overall proficiency

    at choosing valid strategies. The valid strategies chosen by Stage III students are more often build-up strate-

    gies. Figures 13-(i) and 13-(ii) indicate that for Missing Value and Similarity tasks cross-multiplication

    was a strategy chosen by high-ranked students, whereas Figure 14-(i) shows that for Comparison tasks

    cross-multiplication was much less frequently chosen, and it was chosen by lower-ranked students. This

    pattern agrees with the expected observed performance outlined in the qualitative cognitive model, as well

    as highlighting the procedural nature of cross-multiplication.

    28

  • Task Type Stage III Stage IV Stage V

    Missing Value Valid 59 292 117

    Invalid 59 81 5

    % Valid 50% 78% 96%

    Similarity Valid 1 43 29

    Invalid 49 107 21

    % Valid 2% 29% 58%

    Comparison Valid 3 53 20

    Invalid 24 58 13

    %Valid 11% 48% 61%

    Equivalence Valid 15 70 50

    Invalid 51 118 14

    %Valid 23% 37% 78%

    Invariance Valid 4 51 33

    Invalid 46 110 18

    % Valid 8% 32% 65%

    Relative/Absolute Valid 8 32 21

    Invalid 23 50 8

    %Valid 26% 39% 72%

    Covariance Valid 2 22 11

    Invalid 14 16 3

    % Valid 13% 58% 79%

    Table 7: Valid vs. Invalid responses

    29

  • St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Multiplicative XMult. Additive

    (i) Missing Value Valid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Multiplicative XMult. Additive

    (ii) Similarity Valid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    (iii) Missing Value Invalid Strategies

    Inc. Imp. Misconcep. M. Mult. N/C

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Inc. Imp. Misconcep. M. Mult. N/C

    (iv) Similarity Invalid Strategies

    Figure 13: Valid vs. Invalid Strategies: Missing Value and Similarity Tasks

    30

  • Turning to invalid strategies, students in lower stages have a higher percentage of choosing non-classifiable

    strategies. The cognitive model also mentions that students frequently employ incorrect addition strategies

    on scaling Similarity tasks. This is reflected in the large percentage of Misconception strategies employed

    by students from all three stages (Figures 13-(iii) and 13-(iv)). Consistent with the difficulty parameters and

    the cognitive model, students in Stage III found scaling tasks exceedingly difficult, employing the largest

    percentages of non-classifiable and meaningless multiplicative strategies.

    Also among invalid strategies, the cognitive model predicts that students in lower stages are more likely

    to respond with a qualitative strategy on Comparison tasks (Figure 14-(iii)). This is reflected in the per-

    centages. Given that an invalid strategy was chosen, for students in Stage III it is most likely that the invalid

    strategy is qualitative. Finally, while all of the strategies listed are invalid, as developmental stage increases

    students are more likely to choose an invalid strategy that is at least closer to valid (Incorrect implementation,

    Misconception) than a non-recognizeable strategy.

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Multiplicative XMult. Additive

    (i) Comparison Valid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Multiplicative Additive

    (ii) Equivalence Valid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Inc. Imp. Qualitative N/C

    (iii) Comparison Invalid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Misconcep. Noncomp Elim/Guess N/C

    (iv) Equivalence Invalid Strategies

    Figure 14: Valid vs. Invalid Strategies: Comparison and Equivalence Tasks

    Invariance, Relative/Absolute and Covariance tasks (Figure 15) had only one valid strategy to employ,

    thus the only conditioning that goes beyond the Rasch model is for invalid strategies. These invalid strategies

    31

  • are all shown in Figure 15. For Invariance tasks, both the non-ratio and non-classifiable patterns behave

    as is expected, with students in higher stages choosing the more sophisticated invalid strategies more of-

    ten, given that an invalid strategy has been chosen. However, the Surface strategy, notable as a strategy

    chosen by lower-stage students, is also prominent in Stage V students as well. The shading tasks in partic-

    ular elicit a large amount of surface strategies. The same pattern is notable for the Relative strategy in

    Relative/Absolute tasks. With these tasks, even higher level students sometimes resort to less sophisticated

    strategies.

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    NonRatio Surface N/C

    (i) Invariance Invalid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Err. Absolute Relative N/C

    (ii) Relative/Absolute Invalid Strategies

    St.III

    St.IV

    St.V

    0.0 0.2 0.4 0.6 0.8 1.0

    Ratio Only Total(RB) Overall N/C

    (iii) Covariance Invalid Strategies

    Figure 15: Invalid Strategies: Invariance, Relative/Absolute and Covariance Tasks

    For the Covariance task the standard errors were quite high. There was only one Covariance task asked

    and hence few responses. The most notable feature of the breakdown in Figure 15 is the large percentage

    of Stage III students who maintained the ratio relationship. However, ignoring the total numbers of chips

    appears to make it easier to maintain a ratio relationship. Accounting for the totals made it more difficult to

    maintain the ratio relationship.

    32

  • 3.3 Discussion

    The Rasch model employs a very simple binary coding of the data and models the latent variable as a single

    continuous trait. Interpreting the latent scale as a simple ability parameter does not capture the fact that

    students with low proficiency choose different kinds of valid strategies than students with high proficiency.

    Whether in a polytomous data coding or specific item design, it is critical to incorporate this information

    into the assessment.

    Some patterns in strategy choice conditional on valid vs. invalid appear consistent with the cognitive

    model. When employing valid strategies, students with low proficiency tend to choose additive strategies

    more often than students with higher proficiency. When choosing invalid strategies, students with low profi-

    ciency tend to choose non-classifiable strategies more often than students with higher proficiency. Difficulty

    parameters also reflect descriptions in the cognitive model; specifically, scaling tasks, which are noted as

    difficult for all but stage V students, were among the most difficult for students in the pilot study as well.

    On the other hand, some observed patterns are not consistent with the cognitive model. Whether these

    inconsistencies are due to a mis-specification of the cognitive model or to a flawed task design is unclear.

    For example, although the cognitive model specifies surface features as typical of Stage II and Stage III

    students, surface strategies were elicited from Stage IV and V students on shading tasks. It could be that

    the specific format of shading elicits surface strategies from more sophisticated students (indicating that the

    cognitive model need to be refined). However, these items are also the first asked on the exams, and may

    simply be worded too broadly for students to follow. On Relative/Absolute tasks, the invalid Relative

    strategy is prominent in higher ranked students. This could be because higher ranked students have a natural

    tendency to view situations as relative, or it could be a priming effect: the Relative/Absolute tasks were

    asked after a string of Missing Value tasks, each of which was solved via a ratio.

    Based on simple milestones from the cognitive model, such as the reliance on Build-Up vs. Multiplica-

    tive strategies and the ability to solve scaling tasks, it may be possible to design an exam for which valid

    vs. invalid strategy choice stratifies students quite well into the three developmental stages. Evidence for

    the validity of the cognitive model can be seen by a natural unforced set of patterns that also follow what is

    outlined by experts. But in using this kind of Rasch analysis, any connection to the underlying skill patterns

    that govern strategy choice must be known beforehand and incorporated into the existing cognitive model.

    The latent scale does not provide enough information about underlying skill patterns. For instance, despite

    the fact that missing value and Comparison tasks are described as comprehensive, incorporating all aspects

    of proportional reasoning, these tasks seemed to be easier on the valid/invalid scale than those tasks designed

    33

  • to highlight only one specific aspect of proportional reasoning, such as the Invariance and relative/absoulte

    tasks. Does this indicate that when solving Missing Value tasks, students rely on concepts and skills other

    than those defined or measured by the single-aspect tasks? Do Missing Value, Similarity and Comparison

    tasks rely compensatorily rather than conjunctively on the single facets of proportional reasoning? The

    Rasch analysis sheds no light on the question. In order to obtain specific information on patterns of skills

    and their link to patterns of strategies, polytomous coding needs to be addressed in the model, and the latent

    structure needs to be expanded to include sets of discrete skills.

    4 Polytomous Coding with Many Latent Skills: Bayes Net

    4.1 Method

    The Rasch analysis is based on a unidimensional latent variable which can be seen as a single overall

    propensity to select valid strategies. The patterns of strategies based on this latent scale seem to follow the

    cognitive model, but restructuring the latent space as a set of discrete skills can help pinpoint which aspects

    of proportional reasoning a student has mastered. The underlying skill patterns, coupled with specific item

    features, drive the students strategy choice.

    The qualitative cognitive model for proportional reasoning (Figure 2) does not address the underlying

    skills in each of the developmental stages. But the aspects of proportional reasoning suggested by Baxter

    and Witkowski (2002) can be adapted to define a working set of skills:

    Covariance: Student understands that the change in one quantity depends on the change in another

    Relative/Absolute: Student recognizes the fact that the change in quantities is a relative, multiplicative

    change as opposed to an absolute change

    Invariance: Student recognizes the presence of a constant ratio relationship

    Multiplicative Modeling: Student can implement a valid multiplicative strategy (as opposed to a valid

    additive strategy such as Build-Up)

    Adaptability: Strategy choice is generalizable and efficient for the given numbers in a problem

    These skills are related more to a psychological conceptualization than they are to mathematical imple-

    mentation, with the possible exception of Invariance and Multiplicative Modeling, and it is not immediately

    clear how to connect strategies with each of these skills. To study the relationships between skills and

    34

  • strategies, it is necessary to use a polytomous strategy coding as opposed to a simple binary coding of

    valid/invalid. For task type j with strategies k = 0...k j and r = 1...r j repetitions, we define

    Xi jr = k : Student i chooses strategy k on the rth task of type j,

    A model based on four skills Covariance, Invariance, Relative/Absolute and Multiplicative Modeling

    is displayed schematically in Figure 16. Although the adaptability skill defined above certainly plays

    a role in strategy choice for tasks measuring general ability in proportional reasoning, our data is not coded

    to distinguish an adaptable strategy from general multiplicative strategies. Measuring adaptability requires

    that some kind of optimal multiplicative strategy can be identified for each item.

    i1

    i2

    i3

    i4

    TASKS

    Invariance

    Relative/Absolute

    Covariance

    STRATEGIES

    Missing Value:

    Similarity:

    Comparison:

    Equivalence:

    Invariance:

    Relative/Absolute:

    Covariance:

    SKILLS (binary indicators)

    additivelogisticmodels

    0: Mult.1: XMult.2: Additive

    4: Misconcep.

    6: N/C

    0: Mult.1: XMult.2: Additive

    4: Misconcep.

    6: N/C

    0: Mult.1: XMult2: Additive

    4: Qualitative5: N/C

    0: Mult.1: Additive 3: NonComp.

    4: Elim/Guess5: N/C

    0: Ratio 1: NonRatio2: Surface3: N/C

    0: Absolute

    3: N/C

    0: Pres. Both 1: Ratio Only

    3: Overall 4: N/C

    10 replications

    4 replications

    3 replications

    5 replications

    5 replications

    2 replications

    1 replication

    3: Inc. Imp.

    3: Inc. Imp

    3: Inc. Imp.

    2: Misconcep.

    1: Err. Absolute2: Relative

    2: Total (RB)

    5: M. Mult.

    5: M. Mult.

    Multiplicative Modeling

    Valid Invalid

    Figure 16: A model linking skills, task types and strategies for the Pilot Data.

    To describe the model mathematically we define latent skill indicators,

    35

  • is =

    1 : Student i possesses skill s

    0 : else,

    and a design matrix relating skills to tasks,

    Q js =

    1 : Task type j highlights skill s

    0 : else.

    The model we choose to fit is a kind of Bayes Net (Mislevy, 1995), where possession of a certain skill

    relevant to a particular strategy increases the probability of choosing that strategy over other strategies. The

    skills act compensatorily; not all skills need be present for the implementation of a certain strategy. Observed

    strategy choice is linked in our model to unobserved skills via an additive logistic regression model:

    log

    P(Xi jr = k | i, j.)

    P(Xi jr = 0 | i, j.)

    = jk +

    S

    s=1

    [

    Q jssjkis]

    with parameters:

    jk : Log odds of selecting strategy k for task type j.

    sjk : Change in log odds of choosing strategy k when is goes from 0 to 1.

    In this model, the indicators Q js are fixed in advance, based on predictions from the cognitive model

    about which skills are relavent to strategies for each task type. The parameters jk and sjk , and the latent

    skill indicators is are estimated in fitting the model. Also, the model assumes a form of local indepen-

    dence. Given a students position (i1, ..., i4) in the latent skill space, that students replies are viewed as

    independent.

    For identifiability, j0 and sj0 are set to 0 for all task types j and skills s. For the pilot study, the baseline

    0 category represents a multiplicative strategy for Missing Value, Similarity, Comparison, and Equivalence

    tasks, and it represents the sole valid strategy for the remaining three task types.

    Though the skills are given names and descriptions adapted from Baxter and Witkowski (2002), the defi-

    nition of what the skill measures within the model is determined by the links set in the Q-matrix, the structure

    of the task types especially those task types which highlight single skills and inequality constraints set

    on the model.

    The matrix Q indicates which skills are highlighted in the structure and strategy coding of each task

    type. The matrix represents a compensatory relationship between latent class and observed response, as

    36

  • with the linear logistic test model (Fischer, 1973), as opposed to a conjunctive relationship, as with certain

    other models (Junker and Sijtsma, 2001; DiBello, Stout and Roussos, 1995). Since different strategies rely

    on diffent sets of skills, an entry of 1 in Q js indicates only that the skill s plays a role in determining the

    probability structure of the strategies in task type j. The model is essentially a polytomous version of Maris

    (1995) compensatory multiple classification latent class model.

    Missing Value

    Similarity

    Comparison

    Equivalence

    Invariance

    Rel/Abs

    Covariance

    Question Type

    Q

    Covariance Relative/Absolute Invariance Mult. Model

    1 1 1 1

    11

    0 0

    0 0 1 0

    0 1 0 0

    1 0 0 0

    1 1

    1 1

    11 1 1

    Skill

    Figure 17: The (compensatory) Q-matrix.

    The Q-matrix for the pilot data is shown in Figure 17. For tasks designed to measure overall ability

    in proportional reasoning, all skills are highlighted. For those task types designed to pinpoint specific

    areas of knowledge, Q is much more sparse. The Q-matrix highlights two skills for Equivalence tasks:

    Invariance and Multiplicative Modeling. These skills were chosen because the Invariance skill stresses

    the recognition of a ratio relationship and the Multiplicative Modeling skill stresses the employment of a

    multiplicative strategy. Covariance and Relative/Absolute skills were seen as more psychological traits than

    mathematical, needed for translating a real-world situation into mathematical terms. As the Equivalence

    tasks were context-free math tasks with equations already set up, it was theorized that these tasks would

    highlight the more mathematical skills.

    Further constraints must be made to induce identifiability in the set of latent skills. This is done with

    37

  • inequality or monotonicity constraints, which are generally weaker than imposing linear constraints among

    the parameters. Inequality constraints also help with the interpretation of skills by imposing prior relation-

    ships between skills and strategies, highlighting which strategies are more likely given certain sets of skills.

    For each task type j, an inequality constraint can take the form:

    P(Xi jr = k1 | is = l1) < P(Xi jr = k2 | is = l2).

    This allows us to encode conditions such as:

    Strategy k1 is less likely than strategy k2 when skill s is low.

    Strategy k is less likely when skill s is low than when skill s is high.

    The constraints for the pilot data, shown in Table 8, implement only the first kind of conditions, given a

    low skill level. The inequalities are designed to loosely constrain the model for identifiability but to provide

    little information on specific skills definitions. As such, the interpretation of skills in this model will be

    highly dependent on the Q-matrix entries for the single-skill task types.

    4.2 Results

    Parameter estimation for the pilot data was obtained via MCMC. A set of four skills yields 24 latent classes.

    Since the cognitive model is not specific about the development or likelihood of skill patterns, a uniform

    prior was set on the probability of each of the 24 latent classes. This translates to a uniform prior for each

    parameter is. Normal(0, 2 = 3) priors were set for each of the unconstrained jk and sjk parameters,

    subject to the inequality constraints in Table 8. Candidate distributions were tuned to accept approximately

    50% to 70% of proposals during each iteration of the Markov Chain.

    Chains of 10,000 iterations were run for parameter estimation, with the first 1000 iterations discarded

    for burn-in. The jk and sjk parameters were estimated using posterior means. Student skill patterns were

    estimated by examining the posterior distribution of the set of 16 different possible skill patterns for each

    student.

    Due to the large number of item parameters and relatively small sample size, obtaining convergent

    chains was problematic. Chains for parameters associated with a larger number of responses (for example,

    Multiplicative strategies on Missing Value tasks) were more stable than those associated with only a few re-

    sponses (for example, Erroneous Absolute strategies for Relative/Absolute tasks). Posteriors were generally

    unimodal, with a few incidences of possible label switching for parameters associated with fewer observa-

    tions. The following analyses are subject to large standard errors, and we have limited discussion to the

    38

  • Task Type Skill Level Constraint

    Missing Value Covariance Low P(Misconception) < P(Meaningless Multiplicative)

    Missing Value Relative/Absolute Low P(Incorrect Implementation) < P(Misconception)

    Missing Value Invariance Low P(Multiplicative) < P(Additive)

    Missing Value Mult. Modeling Low P(Multiplicative) < P(Additive)

    Similarity Covariance Low P(Misconception) < P(Meaningless Multiplicative)

    Similarity Relative/Absolute Low P(Incorrect Implementation) < P(Misconception)

    Similarity Invariance Low P(Multiplicative) < P(Incorrect Implementation)

    Similarity Mult. Modeling Low P(Multiplicative) < P(Incorrect Implementation)

    Comparison Covariance Low P(Incorrect Implementation) < P(Qualitative)

    Comparison Relative/Absolute Low P(Additive) < P(Incorrect Implementation)

    Comparison Invariance Low P(Multiplicative) < P(Additive)

    Comparison Mult. Modeling Low P(Multiplicative < P(Additive)

    Equivalence Invariance Low P(Misconception) < P(Non-Computational)

    Equivalence Mult. Modeling Low P(Multiplicative) < P(Additive)

    Invariance Invariance Low P(Ratio) < P(Non-Ratio)

    Relative/Absolute Relative/Absolute Low P(Absolute) < P(Relative)

    Covariance Covariance Low P(Ratio and Totals) < P(Ratio Only)

    Table 8: Constraints set on the Discrete Bayes Net model.

    39

  • strategies that had larger numbers of observations in the data set. The methodology, however, is applicable

    in any setting.

    4.2.1 Classifying Students

    Covariance Relative/Absolute Invariance Multiplicative Modal Probabilistic

    Modeling Count Count

    0 0 0 0 3 4.43

    0 0 0 1 5 5.69

    0 0 1 0 9 8.31

    0 0 1 1 10 7.56

    0 1 0 0 5 7.12

    0 1 0 1 9 9.59

    0 1 1 0 13 10.17

    0 1 1 1 8 10.38

    1 0 0 0 3 5.50

    1 0 0 1 6 5.61

    1 0 1 0 8 7.14

    1 0 1 1 8 6.82

    1 1 0 0 6 6.19

    1 1 0 1 5 9.77

    1 1 1 0 10 8.60

    1 1 1 1 17 12.12

    Table 9: Number of students (out of 125) classified into each group of skill proficiencies. The modal count

    is obtained by assigning each student to the most likely posterior skill pattern, while the probabilistic count

    is obtained by summing over the posterior probability distribution for each student.

    Four latent skills yield sixteen different skill patterns. These patterns, as well as the number of students

    classified into each, are shown in Table 9. While the modal category can be used for explicit categorization

    to a set of skill patterns, one can also characterise uncertainty in the classification by looking at the posterior

    distribution of skill patterns for each student. To obtain a compact visual representation of this posterior

    40

  • distribution, we modify a graphical display from computer engineering called a Karnaugh map or k-map

    (cf. Wakerly, 2000).

    The structure of a k-map is illustrated in Figure 18. For a set of four skills {A, B,C,D}, the k-map dis-

    plays the skill space as a set of 16 cells in a grid. Adjacent non-diagonal cells differ from each other by only

    one digit, with top and bottom cells logically adjacent, and left edge and right edge cells logically adjacent.

    The topology is similar to an unrolled toroid, where rectangular regions of adjoining cells correspond to

    different marginal logical statements about the skills.

    00 01 11 10

    11

    10

    01

    00 0000

    0100

    1100 1111

    0111

    0011

    0110

    0010

    1000 1001

    1101

    0101

    0001

    1011 1010

    1110

    B = 1

    B =1, C = 1

    C = 1

    B = 0, C = 0, D = 0

    D = 1

    A = 1

    A,B

    C,D

    Figure 18: The structure of a k-map for four skills {A,B,C,D}.

    K-maps can be drawn for larger sets of skills, but as the dimension increases, the size and complexity

    of the k-map increases. A five-skill k-map consists of two four-skill k-maps embedded in a one-skill k-map

    (see Figure 19). A six-skill k-map consists of four four-skill k-maps embedded in a two-skill k-map. K-

    map representations of more than six skills are rarely used in computer engineering, as the complexity of

    41

  • the 1-digit differences in high-dimensional space makes visualization difficult. However, when dealing with

    probability distributions centered on a single cell, higher dimensional k-maps may still be useful. If it is

    known beforehand that certain skills are easily discriminable, the k-maps can be set up so that comparison

    across graphs corresponds to 1-digit differences in those skills. In that case it may still be possible to

    concentrate the bulk of the distribution in one region of the k-map.

    C = 1

    D = 1

    A = 1

    A,B

    E = 0

    11

    10

    01

    00

    000 010 110 100

    10000 10010 10110 10100

    11000 11010 11110 11100

    01000 01010 01110 01100

    00000 00010 00110 00100

    C,D,E

    C = 1

    D = 1

    B = 1

    11

    10

    01

    00

    E = 1

    10001 10011 10111 10101

    11001 11011 11111 11101

    01001 01011 01111 01101

    00001 00011 00111 00101

    A=0,B=0,C=0,D=0C,D,E

    001 011 111 101

    Figure 19: A k-map for five skills {A,B,C,D,E}.

    Four probabilistic k-maps are illustrated in Figure 20. In the true state space each students place is

    represented by a single 1 in one of the 16 grid spaces. The K-maps presented here are probabilistic

    because the sum over the grid is equal to 1, and the map represents an estimated posterior distribution.

    Each region is shaded relative to the probability of the skill pattern, with darker shades indicating higher

    probability. To make patterns more visible, the map is normalized relative to the mode of the distribution,

    which is listed at the top of the graph. Thus, shading across students is not comparable.

    In Figure 20, student 238 has a very strong peak (probability 0.57) at the k-map cell corresponding to

    possession of all four skills, with the density falling away quickly at adjoining squares. Student 245, how-

    ever, has the bulk of the distribution spread across two adjoining squares. The 1-digit difference between

    these squares corresponds to the Relative/Absolute skill. This indicates that while there is strong certainty

    that the student possesses Covariance, Invariance, and Multiplicative Modeling skills, there is still uncer-

    tainty regarding the students possession of relative vs. absolute reasoning. For student 314, the distribution

    42

  • Invariance,Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    Student 238 : mode = 0.57

    Invariance,Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    Student 245 : mode = 0.41

    Invariance,Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    Student 314 : mode = 0.26

    Invariance,Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    Student 220 : mode = 0.18

    Figure 20: The distribution of skill patterns for four different students using a probabilistic k-map.

    43

  • indicates high for Invariance and low for Multiplicative Modeling, but uncertainty in both the Covariance

    and Relative/Absolute skill. Student 220 has the most diffuse distribution of the four students shown, with

    the mass centered on 0 for Invariance but with uncertainty in the remaining three skills.

    4.2.2 Interpreting Skills through Examination of Logistic Regression Coefficients

    In comparison to the Rasch model, the Bayes net provides flexibility for inference at the student level.

    Looking at posterior distributions provides information on specific skill patterns, and diffuse patterns on

    one skill axis can also indicate where the model may fail for particular students, or which skills may be

    poorly elicited from the test. But while the Bayes net model highlights student-specific inference on skills,

    interpretation of the skills is more challenging, because the meanings of skills are derived from constraints

    and the compensatory link between skills and strategy choice outlined by the Q-matrix.

    One way of assessing the links between skills and strategies is to examine task-specific probabilities of

    skill possession given strategy response. Define a matrix with C = 24 rows and S = 4 columns, as defined

    by the left hand side of Table 9. The cs entry represents the status of posession of skill s in the c-th latent

    class. For task type j, response k, and a latent class c, we compute

    Pc( j, k) =exp(S

    s=1 Q jscssjk)

    Cl=1 exp(

    Ss=1 Q jsls

    sjk)

    (3)

    If we assume a uniform prior on the C latent classes, then

    Pc( j, k) = P(latent class c | response k to task type j).

    When the prior on latent classes is not uniform, Pc( j, k) does not have a simple Bayesian interpretation.

    However it can still be used to assess the information about skills that is provided by strategies for each

    task type. Estimates of Pc( j, k) can be obtained by substituting the posterior mean estimates for the sjk

    parameters into equation 3. Across all skill patterns c, Pc( j, k) is a probability distribution. This distribution

    is considered task-specific because it uses only information from the computed sjk parameters, and does not

    take into account any information on the marginal likelihood of the skill patterns (in the form of the latent

    classes). The variability of the estimate can be obtained through direct approximation using the MCMC

    output, or through the -method.

    The probability distributions in equation 3 also have a useful graphical interpretation using k-maps (See

    Figure 21). On the logit scale, the sjk term is associated with the two horizontal or vertical lines in the

    k-map corresponding to the 1-digit differences for skill s. A k-map based on the probabilities in equation

    44

  • 3 is essentially a way of visualizing in one graph all of the jk parameters for task type j, response k. We

    use these k-maps to interpret skills in terms of tasks and strategies by examining which tasks and strategies

    particular skills load on, similar to interpreting the matrix of factor loadings in a factor analysis.

    00 01 11 10

    11

    10

    01

    00 0000

    0100

    1100 1111

    0111

    0011

    0110

    0010

    1000 1001

    1101

    0101

    0001

    1011 1010

    1110

    +jkA

    +jkB

    +jkC

    = 1

    = 1

    C = 1

    D = 1

    A,B

    C,D

    jkA

    jkC

    jkB

    Djk

    +Djk

    Baseline

    Figure 21: A k-map representation of the conditional probability of skill patterns given task type j, response

    k (logit scale) for four skills {A, B, C, D}. Starting from the baseline category, each horizontal and vertical

    line on the k-map represents adding or subtracting a jk term in the exponent of the numerator.

    Figures 22 through 28 show the estimated marginal probabilities of response for each category, as well

    as k-map representations of the estimated task-specific probability of skill patterns given each response (

    Pc( j, k), c = 1, ...16), for the seven task types. The marginal probabilities give an idea of the frequency of

    the strategy across all students, while the k-map shows the kinds of skills which are present in students who

    chose those strategies.

    Constraining sjk = 0 with the Q-matrix is equivalent to looking at a coarsening of the grid in the k-map;

    45

  • 0 1 2 3 4 5 6

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    Marginal Probability

    Missing Value

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    0. Multiplicative

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    1. CrossMultiplication

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    2. Additive

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    3. Incorrect Implementation

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    4. Misconceptions

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    5. Meaningless Multiplicative

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    6. NonClassifiable

    Figure 22: Marginal and conditional probabilities: Missing Value

    46

  • 0 1 2 3 4 5 6

    0.0

    0.1

    0.2

    0.3

    Marginal Probability

    Similarity

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    0. Multiplicative

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    1. CrossMultiplication

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    2. Additive

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    3. Incorrect Implementation

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    4. Misconceptions

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    5. Meaningless Multiplicative

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    6. NonClassifiable

    Figure 23: Marginal and conditional probabilities: Similarity

    47

  • certain one-digit differences will yield no change in the probability. But unconstrained skills which do not

    impact a response pattern will also coarsen the k-map in the same way. For example, the Q-matrix for

    Missing Value, Similarity and Comparison tasks highlights all skills and does not fix any sjk parameters at

    0. The multiplicative response for Missing Value tasks (Figure 22) appears to highlight specific inference

    in all four skill areas; the darkest cell and its four darkest neighbors correspond to at least three of four

    skills. However the multiplicative response for Similarity tasks (Figure 23) highlights a coarser region

    corresponding to possesion of Invariance and Relative/Absolute skills, but yielding more uncertainty about

    Covariance and Multiplicative Modeling skills.

    0 1 2 3 4 5

    0.0

    0.1

    0.2

    0.3

    0.4

    Marginal Probability

    Comparison

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    0. Multiplicative

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    1. CrossMultiplication

    Invariance, Multiplicative ModelingC

    ova

    ria

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    2. Additive

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    3. Incorrect Implementation

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    4. Qualitative

    Invariance, Multiplicative Modeling

    Co

    varia

    nce

    , R

    ela

    tive

    /Ab

    solu

    te

    00

    01

    11

    10

    00 01 11 10

    5. NonClassifiable

    Figure 24: Marginal and conditional probabilities: Comparison

    For both Missing Value (Figure 22) and Comparison (