reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing...

31
Reconnecting interpretation to reasoning through individual differences Keith Stenning and Richard Cox Edinburgh University, Edinburgh, Scotland, Q2 UK Computational theories of mind assume that participants interpret information and then reason from those interpretations. Research on interpretation in deductive reasoning has claimed to show that subjects’ interpretation of single syllogistic premises in an “immediate inference” task is radically different from their interpretation of pairs of the same premises in syllogistic reasoning tasks (Newstead 1989, 1995; Roberts, Newstead, & Griggs, 2001). Narrow appeal to particular Gricean implicatures in this work fails to bridge the gap. Grice’s theory taken as a broad framework for cred- ulous discourse processing in which participants construct speakers’ “intended models” of discourses can reconcile these results, purchasing continuity of interpretation through variety of logical treat- ments. We present exploratory experimental data on immediate inference and subsequent syllogistic reasoning. Systematic patterns of interpretation driven by two factors (whether the subject’s model of the discourse is credulous, and their degree of reliance on information packaging) are shown to trans- cend particular quantifier inferences and to drive systematic differences in subjects’ subsequent syllogistic reasoning. We conclude that most participants do not understand deductive tasks as exper- imenters intend, and just as there is no single logical model of reasoning, so there is no reason to expect a single “fundamental human reasoning mechanism”. Computational theories of mind assume that subjects impose interpretations on information and then reason from those interpretations, poss- ibly going through cycles of interpretation and reasoning. When subjects in the “immediate infer- ence” task (Newstead, 1989) are presented with the information that All A are B and are asked about what follows, a substantial proportion of subjects respond that it follows that All B are A. Similarly, given Some A are B substantial numbers state that Some A are not B follows. And again, given All A are B substantial numbers state that Some A are B is false. Theories of discourse processing offer explanations about why these “implicatures” might be drawn (Grice, 1975). When the same subjects are introduced to syllogis- tic reasoning, they are presented with just one more premise from the same range of forms and are asked what now follows. One might expect subjects to adopt the same interpretations of the same English sentences in the second task as in the first, and one might expect them to apply the appropriate reasoning processes. Since the interpretations of the sentences would be the same, one might expect the appropriate reasoning processes to be the same. Yet several studies have now shown that on assumptions prevalent in the psychology of reasoning, it is very hard to reconcile Correspondence should be addressed to Keith Stenning, Human Communication Research Centre, Edinburgh University, 2, Buccleuch Place, Edinburgh EH8 9LW, Scotland, UK. Email: [email protected] # 200X The Experimental Psychology Society 1 http://www.tandf.co.uk/journals/pp/17470218.html DOI:10.1080/17470210500198759 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY 200X, XX (X), 1–30

Upload: others

Post on 24-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Reconnecting interpretation to reasoning throughindividual differences

Keith Stenning and Richard CoxEdinburgh University, Edinburgh, Scotland,Q2 UK

Computational theories of mind assume that participants interpret information and then reason fromthose interpretations. Research on interpretation in deductive reasoning has claimed to show thatsubjects’ interpretation of single syllogistic premises in an “immediate inference” task is radicallydifferent from their interpretation of pairs of the same premises in syllogistic reasoning tasks(Newstead 1989, 1995; Roberts, Newstead, & Griggs, 2001). Narrow appeal to particular Griceanimplicatures in this work fails to bridge the gap. Grice’s theory taken as a broad framework for cred-ulous discourse processing in which participants construct speakers’ “intended models” of discoursescan reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory experimental data on immediate inference and subsequent syllogisticreasoning. Systematic patterns of interpretation driven by two factors (whether the subject’s model ofthe discourse is credulous, and their degree of reliance on information packaging) are shown to trans-cend particular quantifier inferences and to drive systematic differences in subjects’ subsequentsyllogistic reasoning. We conclude that most participants do not understand deductive tasks as exper-imenters intend, and just as there is no single logical model of reasoning, so there is no reason to expecta single “fundamental human reasoning mechanism”.

Computational theories of mind assume thatsubjects impose interpretations on informationand then reason from those interpretations, poss-ibly going through cycles of interpretation andreasoning. When subjects in the “immediate infer-ence” task (Newstead, 1989) are presented withthe information that All A are B and are askedabout what follows, a substantial proportion ofsubjects respond that it follows that All B are A.Similarly, given Some A are B substantialnumbers state that Some A are not B follows. Andagain, given All A are B substantial numbers statethat Some A are B is false. Theories of discourseprocessing offer explanations about why these

“implicatures” might be drawn (Grice, 1975).When the same subjects are introduced to syllogis-tic reasoning, they are presented with just onemore premise from the same range of forms andare asked what now follows. One might expectsubjects to adopt the same interpretations of thesame English sentences in the second task as inthe first, and one might expect them to applythe appropriate reasoning processes. Since theinterpretations of the sentences would be thesame, one might expect the appropriate reasoningprocesses to be the same. Yet several studies havenow shown that on assumptions prevalent in thepsychology of reasoning, it is very hard to reconcile

Correspondence should be addressed to Keith Stenning, Human Communication Research Centre, Edinburgh University,

2, Buccleuch Place, Edinburgh EH8 9LW, Scotland, UK. Email: [email protected]

# 200X The Experimental Psychology Society 1http://www.tandf.co.uk/journals/pp/17470218.html DOI:10.1080/17470210500198759

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY

200X, XX (X), 1–30

Page 2: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

subjects’ conclusions on the one-sentence(immediate inference) and two-sentence (syllogis-tic) tasks (Newstead, 1995; Roberts, Newstead, &Griggs, 2001). The conclusion of these studies isthat the two-sentence task presents such calcula-tive difficulties that subjects resort to quite differ-ent interpretations of the quantifiers.

We see this conclusion as pessimistic withregard to the relevance of these tasks to generalreasoning and also as quite unnecessary on the evi-dence. Subjects can be seen as generally continuingthe defeasible processes by which they normallyinterpret discourse, from the first task to thenext. But seeing continuity between the one-sentence and two-sentence tasks within eachsubject requires alternative competence modelsfor different subjects. Only by appreciating theindividual difference between participants canwe grasp the homogeneity of their interpretativeprocedures across the two tasks.

The proposal that subjects in these tasks shouldbe modelled as doing defeasible reasoning hasnotably been made by Oaksford and Chater (seeOaksford & Chater, 2001, for a recent review)who have argued over the course of the last 15years that the selection task, conditional reasoning,and syllogistic reasoning are understood bysubjects as inductive tasks for which the appropri-ate “computational level” competence model isprobability theory. Bonnefon (2004) relatesmental models theory to a range of approachesusing default logics to model human reasoningand provides useful references to earlier appealsto defeasible reasoning in the psychological litera-ture. Although it is true that mental models theorymakes periodic appeal to the defeasibility ofreasoning,Q3 Stenning and vanQ3 Lambalgen (2004,in press) show how failure to separate competencemodels for different construals of the tasks leads tolittle but confusion.

We share much with Oaksford and Chater(2001) and with Bonnefon (2004). We agreethat the classical logical competence model is aninsufficient basis for modelling most subjects inthese tasks, and that their reasoning is often defea-sible. We share their doubts that probabilitytheory can be more than a theorists’ tool as a

computational level model and suspect that quali-tative nonmonotonic logical models are both moreinsightful as competence models and nearer to theperformance models that subjects actually use toreason with.

We also differ on a number of dimensions, firstand foremost in believing that a range of compe-tence models is needed for explaining participantqualitatively different construals of reasoningtasks. Different logics are required for modellingdifferent task construals. This richness ofinterpretation has to be acknowledged whetheror not probabilistic information gain plays a rolein subjects’ reasoning.

In our discussion below of the source-foundingmodel (Stenning & Yule, 1997) and its applicationto the present data, we return to other theories’stance toward individual differences. That discus-sion argues that construing mental models theoryas a single “fundamental human reasoning mech-anism”, when it clearly has to be stretched tomodel all kinds of different reasoning tasks, hasblunted curiosity about the qualitative differencesbetween subjects’ performances. It has alsoled to spurious arguments dismissing rule-basedaccounts of the suppression task. Stenning andvan Lamblagen (in press) show that Q3Byrne’s(1989) claims that rules cannot be operatingbecause they are suppressed in some circumstancesturn out to be made with regard to logical rulesinappropriate to the subjects’ construal of thetask. Logical modelling has the great virtue ofrequiring analysis of coherent construals of reason-ing tasks, one model for each construal, a goodstarting point for cognitive analysis, whichsurely has to start from subjects’ construal of thetask.

Our interpretative approach was originallydeveloped by showing that a single task (Wason’s,1968, selection task) Q4evokes multiple interpretationalconflicts for participants. These conflicts arisebecause the overwhelmingly most likely initialinterpretation of the rule in that task is nonclassical,nontruthfunctional, and nonmonotonic, and theseproperties conflict with the nature of the task—totest truth of a defeasible generalization by examin-ation of cases (Stenning & van Lambalgen, 2004).

2 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 3: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Subjects’ qualitatively different responses can beunderstood as attempts, variously successful, toresolve these conflicts. The effects of deontic vari-ations of material and instructions in this task canbe understood in terms of their different logicalinterpretations and different task demands. Aformal default logic model of this understanding ofconditionals is given byQ3 Stenning and vanLambalgen (in press) in the context of modellingByrne’s (1989) suppression task. Far from producingevidence against subjects’ reasoning by logical rules,this task illustrates how subjects employ the samenonclassical default logic that underlies theirproblems in the selection task, to accommodate theconflicting statements presented in the suppressiontask. The particular logic proposed is also shownto be neurally implementable and extraordinarilyefficient, properties that make it a good candidatefor at least some of the functions of what havecome to be known as “System 1” implicit reason-ing processes (Evans, 2003; Stanovich, 1999).This logic also offers the possibility of a smootharticulation with System 2 processes modelled byclassical logic.

These proposals present a modern logical viewof the relation between task, materials, context,content, and logical form. On this view, to getan appropriate logical system in which to reasonfor some purpose, a wide range of logical par-ameters have to be set by clues from instructions,materials, context, and content in order to assignlogical form, semantics, and the relevant conceptof validity, which alone can fix what counts as“correct” reasoning.

Here, we assume the same broad defeasiblelogical framework in an empirical exploration ofthe range of defeasible understandings of theone- and two-sentence tasks (the immediate infer-ence task and syllogisms). Tasks that appear toinvoke disparate and unreconcilable behaviourcan in fact be seen to invoke quite consistentinterpretations, though different interpretationsin different subjects. Appreciation of the varietyof logics focuses attention on individual differ-ences and makes modelling group data not gener-ally justifiable. If subjects are aiming to dodifferent tasks with different normative standards

and different valid conclusions, then aggregatemodels of their “accuracy” of reasoning run thedanger of being entirely misleading. Groupmodels are only justified by unargued assumptionsabout the uniqueness of classical logical interpret-ation and the uniformity of the “fundamentalhuman reasoning mechanism”, or by generalizedappeals to probability theory as a competencemodel.

The plan of the paper is as follows. The nextsection reviews the experimental work that hasproduced the evidence of disparities in interpret-ation between the immediate inference and syllo-gistic tasks, and it draws out from that worksome of the assumptions that underlie its con-clusions. The following section examines therelations between these experiments and logicaland linguistic theories of discourse Q5interpretation—Gricean approaches and default logics—arguingthat although Grice has been widely appealed to inthese papers, his theory has been torn from itscontext and only narrowly applied as a theory of(an aberrant classical) logical form rather than atheory of cooperative communication. The fourthsection presents an experiment that collects and ana-lyses immediate inference data in ways suggested bythe logical framework, and it follows this by elicitingsyllogistic reasoning from the same subjects. Anexploratory statistical model of conclusion termorder is then designed to reveal connectionsbetween patterns of interpretation and patterns ofreasoning. A second experiment provides a replica-tion and allows the analysis of conclusion termorder to be reconnected to the analysis of reason-ing accuracy. Finally, the General Discussiondraws implications for empirical strategy in inves-tigating interpretation and reasoning.

Experimental studies of interpretation’srelation to reasoning

Newstead (1985) Q6was among the first to study therelation between quantifier interpretation andreasoning in detail. This paper presents two toolsfor studying the interpretation of syllogisticquantifiers: the immediate inference task and thediagrammatic task. In the first, sentential, task,

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 3

INTERPRETATION AND REASONING Q1

Page 4: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Newstead presented a premise to be consideredtrue and a candidate conclusion sentence andasked subjects: “Is the conclusion sentence defi-nitely true? Or else false?” (p. 652). The secondtool for the measurement of interpretation wasbased on Euler diagrams. This task requiredsubjects to consider the five possible relationshipsbetween two sets presented as circle diagramsand to indicate which of the four quantified state-ments were true with respect to them.

The study found, in both immediate inferenceand graphical tasks, some support for bothconversion theory (e.g., Chapman & Chapman,1959), in which subjects treat logically asymme-trical statements as reversible (e.g., All A are B asimplying All B are A), and for Grice’s (1975)theory of implicatures in which, as illustratedin the examples cited in the first paragraph, par-ticipants make extra inferences from the assumedcooperativeness and omniscience of the speaker.These theories are discussed in the nextsection. Politzer (1990) improved on methodsand had more success at modelling immediateinference along Gricean lines but did notcollect syllogistic data.

Newstead (1995) repeated the immediate-inference experiment of his earlier paper and ranthe same participants on syllogisms. This paperfocused on the question of whether Griceaninterpretation as evidenced, for example, by thedrawing of the implicature from Some A are Bto Some A are not B is responsible for errors of syl-logistic reasoning. An analysis of the then existingliterature showed that the expected errors in syllo-gistic reasoning are quite rare. Four new exper-iments were presented. In Experiment 2, three“measures of Gricean interpretation” were com-puted: (a) the graphical criterion of choosingthe set intersection or disjoint diagrams for someor some. . .not statements; (b) the frequency withwhich some was taken to imply some. . .not; and(c) the frequency with which all implied eitherthe falsity or logical independence of some, andthe frequency with which no implied either thefalsity or logical independence of some. . .not.

These three alternative measures of Griceaninterpretation did not even correlate with each

other. Newstead (1995) concluded: “Surprisingly,since these are supposedly different measures ofthe same thing, the correlations were small,non-significant, and in one case actually negative.This suggests that the three measures may not betapping the same thing”. Q7There was also littlecorrelation between any of these three measuresof Gricean interpretation and conclusions drawnin syllogistic reasoning, and the conclusion wasdrawn that subjects change their interpretation ofthe quantified statements when going from theone-sentence to the two-sentence tasks.

Roberts, Newstead, and Griggs (2001), perhapsbecause of the earlier difficulty in producing stableempirical indices of Gricean interpretation,adopted a top-down approach to the problem.The paper works through a total of 13 combi-nations of “logical”, heuristic, Gricean, and “con-version” interpretations of the quantifiers andproduces complete tables of which conclusions, ifany, follow under which interpretations, for all64 syllogism premise pairs. These models ofinterpretation are then fitted to some existingdatasets Q8(Dickstein, 1978; Johnson-Laird &Steedman, 1978, combined with Johnson-Laird& Bara, 1984) Q9and one new dataset of the kindsof conclusion that subjects draw from each pairof syllogism premises.

When these thirteen interpretations are fittedto the three data sets, simple Gricean interpret-ations fit poorly to all three. Heuristics such asatmosphere and matching do poorly on theearlier datasets but rather well on the new one.The best fits are generally of conversion and“reversible” interpretations (the latter being aparticular subcategory of conversion) either withor without elements of Gricean interpretation.These different models are rather poorly distin-guished by the data. It is a bane of the study ofthe syllogism that remarkably simple heuristicsget a high proportion of classically correctanswers, and all models of representation and pro-cessing share a large core of predictions. Only bylumping all three datasets together is any signifi-cant separation of interpretative models possible.

The authors repeat Newstead’s (1995) Q10earlierconclusion that the extra complexity of syllogistic

4 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 5: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

reasoning leads to the abandonment of the simpleGricean interpretations elicited by the immediateinference tasks, in favour of different interpret-ations for the syllogism task. They see this as athe result of an attempt by subjects to simplifythe necessary calculations.

Seeing deductive tasks as discourseinterpretation

This psychological literature reduces both Griceaninterpretation and issues of the conversion orreversibility of statements’ interpretations to exer-cises in modulating classical logical forms for sen-tences. But Grice’s theory is not a theory for whichthere is some “classical logic of conversation”,which can be translated from standard logic bythe addition of conjuncts to classical logicalforms. Grice’s theory is about defeasible inferencesdrawn during the cooperative activity of producing/understanding a certain kind of discourse in whichthe hearer attempts to identify the speaker’s intendedmodel of their discourse, using general knowledgeand assumptions about cooperativity and omnis-cience, as well as general maxims that cover thiscooperative activity. We call this general discourseprocessing goal credulous reasoning— the goal is toconstruct (and believe) a model of what one is told.This technical term is not perjorative. It contrastswith the sceptical attitude to discourse adopted inclassical logical proof where conclusions must betrue in all models (not merely an intended one).

Roberts et al.’s (2001, p. 174) own calculationsof what conclusions will follow from whatinterpretations illustrate the contrast betweentheir narrow and our broad interpretations ofGrice. Roberts et al. consider the syllogism All Bare A/Some C are B and observes that withGricean interpretations it may get encoded eitherwith Set A identical to Set C, or with the twosets intersecting with nonintersecting subsets.They then argue that “with the outcome setsmade explicit, a problem for anyone adoptingthese interpretations becomes apparent. Griceaninterpretations affect not only the encoding of aproblem, but also its decoding of the final outcomesso that conclusions can be generated. . . . The

assumption of the mutual exclusivity of some andall during encoding will result in a contradictionon decoding.”

They then entertain several possible complexresolutions to this dilemma. But this argumentsupposes that the hearer treats the output modelslike models of classical logical conclusions true inall models of the premises, and that now, fromthe point of view of a speaker concluding fromthis construction, Some A are B is inconsistentwith All A are B—not merely that it would be mis-leadingly uninformative to say the former whenthe latter is true in the intended model. InGrice’s theory, the aim of processing determinesthe semantics of the representations, and ingiving a competence model, one should notforget this and return to a classical interpretationin midstream. The credulous process that Gricedescribed is defeasible and therefore nonmono-tonic. Inferences that might be made at the endof Premise 1 may not be made after Premise2. The construal of the task and the interpretationmechanism are perfectly homogeneous acrosstasks, though they have the effect that implicaturesarising at one point may get cancelled at another.

Likewise, conversion and reversibility are givenno theoretical basis. They are simply offered asobservations from earlier experiments with nojustification beyond the resulting simplificationof reasoning. Yet there are well-developed logicalframeworks for understanding issues of reversibil-ity in the process of discourse comprehension.Closed-world reasoning is the general label forthis reasoning, which can be treated technicallywithin default logics. Far from being some arbi-trary syntactic operation, or simply an attempt tofind an easier problem, this reasoning is anotherexample of Gricean credulous interpretation. It isnot hard to see informally that default logicalmodels for discourse can lead to conversion. Therough principle is: “Only add to the model whatis necessary to understand what has been said.”So, if a discourse begins “All A are B”, then atthis stage we get a model in which the only waythat something can get to be B is because it is A,and in such a model it will be true that “All Bare A”. This is transparently a case of Gricean

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 5

INTERPRETATION AND REASONING Q1

Page 6: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

interpretation via the maxims of quantity and rel-evance—“say enough and not too much”. It is truethat the closed-world reasoning much modelled bydefaultQ5 logics in AI was not discussed by Grice.But Roberts et al. (2001) explicitly deny thatthere is any connection between Griceaninterpretation and conversion/reversible interpret-ation (p. 184). This denial is perhaps a reflection ofa general disinclination to treat deductive reason-ing materials as discourses.

This close association of Gricean implicatureand reversible interpretation points up the factthat credulous discourse processing encompassesa wide range of interpretational inferences.Another class relevant to syllogistic reasoning isanaphora resolution. Consider a subject disposedto construct a single intended model for the dis-course: Some A are B/Some B are C. The speakeris likely to construct a model of the first sentencein which some As are Bs. When the second sen-tence arrives, what anaphoric relations is such aspeaker likely to impose? The example immedi-ately makes it clear that such syllogisms are ana-phorically problematic, because the only sharedterm is used first as a predicate and then in anoun phrase. The second sentence’s subject hasthe indefinite quantifier “some”, and two indefinitephrases in a discourse are prone to be interpreted asintroducing nonidentical referents. This line ofreasoning might yield the response that “it’s differ-ent Bs that are C than the ones that are A”, indi-cating that the subject has constructed a modelwith some As that are B and some other Bs thatare C, and so nothing may follow about therelation between A and C. On the other hand,the general pragmatic forces toward integrationof the only information available pulls in theother direction toward the response “sincethere is a shared reference to B, and the speakerintends me to find some connection, then theBs that are C must be intended to be the sameBs as the ones that are A”. Such reasoning leadsto an intended model in which some As areCs. So even within a generally credulous“construct-the-intended-model” construal of thetask, subjects may draw a variety of conclusions.Indeed, closed-world reasoning also requires

modelling in logics with several subtly differentdegrees and kinds of closure of the world. Onemight object that it is unnatural to treat these pre-mises as having anaphoric relations. The subjectswould undoubtedly agree. However, if they havea credulous interpretation of the task, resolvingthese relations is a goal that is forced upon them,and a little Socratic tutoring does elicit exactlythese kinds of commentaries.

The fact that Gricean “forward” implicaturessuch as that from Some A are B to Some A are notB are the result of the same credulous goals asare reversible closed-world inferences from All Aare B to All B are A suggests an explanation ofwhy the former show up strongly in the one-sentence immediate inference task, but are notwell supported in the data from the two-sentencesyllogistic task, whereas the latter are common inboth tasks. For example, when Some A are B isfollowed by a second premise, say All B are C,the former’s implicature Some A are not B is notrelevant to the available candidates for syllogisticconclusions connecting A and C. However,when All B are A as a first premise is followed byNo C are B, the former’s reversible interpretationin which all B are A is relevant to possible con-clusions about the relation between A andC. Reversibility, viewed as credulous closed-world reasoning, just yields more relevant connec-tions between end terms in syllogisms than dosimple forward implicatures.

The underdetermination even of credulousinterpretations clearly presents a degrees-of-freedom problem for data analysis. Some haveclaimed that once interpretation is taken seriously,empirical constraint is impossible. But fortunatelythis shift toward seeing the task as discourseinterpretation strongly suggests a change of exper-imental programme, which can provide consider-ably more empirical constraint. At a methodologicallevel, Roberts et al. (2001, p. 177) make the assump-tion that there is a sufficiently dominant interpret-ation and reasoning process, shared among a largeenough subgroup of subjects, persisting throughouttheir performance on 64 syllogisms, that fittingmodels to group data makes sense. Their eventualconclusion is that they have gone about as far as it

6 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 7: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

is possible to go without resort to individual differ-ence methodology, but even this concession appearsto assume that any individual differences will bedetails to be hung on the common model. It mightbe convenient, but why should this be so?

Once the subjects are seen as attempting differ-ent tasks, the a priori attractiveness of fitting groupmodels to data is diminished. For example, theauthors note that whereas the best fitting groupmodel for the older data sets is either reversibleor Gricean reversible, the best fitting groupmodel to their own new data is heuristic (match-ing). They suggest this is probably because oflower skill levels in the new subjects. So differentpopulations of subjects emerge fitted to uniformbut quite different group models of the task asone goes from one population of undergraduatestudent subjects to another. Is it not much moreplausible that the groups contain different pro-portions of heterogeneous subjects, and that atleast some of these heterogeneous kinds ofsubject are represented in all groups? At worst, itis perfectly possible for group data to producemodels that do not fit any individual at all.

When it comes to syllogistic reasoning data,there is an issue about how to conduct exploratoryanalysis. We here investigate both accuracy ofreasoning (on a classical logical criterion) and con-clusion term ordering as measures of reasoningperformance that might be expected to be sensitiveto the interpretative differences just described.Global accuracy analysis allows comparison withearlier literature, but has several problems. Asmentioned above, there is the conceptualproblem that if participants have different under-standings of the task, then accuracy should beassessed on appropriate competence models fortheir understanding. It is true that one of theproblems with studying syllogisms is that manyof the possible credulous models of syllogismunderstanding diverge from the classical scepticalmodel on only a few problems. There are goodlogical reasons. The syllogism is an unusual frag-ment of logic, which allows application of onlyslightly strategically modified processes of credu-lous discourse processing to achieve scepticalreasoning. This is because, as we shall see

presently, it permits generation of classically validconclusions through the identification of “criticalindividuals”—individuals fully specified for allthree properties (see Stenning & Yule, 1997). Sothe conceptual problem may not be as severehere as in, say, the selection task. However,the fact that reasoning can be done in terms ofthe identification of critical individuals turns outto mean that conclusion term order provides analternative measure of reasoning, which is moreclosely related to process and to accuracy than atfirst appears and has important methodologicaladvantages for exploration. Exploratory statisticalmodels of accuracy require the modelling of ninepossible responses—eight conclusion types plus“no valid concluson”. Conclusion term order isbinary. In the end what we want is not modelsof global accuracy but models of reasoningprocess.

Conclusion term order is an obvious measure ofsyllogistic reasoning to relate to the reversibility ofinterpretation (or its refusal). When subjectsdecide to make an ac or a ca conclusion Q11. Term-ordering in conclusions has been studiedextensively since Johnson-Laird and Steedman(1978) popularized the conclusion construction(as opposed to evaluation) task, thus making thisdata available. Stenning and Yule’s (1997)source-founding model is the widest coverageand most accurate model of conclusion term order-ing available, and here we build on that model toanalyse individual differences in our subjects’ con-clusion termordering. The source-founding modelconceives of the classical logical task as findingtypes of individual that must exist in all modelsof the premises. Equally it conceives of the credu-lous task as finding types of individual that consti-tute the speaker’s intended model. The source-founding model proposes that subjects identify asource premise (one that entails the existence ofthe identified type of individual) and draw a con-clusion by adding the end term of the otherpremise on to the end of this premise. So con-clusion term order is decided by choice of sourcepremise, and choice of source premise, alongwith some minor quantifier adjustments in a fewcases, determines “accuracy”.

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 7

INTERPRETATION AND REASONING Q1

Page 8: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Stenning and Yule (1997) used a version of thesyllogistic task that provides data about the order-ing of the middle term during reasoning. Insteadof being asked to draw a conclusion that dropsthe middle term, subjects were asked to do a logi-cally closely related task of describing, in terms ofall three properties, a type of individual that mustexist if the premises are true. So, for example,given the premises All A are B and Some C arenot B the subject might conclude, for example: aC that is not B and is not A. Note that any orderingof the terms in such descriptions is equally logicallyvalid. Subjects were instructed that if no criticaltype of individual was entailed, then to respond“no valid conclusion” (VC).

The source-founding model is shown to beconsiderably more accurate than mental modelstheory’s predictions, and it explains the new dataon three-term ordering generated by the noveltask (cf. Stenning & Yule, 1997, Table 11.).Intuitively, the source premise identifies the typeof individual that the problem is about. In ourexample syllogism All A are B. Some C are not Bthe second premise is source as it entails the exist-ence of the Cs that are not B on which the con-clusion Some C are not A is founded. Stenningand Yule noticed that subjects overwhelminglydraw their conclusions by adding the end term ofthe other premise to the tail of the sourcepremise (removing the middle-term). Of the poss-ible three-term orders in their task, the modelidentifies one third that should be more commonthan the other two thirds of orders. In fact,between 70% and 90% of all responses fall in thispredicted third of orders, for each of the fourfigures, in comparison with 60%, 44%, 72%, and8.7% for mental models predictions. The improve-ment in fit is mostly due to the source-foundingmodel’s prediction of bac and bca term orders,which are observed to be the commonest of all.Mental models theory rules these orders outsince it is organized around the principle that themiddle term must be got into medial position toallow a “mental cancellation” operation. Wetherefore propose to incorporate our individualdifferences by extending the source-foundingmodel.

Subjects tend to draw conclusions founded onthe source premise, as identified by the classicalcompetence model. However, the crux of theoryas a processing model is obviously the heuristicsthat subjects use to identify the source premise.This point has been misunderstood. Forexample, Chater and Oaksford (1999, p. 237)assume that the theory is that subjects identifythe source premise using the classical logical com-petence model, and so they conclude that thetheory cannot be applied to invalid syllogisms, toconclusions drawn by implicature, or to quantifierssuch as few, most, many. However, this is just tomisunderstand the theory. Although the exper-imenter’s initial evidence for source founding hadto be based on identification of source premisesestimated from a competence model, when itcomes to constructing process models of partici-pants’ reasoning, a variety of heuristics suggestthemselves on which a processing theory mightbe built. Some were proposed in the originalpaper and are refined here, and of course suchheuristics can cover invalid syllogisms and infer-ences by implicature. It is quite natural to extendthe central discourse processing idea that subjectsattempt to anchor their reasoning on an estab-lished individual (or set) to quantifiers such asmany and most. Stenning and Yule (1997) showthat the heuristics select unique existential premisesas source and select unique positive premises assource, applied in that order, approximate to classi-cal logical competence and account for a large pro-portion of subjects’ reasoning. The same heuristicsapproximate to credulous individual identification.So the source-founding model is a “shell” processmodel, which abstracts over different logics andrepresentations and in which different strategiescan be expressed by changing the heuristics forsource premise identification. At this level ofanalysis, there are many parallels between thesource-founding model and Oaksford andChater’s (2001) Q12probability heuristics model(especially the attachment heuristic), howeverdifferent is their general philosophy.

In summary, the credulous construction ofsingle intended models for discourse is a familyof interpretative processes, which have quite

8 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 9: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

different goals from those of classical logical proof.Given the range of credulous discourse processingavailable, we here adopt a more appropriatelydescriptive approach to finding important synopticpatterns of interpretation across immediate infer-ence task questions. These patterns are then usedto predict subjects’ syllogistic reasoning, startingfrom an exploratory statistical model of conclusionterm order and using its findings to make and testpredictions about subgroups of subjects’ reasoningprocesses viewed through the source-foundingmodel. Developing specific logical models is atask for later papers.

EXPERIMENT

The history of our experimentation is that an initialexperiment yielded a much more complex regressionmodel of the relations between individual differ-ences in interpretation and subsequent syllogisticreasoning than had been anticipated. This meantthat it was not possible to reserve a test subset ofthose data on which to test the model, so a secondexperiment of exactly the same form as the firstwas conducted on a new group of subjects drawnfrom the same student population. In what followswe refer to the first experiment’s data as the “devel-opment dataset” and the second as the “test dataset”.We report the results in parallel for the reader’sconvenience.

The experiment includes both an immediateinference task and the subsequent syllogistic taskperformed by the same subjects. We analyse anddiscuss the former before introducing the latterbecause the exploratory analysis of interpretationsin immediate inference feeds into the modellingof the subsequent reasoning data.

IMMEDIATE INFERENCE

Stenning and Cox (1995) studied both sententialimmediate inference and diagrammatic tasks andshowed that with an appreciation of their differentsemantics, their measures could be brought intosome correspondence. In this paper we focus on

sentential measures of interpretation. We first col-lected a set of sentential immediate inference dataon the syllogistic quantifiers, analogous toNewstead (1995 Exp. 2). We analyse these datainitially from a descriptive standpoint. What aresubjects’ naive logical intuitions as gathered inthe immediate inference task like? Are general pat-terns to be found?

Method

First-year undergraduate students who had notbeen exposed to formal logic teaching were givenquestionnaires about their interpretation of thequantifiers all, no, some, some . . . not.

ParticipantsParticipants yielding the development dataset were101 undergraduate psychology students at theUniversity of Edinburgh. They were testedduring a lecture on cognitive psychology. Thesestudents were drawn from a wide range of depart-ments across the entire university with a predomi-nance of social science faculty students. Few ofthese students had received any formal logicaltraining at secondary school. None of the studentshad taken logic courses in the university. Thesecond set of participants (N ¼ 62) who yieldedthe test dataset were from the same introductorypsychology class in the following year at thesame point in their courses. As for the develop-ment dataset, the Participants’ quantifierinterpretations were tested during a lecture oncognitive psychology.

Materials and procedureImmediate inference (II) questionnaire. The taskused was similar to that described by Newstead(1995, Exp. 2), as described above. As inNewstead’s study, the questionnaire consisted offour pages. At the top of each page one of thefour standard quantified statements was displayed:All As are Bs; No As are Bs; Some As are Bs; andSome As are not Bs. These were the premise state-ments. Beneath these premise statements the fourquantified statements were listed (All As are Bs,etc.) and the converses of these (All Bs are As,

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 9

INTERPRETATION AND REASONING Q1

Page 10: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

etc). These were the candidate conclusion state-ments. Alongside the eight response statementswere response options “T” (true), “F” (false), and“CT” (can’t tell; possibly true and possibly false).The order of the four stimulus statement pageswas randomized across subjects.

Participants were instructed:

This is a study of the way people draw con-clusions from information. On each of thefollowing pages there is a statement atthe top of the page. An example is “All Asare Bs”. Assume that these statements at thetop of the page are true and that there areboth As and Bs.

Below each statement is a line. Below theline are some more statements. For each ofthe statements below the line, decidewhether you believe it is true, false, or onecan’t tell (because either is possible), giventhe truth of the sentence at the top of thepage. Indicate your belief by circling ONEof either “T” (true), “F” (false), or “Can’ttell”.

Examples:

† If you believe that “No As are Bs” must betrue given the true statement “All As are Bs”then circle T.† If you believe that “Some As are not Bs”must be false given the true statement “NoAs are Bs” then circle F.† If you believe that “No As are Bs”Q13 could betrue or could be false given the true state-ment “No As are Bs”, then circle CT.Again, please note that you should interpret“some” to mean “at least one and possibly all”.

Participants were allowed as much time as theyneeded to complete the tasks (approximately 10minutes).

Results

Table A1 (Appendix A)Q14 shows the proportion of“true”, “false”, and “can’t tell” responses to eachquantifier, along with the responses correct

according to the logical model with the no-emptysets axiom. In Table A1, Newstead’s(1989) results are shown if the results of thepresent study differ by more than .07 from thosereported by Newstead (1989, Table 2, p. 86).

In Table A1, primed conclusion quantifiers(e.g., A0) represent the converse conditions (e.g.,ALL Bs are As). The introduction of the “Can’ttell” response option in the current study resultedin a marked lowering of conversion and Griceanerrors of interpretation compared to the resultsof Newstead (1989). Of course, these absolutedifferences are not of great interest in themselves.Their real significance can only be appreciatedthrough the analyses of subjects’ concept of val-idity, which they enable (presented below).

The data shown in Table A1 and Table 1 Q15arevery similar to those of a previous study with adifferent sample (Stenning & Cox, 1995).

Participant profilesAlthough grouped data provide a comparison toearlier work, and it is possible to examine piece-meal correlations between answers to differentquestions, our real interest is in finding patternsof interpretation characterizing a participant’s

Table 1. Development dataset: Numbers of subjects making each

number of the two kinds of errors possible on QAB:QBA? questions

T/F for CT

CT for

T/F 0 1 2 3 4 5 6 7 8 Tot

0 2 5 5 5 8 5 2 5 8 45

1 2 0 1 1 3 3 0 2 0 12

2 0 1 2 1 0 1 0 0 1 6

3 0 2 1 1 1 0 2 0 0 7

4 0 0 0 0 0 1 0 0 1 2

5 0 0 2 2 2 0 0 0 0 6

6 0 2 3 1 3 0 0 0 0 9

7 2 1 1 0 0 0 0 0 0 4

8 7 3 0 0 0 0 0 0 0 10

Totals 13 14 15 11 17 10 4 7 10 101

Note: Can’t tell (CT) for True or False (T/F) errors are judge-

ments that the conclusion is logically independent when it is

in fact dependent; T/F for CT errors are judgements that

logically dependent conclusions are logically independent.

10 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 11: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

interpretative scheme. There is a large space ofpossible patterns of responses to the immediateinference questions (332) and therefore a consider-able discovery problem in finding useful descrip-tions. We approached this problem by exploringstudents’ use of the CT response and theirresponse to subject predicate inversion.Credulous interpretation should generally reduceCT responses, and closed-world varieties will gen-erally increase acceptance of inverted conclusions.

Initial investigation revealed the expected sub-stantial group of students who rarely respond CTand who overaccept inverted conclusions accord-ing to the classical logical measure. Substantialnumbers of this group never responded CT toany question with transposed subject and predi-cate. Not only would these subjects respond, forexample, T when given Some A are B and askedwhether Some B are A, but they would alsorespond the same way when given All A are Band asked whether All B are A.

An early contrasting observation was that a sub-stantial group of subjects would overuse the CTresponse (by the classical standard). That is, theyresponded CT whenever they were asked a ques-tion in which subject and predicate were trans-posed. Not only would these students respondCT when given, for example, All A are B andasked whether All B are A, but they wouldrespond the same way when given Some A are Band asked whether Some B are A. This lastpattern is particularly distinctive since it does notarise from either any obvious Gricean interpret-ation or reversible interpretation. Whereas “illicitconversion” has been discussed since Aristotle,this pattern of responding has remained unre-marked in the literature. Table 1 shows thedistributions of numbers of these two generalpatterns of response.

Here are two quite different dimensions ofdivergence of interpretation from the classicallogical model, which need to be investigated inparallel. For convenience, we label a tendency torespond CT where T or F is correct as hesitancyand a tendency to respond either T or F whereCT is correct as rashness. Note that these areempirical terms close to the data, which will turn

out to have complex relations to theoretical con-cepts such as credulous and sceptical reasoning.Rashness and hesitancy can potentially be exhib-ited both when the conclusion sentence preservessubject/predicate (henceforth in place) and whenit changes (henceforth out of place), potentiallyyielding four dimensions of classification. As amatter of observation, no participants had strongtendencies to be hesitant on in-place questions,thus reducing the space to three dimensions.These results therefore suggest a scheme forinsightful abstraction over the sentential responsedata.

Setting thresholds on the number of CTresponses required to qualify as hesitant and onthe number of T or F responses required toqualify as rash can be done both within Q ABquestions and Q BA questions. This reduces thespace to three binary dimensions. Hierarchicallog-linear modelling (e.g., Stevens, 2001) revealedthat 3 second-order terms (rashness on in-placequestions by rashness out of place; rashness outof place by hesitancy out of place; rashness inplace by hesitancy out of place) made statisticallysignificant contributions to a model of the data.The technique also permitted the cut-off pointson each dimension used to categorize participantsto be iteratively adjusted until residuals were mini-mized. The selected cut-off points were 0, �1responses for rashness on in-place items; ,5, �5for rashness on out-of-place items; and ,6, �6for hesitancy on out-of-place items. The figures

Q16on the first line of Table 2 show the number ofsubjects assigned to the three binary dimensions.Of the participants, 23 are hesitant on out-of-place items, 70 are rash on in-place items, and31 are rash on out-of-place items.

A total of 92% of participants fall into fourgroups. A total of 23 participants are neitherrash nor hesitant on either in-place or out-of-place questions, in general complying with theclassical competence model. A total of 24 partici-pants are just rash on in-place questions.However, both of the other substantial groupsmake two kinds of error. The largest group (29)are rash on both kinds of question. The fourthgroup (17) consists of participants who are both

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 11

INTERPRETATION AND REASONING Q1

Page 12: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

rash on in-place questions and hesitant on out-of-place questions.

Several generalizations can be made aboutassociations between these three dimensions.Participants who are hesitant on out-of-placeitems are not rash on out-of-place items, andvice versa. (Note that it is logically possible to beboth rash and hesitant on out-of-place itemssince the two sets of eight items contributing toeach dimension are disjoint.) Participants whoare rash on out-of-place items are rash on in-place items but not vice versa. Participants whoare hesitant on out-of-place items are just aslikely if not more so to be rash on in-place itemsas are Participants who are not hesitant.

Test dataset: InterpretationThe assignment of participants to rashness/hesi-tancy categories in the test dataset was made inthe same way as for the previous data and isdisplayed in Table 2.

These data show a generally similar distributionof patterns of interpretation to that in the develop-ment dataset.

DiscussionTypically, the data used in the literature toexamine interpretation and to explain reasoningpatterns have consisted of responses to particularinferences. The field has concentrated on specificerrors—especially on errors of commission (e.g.,

illicit conversion, or Gricean implicatures), buthas not noticed errors of omission (e.g., failing toconclude from Some A are B that Some B are A).Our analysis of the immediate inference datashows that there are strong response tendencies,which generalize across particular logical infer-ences and are most strongly driven by subject/predicate relations between premise and con-clusion, and by whether conclusions are logicallydependent or independent. What is more, thesetwo factors interact strongly but in complex waysin determining responses. To take one example,one half of which has greatly occupied the litera-ture, participants who commit the fallacy ofillicit conversion of all (discussed above), tendnot to omit to validly convert some. Participantswho fail to validly convert some do not tend tocommit the fallacy of converting all. With con-clusions that preserve subject/predicate structures,omission errors are hardly ever made, while manycommission errors are made by participants whonever make them in assessing subject/predicateinverted conclusions.

Many of these findings can be understood inthe context of our discussion of the discourseinterpretations available. Rashness on in-placequestions corresponds to the commission of theparadigm Gricean implicatures: for example, con-cluding Some A are not B from Some A are B, or viceversa, and similarly with claiming that All A are Bis false given Some A are B or that No A are B is false

Table 2. Numbers of subjects classified by hesitancy on out-of-place questions, rashness on in-place questions and rashness on out-of-place

questions

Not hesitant on OP Hesitant on OP

Not rash on IP Rash on IP Not rash on IP Rash on IP

Not rash on

OP

Rash on

OP

Not rash on

OP

Rash on

OP

Not rash on

OP

Rash on

OP

Not rash on

OP

Rash on

OP

Developmental

dataset

23 2 24 29 6 0 17 0

Test dataset 12 3 4 17 13 2 5 6

Syll. 8 0 7 15 4 0 6 0

Note: OP ¼ out-of-place questions. IP ¼ in-place questions. Syll. ¼ developmental subsample of subjects that completed the

syllogistic reasoning task.

12 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 13: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

when given Some A are not B. These are transfers ofa credulous model of the task. A large proportionof the participants exhibit this tendency (78/101). This is consonant with earlier findings.

What rashness on out-of-place questions sig-nifies is an intriguing question–another part ofthe pattern so far ignored. As the discussionabove indicated, at least some of these errorscould be generated by credulous closed-worldreasoning. There is no reason in Grice’s theorywhy change in subject/predicate assignment inthe conclusion should affect the drawing of 14implicatures. Or, to turn the question around—why should participants who are rash in placebut not out of place refuse credulous conclusionswhen subject/predicate structures are altered?

One possibility for explaining why participantsdo draw both in-place and out of placeimplicatures is a representational explanation.Participants who are rash regardless of subject/predicate structure may be pursuing a represen-tational strategy that removes subject/predicateinformation (for example, a graphical approachsuch as Euler’s circles), combined with applying acredulous model of the task. Participants who arerash on in-place but not on out-of-place questionsmight then be interpreted as having a credulousmodel, but being guided by the way the speakerhas designed the discourse structure, they do notdraw out-of-place implicatures. The two rash-nesses (in-place and out-of-place) are significantly,if weakly, positively correlated (.40, p , .001).There are almost no participants (2 of 101) whoare rash on out-of-place questions but not onin-place questions, and this is as this explanationmight predict. The subject/predicate structure ofout-of-place questions might block implicaturesin a credulous model, but it is hard to see whysomeone applying a credulous model to out-of-place items would not apply it to the in-placeitems.

What are we to make of hesitancy, the strikingnovel empirical observation here? Hesitant partici-pants draw too few inferences on out-of-placeproblems (by a classical logic benchmark), nottoo many. On the face of it they seem unlikely can-didates for credulous approaches. However, about

a quarter of them simultaneously behave rashly onin-place questions. Hesitancy on in-place ques-tions barely occurs in the data. Here we merelysuggest some speculative interpretations and bearin mind that it is possible that this is a hetero-geneous group.

In natural language, structures such as subjectand predicate, placement of negation, and indeedordering of premises are strongly related to credu-lous approaches to communication. Even whenthey do not affect the truth conditions of sen-tences, they focus the credulous hearer onto thespeaker’s intended model. In contrast, classicallogic, a formalism developed for sceptical adversar-ial proof, actually gets rid of subject/predicate infavour of function argument organization. Onesees this focusing through subject/predicate struc-ture in operation in figural effects in syllogisticreasoning. The rare counterfigural syllogisms—those whose only valid conclusions reverse thesubject/predicate status of the premise terms—are solved by few participants. For example, forthe syllogism No A are B/All B are C in our datapresented below, about 75% of subjects drawinvalid ac conclusions, and very few find thevalid conclusion Some C are not A. Simply reversingthe subject/predicate structure of the first premise,which puts the term required to be subject of theconclusion into subject position in its premise,leads 45% of participants to get the correct caterm order, even though there is still a competingcandidate subject term in the other premise.

Linguists refer to the focusing effects of struc-tures where they do not change truth conditionsas information packaging (e.g., Vallduvi, 1992).So one interpretation of hesitancy is that hesitantparticipants with credulous models of the dis-course rely heavily on information packaging toguide their reconstruction of the speaker’sintended model. The information-packagingstructures particularly relevant to the syllogismare subject/predicate, term order, and the place-ment of existential and positive premises.Subjects with credulous models of the discoursewho are attempting to construct the speaker’sintended model around a particular type of individ-ual on the basis of these packagings will prefer

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 13

INTERPRETATION AND REASONING Q1

Page 14: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

conclusions whose subject terms occur as partici-pants in positive existential first premises. Thedetailed rationale for these latter effects is discussedbelow in the context of a model of conclusion termordering.

If hesitant participants with credulous modelsof the discourse rely on such information packa-ging to guide them, then they will avoid applyingclosed-world reasoning to universal premises 15(which automatically changes subject/predicateorganization). They will be particularly in-fluenced by premise order and the placement ofnegatives. There will be many credulous inferencesthat they will not draw, even though they areadopting a generally credulous attitude to thediscourse—their “blind faith” in the speaker’scredulous information packaging will avert manyfallacies, but it will also lead to refusal of classicallyvalid inferences.

We offer this tentative interpretation acknowl-edging that not all hesitant participants are rash inplace, so they may be a heterogeneous group.Furthermore, a deeper theoretical analysis musttake on the differences between things andproperties, terms and predicates, in participants’representations and processing. Whether partici-pants think in terms of processing individualsand their properties, or in terms of sets, is anissue closely related to information packaging.

Back at the data, hesitancy on out-of-placequestions is significantly negatively correlated withrashness on out-of-place questions (r ¼ 2.50,p , .001), presumably because rashness on out-of-place questions indicates an indifference to infor-mation packaging rather than because it indicatesa credulous model of communication. Theseresults provide evidence that we can usefully dis-tinguish contrasts between credulous/adversarialmodels of communication on the one hand, and ofthe uses that participants may make of informationpackaging on the other.

In summary, merely exploring the data forlarge-scale patterns of interpretation has revealedseveral striking but previously unnoticed patterns,which transcend particular quantifiers. We are nolonger constrained to looking for correspondencesbetween single implicatures in interpretation tasks

and reasoning tasks, any of which may be dis-rupted by many extraneous factors. Instead wecan look for systematic influences of credulousmodels and information packaging on the task ofreasoning.

THE SYLLOGISTIC REASONINGTASK

Method

ParticipantsA few days after the immediate inference task, asubset (N¼ 40) of the participants did a syllogisticreasoning task and were paid for their partici-pation. The numbers of these participants classi-fied by their interpretation data, in each of therashness and hesitancy classifications, is shown inTable 2.

These 40 subjects were then given the full set of64 syllogisms. We refer to these data as the devel-opment dataset. Given each pair of premises,participants were asked whether there was anyconclusion of the form quantifier ac or quantifierca (where the possible quantifiers are all, some,no, some . . . not) that must be true whenever thepremises are true. If not, they were instructed torespond no valid conclusion—NVC.

The second set of 62 participants who yieldedthe test dataset of interpretation data shown inTable 2 also did an identical syllogistic reasoningtask to yield the reasoning test dataset. Thisdataset is modelled in Table 4 (see later).

Materials and procedureA categorial syllogism consists of two premises,which relate three terms (a, b, and c), oneof which (the middle term, b) occurs in both pre-mises, while the other two (the end terms, aand c) each occur in only one premise—a is theend term in the first premise, and c is the endterm in the second premise.

There are four moods or premise types, distin-guished by the quantifiers “all”, “some”, “none”,and “some . . . not”. The quantifiers all and noneare universal. The quantifiers some and some . . .

14 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 15: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

not are existential. There are four possible arrange-ments of terms in the two premises, known asfigures. We also make use of the term diagonalfigures to refer to the first pair of figures (ab/bcand ba/cb) and symmetric figures to refer to thesecond pair (ab/cb and ba/bc). Since eachpremise can be in one of four moods, and eachpremise pair can have one of four figures, thereare 4 � 4 � 4 ¼ 64 different syllogisms. Ofthese, 27 have at least some valid conclusionunder the assumption that the sets are nonempty;37 have no such conclusion.

All problems were presented with the abstractterms a, b, and c. Participants were allowed towork through the problems in their own timeand took between 40 minutes and an hour. Theorder of problems was randomized for eachsubject.

Results

Reasoning accuracy (development dataset)Participants’ conclusions were scored for accuracyas defined in classical logic with the noempty-sets assumption—that is, on a VC problem, theywere scored 1 if their conclusion was a valid con-clusion, otherwise 0; on NVC problems theywere scored 1 for responding “NVC”, otherwise 0.

Three separate analyses of variance (ANOVAs)were conducted, with score as the dependent vari-able: (a) validity (whether the problem had a validconclusion or not, VAL) by rashness in place (RI)by hesitancy out of placeQ17 (RO); (b) validity byrashness out of place (RO) by hesitancy out ofplace; and (c) validity by rashness in place by rash-ness out of place as the dichotomized independentvariables.

In each case, validity had a significant maineffect on accuracy: Valid syllogisms are solvedmore accurately by all participants than syllogismswithout valid conclusions: RI by RO by VALanalysis, main effect of VAL, F(1, 36) ¼ 20.33,MSE ¼ 0.03, p, .001; RI by HO by VAL analy-sis, main effect of VAL, F(1, 36) ¼ 12.46, MSE ¼

0.03, p¼ .001); RO by HO by VAL analysis, maineffect of VAL, F(1,36) ¼ 7.42, MSE ¼ 0.03,p ¼ .01.

Rash in place and rash out of place each havesignificant main effects on conclusion accuracy:Either kind of rashness is associated with lowerreasoning accuracy: RI by RO by VAL analysis,main effect of RI, F(1, 36) ¼ 7.48, MSE ¼ 0.06,p ¼ .01; RI by HO by VAL analysis, main effectof RI, F(1, 36) ¼ 8.48, MSE ¼ 0.06, p , .01;RO by HO by VAL analysis, main effect of RO,F(1, 36) ¼ 6.97, MSE ¼ 0.06, p ¼ .012.

The only significant interaction is betweenvalidity, rashness in place, and rashness out ofplace: RI by RO by VAL analysis, interaction ofRI by RO by VAL, F(1, 36) ¼ 6.35, MSE ¼

0.03, p ¼ .016.This interaction, shown in Table 3, is of the

form that participants perform roughly equallywell on problems with valid conclusions, but onlythose who are neither rash in place nor rash outof place are comparably accurate on problemswithout valid conclusions. All other effects failedto reach significance.

Discussion of syllogism accuracy resultsThe main effect of validity accords with all the datain the literature: Problems without valid con-clusions are more difficult than problems withvalid conclusions. Rashness of either kind isassociated with lower accuracy across all problems,a result consistent with rash participants havingcredulous understandings of the task. But rashnessof either kind is not singly associated with moreinaccuracy on NVC problems. It seems that asimple association between credulous models ofthe task and failure on NVC problems does not

Table 3. Development dataset: The interaction of validity of

problem with subjects’ rashness on out-of-place problems and

rashness on in-place problems in determining mean reasoning

accuracy

Valid conclusion

problems

No valid conclusion

problems

Not rash IP Rash IP Not rash IP Rash IP

Not rash OP .63 .55 .60 .23

Rash OP .56 .54 .35 .32

Note: OP ¼ out of place. IP ¼ in place.

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 15

INTERPRETATION AND REASONING Q1

Page 16: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

hold. This is related toQ18 Newstead’s finding thatalthough Gricean interpretation is common, it isnot necessarily associated with drawing too manysyllogistic inferences. However, we observedabove that credulous approaches cover a widerange of specific logics and kinds of inference,and that disturbance of information packagingmight save participants from invalid conclusionseven though those conclusions follow from credu-lous implicatures. Some construals of the taskmight also reject deductively invalid Griceanconclusions for unrelated reasons (see, e.g.,example Oaksford & Chater’s, 1994, rationalchoice model).

There is an association between participantshaving both kinds of rashness and their reasoningaccuracy on NVC problems. Essentially all rashout-of-place participants are also rash in place(though not vice versa). Many rash in–place par-ticipants are also hesitant on out-of-place items,or are simply neither rash nor hesitant on theseitems. If the former participants are being savedform overinference by their sensitivity to infor-mation packaging, then this might explain whyrashness in place alone is insufficient to cause over-inference on NVC problems, and why participantswho are rash on both in-place and out-of-placequestions do overinfer on NVC problems.

We return to the relations between reasoningaccuracy and information packaging after wehave developed a model of conclusion term order-ing, which will allow a process-oriented analysis ofreasoning accuracy.

Conclusion term ordering (development dataset)Participants can draw conclusions in either of twoterm orders: ac or ca. This choice reflects partici-pants’ active information packaging of conclusionsas influenced by problem structure. We nowdevelop a statistical model for the effects ofproblem structure and individual differences ininterpretation on the term orders of conclusionsthat subjects draw. The statistical framework islogistic regression: Structural problem variablesand individual difference variables contribute toan equation predicting the probability of the

reasoner drawing an ac conclusion, as opposed toa ca conclusion.

The term ordering data from both VC and NVCproblems were modelled. Classically invalid con-clusions generally show similar patterns of termordering to those for valid conclusions. Participants’NVC responses, lacking term order, were discarded.Since doubly rash participants draw more con-clusions from NVC problems, we included validityas an independent variable to check whether anyeffects are mediated through validity.

We seek a model that will identify the factorsdetermining participants’ choice of end-termorder in drawing a conclusion. Participants couldseize on any structural asymmetry in the problemand, on this basis, arrive at either premise order.Most simply, participants might choose to ordertheir conclusion’s terms in the order of their occur-rence in the premises, leading always to ac con-clusions. Or they might seize on structuralasymmetry such as the placement of a uniquequantifier, or the grammatical status of a term(subject or predicate) to order their conclusion’sterms. For example, if just one of the premisescontains a subject end term, this structure definesan asymmetry that could be the basis for puttingthe subject end term as subject of the conclusion(or for that matter as predicate). If both orneither premise contains such an end term, thenthis source of asymmetry cannot operate, and simi-larly for any other structural feature of problems.

The structural factors that we investigated wereas follows: the sequence of premises, which isreflected by the intercept term of the regressionmodel; the grammar of a problem encodes endterm grammatical information (in Figure 1, ABBC, grammar is scored þ1; in Figure 2, BA CB,grammar is scored 21; in Figures 3 and 4,grammar is scored 0); the presence 18 of aunique existential premise (either some orsome_not; Q19this was encoded þ1 if it was inPremise 1, and 21 if it was in Premise 2; and 0if there were either two identical existential pre-mises or none); similarly for the presence of aunique all premise, a unique no premise, for aunique some_not premise, and a unique negativepremise.

16 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 17: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Our aim was to reveal any patterns of individualdifferences in quantifier interpretation that pre-dicted reasoning behaviour, specifically conclusionterm order. The individual differences ofinterpretation investigated were as defined by thequantifier interpretation data described above: hes-itancy out of place on subject/predicate reversedquestions; rashness in place on subject/predicatepreserved questions; and rashness out of place onsubject/predicate reversed questions. Hesitancy-out-of-place scores ranged from 0 to 8 (mean ¼

2.30). rashness in-place scores ranged from 0 to 4(mean ¼ 1.68); rashness out-of-place scores rangedfrom 0 to 8 (mean ¼ 3.90).

Stepwise methods in regression analysis providea useful tool for exploratory research in new areas(Hosmer & Lemeshow, 1989), where the focusis upon initially descriptive model building. TheSPSS backwards elimination algorithm was used,since, compared to forward entry methods, back-ward elimination is less likely to exclude predictorvariables that are relevant but that are involved insuppressor effects (Menard, 1995). A logisticregression model with independent variables wasselected from the range described, predicting asdependent variable the proportion of ac con-clusions on each of the 64 syllogism problems.All possible two- and three-way interactionsbetween the structural variables were investigated.All two-and three-way interactions between hesi-tancy, rashness in place and rashness out of place,and structural problem variables and pairs ofthem were investigated.

Two abstract classifications of quantifiers wereexplored: positive versus negative; universal versus

existential. The best models resulted from separ-ating the variables for the universal quantifiersall and no, and the negative quantifiers no andsome_not, but having a joint variable for the existen-tials (some and some_not) plus a distinguishing vari-able some_not. This arrangement has the effect thatfor syllogisms with both some and some_not the vari-ables existential and some_not both have the value1. Note that such problems are NVC problems.

The logistic regression model of term conclusionorder was developed on the development dataset andwas tested on the test dataset. All variables includedcontribute significantly to fit, and adding any of theother variables fails to improve fit significantly,model x2(27)¼ 360.513, p, .0001. The model cor-rectly predicts the term order of 68% of conclusionsof the dataset on which it was developed; themodel’sx2(27) ¼ 620.37, p , .001. The samemodel correctly classified marginally more of thetest dataset (71% compared to 68% in Experiment1). The classification of the two datasets by thismodel is shown in Table 4, and the model’s par-ameters in Table B1 (Appendix B1. Examinationof the residuals in Tables C1 and C2 (AppendixC) suggests that the largest discrepancies in fit tothe structure of problems are in predicting con-clusions for problems in Figures 3 and 4 withsome_not as Premise 1.

Before moving to an analysis of the model, webegin with comparison of development and testdataset fits. The effects of grammar, all and exis-tential on term ordering are all significant andsimilarly signed as in the previous data. The inter-action between existential and some_not is insignif-icant in the new data. The interaction between all

Table 4.Development and test dataset fits: The logistic regression model’s classification of conclusion term order (ac and ca)

Development dataset Test dataset

Predicted Predicted

ac ca % Correct ac ca % Correct

Observedac 729 257 75.66 1,274 323 79.8

ca 318 511 63.21 474 677 58.8

Overall 68.32 71.0

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 17

INTERPRETATION AND REASONING Q1

Page 18: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

and validity is similar but only marginally signifi-cant in these data. The main effect of hesitancyon out-of-place questions is roughly halved insize but is still significant. Its interaction withgrammar is similar and significant. Its interactionwith grammar and none is similar and still signifi-cant. The interaction between rashness in placeand some_not is similar and significant, but itsinteraction with some_not and grammar is insignif-icant in these data. The interaction between rash-ness on out-of-place questions and none is similarand significant in the new data. On the whole, themodel of the old data fits the new data rather well.The interactions with some_not are less well sup-ported than those with no. The remainder of ourdiscussion focuses on the fit to the developmentdataset.

The model shows that several structural andindividual difference variables, and interactionsbetween them, contribute significantly to thedetermination of conclusion term order. The posi-tive constant in the equation (coefficient ¼ 0.2098)reflects the overall bias towards ac conclusions seenin Table 4 and observed in all published studies.The end term from the first premise tends tooccur as the subject of the conclusion. The endterm from the second premise occurs as the predi-cate of the conclusion. Although the interceptconstant marginally fails to reach significance, wesee that specific subgroups of subjects exhibitmore strongly this tendency to place terms in theorder they occur in premises.

Of the structural variables, grammar makes thegreatest contribution to fit. Figure 1 problemsproduce ac conclusions, and Figure 2 ca con-clusions. This effect can be summarized by sayingthat where end terms are of different grammaticalcategory (i.e., in Figures 1 and 2), they tend to pre-serve those categories in conclusions. This effectagain accords with all other studies. Note thatthis effect preserves one particular aspect of a pro-blem’s information packaging in conclusions.

No systematic analysis of quantifiers’ effects onconclusion term order has ever been conductedbefore. The overall main effects of the quantifiers’positions on conclusion term order are summar-ized in Table 5. We observed above that

Stenning and Yule’s (1997) source-foundingmodel of conclusion term order assumes that par-ticipants use heuristics to determine the sourcepremise on which to found conclusions. Themost important heuristic is that any unique exis-tential premise is the source premise. Theregression model shows that after grammar, a vari-able built into the foundations of source-foundingmodel, the most powerful structural determinantof term order is the existential variable: Uniqueexistential quantifiers identify source premises,and their end terms become the subjects of con-clusions. This effect works powerfully when theexistential is in either premise, but more stronglywhen it is in the second where it operates againstthe congruence of conclusion terms with premiseorder captured by the intercept.

Because of the way the quantifier variables aredefined, the interaction between existential andsome_not applies only in problems with some_notin Premise 1 and some in Premise 2. In theseproblems, the negative coefficient means thatthere is an added preference to take the positivequantifier as source (against premise order), asthe source-founding heuristics suggest.

The source-founding model’s heuristics ofpreferring existential to universal and positive tonegative premises as source combine to make nopremises the least preferred sources. Our resultsshow that no, alone among the quantifiers, doesnot have any main effect of placing its end termin subject position of the conclusion. All theother quantifiers’ Premise 1 net coefficientsare positive, and all the Premise 2 net coefficientsare negative (see Table 5). This means that eachquantifier except no tends to put its end terminto subject position, though to different extents.

The sizes of the coefficients for each quantifierin the two premises vary. The negative Premise 2

Table 5. Development dataset: Summary of contributions to Z of

the quantifiers’ premise position, in the logistic regression model

All Some None Some_not

Premise 1 0.5012 0.7112 — 0.0228

Premise 2 20.9762 21.1777 — 21.0511

18 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 19: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

coefficients are absolutely larger than the positivePremise 1 coefficients, so unique quantifiers havethe overall impact of producing a tendencytoward ca conclusions, except when the secondquantifier is no, and especially when the first quan-tifier is no or some_not. Note that it is a consequenceof the way that the quantifier variables are definedthat they do not affect problems with repeatedquantifiers, and so this result means that these pro-blems with repeated quantifiers have a relativelygreater tendency for ac conclusions. When thereis no quantifier asymmetry, premise order is usedto break symmetry.

The only significant effect of whether aproblem has valid conclusions is an interactionwith all, whose effect is diminished in NVC pro-blems. It is reassuring that the existence of validconclusions plays little role in determining con-clusion term order, indicating that similar pro-cesses determine term order whether participantsare drawing conclusions validly or in error. Againthe evidence in the literature is that term ordereffects in NVC problems are similar to those inVC problems.

To summarize the structural effects, when thegrammatical categories of the two end terms aredifferent (i.e., in diagonal Figures 1 and 2), termsstrongly tend to preserve their category in con-clusions (Figure 1 ¼ ac; Figure 2 ¼ ca).Quantifiers except no tend to make their premise’send terms into subjects, and more so when inPremise 2. These tendencies generally workagainst the overall tendency just noted to put theterms in premise order.Q20 Some_not acts mainlywhen in Premise 2.

On top of these structural effects across all sub-jects, the individual differences between subjectsdifferentially affect conclusion term ordering.The only main effect of any individual differencevariable is that hesitancy increases the generaltendency toward ac responding throughout.Hesitancy interacts with grammar. Although hesi-tant participants have a greater tendency to con-clude ac, preserving the premise ordering ofterms, they are also more likely to overridepremise order by grammar in Figure 2 and drawa ca conclusion.

Although no is alone among the quantifiers inhaving no main effect, it interacts with both hesi-tancy and rashness out of place in three-way inter-actions with grammar. Without going into the fulldetail, these interactions are consistent with theview that hesitant participants are more sensitive,and rash out-of-place participants less sensitive,to the logical influence of no on conclusion termorder noted above.

The source-founding model and individualdifferencesThis exploratory regression model is rathercomplex. It clearly demonstrates that there are sys-tematic effects of patterns of quantifier interpret-ation on participants’ term ordering when theyreason. These patterns are grossly comprehensiblein terms of differential sensitivities to informationpackaging of information. However, the complex-ity of the model makes it hard to digest, and wewant to relate conclusion term order to reasoningprocesses. Stenning and Yule’s (1997) source-founding model aids interpretation. The maingoal of this modelling is to understand thenature of the interactions between the negativequantifiers and the individual differences inquantifier interpretation.

The source-founding model was describedearlier. Here we supplement the model toaccount for the interactions between interpretationpatterns (particularly hesitancy and rashness out ofplace) and the quantifiers (particularly the negativequantifiers). In this model, the main locus of oper-ation of effects is on the choice of source premise.Hesitant participants’ preference for maintainingpremise order in their conclusions constitutes apreference for choosing the first premise assource, thereby processing the discourse sequen-tially. Rash out-of-place participants are lesssequential. To further understand the interactionsbetween participants’ patterns of interpretationwith negative quantifiers we first need toexamine some logical generalizations about noand then to re-represent the data in a way thatbrings out strategies for source premise choice.

The relation of no to classically valid conclusionterm ordering is logically as well as empirically

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 19

INTERPRETATION AND REASONING Q1

Page 20: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

unique. For the range of problems where logicdetermines the subject/predicate structure ofvalid conclusions (ie., problems with valid con-clusions in only one term order), we can ask howtheir quantifiers influence that structure logically.It turns out that in all such problems with no,the end term of the no premise always canbecome the predicate of the conclusion. In otherwords, the end term of a no premise is never obli-gatorily the subject of the only valid conclusion.No other quantifier has this simple logicalrelationship to conclusion structure. There is atendency for end terms of all premises to becomesubjects of valid conclusions, but there are excep-tions. Some and some_not premises each contributeroughly equal numbers of subject and predicateterms to uniquely ordered valid conclusions. Asfar as we know this generalization has not beennoted before. This logical generalization isreflected in Stenning and Yule’s (1997) source-founding model of reasoning. The two heuristics(pick existential over universals and then positivesover negatives) determine that no premises arenever the source identified by the algorithm.They may be alternative sources but they arenever the source picked by these heuristics.

In order to help understand the interactionsbetween interpretation pattern and term order inreasoning, we need a representation of the datathat emphasizes the influence of premise orderingon source identification. Accordingly, we can re-represent the problems as pairs that are relatedby premise reordering (e.g., All A are B/All B areC and All B are A/All C are B are such a pair).The 27 problems with valid conclusions are com-posed of 13 such pairs and one singleton, whichis symmetrical about this ordering (All B areA. All B are C—reordering these premises merelyleads to reassigning the end terms).

The source identification heuristics of themodel provide a criterion for defining canonicaland noncanonical orderings of premises for eachof these 13 problem pairs. We call the problemwith the source premise first (the one identifiedby the heuristics above) canonical and the othermember of the problem pair noncanonical. Thereis one all/all problem pair that has to be ordered

by its grammar (All AB. All BC is the canonicalproblem of this pair). This definition of canonicalproblems according to the source premise identify-ing heuristics means that no premises are alwayssecond premises in canonical problems.

Having defined canonicality in terms of thesource premise identifying heuristics and observedthe peculiar logical properties of no, we now applycanonicality to understanding the relation betweenterm order and individual differences in processesof drawing conclusions. In canonical problems,participants read a source-founding premise first,and if they have strong tendencies to sequentiallyconstruct their representations, then we canexpect that they will find canonical problemseasier. If, on the other hand, they are rather indif-ferent to the surface order of arrival, canonicalityof problem should have less impact. As we havealready seen, hesitant participants are in generalmuch more susceptible to premise order in choos-ing conclusion term order.

Hesitancy’ interacts with no and with no andgrammar together. Taking the three-way inter-action first, because the regression model’s signifi-cant term for the hesitancy by grammar by nointeraction is with no in Premise 2, it affects onlycanonical problems. Overall, hesitant participantsdraw even more premise ordered conclusionsfrom canonical no problems than they do fromcanonical problems in general. When premiseorder and the properties of no line up with thesource-selecting heuristics, the two have an inten-sifying effect on hesitant participants’ conclusionterm ordering.

Next, we consider the interaction betweenrashness out of place and no. Because theregression model’s significant terms for the rash-ness out of place by no interaction are with noin Premise 1, they affect only noncanonicalproblems. Because the coefficient’s sign is positiveit increases the number of ac conclusions, whichare here conclusions with terms ordered noncano-nically. So rash out-of-place participants heretend to treat the no premise like the otherquantifiers.

Finally, canonicality can help us to understandthe interactions between rash in place, some_not,

20 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 21: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

and the three-way interaction of both withgrammar. The significant terms of both two- andthree-way interactions are with some_not inPremise 2, and so these effects are on noncanonicalproblems (save for a single exception that wereturn to later). Even though in Figure 2 problems,the some_not end term is subject, rash-in-placeparticipants still prefer the other quantifier assource, thus showing an unusual indifferencehere to grammatical structure.

So with all these interactions between individ-ual differences in interpretation and negativequantifiers in determining term ordering inreasoning, hesitant participants tend to be moreinfluenced by premise order and by no in theirchoice of source premise. Rash participants areless affected by premise order and the negativenessof quantifiers than are other participants.

Canonicality can also help us to understand thepart that interpretation differences play in deter-mining reasoning accuracy as mediated by con-clusion term ordering. However, for such analysiswe need to pool the data from the developmentand test datasets.

With the larger dataset it is possible to pursuethe question of whether interpretation patterns’effects on reasoning accuracy can be shown to bemediated through their effects on conclusionterm order. Experiment 1 showed that interpret-ation patterns had gross effects on reasoning accu-racy, but the concept of canonicality and thesource-founding model can help us to explorerelations between interpretation and reasoningaccuracy in a much more articulated way. If thesource-founding model is correct, the differentweights given by different subject groups to thefactors determining source premise should influ-ence reasoning accuracy as well as conclusionterm order. Certain information packagings willhide certain conclusions from certain subgroupsof participants. Canonicality provides a way ofanalysing the effect of premise order on reasoningby controlling all the other factors influencingchoice of source premise. Generally, if any groupfinds canonical problems easier than their nonca-nonical counterparts, that means that premiseorder is playing some instrumental role in those

participants’ reasoning, because these pairs ofproblems differ only by their premise order.

From the statistical model of conclusion termordering, we know that hesitant participants aremore influenced in their conclusion term orderby premise order than are rash participants.Similarly, from the interpretation task, we knowthat rash-out-of-place participants are particularlyindifferent to term order in drawing implicatures.Are these participants duly more, or less, affectedin their term ordering and reasoning accuracy bycanonicality of problem? An especially informativeplace to look for interactions between premiseorder and the source premise identifying heuristicsis the exceptional problems for which the crudeversion of the heuristics construct invalidconclusions.

There is just one problem pair (out of 13) forwhich the heuristic of adding the end term ofthe non source premise (as identified by thesource premise identifying heuristics) to the endof the source premise, rules out drawing a validconclusion. This is the Figure 4 pair: Some B arenot A/All B are C (canonical); and All B are A/Some B are not C (noncanonical). The con-clusion-drawing heuristic (attach the end term ofthe non source premise to the end of the sourcepremise and remove the middle term) applied tothese problems yields the conclusion Some A arenot C for the canonical first problem and Some Care not A for noncanonical second problem,whereas the valid conclusions are the other wayround. Note that this is because to get the validconclusion the heuristic requires adjustment tothe negations as it initially yields a negatedpredicate as subject term.

This pair of problems therefore provides aninteresting test case for interactions betweeninterpretation patterns and reasoning accuracy, asmediated by conclusion term ordering. We canusefully compare this problem pair with twoother pairs. The Figure 3 problem that has thesame quantifiers has valid conclusions, which arefound by the source premise identifying and con-clusion-drawing heuristics: Some A are not B/AllC are B (canonical) and All A are B/Some C arenot B (noncanonical). The simple heuristics work

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 21

INTERPRETATION AND REASONING Q1

Page 22: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

on these problems because the end terms are bothparticipants, just as they fail to work in theFigure 4 example because the end terms are bothpredicates. The second insightful comparison iswith the Figure 1/2 pair where canonicality isresolved only by grammar: All AB/All BC (canoni-cal) and All BA/All CB (noncanonical). Sincegrammar is a powerful resolver of conclusionterm order for all interpretation groups, and itdoes not interact with all, this problem pair pro-vides a control. The obvious interpretation dimen-sion to explore is hesitant versus. rash out of placesince these are negatively correlated and identifytwo almost disjoint groups.

Canonicality interacting with the sourcepremise identifying heuristics makes ratherprecise predictions about hesitant and rash out-of-place participants’ reasoning accuracy for thesethree pairs of problems. The all/all problem pairshould show a reasoning accuracy advantage forthe canonical problem over the noncanonicalproblem for both interpretation groups, becausegrammar overrides premise order in determiningchoice of source.

Hesitant participants for whom premise order hasa strong influence on choice of source should show astrong canonicality advantage for the some_not/allproblem pair in Figure 3. Rash outof- place subjects,who are not much influenced by premise order,should show little canonicality effect on this pair ofproblems. But for the some_not/all problem pair inFigure 4, hesitant participants should show areverse canonicality effect because determiningsource by premise order gives the right conclusionterm order in the anticanonical problem and thewrong one in the canonical problem. Again, rashout-of-place participants should show little canoni-cality effect here.

The data from Experiments 1 and 2 werepooled. This yielded 17 hesitant participants, 43rash out-of-place participants, and 42 participantswho were neither (Table 2).

One participant was excluded from the lattergroup due to missing data.

An ANOVA was conducted with the within-group factors canonicality (two levels), problempair (three levels), and group factor (hesitant vs.

rash out of place). The dependent variable wasreasoning accuracy adjusted by subtracting theparticipant’s score on the problem from themean score on that problem of the group of par-ticipants who were neither hesitant nor rash outof place. This was done to remove some of theeffects of absolute difficulty of problem. Theresults showed that there was no main effect ofcanonicality of problem and no main effect ofsubject group. There was a significant interactionbetween canonicality and problem pair, F(z) ¼

3.68, p ¼ .028, and between subject group, cano-nicality, and problem pair, F(z) ¼ 3.59, p ¼ .031.The means for the three-way interaction appearin Table 6.

The interaction is of the form that whengrammar identifies source premise, both hesitantand rash participants are equal to their nonhesi-tant, nonrash peers for the canonically orderedproblem, but both suffer roughly equally whenthe premise order is anticanonical. Whengrammar does not identify source, rash partici-pants suffer relative to their nonrash, nonhesitantpeers regardless of the premise order. Howevernow hesitant participants show a sensitivity towhether premise order defines source accurately.In the standard problem where the source ident-ified by the model’s heuristics can be used tomake the simple construction of the correct con-clusion, hesitant participants actually outperformtheir nonhesitant, nonrash peers on the canonicallyordered member of the pair, but underperformthem on the noncanonical member. In the excep-tional problem where the source identified by theheuristic cannot be so used, they underperformtheir peers on the canonically ordered problembut outperform them on the noncanonicalmember of the pair. These results are consistentwith the idea that hesitant participants tend toidentify Premise 1 as source, whereas rash out-of-place participants are less affected bypremise order and more by grammar and thequantifier attributes that drive the heuristics.Note that whereas there are no global differencesin accuracy, there are large and opposite effectson groups of problems for different subjectgroups.

22 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 23: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Summary of experimental results

The regression models’ fits to both sets of data showthat groups of participants classified by theirinterpretation of quantifiers exhibit radically differ-ent patterns of term ordering. Even if we take someof the most powerful effects known in the syllogisticreasoning literature (such as the effect of Figure 1 vs.Figure 2 on term order) the model allows us to findsubgroups of participants and subgroups of pro-blems that systematically fail to show these effects,or even show reversals of them. Group data, infact, are highly misleading here.

We treated Grice’s (1975)Q21 theory as a broad fra-mework for a range of credulous reasoning processesand therefore adopted an exploratory approach toyield two dimensions for classifying participants’interpretations. We looked for effects of patternsof interpretations on patterns of reasoning. Highlystructured patterns of interpretation do have sys-tematic effects on participants’ reasoning. Thenovel observations of individual differences ininterpretation affecting processes of reasoning canbe accommodated smoothly into the source-founding model. Indeed, without the model itwould be hard to give an overall picture of howinterpretation affects reasoning.

An exploratory approach throws up newempirical phenomena to explain. The hesitant

are an important group of hitherto unnoticed par-ticipants who underinfer and who are saved fromclassical fallacies by their sensitivities to infor-mation packaging. Rash out-of-place participantsplay counterpoint to them. The distinctivelogical and psychological properties of no havealso not been remarked on before. Exploratoryresults are messy and complicated. We couldpresent more examples of subgroups of partici-pants behaving oppositely on subgroups ofproblems but space forbids. Such messy explora-tions are necessary if premature dismissals oftheories are to be avoided.

GENERAL DISCUSSION

Where do these findings leave our interpreta-tional approach to human reasoning more gener-ally? Participants have a variety of credulousinterpretations of these tasks which they carryover from reasoning in the one-sentence to thetwo-sentence task. The source-founding modelis a useful abstraction over many details ofinterpretation and representation, which allowsdifferences in strategies to be explored. Rashnessand hesitancy are likewise coarse concepts closeto the data. We have not offered logical models

Table 6. Combined development and test datasets: Canonicality advantage on reasoning accuracy scores

Subject group

Hesitant Rash-out-of-place

Problem Pair M S.E. M S.E.

All AB. All BC. So, All AC 2.02 .09 2.09 .06

All BA. All CB. So, All CA 2.23 .12 2.22 .08

Some A not B. All CB. So, Some A not C þ.11 .12 2.16 .08

All AB. Some C not B. So, Some C not A 2.16 .12 2.19 .08

Some B not A. All BC. So, Some C not A 2.22 .12 2.10 .08

All BA. Some B not C. So, Some A not C þ.12 .12 2.17 .08

Note: Adjusted for nonhesitant non-rash-out-of-place mean scores, for three canonical/noncanonical

problem pairs and for hesitant and rash-out-of-place subject groups. The top member of each

problem pair is canonical. A positive value means the participants are more accurate than nonhesi-

tant, non-rash-out-of-place subjects: A negative score means they are less accurate.

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 23

INTERPRETATION AND REASONING Q1

Page 24: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

here, but rather evidence that a variety is necessary,and that they should cross tasks (see Stenning &van Lambalgen, in pressQ3 , for the detailed devel-opment and neural implementation of such adefault logical model). We agree withNewsteadQ22 , Roberts, and Griggs that Griceaninterpretations play important roles (severalroles:Q5 Grice does not provide a single interpret-ation or processing model) in the immediateinference task but have provided evidence thatthat these roles continue into syllogistic reasoning.We agree with Oakford and Chater (2001)Q23 thatthe reasoning processes of most participants donot correspond to deductions in classical logic,but we would point out that their resort to classicalprobability theory assumes that classical logic is theunderlying logic of all subjects. However, wedisagree with both groups of researchers in thatwe believe that group models are unjustified andmisleading.

The family of default logical models of theseprocesses that we envisage are, however, closelyrelated to Oakford and Chater’s (2001)Q23 compu-tational-level probabilistic models. Indeed, theymight be seen as offering qualitative models ofreasoning about likelihoods through their data-bases of defeasible conditionals. If this can besubstantiated, then it would bring additionalbenefits, because these logics offer plausibleprocess models. Our approach does reveal howclose credulous models are to the classical scepti-cal one in the domain of the syllogism, and thesource-founding model particularly offers a wayof expressing this continuity through strategicvariations.

Mental models theorists have recently claimedthat their theory encompasses defeasible reason-ing, and, as mentioned in the Introduction,Bonnefon (2004) has shown that mental modelstheory lies somewhere between sceptical and clas-sical reasoning. It is obvious that if mental modelstheory can model many different logical systems(at least classical, modal, probability, and nonmo-notonic reasoning have been claimed), the systemis in theory capable of modelling individual differ-ences.Q24 Furthermore, it is clear on purely logicalgrounds that if allowed access to the full panoply

of set theory in its metalanguage (as it seems tobe), just about any reasoning can be described asmanipulations on models. However, then theinteresting systems can be captured in prooftheories. Our complaint is with the effect thatclaiming a “single fundamental human reasoningmechanism” has on the investigation of individualvariety. From the earliest papers (e.g., Johnson-Laird & Steedman, 1978), mental models theorydescribed many subject’ reasoning as “failure tosearch for alternative models in which the candi-date conclusion is invalid”. But such patterns justare defeasible reasoning to what the subject takesto be “intended models”. Is the subject failing toreason classically or succeeding in reasoning defea-sibly? When competence theories are entwinedwith performance theories, the question does notseem to arise.

How is one to view these individual differencesbetween subject’ interpretation and reasoningpatterns? Are they traits that Q25persist as the samesubject travels across all contexts? Or are theymore subtle differences in which contexts evokedifferent kinds of reasoning from subject? Wefavour the latter view, believing that most partici-pants have access to most reasoning systems insome context or other. Subjects perhaps differmainly in which situations evoke which systems,and especially in their strategic flexibility in choos-ing appropriate systems in the abstracted tasks ofthe reasoning laboratory and in formal education.Taking subjects’ own interpretations seriously inevaluating their reasoning does not entail that allinterpretations be treated as equally appropriateor Q26justified. A beginning has been made on study-ing issues of reasoning styles by investigating indi-vidual differences in diagrammatic reasoning andlearning (Cox, 1999; Stenning, 2002; Stenning,Cox, & Oberlander, 1995). To find out whetherthis approach is fruitful will require an empiricalfocus on what is common in the same subject’sreasoning as she or he faces different tasks indifferent contexts.

Original manuscript received 14 June 2004

Accepted revision received 24 May 2005

PrEview proof published online date month year

24 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 25: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

REFERENCES

Bonnefon, J.-F. (2004). Reinstatement, floatingconclusions, and the credulity of MentalModels Theory. Cognitive Science, 28(4),621–631.

Byrne, R. M. J. (1989). Suppressing valid inferenceswith conditionals. Cognition, 31, 61– 83.

Chapman, K. J., & Chapman, J. P. (1959). The atmos-phere effect reexamined. Journal of Experimental

Psychology, 58, 220–256.Chater, N., & Oaksford, M. (1999). The probability

heuristics model of syllogistic reasoning. CognitivePsychology, 38, 191–258.

Cox, R. (1999). Representation construction, externa-lised cognition and individual differences. Learningand Instruction, 9, 343–363.

Dickstein (1978).Q8Evans, J. (2002). LogicQ27 and human reasoning: An

assessment of the deduction paradigm. PsychologicalBulletin, 128, 978–996.

Evans, J. (2003). In two minds: Dual-process accountsof reasoning. Trends in Cognitive Sciences, 7(10),454–459.

Gernsbacher,Q28 M. A. (Ed.). (1994). Handbook of psycho-

linguistics. New York: Academic Press.Grice, H. P. (1975). Logic and conversation.

In P. Cole & J. Morgon, (Eds.), Syntax and

Semantics Vol. 3. Speech acts. London:Academic Press.

Hosmer, D. W., & Lemeshow, S. (1989). Applied logisticregression. New York: John Wiley & Sons.

Johnson-Laird & Bara (1984)Q9 .Johnson-Laird, P., & Steedman, M. (1978). The

psychology of syllogisms. Cognitive Psychology, 10,64–99.

Menard, S. (1995). Applied logistic regression analysis.Thousand Oaks, CA: Sage Publications.

Newstead, S. (1989). Interpretational errors in syllogis-tic reasoning. Journal of Memory and Language, 28,78–91.

Newstead (1985).Q6Newstead, S. (1995). Gricean implicatures and syllogis-

tic reasoning. Journal of Memory and Language, 34,644–664.

Oaksford, M., & Chater, N. (1994). A rationalanalysis of the selection task as optimal dataselection. Psychological Review, 101, 608–631.

Oaksford, M., & Chater, N. (2001). The probabilisticapproach to human reasoning. Trends in Cognitive

Sciences, 5(8), 349–357.

Politzer, G. (1990). Immediate deduction betweenquantified sentences. In K. J. Gilhooly, M. T.G. Keane, R. H. Logie, & G. Erdos (Eds.), Linesof thinking: Vol. 1. Representation, reasoning,

analogy and decision Making. John Wiley & SonsLondon.

Roberts, M. J., Newstead, S. E., & Griggs, R. A.(2001). Quantifier interpretation and syllogisticreasoning. Thinking and Reasoning, 7(2), 173–204.

Stanovich, K. E. (1999). Who is rational? Studies of indi-

vidual differences in reasoning. Mahwah, NJ:Lawrence Erlbaum Associates, Inc.

Stenning, K. (2002). Seeing reason: Language and image

in learning to think. Oxford, UK: Oxford UniversityPress.

Stenning, K., & Cox, R. (1995). Attitudes to logicalindependence: Traits in quantifier interpretation. InJ. D. Moore & J. Fain Lehman (Eds.), Proceedingsof the 17th Annual Conference of the Q29Cognitive

Science Society, (pp. 742–747). Hillsdale, NJ:Lawrence Erlbaum Associates, Inc.

Stenning, K., Cox, R., & Oberlander, J. (1995).Contrasting the cognitive effects of graphicaland sentential logic teaching: Reasoning, rep-resentation and individual differences.Language and Cognitive Processes, 10(3/4), 333–354.

Stenning, K., & Oberlander, J. (1995). Q30A cognitivetheory of graphical and linguistic reasoning:Logic and implementation. Cognitive Science, 19,97–140.

Stenning, K., & van Lambalgen, M. (2004). A littlelogic goes a long way: Basing experiment on Q31semantictheory in the cognitive science of conditional reason-ing. Cognitive Science.

Stenning, K., & van Lambalgen, M. (in press). Q3Semanticinterpretation as computation in nonmonotoniclogic: The real meaning of the suppression task.Cognitive Science.

Stenning, K., & Yule, P. (1997). Image and language inhuman reasoning: A syllogistic illustration. CognitivePsychology, 34, 109–159.

Stevens, J. P. (2001). Applied multivariate statisticsfor the social sciences (4th ed.). Mahwah, NJ:Lawrence Erlbaum Associates, Inc.

Vallduv́, E. (1992). The informational component.New York: Garland.

Wason, P. C. (1965). The contexts of plausible denial.Journal of Verbal Learning and Verbal Behaviour, 4,7–11. Q4

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 25

INTERPRETATION AND REASONING Q1

Page 26: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

APPENDIX A

Response accuracy—immediate inference task

Table A1. Development dataset: Proportion of subjects responding “True”, “False”, and “Can’t tell” to immediate inference questions, along

with correct responses and missing data (no response)

Premise

ALL NO SOME SOME NOT

Conclusion Prop. Newst. Prop. Newst. Prop. Newst. Prop. Newst.

TRUE All 0.99 0.01 0.09 0.06

All0 0.33 0.57 0.02 0.07 0.03

No 0.00 0.94 0.02 0.07

No0 0.03 0.52 0.80 0.01 0.05

Some 0.83 0.02 0.97 0.35 0.83

Some0 0.67 0.87 0.03 0.69 0.87 0.27 0.77

Some. . .not 0.03 0.76 0.45 0.93 0.95

Some. . .not0 0.15 0.47 34 0.77 0.34 0.83 0.50 0.90

FALSE All 0.01 0.94 0.34 0.87

All0 0.15 0.64 0.26 0.46

No 0.99 0.01 0.96 0.49

No0 0.68 0.06 0.67 0.34

Some 0.13 0.93 0.01 0.07

Some0 0.07 0.54 0.01 0.05

Some. . .not 0.92 0.14 0.05 0.00

Some. . .not0 0.34 0.10 0.07 0.03

CAN’T TELL All 0.00 0.00 0.55 0.06

All0 0.52 0.27 0.64 0.47

No 0.01 0.00 0.00 0.43

No0 0.29 0.36 0.30 0.56

Some 0.03 0.00 0.00 0.54

Some0 0.26 0.37 0.28 0.64

Some. . .not 0.05 0.04 0.48 0.03

Some. . .not0 0.51 0.32 0.57 0.42

MISSING DATA All 0.00 0.05 0.02 0.01

All0 0.00 0.07 0.03 0.04

No 0.00 0.05 0.02 0.01

No0 0.01 0.06 0.02 0.05

Some 0.00 0.05 0.02 0.04

Some0 0.00 0.06 0.02 0.04

Some. . .not 0.00 0.06 0.02 0.02

Some. . .not0 0.06 0.02 0.05

CORRECT All T F CT F

All0 CT F CT CT

No F T F CT

No0 F T F CT

Some T F T CT

Some0 T F T CT

Some. . .not F T CT T

Some. . .not0 CT T CT CT

Note: Primed conclusion quantifiers (e.g. A0) represent the converse conditions (e.g., ALL Bs are As). Newst. ¼ Newstead’s (1989)

results if the results of the present study differ by more than .07 from those reported by Newstead (1989, Table 2, page 86).

26 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 27: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

APPENDIX

B

Logisticregressionmodelsofac

conclusionprobabilityfordevelopmentandtest

datasets

TableB1.The

logisticregression

modelofsubjects’probability

ofac

conclusion:Developmentandtestdata

Modeldevelopmentdata

Modeltestdata

Variable

BSE

Wald

dfSig

Exp(B)

BSE

Wald

dfSig

Exp(B)

Gra

mm

ar8

6.0

16

22

0.0

00

00

.181

01

75

.70

52

0.0

00

Figure

11

.083

00

.18

49

34

.30

38

10

.00

00

0.1

136

0.9

58

0.1

643

4.2

37

10

.000

2.6

07

Figure

22

0.8

903

0.1

69

42

7.6

28

31

0.0

00

02

0.1

012

21

.45

00

.147

97

.41

61

0.0

000

.235

All

15

.16

03

20

.00

05

0.0

668

23

.57

42

0.0

00

Allin

Premise1

0.5

012

0.2

36

54

.491

41

0.0

34

10

.031

60

.39

20

.189

4.3

101

0.0

381

.480

Allin

Premise2

20

.976

20

.25

09

15

.14

36

10

.00

01

20

.072

52

0.9

78

0.2

022

3.5

42

10

.000

0.3

76

Exi

sten

tial

21

.68

87

20

.00

00

0.0

841

50

.40

92

0.0

00

Existential

inPremise1

0.7

112

0.2

62

87

.324

61

0.0

06

80

.046

10

.49

30

.211

5.4

631

0.0

191

.638

Existential

inPremise2

21

.177

70

.25

58

21

.19

79

10

.00

00

20

.087

62

1.4

60

0.2

065

0.1

67

10

.000

0.2

32

Exi

sten

tial

(ie.

So

me)

�S

om

e_n

ot

8.3

517

20

.01

54

0.0

417

1.3

502

0.5

09

Some_notin

Premise1

20

.688

40

.24

26

8.0

502

10

.00

45

20

.049

22

0.0

78

0.2

140

.133

10

.716

0.9

25

So

me_

no

tin

Pre

mis

e2

0.1

266

0.2

53

00

.250

51

0.6

16

70

.000

00

.24

00

.221

1.1

831

0.2

771

.271

All

�V

alid

ity

4.8

619

20

.08

80

0.0

186

5.4

832

0.0

64

All

inP

rem

ise

1b

yIn

vali

d2

0.1

021

0.2

92

20

.122

01

0.7

26

80

.000

00

.30

00

.226

1.7

651

0.1

841

.350

Allin

Premise2byInvalid

0.5

971

0.2

79

64

.558

71

0.0

32

80

.032

00

.47

70

.235

4.1

331

0.0

421

.611

Hesitant-out-o-place

0.1

718

0.0

34

72

4.5

09

11

0.0

00

00

.094

80

.06

90

.029

5.8

171

0.0

161

.072

Hes

itan

t-ou

t-o-p

lace

�G

ram

mar

23

.75

47

20

.00

00

0.0

888

15

.22

22

0.0

00

Fig

ure

10

.008

20

.07

13

0.0

131

10

.90

89

0.0

000

0.1

40

0.0

703

.955

10

.047

1.1

50

(Continuedoverleaf

)

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 27

INTERPRETATION AND REASONING Q1

Page 28: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

TableB1.(C

on

tin

ued

)

Modeldevelopmentdata

Modeltestdata

Variable

BSE

Wald

dfSig

Exp(B)

BSE

Wald

dfSig

Exp(B)

Figure

22

0.2

900

0.0

62

02

1.9

13

81

0.0

00

02

0.0

892

20

.14

70

.052

8.0

991

0.0

040

.863

Hes

itan

t-ou

t-o-p

lace

�N

on

e9

.187

72

0.0

10

10

.045

55

.295

20

.071

No

ne

inP

rem

ise

12

0.0

026

0.0

60

40

.001

81

0.9

65

70

.000

00

.00

80

.052

0.0

221

0.8

811

.008

Nonein

Premise2

20

.181

50

.05

99

9.1

742

10

.00

25

20

.053

52

0.1

18

0.0

525

.138

10

.023

0.8

88

Hes

itan

t-ou

t-o-p

lace

�N

on

e�

Gra

mm

ar1

3.2

38

54

0.0

10

20

.045

89

.983

40

.041

No

ne

inP

rem

ise

1in

Fig

ure

12

0.1

508

0.0

98

42

.350

31

0.1

25

32

0.0

118

20

.05

60

.111

0.2

521

0.6

160

.946

No

ne

inP

rem

ise

1in

Fig

ure

20

.123

00

.09

86

1.5

564

10

.21

22

0.0

000

0.1

77

0.0

884

.025

10

.045

1.1

93

No

ne

inP

rem

ise

2in

Fig

ure

10

.173

80

.14

56

1.4

245

10

.23

27

0.0

000

0.0

24

0.1

190

.042

10

.838

1.0

24

Nonein

Premise2in

Figure

20

.262

20

.09

48

7.6

449

10

.00

57

0.0

475

0.2

25

0.0

837

.256

10

.007

1.2

52

Ras

h-o

ut-

o-p

lace

�N

on

e1

4.8

55

62

0.0

00

60

.065

95

.043

20

.080

Nonein

Premise1

0.1

706

0.0

44

31

4.8

25

71

0.0

00

10

.071

60

.06

20

.032

3.7

541

0.0

531

.064

No

ne

inP

rem

ise

22

0.0

666

0.0

43

72

.326

41

0.1

27

22

0.0

114

20

.05

80

.032

3.3

521

0.0

670

.943

Ras

h-i

n-p

lace

�S

om

e_n

ot

24

.85

69

20

.00

00

0.0

913

10

.59

02

0.0

05

So

me_

no

tin

Pre

mis

e1

0.0

296

0.0

89

50

.109

01

0.7

41

30

.000

02

0.1

36

0.0

614

.899

10

.027

0.8

73

Some_notin

Premise2

20

.454

60

.09

20

24

.41

49

10

.00

00

20

.094

62

0.1

65

0.0

646

.737

10

.009

0.8

48

Ras

h-i

n-p

lace

�S

om

e_n

ot�

Gra

mm

ar2

5.1

13

34

0.0

00

00

.082

73

.424

40

.490

So

me_

no

tin

Pre

mis

e1

inF

igu

re1

0.0

573

0.1

68

30

.115

81

0.7

33

60

.000

02

0.0

43

0.1

000

.183

10

.668

0.9

58

So

me_

no

tin

Pre

mis

e1

inF

igu

re2

0.0

086

0.1

40

30

.003

81

0.9

51

00

.000

00

.13

30

.098

1.8

131

0.1

781

.142

Some_notin

Premise2in

Figure

10

.377

90

.15

19

6.1

889

10

.01

29

0.0

409

0.0

15

0.1

010

.021

10

.884

1.0

15

Some_notin

Premise2in

Figure

20

.683

00

.13

86

24

.28

07

10

.00

00

0.0

944

0.0

93

0.0

931

.004

10

.316

1.0

98

Co

nst

ant

0.2

098

0.1

25

02

.818

01

0.0

93

20

.81

30

.112

52

.82

11

0.0

002

.254

Note:

Bold

face

ind

icat

esa

leve

lof

ava

riab

lew

ith

sign

ifica

nt

effe

ctin

the

dev

elop

men

td

atas

et.

28 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 29: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

APPENDIX C

Conclusion term order data by problem

Table C1. Development dataset: Mean proportion ac conclusions, residuals of model’s predictions, and number of observations, by syllogism

Premise 1

All Some No Some not

Premise 2 Mean Resid N Mean Resid N Mean Resid N Mean Resid N

Figure 1 ABBC

All 0.90 0.05 39 0.84 0.03 38 0.69 20.11 35 0.81 0.06 36

Some 0.74 20.01 31 0.67 20.18 24 0.63 20.13 32 0.84 0.05 25

No 0.86 0.02 37 0.97 0.11 29 0.79 20.06 14 0.80 20.01 25

Some_not 0.75 0.09 32 0.79 0.01 24 0.75 0.07 16 0.76 20.08 17

Figure 2 BACB

All 0.27 20.10 37 0.30 20.02 30 0.24 20.06 38 0.24 0.00 33

Some 0.21 20.04 39 0.31 20.05 26 0.35 0.11 34 0.21 20.07 24

No 0.40 0.04 35 0.45 0.05 29 0.38 0.02 13 0.27 20.04 22

Some_not 0.25 0.08 32 0.35 0.07 23 0.25 0.07 24 0.33 20.03 21

Figure 3 ABCB

All 0.79 0.16 28 0.75 0.17 32 0.63 0.08 40 0.65 0.17 37

Some 0.36 20.12 28 0.60 20.03 20 0.50 0.01 34 0.67 0.14 24

No 0.58 20.04 36 0.65 20.01 31 0.77 0.13 13 0.48 20.10 21

Some_not 0.23 20.15 31 0.27 20.25 26 0.38 20.01 21 0.47 20.15 17

Figure 4 BABC

All 0.70 0.08 37 0.47 20.10 38 0.54 0.00 35 0.29 20.19 38

Some 0.53 0.05 36 0.68 0.05 19 0.41 20.07 32 0.64 0.11 25

No 0.65 0.03 31 0.65 20.01 31 0.67 0.04 15 0.41 20.16 22

Some_not 0.54 0.02 37 0.48 20.04 25 0.46 0.08 24 0.76 0.15 17

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X) 29

INTERPRETATION AND REASONING Q1

Page 30: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

Table C2. Test dataset: Mean proportion ac conclusions, residuals of model’s predictions and number of observations, by syllogism

Premise 1

All Some No Some not

Premise 2 Mean Resid N Mean Resid N Mean Resid N Mean Resid N

Figure 1 ABBC

All 0.93 0.05 61 0.85 0.03 62 0.80 0.00 51 0.84 0.03 57

Some 0.88 0.05 51 0.86 20.02 37 0.63 20.08 41 0.76 20.07 34

No 0.89 0.01 61 0.90 0.03 52 0.73 20.15 26 0.88 0.05 41

Some_not 0.84 0.05 58 0.82 20.04 34 0.64 20.07 25 0.77 20.11 31

Figure 2 BACB

All 0.26 20.07 62 0.21 20.08 52 0.32 0.03 61 0.19 20.02 51

Some 0.15 0.00 60 0.30 20.02 36 0.27 0.07 49 0.15 20.09 38

No 0.35 20.06 47 0.47 0.10 49 0.36 0.03 22 0.41 0.12 31

Some_not 0.28 0.05 54 0.25 20.01 29 0.17 20.04 29 0.34 20.03 30

Figure 3 ABCB

All 0.85 0.19 48 0.75 0.09 53 0.66 0.06 57 0.67 0.20 51

Some 0.42 20.18 53 0.71 0.05 33 0.29 20.19 51 0.64 0.05 28

No 0.53 20.20 58 0.72 0.00 51 0.65 20.05 17 0.56 20.06 31

Some_not 0.34 20.11 53 0.58 20.07 32 0.36 20.06 26 0.65 20.04 28

Figure 4 BABC

All 0.92 0.22 52 0.53 20.06 58 0.55 20.02 51 0.25 20.26 54

Some 0.69 0.24 59 0.76 0.05 24 0.40 20.05 47 0.51 20.09 35

No 0.70 20.01 44 0.70 20.03 50 0.92 0.22 13 0.62 20.01 29

Some_not 0.75 0.06 57 0.69 0.03 29 0.59 0.17 29 0.84 0.14 25

30 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 200X, XX (X)

STENNING AND COX

Page 31: Reconnecting interpretation to reasoning through ... · can reconcile these results, purchasing continuity of interpretation through variety of logical treat-ments. We present exploratory

PQJE067-04Queries

Keith Stenning and Richard Cox

Q1 No short title given: “INTERPRETATION AND REASONING” OK?

Q2 Affiliation ok for both authors?

Q3 Stenning & van Lambalgen: still in press?

Q4 Wason’s 1968 selection task: 1965 in Refs

Q5 Should there be a Grice citation here?

Q6 Newstead 1985 not in Refs. Or should it be 1989 or 1996? (page ref seems to correspond to 1995)

Q7 Please give page no for Newstead, 1995, quote.

Q8 Dickstein, 1978, not in Refs.

Q9 Johnson-Laird & Bara, 1984, not in Refs.

Q10 Reference to Newstead: 1995 ok as added?

Q11 Something missing in sentence starting “When subjects decide. . .”

Q12 Oaksfor & Chater: 2001 ok as added?

Q13 . . . if you believe that “No As are Bs” could be true and could be false given the true statement“No As are Bs”: is this ok?

Q14 Appendices re-ordered to follow same order as the one they are referred to in text

Q15 Should explanation of Table 1 that is now three paragraphs further on be here instead?

Q16 Ok as re-phrased? (Bold is supposed to be avoided in tables.)

Q17 should hesitancy out of place be HO?

Q18 Newstead citation here?

Q19 Underscore rather than ellipses (as used earlier) from here on ok?

Q20 Ok as edited? Or should it be “. . . in the premise order just noted. . .?

Q21 Grice’s theory: 1975 ok as added?

Q22 Newstead, Roberts, and Griggs not in Refs – or should it be Roberts, Newstead, and Griggs?

Q23 Oakford & Chater: 2001 ok as added?

Q24 Ok as edited?

Q25 Ok as edited?

Q26 Ok as edited?

Q27 Evans, 2002: text citation?

Q28 Gernsbacher, 1994: text citation?

Q29 Hillsdale, NJ ok as added?

Q30 Stenning & Oberlander, 1995: text citation

Q31 Stenning & van Lambalgen, 2004: rest of reference?