measuring deductive reasoning
DESCRIPTION
TestTRANSCRIPT
-
5/24/2018 Measuring Deductive Reasoning
1/20
21
MEASURING REASONING ABILITY
OLIVER WILHELM
DEDUCTIVE AND INDUCTIVE REASONING
Reasoning is a thinking activity that is of crucialimportance throughout our lives. Consequen-tially, the ability to reason is of central impor-tance in all major theories of intelligence structure.Whenever we think about the causes of eventsand actions, when we pursue discourse, when we
evaluate assumptions and expectations based onour prior knowledge, and when we develop ideasand plans, the ability to reason is pivotal.
The verb reason is associated with varioushighly overlapping meanings. Justifying andsupporting concepts and ideas is as important asconvincing others through good reasons and thediscovery of conclusions through the analysisof discourse. In modern psychology, usually twoto three forms of reasoning are distinguished. Indeductive reasoning, we derive a conclusion thatis necessarily true if the premises are true. Ininductive reasoning, we try to infer information
by increasing the semantic content when pro-ceeding from the premises to the conclusion.Sometimes, a third form of reasoning is distin-guished (Magnani, 2001). In abductive reason-ing, we reason from a fact to the action that hascaused it. Abductive reasoning has not beenthoroughly investigated in intelligence research,and we can consider abductive reasoning to be asubset and mixture of inductive and deductivereasoning. In the remainder of this chapter,abductive reasoning will not be discussed.
In deduction, the premises necessarily entailor imply the conclusion. It is impossible that thepremises are true and that the conclusion isfalse. Three perspectives on deduction can bedistinguished. From a syntactic perspective, therelation between premises and conclusion isderivable independent of the instantiation of thepremises. The criterion for the correctness of an
argument is its derivability from the premises.From a semantic perspective, the conclusion istrue in any possible model of the premises. Thecriterion for the correctness of an argument is itsvalidity. From a pragmatic perspective, there isa learned or acquired relation between premisesand conclusion that has no logical necessity.The criterion to assess the quality of an argu-ment is its utility.
These perspectives cannot be applied toinduction because the criteria to assess conclu-sions must be different. Carnaps formalizationhas attracted considerable attention when it
comes to distinguishing forms of induction.Carnap (1971) classifies inductive arguments asenumerative and eliminative. In enumerativeinduction, the premises assert something as trueof a finite number of specific objects or subjects,and the conclusion infers that what is true forthe finite number is true of all such objects orsubjects. In eliminative induction, confirmationproceeds by falsifying competing alternativehypotheses. The problem with induction is thatwe cannot prove for any inductive inference
373
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 373
-
5/24/2018 Measuring Deductive Reasoning
2/20
with true premises that the inference providesus with a true conclusion (Stegmller, 1996).
Nevertheless, induction is of crucial importancein science whenever we talk about discovery.However, the testing of theories is a completelydeductive enterprise. Induction and deductionhence both have their place in science, and theability to draw good inductive and deductiveinferences is of major importance in real life.
Historically, logic was primarily establishedthrough Aristotle. Although Aristotle viewedlogic as the proper form of scientific investiga-tion, he used the term as equivalent to verbalreasoning. The syllogistic form of reasoning, as
established through Aristotle, dominated logic upuntil the middle of the 19th century. Throughoutthe second half of the 19th century, there was arapid development of logic as a scientific disci-pline. Philosophers such as George Boole (1847)and Gottlob Frege (1879) started to develop for-malizations of deductive logic as a language thatwent beyond the idea that logic should reflectcommon sense and sound reasoning. In a nut-shell, logic was the manipulations of symbols byvirtue of a set of rules. The logical truth of anargument was hence no longer assessed by agree-ment with some experts or through acceptance bycommon sense. Whether logical reasoning wascorrect could then be assessed by agreement witha calculus. In our historical excursion, we need tonote, however, that George Boole did believe thatthe laws of thinking and the rules of logic areequivalent, and John Stuart Mill thought that therules of logic are generalizations of forms ofconclusions considered true by humans.
Apparently from the early days of logic tonow, the puzzle remains that although humansinvented logic, they are not able or willing tofollow its standards in all instances. Humans are
vulnerable to errors in reasoning and do not pro-ceed consistently in deriving conclusions. Theresearch on biases, contents, and strategies inreasoning has a long tradition in psychology.For example, String (1908) investigatedthought processes in syllogistic reasoning anddistinguished various strategies, Wilkins (1929)manipulated test content and observed effectson test properties, and Woodworth and Sells(1935) conducted outstanding research on aparticular bias in syllogistic reasoning labeledthe atmosphere effect.
In contemporary psychological research onreasoning, so-called dual-process theories domi-
nate. In these theories, an associative, heuristic,implicit, experiential, and intuitive system ofinformation processing is contrasted with asecond rule-based, analytical, explicit, andrational system (Epstein, 1994; Evans, 1989;Hammond, 1996; Sloman, 1996; Stanovich, 1999).Most of the biases found in reasoning, judgment,and decision making can be located withinthe first system. A reasoning competence andpropensity to think rationally can be locatedwithin the second system. In considering individ-ual differences in reasoning ability, the interest is
primarily on differences within the second sys-tem. Most of the differences could reflect indi-vidual differences in available resources for thecomputational work to be accomplished to obtaina correct response. An additional source of indi-vidual differences might be the probability withwhich individuals deliberately use the secondsystem when responding to specific problems.
The discussion of individual differences inreasoning ability starts with the assertion thatthere are individual differences in the abilityto reason according to some rational standard.Humans can be rational in principle, but theyfail to a varying degree in practice. The princi-ple governing this rationality is that peopleaccept inferences as valid if there is no mentalrepresentation contradicting the conclusion(Johnson-Laird & Byrne, 1993; Stanovich,1999). Individual differences from this per-spective primarily arise from restrictions in theability to create and manipulate mental repre-sentations. In other words, depending on ourcognitive apparatus, we are able to find a good,or the correct, answer to some reasoning prob-lems but not to other more difficult problems.
In measuring reasoning ability, it is conse-quently assumed that individuals can thinkrationally but that there are individual differ-ences in how well people can do so.
THOUGHT PROCESSES IN REASONING
There are several competing theories for thedescription and explanation of reasoningprocesses. The theories are distinguished by thebroadness of the phenomena they can explain
374HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 374
-
5/24/2018 Measuring Deductive Reasoning
3/20
and how profound the proposed explanationsare. They are also different with respect to how
much experimental research was done to inves-tigate them and how much supportive evidencewas collected. The theory of mental models(Johnson-Laird & Byrne, 1991) is one outstand-ing effort in describing and explaining whatpeople do when they reason, and this theory willbe described in more detail after briefly review-ing more specific accounts of deductive andinductive reasoning, respectively.
Besides many more specific accounts of rea-soning, the mental logic approach to reasoninghas many adherents and was applied to a broad
range of reasoning problems (Rips, 1994).According to mental logic theories, individualsapply schemata of inference when they reason.Errors in reasoning occur when inferenceschemata are unavailable, corrupted, or cannotbe applied. More complex inferences areaccomplished by compiling several elementalschemata. The inference schemata in variousmental logic theories are different from eachother, from logical terms in natural language,and from logical terms in formal logic. Thepsychology of proof by Rips (1994) is themost elaborated and sophisticated theory ofmental logic. However, the mental model theorycovers a broader range of phenomena thanmental logic accounts do. In addition, the exper-imental support seems to be in favor of themental models theory. Finally, both sets oftheories are closely related with each otherthemajor difference being that the mental modelapproach deals with reasoning on the semanticlevel, whereas mental logic theories investigatereasoning on the syntactic level.
Analogical reasoning is a subset of inductivethinking that has received considerable attention
in cognitive psychology. For example, Holyoakand Thagard (1997) developed a multiconstrainttheory of analogical reasoning. Three con-straints are claimed to create coherence in ana-logical thought: similarity between the conceptsinvolved; structural parallelsspecifically,isomorphismbetween the functions in thesource and target domains; and guidance by thereasoners goals. This work was recentlyextended. Hummel and Holyoak (2003) devel-oped a symbolic connectionist model of rela-tional inference. The theory suggests that
distributed symbolic representations are thebasis of relational reasoning in working memory.
There is no doubt substantial promise in extend-ing these accounts of inductive thinking toavailable reasoning measures. So far, there is notenough experimental evidence available allow-ing derivation of predictions of item difficulties(but see Andrews & Halford, 2002), and there isnot enough variability in the application of thetheories to allow a broad application in pre-dicting psychometric properties of reasoningtests in general. To illustrate the character andpromise of theories of reasoning processes,I will limit the exposition to the mental model
theory. It is hoped that the future will bring anintegration of theories of inductive and deduc-tive reasoning along with strong links totheories of working memory.
The mental model theory has been exten-sively applied to deductive reasoning (Johnson-Laird, 2001; Johnson-Laird & Byrne, 1991)and inductive thinking (Johnson-Laird, 1994b).Briefly, mental model theory views thinkingas the manipulation of models (Craik, 1943).These models are analogous representations,meaning that the structure of the models corre-sponds to what they represent. Each entity isrepresented by an individual token in a model.Properties of and relations between entitiesare represented by properties of and relationsbetween tokens, respectively. Negations ofatomic propositions are represented as annota-tions of tokens. Information can be representedimplicitly, and the implicit status of a model ispart of the representation. If necessary, implicitrepresentations can be fleshed out by simplemechanisms. The epistemic status of a modelis represented as a propositional annotation inthe model.
A major determinant of the difficulty of rea-soning tasks is the number of mental modelsthat are compatible with the premises. Thepremises A is left of B. B is left of C. C is leftof D. D is left of E. can be easily integrated intoone mental model:
A B C D E
This mental model supports conclusionssuch as C is left of E. However, the premisesA is left of B. B is left of C. C is left of E. D is
Measuring Reasoning Ability375
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 375
-
5/24/2018 Measuring Deductive Reasoning
4/20
left of E. call for the construction of two mentalmodels. The first mental model places C left of
D, whereas the second mental model places Dleft of C.
Model 1: A B C D E
Model 2: A B D C E
Both models are compatible with thepremises. Generally, the more mental modelsthat are compatible with the premises of a rea-soning task, the harder the task will be. This pre-diction has been confirmed with a wide varietyof reasoning tasks, including syllogisms, spatial
and temporal reasoning, propositional reason-ing, and probabilistic reasoning (Johnson-Laird,1994a; Johnson-Laird & Byrne, 1991). In estab-lished measures of reasoning ability, it is hard orimpossible to specify the nature and number ofmental models a given item calls for (Yang &Johnson-Laird, 2001). This is because test con-struction is usually driven by applying psycho-metric criteria and not by creating indicatorsthrough the strictly theory-driven derivationfrom a cognitive model of thought processes. Inspecifically constructed measures, on the otherhand, the nature and number of mental modelsthat participants need to construct in order tosolve an item correctly can be manipulated. Theempirical study presented later in this chaptermixes measures with and without explicitmanipulation of the number of mental modelsrequired for successful solution.
Inductive and deductive reasoning processesgo through the same three stages of informationprocessing. In the first stage, the premises areunderstood. Knowledge in general and literacyin dealing with the stimuli are critical in build-ing a representation of the problem. Frequently,
the problem will be verbal, and hence readingcomprehension will be an important aspect ofthe creation of representations. However, it iswell known that strategies can have an effect onencoding. In solving syllogisms, subgroups ofindividuals might follow different strategies forcreating an initial representation of problemcontent (Ford, 1995; Stenning & Oberlander,1995; Sternberg & Turner, 1981). As a result,specific groups of items are hard for one sub-group but not for another, whereas for a secondgroup of items, the reverse is true.
In the second stage, a parsimonious descriptionof the constructed model(s) is attempted. If
the task is deductive reasoning, the resultingconstruction should include something thatwas not explicitly evident in the premises.Technically, no meaning is created in deduction.It is all implicit in the premises. Experientially,deductive conclusions do not seem to be com-pletely obvious and apparent. If no such conclu-sion can be found, the answer to the problem canbe that there is no conclusion to the problem. Ifthe task is inductive reasoning, the resulting con-struction allows a conclusion that increases thesemantic information of the premises. Hence,
a tentative hypothesis is constructed that impliesa semantically stronger description than evidentin the premises. However, if background knowl-edge is operating besides the premises, an induc-tive problem might turn into an enthymemeadeduction in which not all premises are explicit.Many of the so-called inductive tasks used inintelligence research technically might wellbe classified as enthymemes. Frequently usednumber-series problems could qualify asenthymemes. If the premises of such a number-series task are explicitly statedfor example, asContinue the number series 1, 3, 5, 7, 9, 11 byone more number, The operations you can useare + , /, and * and all results are positiveintegers, and rules are indicating regularities inproceeding through the number series, and theseregularities can include rule-based changes tothe rulethere might be just one option thatmeaningfully continues the sequence: 13.
In the third stage, models are evaluated,maintained, modified, or rejected. If the task isdeductive reasoning, counterexamples to tenta-tive conclusions are searched for. If no coun-terexample can be found, the conclusion is
produced. If a counterexample is found, theprocess goes back to Stage 2. If the task isinductive reasoning, the conclusion adds infor-mation to the premises. The conclusion shouldbe consistent with the premises and backgroundknowledge. Obviously, inductive conclusionsare not necessarily true. If an induction turns outto be wrong, either the premises are false or theinduction was too strong. If a deduction turnsout to be wrong, the premises must be false.
Evidently, only the third stage is specific toinductive and deductive reasoning, respectively.
376HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 376
-
5/24/2018 Measuring Deductive Reasoning
5/20
However, errors in answering reasoningproblems can be located at any of the three
stages. The relevance of the third stage as aprimary source of errors can be debated.Johnson-Laird (1985) argues that the searchfor counterexamples is crucial for individualdifferences, yet Handley, Dennis, Evans, andCapon (2000) argue that individuals rarelyengage in a search for counterexamples. Psy-chometrically, syllogisms and spatial relationaltasks that do not rely on a search for counterex-amples are as good or better than measures ofreasoning ability as items that require such asearch (Wilhelm & Conrad, 1998).
Theories about reasoning processes in generaland the mental model theory in particular havebeen widely and successfully applied to reason-ing problems. Few of these applications haveconsidered problems from psychometric reason-ing tasks (but see Yang & Johnson-Laird, 2001).We will now discuss the status of reasoning abil-ity in various models of the structure of intelli-gence, as assessed by psychometric reasoningtasks, and then turn to formal and empirical clas-sifications of reasoning measures. Ideally, a gen-eral theory of reasoning processes should governtest construction and confirmatory data analysis.In practice, theories of reasoning processes haverarely been considered when creating and usingpsychometric reasoning tasks.
REASONING IN VARIOUS MODELSOF THE STRUCTURE OF INTELLIGENCE
Binets original definition of intelligencefocused on abilities of sensation, perception,and reasoning, but this definition was modifiedseveral times and ended up defining intelligence
as the ability to adapt to novel situations (Binet,1903, 1905, 1907). Structurally, Binets as wellas Ebbinghauss (1895) earlier investigations donot fall within the realm of factor-analytic work,and consequently, they have been rarely dis-cussed in this context.
Spearmans invention of tetrad analysis as ameans to assess the rank of correlation matriceswas the starting point of factor-analytic work(Krueger & Spearman, 1906; Spearman, 1904).Spearmans definition of general intelligence(g) focuses on the role of educing correlates and
relations. The ability to educe relations andcorrelates is best reflected in reasoning measures.
Other intelligence measures are characterizedby varying proximity to the general factor.Reasoning measures are expected to have high gloadings and low proportions of specific vari-ance. The g factor is said to be precisely definedand the core construct of human abilities(Jensen, 1998; but see Chapter 16, this volume).There are several more or less strict interpreta-tions of the g factor theory (Horn & Noll, 1997).In its strictest form, one core process is causalfor all communalities in individual differences.In a much more relaxed form of the theory, a
general factor is supposed to capture the corre-lations between oblique first- or second-orderfactors. With respect to reasoning, Spearman(1923) considered inductive and deductivereasoning to be forms of syllogisms. AlthoughSpearman (1927) did not exclude the option ofa reasoning-specific group factor besides g, per-formance on reasoning measures was assumedto be primarily limited by mental energyor g.
The controversy around Spearmans theorywas initially focused on statistical and method-ological issues, and it was in the context of newstatistical developments that Thurstone con-tributed his theory of primary mental abilities.Thurstones initial work on the structure of intelli-gence (1938) was substantially modified andimproved by Thurstone and Thurstone (1941). Inthe later work, the primary factors of Space,Number, Verbal Comprehension, Verbal Fluency,Memory, Perceptual Speed, and Reasoning aredistinguished. The initial distinction betweeninductive and deductive reasoning was abandoned,and the associated variances were allocated toReasoning, Verbal Comprehension, Number, andSpace. The Reasoning factor is marked mostly
by inductive tasks. Several of the other factorshave substantial loadings from reasoningtasks. In a sample of eighth-grade students, theReasoning factor is the factor with the highestloading on a second-order factor. Further elabo-ration of deductive measures by creating betterindicators, as suggested by the Thurstones, wasattempted only by the research groups surround-ing Colberg (Colberg, Nester, & Cormier, 1982;Colberg, Nester, & Trattner, 1985) and Guilford.
Guilfords contribution to the measurementof reasoning ability is mostly in constructing and
Measuring Reasoning Ability377
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 377
-
5/24/2018 Measuring Deductive Reasoning
6/20
popularizing reasoning measures. The structure-of-intellect (SOI) theory (Guilford, 1956, 1967)
is mostly to be credited for its heuristic valuein including some of what was previously no-mans-land into intelligence research. For thepresent purposes, the focus is on reasoning abilityexclusively, and Guilfords major contributions tothis topic can be located prior to the specificationof the SOI theory. On the basis of a mixture ofliterature review, construction of specific tests, andempirical investigations of the structure of reason-ing, Guilford proposed initially three, later four,reasoning factors (Guilford, Christensen, Kettner,Green, & Hertzka, 1954; Guilford, Comrey,
Green, & Christensen, 1950; Guilford, Green, &Christensen, 1951; Hertzka,Guilford,Christensen,& Berger, 1954). These four factors (GeneralReasoning,Thurstones Induction, Commonalities,and Deduction) are hard to separate conceptuallyand empirically. Specifically, the first three factorsare very similar on the task level, and empirically,inductive tasks load on all three of these reasoningfactors. The deduction factor is marked weaklywith tasks that are hard to distinguish from tasksassigned to other reasoning factors. The taskspopularized by Guilford are still in use today(Ekstrom, French, & Harman, 1976), but manymeasures are available that are much betterconceptually and psychometrically.
The Berlin Intelligence Structure model(Jger, S, & Beauducel, 1997; see Chapter 18,this volume) is a bimodal hierarchical perspec-tive on cognitive abilities. Intelligence tasks areclassified with respect to a content facet and anoperation facet. On the content facet, Verbal,Quantitative, and Spatial intelligence aredistinguished. On the operation facet, Creativity/Fluency, Memory, Processing Speed, andReasoning are distinguished. The model has a
surface similarity with Guilfords SOI theorybut avoids some of the technical pitfalls ofGuilfords model. The Reasoning factor on theoperation facet is defined as information pro-cessing in tasks that require availability andmanipulation of complex information. The pro-cessing thus reflects reasoning and judgmentabilities. The Reasoning factor is defined acrossthe content facet, and consequently, there areverbal, spatial, and numerical reasoning tasks.
In an epochal effort, Carroll (1993) summa-rized and reanalyzed factor-analytic studies of
human cognitive abilities. The result of thiswork is an elaborated hierarchical theory that
postulates a general factor, g, at the highestlevel. On a second level, broad ability factorsare distinguished. The proposed abilities arefluid intelligence (Gf), crystallized intelligence(Gc), general memory and learning, broadvisual perception, broad auditory perception,broad retrieval ability, broad cognitive speedi-ness, and processing speed. Fluid intelligenceis largely identified by three reasoning abili-ties distinguished on the lowest stratum ofCarrolls theory. The three reasoning factorsare Sequential Reasoning, Induction, and
Quantitative Reasoning. The Sequential Reason-ing factor is measured by tasks that requireparticipants to reason from premises, rules, orconditions to conclusions that properly andnecessarily follow from them. In the remainderof this chapter, the terms sequential reasoningand deductive reasoning will be used inter-changeably. The Induction factor is measuredby tasks that provide individuals with materialsthat are governed by some rules, principles,similarities, or dissimilarities. Participants aresupposed to detect and infer those features ofthe stimuli and apply the inferred rule. TheQuantitative Reasoning factor is measured bytasks that ask the participant to reason withconcepts involving numerical or mathematicalrelations. Figure 21.1 presents the classificationof reasoning tasks according to Carroll (1993,p. 210).
The theory developed by Cattell and Horn(Horn & Noll, 1994, 1997) is very closelyrelated to Carrolls theory. In fact, Carrollstheory is more based on Cattells and Hornswork than the other way round. Their investiga-tion of human cognitive capabilities was
focused on five kinds of evidence in its develop-ment: first, structural evidence as expressed inthe covariation of performances; second, devel-opmental change through the life span; third,neurocognitive evidence; fourth, achievementevidence as expressed in the prediction ofcriteria involving cognitive effort; and fifth,behavioral-genetic evidence. Major differencesbetween the three-stratum theory from Carrolland the Gf-Gc theory from Horn and Cattell arethe lack of a general factor in the Cattell-Hornframework because, according to Horn and Noll
378HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 378
-
5/24/2018 Measuring Deductive Reasoning
7/20
(1994), there is no unifying principle and henceno sufficient reason for specification of ageneral factor. However, for the presentpurposes, the proposed structure and interpreta-tion of reasoning ability is of major importance.Horn and Noll interpret fluid intelligence asinductive and deductive reasoning that is criticalin understanding relations among stimuli, com-prehending implications, and drawing infer-ences. Horn and Noll (1997) also speak aboutconjunctive and disjunctive reasoning, but sup-posedly, these two forms fall under inductiveand deductive reasoning. The Cattell-Horntheory assumes that both inductive and deduc-tive reasoning tasks can have verbal as well asspatial content (Horn & Cattell, 1967). This ideacan be extended, and both Gf and Gc can bemeasured with a broader variety of contents(Beauducel, Brocke, & Liepmann, 2001). Interms of the structure of reasoning ability, thereis little difference between Carrolls theory, onthe one side, and the Cattell-Horn framework,on the other. The major difference is the postu-
lation of a separate quantitative factor in thelatter model, whereas Carroll subsumes quanti-tative reasoning under fluid intelligence.
Based on available psychometric reasoningtasks, reasoning ability has a central place in all ofthe above-discussed theories of the structure ofintelligence. However, the manifold of availablemeasures might still reflect a biased selection fromall possible reasoning tests. The two followingsections on formal and empirical classificationsshould contribute to deepening our understandingof reasoning measures and reasoning ability.
FORMAL CLASSIFICATIONS OF REASONING
There is certainly no lack of reasoning measures.Carroll (1993) lists a very broad variety of avail-able reasoning tasks, and more, similar testscould be developed without major problems.Kyllonen and Christal (1990) summarize thesituation as follows:
Since Spearman (1923) reasoning has been
defined as an abstract, high-level process, eluding
precise definition. Development of good tests of
reasoning ability has been almost an art form,
owing more to empirical trial-and-error than to
systematic delineation of the requirements such
tests must satisfy. (p. 426)
Although empirical evidence indicates thatsome measures are better indicators of reason-ing ability than others, the theoretical knowl-edge about which measure is good for whatreasons is still very limited. In addition, scien-tists and practitioners are left with little advice
from test authors as to why a specific test hasthe form it has. It is easy to find two reasoningtests that are said to measure the same abilitybut that are vastly different in terms of theirfeatures, attributes, and requirements.
Compared to this bottom-up approach of testconstruction, a top-down approach could facilitateconstruction and evaluation of measures. Thereare four aspects of such a top-down approach thatwill be discussed subsequently: operation, con-tent, instantiation and nonreasoning requirements,and vulnerability to reasoning strategies.
Measuring Reasoning Ability379
Sequ.
Reason.
gf
Quantitat.
Inductive Analo-
gies
Odd
Elements
Matrix
Tasks
Multiple
Exempl.
Quantit.
Tasks
Series
Tasks
Rule
Discover
Gen. ver.
Reason.
Linear
Syllog.
Categor.
Syllog.
Figure 21.1 Carrolls Higher-Order Model of Fluid Intelligence (Reasoning)
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 379
-
5/24/2018 Measuring Deductive Reasoning
8/20
The first aspect to consider in the classificationof reasoning measures is the formal operational
requirement. Reasoning tasks can call for induc-tive and deductive inferences, and amongvarious tests for fluid intelligence, there areadditional tests that primarily call for judgment,decision making, and planning. In focusing oninductive and deductive reasoning, the distinc-tion is that in inductive reasoning, individualscreate semantic information; as a result, theinferences are not necessarily true. In deductivereasoning, however, individuals maintainsemantic information and derive inferences thatare necessarily true if the premises are true.
Tasks that are commonly classified as requiringbroad visualization (Carroll, 1993) usuallysatisfy the definition of deductive reasoning.However, the visualization demand of suchtasks is pivotal and paramount (Lohman, 1996),and such tasks will consequently be excludedfrom further discussion.
A second aspect to consider in the classifica-tion of reasoning measures is the content oftasks. Tasks can have many contents, but thevast majority of reasoning measures employfigural, quantitative, or verbal stimuli. Manytasks also represent a mixture of contents. Forexample, arithmetic reasoning tasks can be bothverbal and quantitative. Experimental manipula-tions of the content of measures are desirableto understand the structure of reasoning abilitymore profoundly.
A third aspect of relevance in classifyingmeasures of reasoning ability has to do with theinstantiation of reasoning problems. Reasoningproblems have an underlying formal structure.If we decide to construct a measure of reasoningability, we instantiate this general form and havea variety of options in doing so. In choosing
between these options, essentially we gothrough a decision tree. A first choice mightbe to use either concrete or abstract forms ofreasoning problems. In the abstract branch, wemight choose between a nonsense instantia-tion and a variable instantiation. In the case ofsyllogistic reasoning tests, a nonsense instantia-tion might be All Gekus are Lemis. All Lemisare Filop. A variable instantiation of thesame underlying logical form could be AllA are B. All B are C. In the concrete branch ofthe decision tree, prior knowledge is of crucial
importance. Instantiations of reasoning problemscan either conform or not with our prior knowl-
edge. Nonconforming instantiations can eitherbe counterfactual or impossible. A coun-terfactual instantiation could be All psycholo-gists are Canadian. All Canadians drive Porsches.An impossible instantiation could be All catsare dogs. All dogs are birds. In the branch thatincludes instantiations that conform to priorknowledge, we can distinguish factual andpossible instantiations. A factual instantia-tion could be All cats are mammals. Allmammals have chromosomes. A possibleinstantiation could be All white cars in this
garage are fast. All fast cars in this garage runout of petrol.It is well established that the form of the
instantiation has substantial effects on the diffi-culty of structurally identical reasoning tasks(Klauer, Musch, & Naumer, 2000). It is alsoknown that the form of the instantiation of rea-soning tasks has some influence on the psycho-metric properties of reasoning tasks (Gilinsky& Judd, 1993). Abstract instantiations mightinduce test anxiety in some individuals becausethey look like formulas. Aside from this possi-ble negative effect, abstract instantiations mightbe a good format for reasoning tasks. Instan-tiations that do not conform to prior knowledgeare likely to be less good forms of reasoningproblems because there is an apparent conflictbetween prior knowledge and the requiredthought processes. It is likely that some indi-viduals are better able than others to abstractfrom their prior knowledge. However, such anabstraction would not be covered by a measure-ment intention that aims at assessing the abilityto reason deductively. Instantiations that actu-ally reflect prior knowledge are not good forms
for reasoning problems because rather than rea-soning, the easiest way to a solution is to recallthe actual knowledge. Some of the most widelyused tests of deductive reasoning are impossi-ble instantiations. The psychometric differencesbetween measures instantiated in a differentway are likely to be not trivial.
The final aspect of a classification of reason-ing measures discussed here deals with thevulnerability of a task to reasoning strategies.In measuring reasoning abilitylike most otherabilitiesit is assumed that all individuals
380HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 380
-
5/24/2018 Measuring Deductive Reasoning
9/20
approach the problems in the same way. Someindividuals are more successful than others
because they have more of the required abil-ity. Consequently, it is implicitly assumed thatindividuals at the very top of the ability distrib-ution proceed roughly in the same way througha reasoning test as individuals at the verybottom of the distribution. If a subgroup of par-ticipants chooses a different approach to workon a given test, the consequence is that the testis measuring different abilities for different sub-groups. For syllogistic reasoning, it is knownthat there are two or three subgroups of individ-uals who approach syllogistic reasoning tests
differently. Depending on which strategy ischosen, different items are easy and hard, respec-tively (Ford, 1995). Knowledge about strategiesin reasoning is limited (but see Schaeken, deVooght, Vandierendonck, & dYdewalle, 2000),and the role of strategies in established reasoningmeasures has been barely investigated.
The actual reasoning tasks that have beenused in experimental investigations of reasoningprocesses and psychometric studies of reason-ing ability have little to no overlap in surfacefeatures. However, there is now good evidence(Stanovich, 1999) that reasoning problems, asthey have been used in cognitive psychology,are moderately correlated with reasoning mea-sures as they have been used in individual-differences research. The experimentally usedtasks have been thoroughly investigated, and wenow know a lot about the ongoing thoughtprocesses involved in these tasks. One importantconclusion from this research is that the instan-tiations of reasoning problems are appropriateto elicit the intended reasoning processes for themost part (Shafir & Le Boeuf, 2002; Stanovich,1999). However, there are pervasive reliability
issues because frequently, only a few suchproblems are used in any given experiment.Conversely, we do not know a lot about ongoingthought processes in established measures ofreasoning ability as used in psychometricresearch. However, we do know a lot about theirstructure (Carroll, 1993), their relations withother measures of maximal behavior (Carroll,1993; Jger et al., 1997; Kyllonen & Christal,1990), and their validity for the predictionof real-life criteria (Schmidt & Hunter, 1998).Both sets of reasoning tasks can and should
be used when studying reasoning ability. Thebenefits would be mutual. For example, differ-
ences in correlations between various individualreasoning items as used in cognitive researchand latent variables from reasoning ability testsmight reveal important differences between theexperimental tasks. Similarly, variability in thedifficulties of items from standard psychometricreasoning tests can be possibly explained byapplication of various theories of reasoningprocesseslike the mental model theory thatwas sketched above.
EMPIRICAL CLASSIFICATIONSOF REASONING MEASURES
In psychology, inductive reasoning has fre-quently been equated with proceeding fromspecific premises to general conclusions.Conversely, deductive reasoning has frequentlybeen equated with proceeding from generalpremises to specific conclusions. This definitioncan still be found in textbooks, but it is outdated.There are inductive arguments proceeding fromgeneral premises to specific conclusions, andthere are deductive arguments proceeding fromspecific premises to general conclusions. Forexample, the argument Almost all Swedes areblond. Jan is a Swede. Therefore Jan is blond.is an inductive argument that violates the abovedefinition, and the argument Jan is a Swede.Jan is blonde. Therefore some Swedes areblond. is a deductive argument that alsoviolates the above definition.
According to Colberg et al. (1982), mostestablished reasoning tests confound the direc-tion of inference (general or specific premisesand general or specific conclusions) with deduc-
tive and inductive reasoning tasks. By con-structing specific deductive and inductivereasoning tasks (Colberg et al., 1985), they pre-sent correlational evidence that seems to supportthe unity of inductive and deductive reasoningtasks. However, reliability of the measures isvery low; the applied method of disattenuatingcorrelations is not satisfying; and, most impor-tant, Shye (1988) reclassifies their tasks andfinds support for a distinction between rule-inferring and rule-applying tasks (see Chapter 18,this volume). In the initial classification and
Measuring Reasoning Ability381
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 381
-
5/24/2018 Measuring Deductive Reasoning
10/20
construction of tasks (Colberg et al., 1985), testshave been labeled as inductive when in fact they
were probabilistic. Probabilistic tasks can, inprinciple, be deductive (Johnson-Laird, 1994a;Johnson-Laird, Legrenzi, Girotto, Legrenzi, &Caverni, 1999), and the probabilistic tasks used(Colberg et al., 1985) were in fact deductivetasks. What was shown by Colberg (Colberget al., 1982, 1985), then, was the unity of someforms of deductive reasoning tasks, and whatShye demonstrated was that task classificationis a sensitive business and that rule-applyingtasks, as constructed by Colberg et al., fall intothe periphery of a multidimensional scaling,
with rule inferring/inductive reasoning at thecenter of the solution.The most sophisticated, ambitious, and
advanced attempt to propose factors of reason-ing ability comes from Carroll (1993). Carrolldiscusses the structure of reasoning ability,bearing in mind several objections and difficul-ties. Among those objections are that (a) reason-ing tests are frequently complex, requiringboth inductive and deductive thought processes;(b) reasoning measures are often short andadministrated under timed conditions; (c) rea-soning tests are usually not carefully constructedand analyzed on the item level; (d) inductive anddeductive reasoning processes are learned anddeveloped together; and (e) many reasoningmeasures involve language, quantitative, orspatial skills to an unknown amount.
Carroll (1993) asserts that his proposal of thethree reasoning factorsInduction, Deduction,and Quantitative Reasoningis preliminary forseveral reasons (but see Carroll, 1989). First, inmany of the reanalyzed studies, only one rea-soning factor emerged. This is simply due to thefact that there was frequently not a sufficient
number of reasoning tests included to examinethe structure of reasoning ability in such studies.Second, in the 37 out of 176 data sets with morethan one reasoning factor, most of the studieswere never intended and designed to investigatethe structure of reasoning ability. Third, thosestudies intended to investigate the structure ofreasoning ability included insufficient numbersof reasoning measures. Other problems withinvestigating the structure of reasoning abilityinclude variations in time pressure across testsand studies, variations in scoring procedures,
variations in instructing participants, and, mostimportant, individual measures that are classi-
fied post hoc rather than a priori.In carefully examining Tables 6.1 and 6.2
from Carroll (1993), it is apparent that thedeductive reasoning tasks are frequently verbal.Content for the inductive reasoning tasks ismore diverse but tends to be figural-spatial. Thelast reasoning factor is rather unequivocally aquantitative factor. An explanation of the datain Carroll as indicating a distinction betweeninductive, deductive, and quantitative reasoningcompetes with an explanation that distinguishesbetween verbal, figural-spatial, and quantitative
content. Inspection of Carrolls reanalysis ofindividual data sets is compatible with an inter-pretation of the factor labeled as general sequen-tial reasoning or deductive reasoning as a verbalreasoning factor. The inductive reasoning factor,on the other side, could reflect figural-spatialreasoning. The quantitative reasoning factorapparently reflects numerical or quantitativereasoning. Compatible with this interpretationis that the deductive reasoning factor can fre-quently not be distinguished from a verbalfactor and tends to have high loadings on ahigher-order crystallized factor. In accord withthe interpretation of the inductive reasoningfactor, the figural-spatial reasoning processesmeasured with the associated tasks tend to behighly associated with a higher-order fluid rea-soning factor. In line with this theorizing, theinduction factor has the highest loading on g ofall Stratum 1 factors. The deductive reasoningfactor ranks only 10 among these loadings. Themean loading of induction on g is .57, whereasthe mean loading of deductive reasoning is only.41. Besides the mean difference in the averagemagnitude of loadings, there is a higher disper-
sion of g loadings among the deductive tasks.Similarly, the fluid intelligence factor, Gf, isbest defined by induction in Carrolls reanalysis.Gf is defined by induction 19 times, with anaverage loading of .64. Deductive reasoningdefined Gf only 6 times, with an average load-ing of .55. On the other side, deductive reason-ing appears among the variables definingcrystallized intelligence. Deductive reasoningdefined the Gc factor 7 times, with an averageloading of .69. Induction does not appear on thelist of Stratum 1 abilities defining crystallized
382HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 382
-
5/24/2018 Measuring Deductive Reasoning
11/20
intelligence. Finally, deductive reasoning appears8 times, with an average loading of .70 on a
factor labeled 2Hreflecting a mixture of fluidand crystallized intelligence. Induction, on theother hand, appeared only twice, with anaverage loading of .41.
Given these considerations, the proposal ofreasoning ability as being composed of induc-tive, deductive, and quantitative reasoning iscompeting with a proposal of verbal, figural-spatial, and quantitative reasoning. To investi-gate possible structures of reasoning ability, oneshould include tasks that allow for comparisonbetween several competing theories. There are
basically five theories competing as explana-tions for the structure of reasoning ability.
1. a general reasoning factor accounting for the
communality of reasoning tasks varying with
respect to content (verbal, quantitative, figural-
spatial) and operation (inductive, deductive);
2. two correlated factors for inductive and
deductive reasoning, respectively, without the
specification of any content factors;
3. three correlated factors for verbal, quantitative,
and figural-spatial reasoning, without distin-
guishing inductive and deductive reasoning
processes;
4. a general reasoning factor along with nested
and completely orthogonal factors for verbal
and quantitative reasoning but no figural-
spatial factor; and
5. two correlated factors for inductive and deduc-
tive reasoning along with completely orthogo-
nal content factors for verbal and quantitative
reasoning and again no figural-spatial factor.
For the evaluation of these models, it isimportant to avoid a confound between contentand process on the task level. A second crucialaspect for exploring the structure of reasoningability is to select appropriate tasks to measurethe intended constructs. This is particularly hardin the domain of deductive reasoning. Followingthe above-presented definition of inductive anddeductive reasoning, it is very difficult to findadequate measures of figural-spatial deductivereasoning. In fact, only 7 of all the tasksdescribed in Carroll (1993) can be classified as
deductive figural-spatial tasks. However, thesetasks frequently represent a mixture with other
demands. For example, ship-destination hasquantitative demands; match problems, plot-ting, and route planning have visualizationdemands. In classifying 90 German intelligencetasks, Wilhelm (2000) could not find a singledeductive figural-spatial measure.
To test the structure of reasoning ability,Wilhelm (2000) selected reasoning measuresbased on their cognitive demands and thecontent involved. In addressing the above-mentioned criticisms of existent reasoning tasks,several reasoning tasks were newly constructed.
The following 12 measures were included in thestudy (D and I denote deductive and inductivereasoning; F, N, and V stand for figural, numeri-cal, and verbal content, respectively).
DF1 (Electric Circuits): Positive and negative
signals travel through various switches. The result-
ing signal has to be indicated. The number and kind
of switches and the number of signals are varied
(Gitomer, 1988; Kyllonen & Stephens, 1990).
DF2 (Spatial Relations): Spatial orientation of
symbols is presented pairwise. The spatial orien-
tation of two symbols that were not presented
together can be derived from the pairwise presen-
tations (Byrne & Johnson-Laird, 1989).
DN1 (Solving Equations): A series of equations is
presented. Participants can derive values of vari-
ables deductively. Items vary by the number of
variables and the difficulty of relation. A difficult
sample item is A plus B is C plus D. B plus C is
2*A. A plus D is 2*B. A + B is 11. A + C is 9.
DN2 (Arithmetic Reasoning): Participants pro-
vide free responses to short verbally stated arith-
metic problems from a real-life context.
DV1 (Propositions): Acts of a hypothetical
machine are described, and the correct conclusion
has to be deduced. The number of mental models,
logical relation, and negation are varied in this
multiple-choice test (Wilhelm & McKnight,
2002). A simple sample item is as follows: If the
lever moves and the valve closes, then the inter-
rupter is switched. The lever moves. The valve
closes.
DV2 (Syllogisms): Verbally phrased quantitative
premises are presented in which the number of
Measuring Reasoning Ability383
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 383
-
5/24/2018 Measuring Deductive Reasoning
12/20
mental models is varied by manipulating the
figure and quantifier (Wilhelm & McKnight,
2002). A sample item is as follows: No big shieldis red. All round shields are big.
IF1 (Figural Classifications): Participants are
asked to find the one pictorial figure that does not
belong with four other figures based on various
attributes of the figures.
IF2 (Matrices): Based on trends in rows and
columns of 3*3 matrices, a figure that belongs in
a specified cell has to be selected from several
distractors.
IN1 (Number Series): Rule-ordered series of
numbers are to be continued by two elements. Thedifficulty of the rule that has to be detected is varied.
IN2 (Unfitting Number): In a series of numbers,
one that does not fit has to be identified.
IV1 (Verbal Analogies): Analogies as they are fre-
quently used in intelligence research. The general
form of the multiple-choice items is ? is to B as
C is to ?. The vocabulary of these double analo-
gies is simple (i.e., participants are familiar with
all terms), and the difficulty of the relationship is
varied.
IV2 (Word Meanings): In this multiple-choice
test, participants should identify a word that means
approximately the same thing as a given word.
A total of 279 high school students with amean age of 17.7 years and a standard deviationof 1.2 years completed all tests and several cri-terion measures. All tests were analyzed sepa-rately with item response theory models. For alltests, a two-parameter model assuming disper-sion in item discrimination was superior to aRasch model. The estimated person parameters
from these two-parameter models were subse-quently analyzed. For participants who goteither all answers wrong or all answers right,person parameters were interpolated. Some ofthe reliabilities of the tasks are not satisfying.Coefficient Omega (McDonald, 1985) for IF1and IF2 are only .50 and .51, respectively. Theoverall test length for individual measures mightbe responsible for these suboptimal results.
The core research question in the presentcontext is which of the above-specified modelsprovides the best fit for the data. A one-factor
model simply specifies one latent reasoningfactor with loadings from all indicators. A
two-factor model specifies two correlated latentfactors: one factor with loadings on all theinductive tasks, the other factor with loadings onall the deductive tasks. The correlation betweenboth factors is estimated freely. The three-factormodel specifies three correlated content factors:a verbal factor with loadings from all the verbaltasks, a quantitative factor with loadings fromall quantitative tasks, and a figural-spatial factorwith loadings on all the figural-spatial tasks.The fourth model specifies a general reasoningfactor and two orthogonal nested factorsone
for the four verbal tasks and the other for thefour quantitative tasks. The fifth model specifiesan inductive reasoning factor with loadingsfrom all inductive reasoning tasks and, likewise,a deductive reasoning factor with loadings fromall the deductive reasoning tasks. In addition,the two content factors as in the fourth modelare specified. The two reasoning factors arecorrelated, but the three content factors are not.Generally, there are, of course, other possiblemodel architectures (see Chapter 14, thisvolume). However, the above-mentioned mod-els provide a test of competing theories for thestructure of reasoning ability. The last two modelsmentioned above specify content factors for theverbal and quantitative tasks only. For thefigural-spatial tasks, such a content factor mightnot be necessary because such tasks have beensaid to require decontextualized reasoning, andobserved individual differences do not reflectspecific prior knowledge (Ackerman, 1989,1996; Undheim & Gustafsson, 1987). Modelswith and without a first-order factor of figural-spatial reasoningas specified in the currentcontextare nested and can be compared infer-
entially (see Chapter 14, this volume).Table 21.1 summarizes the fit of the five
confirmatory factor analyses. Comparing thegeneral factor model with a model that specifiestwo correlated factors of inductive and deductivereasoning, respectively, reveals that there is noadvantage in estimating the correlation betweeninductive and deductive reasoning freely (asopposed to restricting this correlation to unity).Indeed, the correlation between both factors inModel 2 is estimated to be exactly 1. Conseque-ntly, when comparing these two models, the
384HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 384
-
5/24/2018 Measuring Deductive Reasoning
13/20
general factor model is the better explanation ofthe data because it is more parsimonious than thetwo-factor model. However, both models do notprovide acceptable fit.
A model specifying three correlated groupfactors for content does substantially better inexplaining the data. Although there is still roomto improve fit, the model represents an accept-able explanation of the data. Given that themodel is completely derived from theory, it canserve as a good starting point for future investi-gations. Comparing the two models with com-pletely orthogonal content factors againdemonstrates the superiority of the model thatpostulates the unity of inductive and deductivereasoning. In this data set, inductive and deduc-tive reasoning are perfectly correlated.Introducing a distinction between both factors isunnecessary and consequently does not improvemodel fit. Both models are substantially betterthan the initial one- and two-factor models.However, one of the loadings on the verbalfactor is not significant and negative in sign.Given this departure from the theoretical expec-
tation of positive and significant loadings, andkeeping in mind interpretative issues with groupfactors in nested factor models (see Chapter 14,this volume), the best solution seems to beaccepting the model based on the contentfactors. In this model, there are three content-related reasoning factors, each one of them sub-suming inductive and deductive reasoning tasks.In the current study, the model with correlatedgroup factors is equivalent to a second-orderfactor model. In this model, the correlationsbetween factors are captured by a higher-order
factor. This model is presented in Figure 21.2.The two content factorsVerbal and Quantita-tive Reasoningreflect deductive and inductivereasoning with verbal and quantitative material,respectively. Due to the relevance of task con-tent, it can be expected that the Verbal and theQuantitative Reasoning factors do predict dif-ferent aspects of criteria such as school grades,achievement, and the like. The loading of theFigural Reasoning factor on fluid intelligence isfreely estimated to be 1. Not only are g and Gfvery highly or perfectly correlated (Gustafsson,1983), but the same is true between figural-spatial reasoning and fluid intelligence. Con-sequently, the current analysis extends Undheimand Gustafssons (1987) work to a lower stra-tum. It is a replicated finding that Gf is theStratum 2 factor with the highest loading ong (Carroll, 1993). It has also been argued thatthis relation might be perfect (Gustafsson, 1983;Undheim & Gustafsson, 1987, but see Chapter18, this volume). Figural-spatial reasoning, inturn, has the highest loading on fluid intelli-gence, and in the data presented in this chapter,
the relation between figural-spatial reasoningand the factor labeled fluid intelligence is per-fect. Hence, if we do want to measure g with asingle task, we should select a task of figural-spatial reasoning. Matrices tasks have been con-sidered particularly good measures of Gf and g.Spearman (1938) suggested the Matrices testfrom Penrose and Raven (1936), as well as theinductive figural measure from Line (1931), asthe single best indicators of g. The latter test isless prominent than the Matrices test, but vari-ants of it can be found in various intelligence
Measuring Reasoning Ability385
Table 21.1 Fit Statistics of Five Competing Structural Explanations of Reasoning Ability
g Ind. Ded. Cont. g & Cont. Ind. Ded. & Cont.
2 121.2 121.2 84.8 73.3 72.0df 54 53 51 46 45p
-
5/24/2018 Measuring Deductive Reasoning
14/20
tests. Although it is not good practice tomeasure rather general constructs with singletasks, there is certainly evidence suggestingthat, if need be, this sole task should be afigural-spatial reasoning measure. Whether sucha task is classified as inductive or deductive isnot important for that purpose.
Frequently, the composition of intelligencebatteries is not well balanced in the sense thatthere are many indicators for one intelligence
construct but few or no tests for other intelli-gence constructs. In such cases (e.g., Robertset al., 2000), the overall solution can be domi-nated by tasks other than fluid intelligencetasks. As a result, figural-spatial reasoning tasksmight not be the best selection in these cases toreflect the g factor of such a battery.
When interpreting the results from this study,it is important to keep in mind that the differ-ences between various models were not that big.With different tasks and different participants, itis possible that different results emerge. The
present results are preliminary and in need ofreplication and extension. The most importantresult from the study reported above is that in acritical test aimed to assess a distinctionbetween inductive and deductive reasoning, nosuch distinction could be found. Latent factorsof inductive and deductive reasoning are per-fectly correlated in several models. The result ofa unity of inductive and deductive reasoningwas also obtained with multidimensionalscaling, exploratory factor analysis, and tetradanalysis. It is important to note that this result
emerged considering the desiderata for futureresearch provided by Carroll (1993, p. 232).Specifically, the present tasks have beenselected or constructed based on a careful reviewof the individual-differences and cognitiveliterature on the topic, the items were analyzedby latent item response theory, and the scaleswere analyzed by confirmatory factor analyses.The current tests include several new reasoningmeasures that are based on and informed
through cognitive psychology.
WORKING MEMORY AND REASONING
There have been several attempts to explainreasoning ability in terms of other abilities thatare considered more basic and tractable. Specifi-cally, working memory has been proposed asthe major limiting factor for human reasoning(Kyllonen & Christal, 1990; S, Oberauer,Wittmann, Wilhelm, & Schulze, 2002). The
working definition of working memory has beenthat any task that requires individuals to simul-taneously store and process information can beconsidered a working memory task (Kyllonen &Christal, 1990). This definition has been criti-cized because it seems to include all reasoningmeasures. The definition has also been criti-cized because its notion of storage and pro-cessing are imprecise and fuzzy (see Chapter22, this volume). A critique of the workingmemory = reasoning hypothesis can also focuson the problem of the reduction of one construct
386HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
Verbal
gf
Figural
Quant.
IV2
DV1
DV2
IN1
IN2
IV1
IF2
DF1
DF2
DN1
DN2
IF1.35
.57
.50
.68
.67
.60
.33
.49
.45
.63
.73
.69
.83
.84 1.00
Figure 21.2 Higher-Order Model of Fluid Intelligence (Reasoning)
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 386
-
5/24/2018 Measuring Deductive Reasoning
15/20
in need of explanation through another one(Deary, 2001) that is not doing any better.
However, this critique is unjustified for severalreasons.
1. It is easy to construct and create workingmemory tasks. Many tasks that satisfy the abovedefinition work in the sense that they correlatehighly with other working memory measures,reasoning, Gf, and g. In addition, it is easy andstraightforward to manipulate the difficulty of aworking memory item by manipulating the stor-age demand, the process demand, or the timeavailable to do storage, processing, or both.
Those manipulations account for a large amountof variance of task difficulty in almost all cases.
2. There is an enormous corpus of researchon working memory and processes in workingmemory in cognitive psychology (Conway,Jarrold, Kane, Miyake, & Towse, in press;Miyake & Shah, 1999). It is fruitful to deriveknowledge and hypotheses about individual dif-ferences in cognition from this body of research.
3. In the sense of a reduction of workingmemory on biological substrates, intensive andvery productive research has linked workingmemory functioning to the frontal lobes andinvestigated the role of various physiologicalparameters to cognitive functioning (Kane &Engle, 2002; see Chapter 9, this volume, for areview of research linking reasoning to variousneuropsychological parameters). Hence, theequation of working memory with reasoning iscomplemented by relating working memory tothe frontal lobes and other characteristics andfeatures of the brain.
The strengths of the relation found between
latent factors of working memory and reasoningvary substantially, fluctuating between a low of.6 (Engle, 2002; Engle, Tuholski, Laughlin, &Conway, 1999; Kane et al., 2004) and a high ofnearly 1 (Kyllonen, 1996). In the discussion ofthe strength of the relation, several sources thatcould cause an underestimation or an overesti-mation should be kept in mind.
1. The relation should be assessed on thelevel of latent factors because this is the levelof major interest when it comes to assessing
psychological constructs. There should be morethan three indicators of sufficient psychometric
quality for each construct to allow an evaluationof the measurement models on both sides.
2. Depending on the task selection and thebreadth of the definition of both constructs, thespecification of more than one factor on bothsides might be necessary (Oberauer, S,Wilhelm, & Wittmann, 2003).
3. The definition of constructs and taskclasses is a difficult issue. Classifying anythingas a working memory task that requires simulta-neous storage and processing could turn out to
be overinclusive. Restricting fluid intelligenceto figural-spatial reasoning measures is likely tobe underinclusive. The comments on tasks ofreasoning ability presented in this chapter, aswell as similar comments on what constitutes agood working memory task (see Chapters 5 and22, this volume), might be a good starting pointfor definition of task classes.
4. Content variation in the operationaliza-tion for both constructs can have an influence onthe magnitude of the relation. When assessingreasoning ability, one is well advised to use
several tasks with verbal, figural, and quantita-tive content. The same is true for workingmemory. This chapter provided some evidencefor the content distinction on the reasoning side.Similar evidence for the working memoryside is evident in structural models that positcontent-specific factors of working memory(Kane et al., 2004; Kyllonen, 1996; Oberauer,S, Schulze, Wilhelm, & Wittmann, 2000).Relating working memory tasks of one contentwith reasoning tasks of another content causesone to underestimate the true relation.
5. A mono-operation bias should be avoidedin assessing both constructs. Using only com-plex span tasks or only dual-tasks to assessworking memory functioning does not do
justice to the much more general nature of theconstruct (Oberauer et al., 2000). Task class-specific factors or task-specific strategies mighthave an effect on the estimated relation.
6. Reasoning measureslike other intelli-gence tasksare frequently administered undertime constraints. Timed and untimed reasoning
Measuring Reasoning Ability387
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 387
-
5/24/2018 Measuring Deductive Reasoning
16/20
ability are not perfectly correlated (Wilhelm &Schulze, 2002). Similarly, working memory tasks
frequently have timed aspects (Ackerman, Beier,& Boyle, 2003). For example, there might beonly a limited time to execute a process beforethe next stimulus appears, there might be a timedrate of stimulus presentation, and the like.Common speed variance could inflate the corre-lation between working memory and reasoning.
The assumption that working memory is acritical ingredient to success on reasoning tasksis compatible with experimental evidence andtheories from cognitive psychology. The ability
to successfully create and manipulate mentalrepresentations was argued to be the criticalingredient in reasoning. Whether the necessaryrepresentations can be created and manipulateddepends crucially on working memory. Thisprediction has gained strong support from thecorrelational studies relating working memoryand reasoning. If the individual differences inreasoning ability and working memory turn outto be roughly the same, the evidence supportingthe predictive validity of reasoning ability andfluid intelligence applies to working memorycapacity, too. After careful consideration ofcosts and benefits, it might be sensible to usemore tractable working memory tasks for manypractical purposes.
SUMMARY AND CONCLUSIONS
The fruitful avenue to future research on mea-suring and understanding reasoning ability ischaracterized by (a) more theoretically moti-vated work in the processes and resourcesinvolved in reasoning and (b) the use of confir-
matory methods on the item and test level toinvestigate meaningful measurement and struc-tural models. The major result of efforts directedthat way would be a more profound understand-ing of important thought processes and animproved construction and design of measuresof reasoning ability. A side product of suchefforts will be generative item production andtheoretically derived assumptions about psycho-metric properties of items and tests. Anotherside product would be the option to developmore appropriate means of altering reasoning
ability. There are several very interestingattempts to develop training methods for rea-
soning ability, and the initial results are encour-aging in some cases (Klauer, 1990, 2001).Although it was not possible to discriminatebetween inductive and deductive reasoningpsychometrically, it could be possible thatappropriate training causes differential gains inboth forms of reasoning. The cognitive processesin inductive and deductive reasoning tasksmight be different, but the individual differenceswe can observe on adequate measures are not.This does not exclude the option that boththought processes might be affected by different
interventions.
REFERENCES
Ackerman, P. L. (1989). Abilities, elementary infor-
mation processes, and other sights to see at the
zoo. In R. Kanfer, P. L. Ackerman, & R. Cudeck
(Eds.), Abilities, motivation, and methodology:
The Minnesota symposium on learning and
individual differences (Vol. 10, pp. 280293).
Hillsdale, NJ: Lawrence Erlbaum.
Ackerman, P. L. (1996). A theory of adult intellectual
development: Process, personality, interests, andknowledge.Intelligence, 22, 229259.
Ackerman, P. L., Beier, M. E., & Boyle, M. D.
(2003). Individual differences in working
memory within a nomological network of
cognitive and perceptual speed abilities.Journal
of Experimental Psychology: General, 131,
567589.
Andrews, G., & Halford, G. S. (2002). A cognitive
complexity metric applied to cognitive develop-
ment. Cognitive Psychology, 45, 153219.
Beauducel, A., Brocke, B., & Liepmann, D. (2001).
Perspectives on fluid and crystallized intelli-gence: Facets for verbal, numerical, and figural
intelligence. Personality and Individual Differ-
ences, 30, 977994.
Binet, A. (1903).L etude experimentale de lintelli-
gence [Experimental studies of intelligence].
Paris: Schleicher, Frenes.
Binet, A. (1905). A propos de la measure de lintelli-
gence [On the subject of measuring intelli-
gence].Anne Psychologique, 12, 6982.
Binet, A. (1907). La psychologie du raisonnement
[The psychology of reasoning]. Paris: Alcan.
388HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 388
-
5/24/2018 Measuring Deductive Reasoning
17/20
Boole, G. (1847). The mathematical analysis of logic:
Being an essay towards a calculus of deductive
reasoning. Cambridge, UK: Macmillan, Barclay,and Macmillan.
Byrne, R. M. J., & Johnson-Laird, P. N. (1989).
Spatial reasoning. Journal of Memory and
Language, 28, 564575.
Carnap, R. (1971). Logical foundations of probabil-
ity. Chicago: University of Chicago Press.
Carroll, J. B. (1989). Factor analysis since Spearman:
Where do we stand? What do we know? In
R. Kanfer, P. L. Ackerman, & R. Cudeck (Eds.),
Abilities, motivation, and methodology: The
Minnesota symposium on learning and individ-
ual differences (Vol. 10, pp. 4370). Hillsdale,NJ: Lawrence Erlbaum.
Carroll, J. B. (1993). Human cognitive abilities: A
survey of factor-analytic studies. Cambridge,
MA: Cambridge University Press.
Colberg, M., Nester, M. A., & Cormier, S. M. (1982).
Inductive reasoning in psychometrics: A philo-
sophical corrective.Intelligence, 6, 139164.
Colberg, M., Nester, M. A., & Trattner, M. H. (1985).
Convergence of the inductive and deductive
models in the measurement of reasoning abilities.
Journal of Applied Psychology, 70, 681694.
Conway, A. R. A., Jarrold, C., Kane, M., Miyake, A.,
& Towse, J. (in press). Variation in working
memory. Oxford, UK: Oxford University Press.
Craik, K. (1943). The nature of explanation.
Cambridge, MA: Cambridge University Press.
Deary, I. J. (2001). Human intelligence differences:
Towards a combined experimental-differential
approach. Trends in Cognitive Science, 5,
164170.
Ebbinghaus, H. (1895). ber eine neue Methode
zur Prfung geistiger Fhigkeiten und ihre
Anwendung bei Schulkindern [On a new method
to test mental abilities and its application with
schoolchildren].Zeitschrift fr Psychologie undPhysiologie der Sinnesorgane, 13, 401459.
Ekstrom, R. B., French, J. W., & Harman, H. H.
(1976).Manual for kit of factor-reference cogni-
tive tests. Princeton, NJ: Educational Testing
Service.
Engle, R. W. (2002). Working memory capacity as
executive attention. Current Directions in
Psychological Science, 11, 1923.
Engle, R. W., Tuholski,S. W., Laughlin, J. E., & Conway,
A. R. A. (1999). Working memory, short-term
memory and general fluid intelligence: A latent
variable approach. Journal of Experimental
Psychology: General, 128, 309331.
Epstein, S. (1994). Integration of the cognitive andthe psychodynamic unconscious. American
Psychologist, 49, 709724.
Evans, J. St. B. T. (1989).Bias in human reasoning:
Causes and consequences. Hove, UK: Lawrence
Erlbaum.
Ford, M. (1995). Two modes of mental representation
and problem solution in syllogistic reasoning.
Cognition, 51, 171.
Frege, G. (1879).Begriffsschrift: Eine der arithmetis-
chen nachgebildete Formelsprache des reinen
Denkens [Begriffsschrift: A formula language
modeled upon that of arithmetic, for purethought]. Halle a.S.: L. Nebert.
Gilinsky, A. S., & Judd, B. B. (1993). Working
memory and bias in reasoning across the life
span. Psychology and Aging, 9, 356371.
Gitomer, D. H. (1988). Individual differences in
technical troubleshooting.Human Performance,
1, 111131.
Guilford, J. P. (1956). The structure of intellect.
Psychological Bulletin, 53, 267293.
Guilford, J. P. (1967). The nature of human intelli-
gence. New York: McGraw-Hill.
Guilford, J. P., Christensen, P. R., Kettner, N. W.,
Green, R. F., & Hertzka, A. F. (1954). A factor
analytic study of Navy reasoning tests with the
Air Force Aircrew Classification Battery.
Educational and Psychological Measurement,
14, 301325.
Guilford, J. P., Comrey, A. L., Green, R. F., &
Christensen, P. R. (1950). A factor-analytic
study on reasoning abilities: I. Hypotheses and
description of tests. Reports from the
Psychological Laboratory, University of
Southern California, Los Angeles.
Guilford, J. P., Green, R. F., & Christensen, P. R.
(1951). A factor-analytic study on reasoningabilities: II. Administration of tests and analysis
of results. Reports from the Psychological
Laboratory, University of Southern California,
Los Angeles.
Gustafsson, J.-E. (1983). A unifying model for the
structure of intellectual abilities.Intelligence, 8,
179203.
Hammond, K. R. (1996).Human judgment and social
policy: Irreducible uncertainty, inevitable error,
unavoidable injustice. Oxford, UK: Oxford
University Press.
Measuring Reasoning Ability389
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 389
-
5/24/2018 Measuring Deductive Reasoning
18/20
Handley, S. J., Dennis, I., Evans, J. St. B. T., & Capon,
A. (2000). Individual differences and the search
for counter-examples in reasoning. In W. Schaeken,A. Vandierendonck, & G. de Vooght (Eds.),
Deductive reasoning and strategies (pp. 241266).
Hillsdale, NJ: Lawrence Erlbaum.
Hertzka, A. F., Guilford, J. P., Christensen, P. R., &
Berger, R. M. (1954). A factor analytic study
of evaluative abilities.Educational and Psycho-
logical Measurement, 14, 581597.
Holyoak, K. J., & Thagard, P. (1997). The analogical
mind.American Psychologist, 52, 3544.
Horn, J. L., & Cattell, R. B. (1967). Age differences
in fluid and crystallized intelligence. Acta
Psychologica, 26, 107129.Horn, J. L., & Noll, J. (1994). A system for under-
standing cognitive capabilities: A theory and
the evidence on which it is based. In D. K.
Detterman (Ed.), Current topics in human
intelligence: Vol. 4. Theories of intelligence
(pp. 151203). Norwood, NJ: Ablex.
Horn, J. L., & Noll, J. (1997). Human cognitive
capabilities: Gf-Gc theory. In D. P. Flanagan, J. L.
Genshaft, & P. L. Harrison (Eds.), Contemporary
intellectual assessment: Theories, tests, and
issues (pp. 5392). New York: Guilford.
Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-
connectionist theory of relational inference and
generalization. Psychological Review, 110,
220264.
Jger, A. O., S, H.- M., & Beauducel, A. (1997).
Berliner Intelligenzstruktur Test [Berlin Intel-
ligence Structure test]. Gttingen: Hogrefe.
Jensen, A. R. (1998). The g factor: The science of
mental ability. London: Praeger.
Johnson-Laird, P. N. (1985). Deductive reasoning abil-
ity. In R. J. Sternberg (Ed.),Human abilities: An
information-processing approach (pp. 173194).
New York: Freeman.
Johnson-Laird, P. N. (1994a). Mental models andprobabilistic thinking. Cognition, 50, 189209.
Johnson-Laird, P. N. (1994b). A model theory of
induction. International Studies in the Philoso-
phy of Science, 8, 529.
Johnson-Laird, P. N. (2001). Mental models and
deduction. Trends in Cognitive Science, 5,
434442.
Johnson-Laird, P. N., & Byrne, R. M. J. (1991).
Deduction. Hove, UK: Lawrence Erlbaum.
Johnson-Laird, P. N., & Byrne, R. M. J. (1993).
Models and deductive rationality. In K. Manktelov
& D. Over (Eds.), Rationality: Psychological
and philosophical perspectives (pp. 177210).
London: Routledge.Johnson-Laird, P. N., Legrenzi, P., Girotto, V.,
Legrenzi, M. S., & Caverni, J. P. (1999). Nave
probability: A mental model theory of exten-
sional reasoning. Psychological Review, 106,
6288.
Kane, M. J., & Engle, R. W. (2002). The role of pre-
frontal cortex in working-memory capacity,
executive attention, and general fluid intelli-
gence: An individual-differences perspective.
Psychonomic Bulletin & Review, 9, 637671.
Kane, M. J., Hambrick, D. Z., Tuholski, S. W.,
Wilhelm, O., Payne, T. W., & Engle, R. W.(2004). The generality of working-memory
capacity: A latent-variable approach to verbal
and visuo-spatial memory span and reasoning.
Journal of Experimental Psychology: General,
133, 189217.
Klauer, K. C., Musch, J., & Naumer, B. (2000). On
belief bias in syllogistic reasoning. Psychologi-
cal Review, 107, 852884.
Klauer, K. J. (1990). A process theory of inductive
reasoning tested by the teaching of domain-
specific thinking strategies.European Journal of
Psychology of Education, 5, 191206.
Klauer, K. J. (2001). Handbuch kognitives training
[Handbook of cognitive training]. Toronto:
Hogrefe.
Krueger, F., & Spearman, C. (1906). Die Korrelation
zwischen verschiedenen geistigen Leistungs-
fhigkeiten [The correlation between different
mental abilities].Zeitschrift fr psychologie, 44,
50114.
Kyllonen, P. C. (1996). Is working memory capacity
Spearmans g? In I. Dennis & P. Tapsfield
(Eds.),Human abilities: Their nature and mea-
surement (pp. 4975). Mahwah, NJ: Lawrence
Erlbaum.Kyllonen, P. C., & Christal, R. E. (1990). Reasoning
ability is (little more than) working-memory
capacity?!Intelligence, 14, 389433.
Kyllonen, P. C., & Stephens, D. L. (1990). Cognitive
abilities as determinants of success in acquiring
logic skill.Learning and Individual Differences,
2, 129160.
Line, W. (1931). The growth of visual perception in
children.British Journal of Psychology, 15.
Lohman, D. F. (1996). Spatial ability and g. In
I. Dennis & P. Tapsfield (Eds.),Human abilities:
390HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 390
-
5/24/2018 Measuring Deductive Reasoning
19/20
Their nature and measurement (pp. 97116).
Mahwah, NJ: Lawrence Erlbaum.
Magnani, L. (2001).Abduction, reason, and science:Processes of discovery and explanation.
Dordrecht, the Netherlands: Kluwer Academic.
McDonald, R. P. (1985). Factor analysis and related
methods. Hillsdale, NJ: Lawrence Erlbaum.
Miyake, A., & Shah, P. (1999). Models of working
memory: Mechanisms of active maintenance
and executive control. New York: Cambridge
University Press.
Oberauer, K., S, H.-M., Schulze, R., Wilhelm, O.,
& Wittmann, W. W. (2000). Working memory
capacity: Facets of a cognitive ability construct.
Personality and Individual Differences, 29,10171045.
Oberauer, K., S, H.-M., Wilhelm, O., & Wittmann,
W. W. (2003). The multiple faces of working
memory: Storage, processing, supervision, and
coordination.Intelligence, 31, 167193.
Penrose, L. S., & Raven, J. C. (1936). A new series
of perceptual tests: Preliminary communication.
British Journal of Medical Psychology, 16,
97104.
Rips, L. J. (1994). The psychology of proof:
Deductive reasoning in human thinking.
Cambridge: MIT Press.
Roberts, R. D., Goff, G. N.,Anjoul, F., Kyllonen, P. C.,
Pallier, G., & Stankov, L. (2000). The Armed
Services Vocational Aptitude Battery: Not much
more than acculturated learning (Gc)?Learning
and Individual Differences, 12, 81103.
Schaeken, W., de Vooght, G., Vandierendonck, A., &
dYdewalle, G. (Eds.). (2000).Deductive reason-
ing and strategies. New York: Lawrence Erlbaum.
Schmidt, F. L., & Hunter, J. E. (1998). The validity
and utility of selection methods in personnel
psychology: Practical and theoretical implica-
tions of 85 years of research findings.
Psychological Bulletin, 124, 262274.Shafir, E., & Le Boeuf, R. A. (2002). Rationality.
Annual Review of Psychology, 53, 491517.
Shye, S. (1988). Inductive and deductive reasoning:A
structural reanalysis of ability tests. Journal of
Applied Psychology, 73, 308311.
Sloman, S. A. (1996). The empirical case for two
systems of reasoning. Psychological Bulletin,
119, 322.
Spearman, C. (1904). General intelligence objec-
tively determined and measured. American
Journal of Psychology, 15, 201293.
Spearman, C. (1923). The nature of intelligenceand
the principles of cognition. London: Macmillan.
Spearman, C. (1927). The abilities of man: Theirnature and measurement. New York: AMS.
Spearman, C. (1938). Measurement of intelligence.
Scientia, 64, 7582.
Stanovich, K. E. (1999). Who is rational: Studies of
individual differences in reasoning. Mahwah,
NJ: Lawrence Erlbaum.
Stegmller, W. (1996). Das Problem der Induktion:
Humes Herausforderung und moderne Antworten
[The problem of induction: Humes challenge
and modern answers]. Darmstadt: Wissenschaf-
tliche Buchgesellschaft.
Stenning, K., & Oberlander, J. (1995). A cognitivetheory of graphical and linguistic reasoning:
Logic and implementation. Cognitive Science,
19, 97140.
Sternberg, R. J., & Turner, M. E. (1981). Components
of syllogistic reasoning.Acta Psychologica, 47,
245265.
String, G. (1908). Experimentelle Untersuchungen
ber einfache Schlussprozesse [Experimental
studies on simple inference processes]. Archiv
fr die gesamte Psychologie, 11, 127.
S, H.-M., Oberauer, K., Wittmann, W. W.,
Wilhelm, O., & Schulze, R. (2002). Working
memory capacity explains reasoning ability
and a little bit more.Intelligence, 30, 261288.
Thurstone, L. L. (1938). Primary mental abilities.
Chicago: University of Chicago Press.
Thurstone, L. L., & Thurstone, T. G. (1941).
Factorial studies of intelligence. Chicago:
University of Chicago Press.
Undheim, J. O., & Gustafsson, J.-E. (1987). The hier-
archical organization of cognitive abilities:
Restoring general intelligence through the use of
linear structural relations.Multivariate Behavior
Research, 22, 149171.
Wilhelm, O. (2000). Psychologie des schlussfolgern-den Denkens: Differentialpsychologische Prfung
von Strukturberlegungen [Psychology of rea-
soning: Testing structural theories]. Hamburg:
Dr. Kovac.
Wilhelm, O., & Conrad, W. (1998). Entwicklung und
Erprobung von Tests zur Erfassung des logis-
chen Denkens [Development and evaluation of
deductive reasoning tests]. Diagnostica, 44,
7183.
Wilhelm, O., & McKnight, P. E. (2002). Ability and
achievement testing on the World Wide Web. In
Measuring Reasoning Ability391
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 391
-
5/24/2018 Measuring Deductive Reasoning
20/20
B. Batinic, U.-D. Reips, & M. Bosnjak (Eds.),
Online social sciences (pp. 151181). Toronto:
Hogrefe.Wilhelm, O., & Schulze, R. (2002). The relation of
speeded and unspeeded reasoning with mental
speed.Intelligence, 30, 537554.
Wilkins, M. C. (1929). The effect of changed mater-
ial on ability to do formal syllogistic reasoning.
Psychological Archives, 16, (102).
Woodworth, R. S., & Sells, S. B. (1935). An
atmosphere effect in formal syllogistic reason-
ing. Journal of Experimental Psychology, 18,451460.
Yang, Y., & Johnson-Laird, P. N. (2001). Mental
models and logical reasoning problems in the
GRE. Journal of Experimental Psychology:
Applied, 7, 308316.
392HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 392