measuring deductive reasoning

Upload: iam-eeryah

Post on 14-Oct-2015

48 views

Category:

Documents


1 download

DESCRIPTION

Test

TRANSCRIPT

  • 5/24/2018 Measuring Deductive Reasoning

    1/20

    21

    MEASURING REASONING ABILITY

    OLIVER WILHELM

    DEDUCTIVE AND INDUCTIVE REASONING

    Reasoning is a thinking activity that is of crucialimportance throughout our lives. Consequen-tially, the ability to reason is of central impor-tance in all major theories of intelligence structure.Whenever we think about the causes of eventsand actions, when we pursue discourse, when we

    evaluate assumptions and expectations based onour prior knowledge, and when we develop ideasand plans, the ability to reason is pivotal.

    The verb reason is associated with varioushighly overlapping meanings. Justifying andsupporting concepts and ideas is as important asconvincing others through good reasons and thediscovery of conclusions through the analysisof discourse. In modern psychology, usually twoto three forms of reasoning are distinguished. Indeductive reasoning, we derive a conclusion thatis necessarily true if the premises are true. Ininductive reasoning, we try to infer information

    by increasing the semantic content when pro-ceeding from the premises to the conclusion.Sometimes, a third form of reasoning is distin-guished (Magnani, 2001). In abductive reason-ing, we reason from a fact to the action that hascaused it. Abductive reasoning has not beenthoroughly investigated in intelligence research,and we can consider abductive reasoning to be asubset and mixture of inductive and deductivereasoning. In the remainder of this chapter,abductive reasoning will not be discussed.

    In deduction, the premises necessarily entailor imply the conclusion. It is impossible that thepremises are true and that the conclusion isfalse. Three perspectives on deduction can bedistinguished. From a syntactic perspective, therelation between premises and conclusion isderivable independent of the instantiation of thepremises. The criterion for the correctness of an

    argument is its derivability from the premises.From a semantic perspective, the conclusion istrue in any possible model of the premises. Thecriterion for the correctness of an argument is itsvalidity. From a pragmatic perspective, there isa learned or acquired relation between premisesand conclusion that has no logical necessity.The criterion to assess the quality of an argu-ment is its utility.

    These perspectives cannot be applied toinduction because the criteria to assess conclu-sions must be different. Carnaps formalizationhas attracted considerable attention when it

    comes to distinguishing forms of induction.Carnap (1971) classifies inductive arguments asenumerative and eliminative. In enumerativeinduction, the premises assert something as trueof a finite number of specific objects or subjects,and the conclusion infers that what is true forthe finite number is true of all such objects orsubjects. In eliminative induction, confirmationproceeds by falsifying competing alternativehypotheses. The problem with induction is thatwe cannot prove for any inductive inference

    373

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 373

  • 5/24/2018 Measuring Deductive Reasoning

    2/20

    with true premises that the inference providesus with a true conclusion (Stegmller, 1996).

    Nevertheless, induction is of crucial importancein science whenever we talk about discovery.However, the testing of theories is a completelydeductive enterprise. Induction and deductionhence both have their place in science, and theability to draw good inductive and deductiveinferences is of major importance in real life.

    Historically, logic was primarily establishedthrough Aristotle. Although Aristotle viewedlogic as the proper form of scientific investiga-tion, he used the term as equivalent to verbalreasoning. The syllogistic form of reasoning, as

    established through Aristotle, dominated logic upuntil the middle of the 19th century. Throughoutthe second half of the 19th century, there was arapid development of logic as a scientific disci-pline. Philosophers such as George Boole (1847)and Gottlob Frege (1879) started to develop for-malizations of deductive logic as a language thatwent beyond the idea that logic should reflectcommon sense and sound reasoning. In a nut-shell, logic was the manipulations of symbols byvirtue of a set of rules. The logical truth of anargument was hence no longer assessed by agree-ment with some experts or through acceptance bycommon sense. Whether logical reasoning wascorrect could then be assessed by agreement witha calculus. In our historical excursion, we need tonote, however, that George Boole did believe thatthe laws of thinking and the rules of logic areequivalent, and John Stuart Mill thought that therules of logic are generalizations of forms ofconclusions considered true by humans.

    Apparently from the early days of logic tonow, the puzzle remains that although humansinvented logic, they are not able or willing tofollow its standards in all instances. Humans are

    vulnerable to errors in reasoning and do not pro-ceed consistently in deriving conclusions. Theresearch on biases, contents, and strategies inreasoning has a long tradition in psychology.For example, String (1908) investigatedthought processes in syllogistic reasoning anddistinguished various strategies, Wilkins (1929)manipulated test content and observed effectson test properties, and Woodworth and Sells(1935) conducted outstanding research on aparticular bias in syllogistic reasoning labeledthe atmosphere effect.

    In contemporary psychological research onreasoning, so-called dual-process theories domi-

    nate. In these theories, an associative, heuristic,implicit, experiential, and intuitive system ofinformation processing is contrasted with asecond rule-based, analytical, explicit, andrational system (Epstein, 1994; Evans, 1989;Hammond, 1996; Sloman, 1996; Stanovich, 1999).Most of the biases found in reasoning, judgment,and decision making can be located withinthe first system. A reasoning competence andpropensity to think rationally can be locatedwithin the second system. In considering individ-ual differences in reasoning ability, the interest is

    primarily on differences within the second sys-tem. Most of the differences could reflect indi-vidual differences in available resources for thecomputational work to be accomplished to obtaina correct response. An additional source of indi-vidual differences might be the probability withwhich individuals deliberately use the secondsystem when responding to specific problems.

    The discussion of individual differences inreasoning ability starts with the assertion thatthere are individual differences in the abilityto reason according to some rational standard.Humans can be rational in principle, but theyfail to a varying degree in practice. The princi-ple governing this rationality is that peopleaccept inferences as valid if there is no mentalrepresentation contradicting the conclusion(Johnson-Laird & Byrne, 1993; Stanovich,1999). Individual differences from this per-spective primarily arise from restrictions in theability to create and manipulate mental repre-sentations. In other words, depending on ourcognitive apparatus, we are able to find a good,or the correct, answer to some reasoning prob-lems but not to other more difficult problems.

    In measuring reasoning ability, it is conse-quently assumed that individuals can thinkrationally but that there are individual differ-ences in how well people can do so.

    THOUGHT PROCESSES IN REASONING

    There are several competing theories for thedescription and explanation of reasoningprocesses. The theories are distinguished by thebroadness of the phenomena they can explain

    374HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 374

  • 5/24/2018 Measuring Deductive Reasoning

    3/20

    and how profound the proposed explanationsare. They are also different with respect to how

    much experimental research was done to inves-tigate them and how much supportive evidencewas collected. The theory of mental models(Johnson-Laird & Byrne, 1991) is one outstand-ing effort in describing and explaining whatpeople do when they reason, and this theory willbe described in more detail after briefly review-ing more specific accounts of deductive andinductive reasoning, respectively.

    Besides many more specific accounts of rea-soning, the mental logic approach to reasoninghas many adherents and was applied to a broad

    range of reasoning problems (Rips, 1994).According to mental logic theories, individualsapply schemata of inference when they reason.Errors in reasoning occur when inferenceschemata are unavailable, corrupted, or cannotbe applied. More complex inferences areaccomplished by compiling several elementalschemata. The inference schemata in variousmental logic theories are different from eachother, from logical terms in natural language,and from logical terms in formal logic. Thepsychology of proof by Rips (1994) is themost elaborated and sophisticated theory ofmental logic. However, the mental model theorycovers a broader range of phenomena thanmental logic accounts do. In addition, the exper-imental support seems to be in favor of themental models theory. Finally, both sets oftheories are closely related with each otherthemajor difference being that the mental modelapproach deals with reasoning on the semanticlevel, whereas mental logic theories investigatereasoning on the syntactic level.

    Analogical reasoning is a subset of inductivethinking that has received considerable attention

    in cognitive psychology. For example, Holyoakand Thagard (1997) developed a multiconstrainttheory of analogical reasoning. Three con-straints are claimed to create coherence in ana-logical thought: similarity between the conceptsinvolved; structural parallelsspecifically,isomorphismbetween the functions in thesource and target domains; and guidance by thereasoners goals. This work was recentlyextended. Hummel and Holyoak (2003) devel-oped a symbolic connectionist model of rela-tional inference. The theory suggests that

    distributed symbolic representations are thebasis of relational reasoning in working memory.

    There is no doubt substantial promise in extend-ing these accounts of inductive thinking toavailable reasoning measures. So far, there is notenough experimental evidence available allow-ing derivation of predictions of item difficulties(but see Andrews & Halford, 2002), and there isnot enough variability in the application of thetheories to allow a broad application in pre-dicting psychometric properties of reasoningtests in general. To illustrate the character andpromise of theories of reasoning processes,I will limit the exposition to the mental model

    theory. It is hoped that the future will bring anintegration of theories of inductive and deduc-tive reasoning along with strong links totheories of working memory.

    The mental model theory has been exten-sively applied to deductive reasoning (Johnson-Laird, 2001; Johnson-Laird & Byrne, 1991)and inductive thinking (Johnson-Laird, 1994b).Briefly, mental model theory views thinkingas the manipulation of models (Craik, 1943).These models are analogous representations,meaning that the structure of the models corre-sponds to what they represent. Each entity isrepresented by an individual token in a model.Properties of and relations between entitiesare represented by properties of and relationsbetween tokens, respectively. Negations ofatomic propositions are represented as annota-tions of tokens. Information can be representedimplicitly, and the implicit status of a model ispart of the representation. If necessary, implicitrepresentations can be fleshed out by simplemechanisms. The epistemic status of a modelis represented as a propositional annotation inthe model.

    A major determinant of the difficulty of rea-soning tasks is the number of mental modelsthat are compatible with the premises. Thepremises A is left of B. B is left of C. C is leftof D. D is left of E. can be easily integrated intoone mental model:

    A B C D E

    This mental model supports conclusionssuch as C is left of E. However, the premisesA is left of B. B is left of C. C is left of E. D is

    Measuring Reasoning Ability375

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 375

  • 5/24/2018 Measuring Deductive Reasoning

    4/20

    left of E. call for the construction of two mentalmodels. The first mental model places C left of

    D, whereas the second mental model places Dleft of C.

    Model 1: A B C D E

    Model 2: A B D C E

    Both models are compatible with thepremises. Generally, the more mental modelsthat are compatible with the premises of a rea-soning task, the harder the task will be. This pre-diction has been confirmed with a wide varietyof reasoning tasks, including syllogisms, spatial

    and temporal reasoning, propositional reason-ing, and probabilistic reasoning (Johnson-Laird,1994a; Johnson-Laird & Byrne, 1991). In estab-lished measures of reasoning ability, it is hard orimpossible to specify the nature and number ofmental models a given item calls for (Yang &Johnson-Laird, 2001). This is because test con-struction is usually driven by applying psycho-metric criteria and not by creating indicatorsthrough the strictly theory-driven derivationfrom a cognitive model of thought processes. Inspecifically constructed measures, on the otherhand, the nature and number of mental modelsthat participants need to construct in order tosolve an item correctly can be manipulated. Theempirical study presented later in this chaptermixes measures with and without explicitmanipulation of the number of mental modelsrequired for successful solution.

    Inductive and deductive reasoning processesgo through the same three stages of informationprocessing. In the first stage, the premises areunderstood. Knowledge in general and literacyin dealing with the stimuli are critical in build-ing a representation of the problem. Frequently,

    the problem will be verbal, and hence readingcomprehension will be an important aspect ofthe creation of representations. However, it iswell known that strategies can have an effect onencoding. In solving syllogisms, subgroups ofindividuals might follow different strategies forcreating an initial representation of problemcontent (Ford, 1995; Stenning & Oberlander,1995; Sternberg & Turner, 1981). As a result,specific groups of items are hard for one sub-group but not for another, whereas for a secondgroup of items, the reverse is true.

    In the second stage, a parsimonious descriptionof the constructed model(s) is attempted. If

    the task is deductive reasoning, the resultingconstruction should include something thatwas not explicitly evident in the premises.Technically, no meaning is created in deduction.It is all implicit in the premises. Experientially,deductive conclusions do not seem to be com-pletely obvious and apparent. If no such conclu-sion can be found, the answer to the problem canbe that there is no conclusion to the problem. Ifthe task is inductive reasoning, the resulting con-struction allows a conclusion that increases thesemantic information of the premises. Hence,

    a tentative hypothesis is constructed that impliesa semantically stronger description than evidentin the premises. However, if background knowl-edge is operating besides the premises, an induc-tive problem might turn into an enthymemeadeduction in which not all premises are explicit.Many of the so-called inductive tasks used inintelligence research technically might wellbe classified as enthymemes. Frequently usednumber-series problems could qualify asenthymemes. If the premises of such a number-series task are explicitly statedfor example, asContinue the number series 1, 3, 5, 7, 9, 11 byone more number, The operations you can useare + , /, and * and all results are positiveintegers, and rules are indicating regularities inproceeding through the number series, and theseregularities can include rule-based changes tothe rulethere might be just one option thatmeaningfully continues the sequence: 13.

    In the third stage, models are evaluated,maintained, modified, or rejected. If the task isdeductive reasoning, counterexamples to tenta-tive conclusions are searched for. If no coun-terexample can be found, the conclusion is

    produced. If a counterexample is found, theprocess goes back to Stage 2. If the task isinductive reasoning, the conclusion adds infor-mation to the premises. The conclusion shouldbe consistent with the premises and backgroundknowledge. Obviously, inductive conclusionsare not necessarily true. If an induction turns outto be wrong, either the premises are false or theinduction was too strong. If a deduction turnsout to be wrong, the premises must be false.

    Evidently, only the third stage is specific toinductive and deductive reasoning, respectively.

    376HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 376

  • 5/24/2018 Measuring Deductive Reasoning

    5/20

    However, errors in answering reasoningproblems can be located at any of the three

    stages. The relevance of the third stage as aprimary source of errors can be debated.Johnson-Laird (1985) argues that the searchfor counterexamples is crucial for individualdifferences, yet Handley, Dennis, Evans, andCapon (2000) argue that individuals rarelyengage in a search for counterexamples. Psy-chometrically, syllogisms and spatial relationaltasks that do not rely on a search for counterex-amples are as good or better than measures ofreasoning ability as items that require such asearch (Wilhelm & Conrad, 1998).

    Theories about reasoning processes in generaland the mental model theory in particular havebeen widely and successfully applied to reason-ing problems. Few of these applications haveconsidered problems from psychometric reason-ing tasks (but see Yang & Johnson-Laird, 2001).We will now discuss the status of reasoning abil-ity in various models of the structure of intelli-gence, as assessed by psychometric reasoningtasks, and then turn to formal and empirical clas-sifications of reasoning measures. Ideally, a gen-eral theory of reasoning processes should governtest construction and confirmatory data analysis.In practice, theories of reasoning processes haverarely been considered when creating and usingpsychometric reasoning tasks.

    REASONING IN VARIOUS MODELSOF THE STRUCTURE OF INTELLIGENCE

    Binets original definition of intelligencefocused on abilities of sensation, perception,and reasoning, but this definition was modifiedseveral times and ended up defining intelligence

    as the ability to adapt to novel situations (Binet,1903, 1905, 1907). Structurally, Binets as wellas Ebbinghauss (1895) earlier investigations donot fall within the realm of factor-analytic work,and consequently, they have been rarely dis-cussed in this context.

    Spearmans invention of tetrad analysis as ameans to assess the rank of correlation matriceswas the starting point of factor-analytic work(Krueger & Spearman, 1906; Spearman, 1904).Spearmans definition of general intelligence(g) focuses on the role of educing correlates and

    relations. The ability to educe relations andcorrelates is best reflected in reasoning measures.

    Other intelligence measures are characterizedby varying proximity to the general factor.Reasoning measures are expected to have high gloadings and low proportions of specific vari-ance. The g factor is said to be precisely definedand the core construct of human abilities(Jensen, 1998; but see Chapter 16, this volume).There are several more or less strict interpreta-tions of the g factor theory (Horn & Noll, 1997).In its strictest form, one core process is causalfor all communalities in individual differences.In a much more relaxed form of the theory, a

    general factor is supposed to capture the corre-lations between oblique first- or second-orderfactors. With respect to reasoning, Spearman(1923) considered inductive and deductivereasoning to be forms of syllogisms. AlthoughSpearman (1927) did not exclude the option ofa reasoning-specific group factor besides g, per-formance on reasoning measures was assumedto be primarily limited by mental energyor g.

    The controversy around Spearmans theorywas initially focused on statistical and method-ological issues, and it was in the context of newstatistical developments that Thurstone con-tributed his theory of primary mental abilities.Thurstones initial work on the structure of intelli-gence (1938) was substantially modified andimproved by Thurstone and Thurstone (1941). Inthe later work, the primary factors of Space,Number, Verbal Comprehension, Verbal Fluency,Memory, Perceptual Speed, and Reasoning aredistinguished. The initial distinction betweeninductive and deductive reasoning was abandoned,and the associated variances were allocated toReasoning, Verbal Comprehension, Number, andSpace. The Reasoning factor is marked mostly

    by inductive tasks. Several of the other factorshave substantial loadings from reasoningtasks. In a sample of eighth-grade students, theReasoning factor is the factor with the highestloading on a second-order factor. Further elabo-ration of deductive measures by creating betterindicators, as suggested by the Thurstones, wasattempted only by the research groups surround-ing Colberg (Colberg, Nester, & Cormier, 1982;Colberg, Nester, & Trattner, 1985) and Guilford.

    Guilfords contribution to the measurementof reasoning ability is mostly in constructing and

    Measuring Reasoning Ability377

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 377

  • 5/24/2018 Measuring Deductive Reasoning

    6/20

    popularizing reasoning measures. The structure-of-intellect (SOI) theory (Guilford, 1956, 1967)

    is mostly to be credited for its heuristic valuein including some of what was previously no-mans-land into intelligence research. For thepresent purposes, the focus is on reasoning abilityexclusively, and Guilfords major contributions tothis topic can be located prior to the specificationof the SOI theory. On the basis of a mixture ofliterature review, construction of specific tests, andempirical investigations of the structure of reason-ing, Guilford proposed initially three, later four,reasoning factors (Guilford, Christensen, Kettner,Green, & Hertzka, 1954; Guilford, Comrey,

    Green, & Christensen, 1950; Guilford, Green, &Christensen, 1951; Hertzka,Guilford,Christensen,& Berger, 1954). These four factors (GeneralReasoning,Thurstones Induction, Commonalities,and Deduction) are hard to separate conceptuallyand empirically. Specifically, the first three factorsare very similar on the task level, and empirically,inductive tasks load on all three of these reasoningfactors. The deduction factor is marked weaklywith tasks that are hard to distinguish from tasksassigned to other reasoning factors. The taskspopularized by Guilford are still in use today(Ekstrom, French, & Harman, 1976), but manymeasures are available that are much betterconceptually and psychometrically.

    The Berlin Intelligence Structure model(Jger, S, & Beauducel, 1997; see Chapter 18,this volume) is a bimodal hierarchical perspec-tive on cognitive abilities. Intelligence tasks areclassified with respect to a content facet and anoperation facet. On the content facet, Verbal,Quantitative, and Spatial intelligence aredistinguished. On the operation facet, Creativity/Fluency, Memory, Processing Speed, andReasoning are distinguished. The model has a

    surface similarity with Guilfords SOI theorybut avoids some of the technical pitfalls ofGuilfords model. The Reasoning factor on theoperation facet is defined as information pro-cessing in tasks that require availability andmanipulation of complex information. The pro-cessing thus reflects reasoning and judgmentabilities. The Reasoning factor is defined acrossthe content facet, and consequently, there areverbal, spatial, and numerical reasoning tasks.

    In an epochal effort, Carroll (1993) summa-rized and reanalyzed factor-analytic studies of

    human cognitive abilities. The result of thiswork is an elaborated hierarchical theory that

    postulates a general factor, g, at the highestlevel. On a second level, broad ability factorsare distinguished. The proposed abilities arefluid intelligence (Gf), crystallized intelligence(Gc), general memory and learning, broadvisual perception, broad auditory perception,broad retrieval ability, broad cognitive speedi-ness, and processing speed. Fluid intelligenceis largely identified by three reasoning abili-ties distinguished on the lowest stratum ofCarrolls theory. The three reasoning factorsare Sequential Reasoning, Induction, and

    Quantitative Reasoning. The Sequential Reason-ing factor is measured by tasks that requireparticipants to reason from premises, rules, orconditions to conclusions that properly andnecessarily follow from them. In the remainderof this chapter, the terms sequential reasoningand deductive reasoning will be used inter-changeably. The Induction factor is measuredby tasks that provide individuals with materialsthat are governed by some rules, principles,similarities, or dissimilarities. Participants aresupposed to detect and infer those features ofthe stimuli and apply the inferred rule. TheQuantitative Reasoning factor is measured bytasks that ask the participant to reason withconcepts involving numerical or mathematicalrelations. Figure 21.1 presents the classificationof reasoning tasks according to Carroll (1993,p. 210).

    The theory developed by Cattell and Horn(Horn & Noll, 1994, 1997) is very closelyrelated to Carrolls theory. In fact, Carrollstheory is more based on Cattells and Hornswork than the other way round. Their investiga-tion of human cognitive capabilities was

    focused on five kinds of evidence in its develop-ment: first, structural evidence as expressed inthe covariation of performances; second, devel-opmental change through the life span; third,neurocognitive evidence; fourth, achievementevidence as expressed in the prediction ofcriteria involving cognitive effort; and fifth,behavioral-genetic evidence. Major differencesbetween the three-stratum theory from Carrolland the Gf-Gc theory from Horn and Cattell arethe lack of a general factor in the Cattell-Hornframework because, according to Horn and Noll

    378HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 378

  • 5/24/2018 Measuring Deductive Reasoning

    7/20

    (1994), there is no unifying principle and henceno sufficient reason for specification of ageneral factor. However, for the presentpurposes, the proposed structure and interpreta-tion of reasoning ability is of major importance.Horn and Noll interpret fluid intelligence asinductive and deductive reasoning that is criticalin understanding relations among stimuli, com-prehending implications, and drawing infer-ences. Horn and Noll (1997) also speak aboutconjunctive and disjunctive reasoning, but sup-posedly, these two forms fall under inductiveand deductive reasoning. The Cattell-Horntheory assumes that both inductive and deduc-tive reasoning tasks can have verbal as well asspatial content (Horn & Cattell, 1967). This ideacan be extended, and both Gf and Gc can bemeasured with a broader variety of contents(Beauducel, Brocke, & Liepmann, 2001). Interms of the structure of reasoning ability, thereis little difference between Carrolls theory, onthe one side, and the Cattell-Horn framework,on the other. The major difference is the postu-

    lation of a separate quantitative factor in thelatter model, whereas Carroll subsumes quanti-tative reasoning under fluid intelligence.

    Based on available psychometric reasoningtasks, reasoning ability has a central place in all ofthe above-discussed theories of the structure ofintelligence. However, the manifold of availablemeasures might still reflect a biased selection fromall possible reasoning tests. The two followingsections on formal and empirical classificationsshould contribute to deepening our understandingof reasoning measures and reasoning ability.

    FORMAL CLASSIFICATIONS OF REASONING

    There is certainly no lack of reasoning measures.Carroll (1993) lists a very broad variety of avail-able reasoning tasks, and more, similar testscould be developed without major problems.Kyllonen and Christal (1990) summarize thesituation as follows:

    Since Spearman (1923) reasoning has been

    defined as an abstract, high-level process, eluding

    precise definition. Development of good tests of

    reasoning ability has been almost an art form,

    owing more to empirical trial-and-error than to

    systematic delineation of the requirements such

    tests must satisfy. (p. 426)

    Although empirical evidence indicates thatsome measures are better indicators of reason-ing ability than others, the theoretical knowl-edge about which measure is good for whatreasons is still very limited. In addition, scien-tists and practitioners are left with little advice

    from test authors as to why a specific test hasthe form it has. It is easy to find two reasoningtests that are said to measure the same abilitybut that are vastly different in terms of theirfeatures, attributes, and requirements.

    Compared to this bottom-up approach of testconstruction, a top-down approach could facilitateconstruction and evaluation of measures. Thereare four aspects of such a top-down approach thatwill be discussed subsequently: operation, con-tent, instantiation and nonreasoning requirements,and vulnerability to reasoning strategies.

    Measuring Reasoning Ability379

    Sequ.

    Reason.

    gf

    Quantitat.

    Inductive Analo-

    gies

    Odd

    Elements

    Matrix

    Tasks

    Multiple

    Exempl.

    Quantit.

    Tasks

    Series

    Tasks

    Rule

    Discover

    Gen. ver.

    Reason.

    Linear

    Syllog.

    Categor.

    Syllog.

    Figure 21.1 Carrolls Higher-Order Model of Fluid Intelligence (Reasoning)

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 379

  • 5/24/2018 Measuring Deductive Reasoning

    8/20

    The first aspect to consider in the classificationof reasoning measures is the formal operational

    requirement. Reasoning tasks can call for induc-tive and deductive inferences, and amongvarious tests for fluid intelligence, there areadditional tests that primarily call for judgment,decision making, and planning. In focusing oninductive and deductive reasoning, the distinc-tion is that in inductive reasoning, individualscreate semantic information; as a result, theinferences are not necessarily true. In deductivereasoning, however, individuals maintainsemantic information and derive inferences thatare necessarily true if the premises are true.

    Tasks that are commonly classified as requiringbroad visualization (Carroll, 1993) usuallysatisfy the definition of deductive reasoning.However, the visualization demand of suchtasks is pivotal and paramount (Lohman, 1996),and such tasks will consequently be excludedfrom further discussion.

    A second aspect to consider in the classifica-tion of reasoning measures is the content oftasks. Tasks can have many contents, but thevast majority of reasoning measures employfigural, quantitative, or verbal stimuli. Manytasks also represent a mixture of contents. Forexample, arithmetic reasoning tasks can be bothverbal and quantitative. Experimental manipula-tions of the content of measures are desirableto understand the structure of reasoning abilitymore profoundly.

    A third aspect of relevance in classifyingmeasures of reasoning ability has to do with theinstantiation of reasoning problems. Reasoningproblems have an underlying formal structure.If we decide to construct a measure of reasoningability, we instantiate this general form and havea variety of options in doing so. In choosing

    between these options, essentially we gothrough a decision tree. A first choice mightbe to use either concrete or abstract forms ofreasoning problems. In the abstract branch, wemight choose between a nonsense instantia-tion and a variable instantiation. In the case ofsyllogistic reasoning tests, a nonsense instantia-tion might be All Gekus are Lemis. All Lemisare Filop. A variable instantiation of thesame underlying logical form could be AllA are B. All B are C. In the concrete branch ofthe decision tree, prior knowledge is of crucial

    importance. Instantiations of reasoning problemscan either conform or not with our prior knowl-

    edge. Nonconforming instantiations can eitherbe counterfactual or impossible. A coun-terfactual instantiation could be All psycholo-gists are Canadian. All Canadians drive Porsches.An impossible instantiation could be All catsare dogs. All dogs are birds. In the branch thatincludes instantiations that conform to priorknowledge, we can distinguish factual andpossible instantiations. A factual instantia-tion could be All cats are mammals. Allmammals have chromosomes. A possibleinstantiation could be All white cars in this

    garage are fast. All fast cars in this garage runout of petrol.It is well established that the form of the

    instantiation has substantial effects on the diffi-culty of structurally identical reasoning tasks(Klauer, Musch, & Naumer, 2000). It is alsoknown that the form of the instantiation of rea-soning tasks has some influence on the psycho-metric properties of reasoning tasks (Gilinsky& Judd, 1993). Abstract instantiations mightinduce test anxiety in some individuals becausethey look like formulas. Aside from this possi-ble negative effect, abstract instantiations mightbe a good format for reasoning tasks. Instan-tiations that do not conform to prior knowledgeare likely to be less good forms of reasoningproblems because there is an apparent conflictbetween prior knowledge and the requiredthought processes. It is likely that some indi-viduals are better able than others to abstractfrom their prior knowledge. However, such anabstraction would not be covered by a measure-ment intention that aims at assessing the abilityto reason deductively. Instantiations that actu-ally reflect prior knowledge are not good forms

    for reasoning problems because rather than rea-soning, the easiest way to a solution is to recallthe actual knowledge. Some of the most widelyused tests of deductive reasoning are impossi-ble instantiations. The psychometric differencesbetween measures instantiated in a differentway are likely to be not trivial.

    The final aspect of a classification of reason-ing measures discussed here deals with thevulnerability of a task to reasoning strategies.In measuring reasoning abilitylike most otherabilitiesit is assumed that all individuals

    380HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 380

  • 5/24/2018 Measuring Deductive Reasoning

    9/20

    approach the problems in the same way. Someindividuals are more successful than others

    because they have more of the required abil-ity. Consequently, it is implicitly assumed thatindividuals at the very top of the ability distrib-ution proceed roughly in the same way througha reasoning test as individuals at the verybottom of the distribution. If a subgroup of par-ticipants chooses a different approach to workon a given test, the consequence is that the testis measuring different abilities for different sub-groups. For syllogistic reasoning, it is knownthat there are two or three subgroups of individ-uals who approach syllogistic reasoning tests

    differently. Depending on which strategy ischosen, different items are easy and hard, respec-tively (Ford, 1995). Knowledge about strategiesin reasoning is limited (but see Schaeken, deVooght, Vandierendonck, & dYdewalle, 2000),and the role of strategies in established reasoningmeasures has been barely investigated.

    The actual reasoning tasks that have beenused in experimental investigations of reasoningprocesses and psychometric studies of reason-ing ability have little to no overlap in surfacefeatures. However, there is now good evidence(Stanovich, 1999) that reasoning problems, asthey have been used in cognitive psychology,are moderately correlated with reasoning mea-sures as they have been used in individual-differences research. The experimentally usedtasks have been thoroughly investigated, and wenow know a lot about the ongoing thoughtprocesses involved in these tasks. One importantconclusion from this research is that the instan-tiations of reasoning problems are appropriateto elicit the intended reasoning processes for themost part (Shafir & Le Boeuf, 2002; Stanovich,1999). However, there are pervasive reliability

    issues because frequently, only a few suchproblems are used in any given experiment.Conversely, we do not know a lot about ongoingthought processes in established measures ofreasoning ability as used in psychometricresearch. However, we do know a lot about theirstructure (Carroll, 1993), their relations withother measures of maximal behavior (Carroll,1993; Jger et al., 1997; Kyllonen & Christal,1990), and their validity for the predictionof real-life criteria (Schmidt & Hunter, 1998).Both sets of reasoning tasks can and should

    be used when studying reasoning ability. Thebenefits would be mutual. For example, differ-

    ences in correlations between various individualreasoning items as used in cognitive researchand latent variables from reasoning ability testsmight reveal important differences between theexperimental tasks. Similarly, variability in thedifficulties of items from standard psychometricreasoning tests can be possibly explained byapplication of various theories of reasoningprocesseslike the mental model theory thatwas sketched above.

    EMPIRICAL CLASSIFICATIONSOF REASONING MEASURES

    In psychology, inductive reasoning has fre-quently been equated with proceeding fromspecific premises to general conclusions.Conversely, deductive reasoning has frequentlybeen equated with proceeding from generalpremises to specific conclusions. This definitioncan still be found in textbooks, but it is outdated.There are inductive arguments proceeding fromgeneral premises to specific conclusions, andthere are deductive arguments proceeding fromspecific premises to general conclusions. Forexample, the argument Almost all Swedes areblond. Jan is a Swede. Therefore Jan is blond.is an inductive argument that violates the abovedefinition, and the argument Jan is a Swede.Jan is blonde. Therefore some Swedes areblond. is a deductive argument that alsoviolates the above definition.

    According to Colberg et al. (1982), mostestablished reasoning tests confound the direc-tion of inference (general or specific premisesand general or specific conclusions) with deduc-

    tive and inductive reasoning tasks. By con-structing specific deductive and inductivereasoning tasks (Colberg et al., 1985), they pre-sent correlational evidence that seems to supportthe unity of inductive and deductive reasoningtasks. However, reliability of the measures isvery low; the applied method of disattenuatingcorrelations is not satisfying; and, most impor-tant, Shye (1988) reclassifies their tasks andfinds support for a distinction between rule-inferring and rule-applying tasks (see Chapter 18,this volume). In the initial classification and

    Measuring Reasoning Ability381

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 381

  • 5/24/2018 Measuring Deductive Reasoning

    10/20

    construction of tasks (Colberg et al., 1985), testshave been labeled as inductive when in fact they

    were probabilistic. Probabilistic tasks can, inprinciple, be deductive (Johnson-Laird, 1994a;Johnson-Laird, Legrenzi, Girotto, Legrenzi, &Caverni, 1999), and the probabilistic tasks used(Colberg et al., 1985) were in fact deductivetasks. What was shown by Colberg (Colberget al., 1982, 1985), then, was the unity of someforms of deductive reasoning tasks, and whatShye demonstrated was that task classificationis a sensitive business and that rule-applyingtasks, as constructed by Colberg et al., fall intothe periphery of a multidimensional scaling,

    with rule inferring/inductive reasoning at thecenter of the solution.The most sophisticated, ambitious, and

    advanced attempt to propose factors of reason-ing ability comes from Carroll (1993). Carrolldiscusses the structure of reasoning ability,bearing in mind several objections and difficul-ties. Among those objections are that (a) reason-ing tests are frequently complex, requiringboth inductive and deductive thought processes;(b) reasoning measures are often short andadministrated under timed conditions; (c) rea-soning tests are usually not carefully constructedand analyzed on the item level; (d) inductive anddeductive reasoning processes are learned anddeveloped together; and (e) many reasoningmeasures involve language, quantitative, orspatial skills to an unknown amount.

    Carroll (1993) asserts that his proposal of thethree reasoning factorsInduction, Deduction,and Quantitative Reasoningis preliminary forseveral reasons (but see Carroll, 1989). First, inmany of the reanalyzed studies, only one rea-soning factor emerged. This is simply due to thefact that there was frequently not a sufficient

    number of reasoning tests included to examinethe structure of reasoning ability in such studies.Second, in the 37 out of 176 data sets with morethan one reasoning factor, most of the studieswere never intended and designed to investigatethe structure of reasoning ability. Third, thosestudies intended to investigate the structure ofreasoning ability included insufficient numbersof reasoning measures. Other problems withinvestigating the structure of reasoning abilityinclude variations in time pressure across testsand studies, variations in scoring procedures,

    variations in instructing participants, and, mostimportant, individual measures that are classi-

    fied post hoc rather than a priori.In carefully examining Tables 6.1 and 6.2

    from Carroll (1993), it is apparent that thedeductive reasoning tasks are frequently verbal.Content for the inductive reasoning tasks ismore diverse but tends to be figural-spatial. Thelast reasoning factor is rather unequivocally aquantitative factor. An explanation of the datain Carroll as indicating a distinction betweeninductive, deductive, and quantitative reasoningcompetes with an explanation that distinguishesbetween verbal, figural-spatial, and quantitative

    content. Inspection of Carrolls reanalysis ofindividual data sets is compatible with an inter-pretation of the factor labeled as general sequen-tial reasoning or deductive reasoning as a verbalreasoning factor. The inductive reasoning factor,on the other side, could reflect figural-spatialreasoning. The quantitative reasoning factorapparently reflects numerical or quantitativereasoning. Compatible with this interpretationis that the deductive reasoning factor can fre-quently not be distinguished from a verbalfactor and tends to have high loadings on ahigher-order crystallized factor. In accord withthe interpretation of the inductive reasoningfactor, the figural-spatial reasoning processesmeasured with the associated tasks tend to behighly associated with a higher-order fluid rea-soning factor. In line with this theorizing, theinduction factor has the highest loading on g ofall Stratum 1 factors. The deductive reasoningfactor ranks only 10 among these loadings. Themean loading of induction on g is .57, whereasthe mean loading of deductive reasoning is only.41. Besides the mean difference in the averagemagnitude of loadings, there is a higher disper-

    sion of g loadings among the deductive tasks.Similarly, the fluid intelligence factor, Gf, isbest defined by induction in Carrolls reanalysis.Gf is defined by induction 19 times, with anaverage loading of .64. Deductive reasoningdefined Gf only 6 times, with an average load-ing of .55. On the other side, deductive reason-ing appears among the variables definingcrystallized intelligence. Deductive reasoningdefined the Gc factor 7 times, with an averageloading of .69. Induction does not appear on thelist of Stratum 1 abilities defining crystallized

    382HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 382

  • 5/24/2018 Measuring Deductive Reasoning

    11/20

    intelligence. Finally, deductive reasoning appears8 times, with an average loading of .70 on a

    factor labeled 2Hreflecting a mixture of fluidand crystallized intelligence. Induction, on theother hand, appeared only twice, with anaverage loading of .41.

    Given these considerations, the proposal ofreasoning ability as being composed of induc-tive, deductive, and quantitative reasoning iscompeting with a proposal of verbal, figural-spatial, and quantitative reasoning. To investi-gate possible structures of reasoning ability, oneshould include tasks that allow for comparisonbetween several competing theories. There are

    basically five theories competing as explana-tions for the structure of reasoning ability.

    1. a general reasoning factor accounting for the

    communality of reasoning tasks varying with

    respect to content (verbal, quantitative, figural-

    spatial) and operation (inductive, deductive);

    2. two correlated factors for inductive and

    deductive reasoning, respectively, without the

    specification of any content factors;

    3. three correlated factors for verbal, quantitative,

    and figural-spatial reasoning, without distin-

    guishing inductive and deductive reasoning

    processes;

    4. a general reasoning factor along with nested

    and completely orthogonal factors for verbal

    and quantitative reasoning but no figural-

    spatial factor; and

    5. two correlated factors for inductive and deduc-

    tive reasoning along with completely orthogo-

    nal content factors for verbal and quantitative

    reasoning and again no figural-spatial factor.

    For the evaluation of these models, it isimportant to avoid a confound between contentand process on the task level. A second crucialaspect for exploring the structure of reasoningability is to select appropriate tasks to measurethe intended constructs. This is particularly hardin the domain of deductive reasoning. Followingthe above-presented definition of inductive anddeductive reasoning, it is very difficult to findadequate measures of figural-spatial deductivereasoning. In fact, only 7 of all the tasksdescribed in Carroll (1993) can be classified as

    deductive figural-spatial tasks. However, thesetasks frequently represent a mixture with other

    demands. For example, ship-destination hasquantitative demands; match problems, plot-ting, and route planning have visualizationdemands. In classifying 90 German intelligencetasks, Wilhelm (2000) could not find a singledeductive figural-spatial measure.

    To test the structure of reasoning ability,Wilhelm (2000) selected reasoning measuresbased on their cognitive demands and thecontent involved. In addressing the above-mentioned criticisms of existent reasoning tasks,several reasoning tasks were newly constructed.

    The following 12 measures were included in thestudy (D and I denote deductive and inductivereasoning; F, N, and V stand for figural, numeri-cal, and verbal content, respectively).

    DF1 (Electric Circuits): Positive and negative

    signals travel through various switches. The result-

    ing signal has to be indicated. The number and kind

    of switches and the number of signals are varied

    (Gitomer, 1988; Kyllonen & Stephens, 1990).

    DF2 (Spatial Relations): Spatial orientation of

    symbols is presented pairwise. The spatial orien-

    tation of two symbols that were not presented

    together can be derived from the pairwise presen-

    tations (Byrne & Johnson-Laird, 1989).

    DN1 (Solving Equations): A series of equations is

    presented. Participants can derive values of vari-

    ables deductively. Items vary by the number of

    variables and the difficulty of relation. A difficult

    sample item is A plus B is C plus D. B plus C is

    2*A. A plus D is 2*B. A + B is 11. A + C is 9.

    DN2 (Arithmetic Reasoning): Participants pro-

    vide free responses to short verbally stated arith-

    metic problems from a real-life context.

    DV1 (Propositions): Acts of a hypothetical

    machine are described, and the correct conclusion

    has to be deduced. The number of mental models,

    logical relation, and negation are varied in this

    multiple-choice test (Wilhelm & McKnight,

    2002). A simple sample item is as follows: If the

    lever moves and the valve closes, then the inter-

    rupter is switched. The lever moves. The valve

    closes.

    DV2 (Syllogisms): Verbally phrased quantitative

    premises are presented in which the number of

    Measuring Reasoning Ability383

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 383

  • 5/24/2018 Measuring Deductive Reasoning

    12/20

    mental models is varied by manipulating the

    figure and quantifier (Wilhelm & McKnight,

    2002). A sample item is as follows: No big shieldis red. All round shields are big.

    IF1 (Figural Classifications): Participants are

    asked to find the one pictorial figure that does not

    belong with four other figures based on various

    attributes of the figures.

    IF2 (Matrices): Based on trends in rows and

    columns of 3*3 matrices, a figure that belongs in

    a specified cell has to be selected from several

    distractors.

    IN1 (Number Series): Rule-ordered series of

    numbers are to be continued by two elements. Thedifficulty of the rule that has to be detected is varied.

    IN2 (Unfitting Number): In a series of numbers,

    one that does not fit has to be identified.

    IV1 (Verbal Analogies): Analogies as they are fre-

    quently used in intelligence research. The general

    form of the multiple-choice items is ? is to B as

    C is to ?. The vocabulary of these double analo-

    gies is simple (i.e., participants are familiar with

    all terms), and the difficulty of the relationship is

    varied.

    IV2 (Word Meanings): In this multiple-choice

    test, participants should identify a word that means

    approximately the same thing as a given word.

    A total of 279 high school students with amean age of 17.7 years and a standard deviationof 1.2 years completed all tests and several cri-terion measures. All tests were analyzed sepa-rately with item response theory models. For alltests, a two-parameter model assuming disper-sion in item discrimination was superior to aRasch model. The estimated person parameters

    from these two-parameter models were subse-quently analyzed. For participants who goteither all answers wrong or all answers right,person parameters were interpolated. Some ofthe reliabilities of the tasks are not satisfying.Coefficient Omega (McDonald, 1985) for IF1and IF2 are only .50 and .51, respectively. Theoverall test length for individual measures mightbe responsible for these suboptimal results.

    The core research question in the presentcontext is which of the above-specified modelsprovides the best fit for the data. A one-factor

    model simply specifies one latent reasoningfactor with loadings from all indicators. A

    two-factor model specifies two correlated latentfactors: one factor with loadings on all theinductive tasks, the other factor with loadings onall the deductive tasks. The correlation betweenboth factors is estimated freely. The three-factormodel specifies three correlated content factors:a verbal factor with loadings from all the verbaltasks, a quantitative factor with loadings fromall quantitative tasks, and a figural-spatial factorwith loadings on all the figural-spatial tasks.The fourth model specifies a general reasoningfactor and two orthogonal nested factorsone

    for the four verbal tasks and the other for thefour quantitative tasks. The fifth model specifiesan inductive reasoning factor with loadingsfrom all inductive reasoning tasks and, likewise,a deductive reasoning factor with loadings fromall the deductive reasoning tasks. In addition,the two content factors as in the fourth modelare specified. The two reasoning factors arecorrelated, but the three content factors are not.Generally, there are, of course, other possiblemodel architectures (see Chapter 14, thisvolume). However, the above-mentioned mod-els provide a test of competing theories for thestructure of reasoning ability. The last two modelsmentioned above specify content factors for theverbal and quantitative tasks only. For thefigural-spatial tasks, such a content factor mightnot be necessary because such tasks have beensaid to require decontextualized reasoning, andobserved individual differences do not reflectspecific prior knowledge (Ackerman, 1989,1996; Undheim & Gustafsson, 1987). Modelswith and without a first-order factor of figural-spatial reasoningas specified in the currentcontextare nested and can be compared infer-

    entially (see Chapter 14, this volume).Table 21.1 summarizes the fit of the five

    confirmatory factor analyses. Comparing thegeneral factor model with a model that specifiestwo correlated factors of inductive and deductivereasoning, respectively, reveals that there is noadvantage in estimating the correlation betweeninductive and deductive reasoning freely (asopposed to restricting this correlation to unity).Indeed, the correlation between both factors inModel 2 is estimated to be exactly 1. Conseque-ntly, when comparing these two models, the

    384HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 384

  • 5/24/2018 Measuring Deductive Reasoning

    13/20

    general factor model is the better explanation ofthe data because it is more parsimonious than thetwo-factor model. However, both models do notprovide acceptable fit.

    A model specifying three correlated groupfactors for content does substantially better inexplaining the data. Although there is still roomto improve fit, the model represents an accept-able explanation of the data. Given that themodel is completely derived from theory, it canserve as a good starting point for future investi-gations. Comparing the two models with com-pletely orthogonal content factors againdemonstrates the superiority of the model thatpostulates the unity of inductive and deductivereasoning. In this data set, inductive and deduc-tive reasoning are perfectly correlated.Introducing a distinction between both factors isunnecessary and consequently does not improvemodel fit. Both models are substantially betterthan the initial one- and two-factor models.However, one of the loadings on the verbalfactor is not significant and negative in sign.Given this departure from the theoretical expec-

    tation of positive and significant loadings, andkeeping in mind interpretative issues with groupfactors in nested factor models (see Chapter 14,this volume), the best solution seems to beaccepting the model based on the contentfactors. In this model, there are three content-related reasoning factors, each one of them sub-suming inductive and deductive reasoning tasks.In the current study, the model with correlatedgroup factors is equivalent to a second-orderfactor model. In this model, the correlationsbetween factors are captured by a higher-order

    factor. This model is presented in Figure 21.2.The two content factorsVerbal and Quantita-tive Reasoningreflect deductive and inductivereasoning with verbal and quantitative material,respectively. Due to the relevance of task con-tent, it can be expected that the Verbal and theQuantitative Reasoning factors do predict dif-ferent aspects of criteria such as school grades,achievement, and the like. The loading of theFigural Reasoning factor on fluid intelligence isfreely estimated to be 1. Not only are g and Gfvery highly or perfectly correlated (Gustafsson,1983), but the same is true between figural-spatial reasoning and fluid intelligence. Con-sequently, the current analysis extends Undheimand Gustafssons (1987) work to a lower stra-tum. It is a replicated finding that Gf is theStratum 2 factor with the highest loading ong (Carroll, 1993). It has also been argued thatthis relation might be perfect (Gustafsson, 1983;Undheim & Gustafsson, 1987, but see Chapter18, this volume). Figural-spatial reasoning, inturn, has the highest loading on fluid intelli-gence, and in the data presented in this chapter,

    the relation between figural-spatial reasoningand the factor labeled fluid intelligence is per-fect. Hence, if we do want to measure g with asingle task, we should select a task of figural-spatial reasoning. Matrices tasks have been con-sidered particularly good measures of Gf and g.Spearman (1938) suggested the Matrices testfrom Penrose and Raven (1936), as well as theinductive figural measure from Line (1931), asthe single best indicators of g. The latter test isless prominent than the Matrices test, but vari-ants of it can be found in various intelligence

    Measuring Reasoning Ability385

    Table 21.1 Fit Statistics of Five Competing Structural Explanations of Reasoning Ability

    g Ind. Ded. Cont. g & Cont. Ind. Ded. & Cont.

    2 121.2 121.2 84.8 73.3 72.0df 54 53 51 46 45p

  • 5/24/2018 Measuring Deductive Reasoning

    14/20

    tests. Although it is not good practice tomeasure rather general constructs with singletasks, there is certainly evidence suggestingthat, if need be, this sole task should be afigural-spatial reasoning measure. Whether sucha task is classified as inductive or deductive isnot important for that purpose.

    Frequently, the composition of intelligencebatteries is not well balanced in the sense thatthere are many indicators for one intelligence

    construct but few or no tests for other intelli-gence constructs. In such cases (e.g., Robertset al., 2000), the overall solution can be domi-nated by tasks other than fluid intelligencetasks. As a result, figural-spatial reasoning tasksmight not be the best selection in these cases toreflect the g factor of such a battery.

    When interpreting the results from this study,it is important to keep in mind that the differ-ences between various models were not that big.With different tasks and different participants, itis possible that different results emerge. The

    present results are preliminary and in need ofreplication and extension. The most importantresult from the study reported above is that in acritical test aimed to assess a distinctionbetween inductive and deductive reasoning, nosuch distinction could be found. Latent factorsof inductive and deductive reasoning are per-fectly correlated in several models. The result ofa unity of inductive and deductive reasoningwas also obtained with multidimensionalscaling, exploratory factor analysis, and tetradanalysis. It is important to note that this result

    emerged considering the desiderata for futureresearch provided by Carroll (1993, p. 232).Specifically, the present tasks have beenselected or constructed based on a careful reviewof the individual-differences and cognitiveliterature on the topic, the items were analyzedby latent item response theory, and the scaleswere analyzed by confirmatory factor analyses.The current tests include several new reasoningmeasures that are based on and informed

    through cognitive psychology.

    WORKING MEMORY AND REASONING

    There have been several attempts to explainreasoning ability in terms of other abilities thatare considered more basic and tractable. Specifi-cally, working memory has been proposed asthe major limiting factor for human reasoning(Kyllonen & Christal, 1990; S, Oberauer,Wittmann, Wilhelm, & Schulze, 2002). The

    working definition of working memory has beenthat any task that requires individuals to simul-taneously store and process information can beconsidered a working memory task (Kyllonen &Christal, 1990). This definition has been criti-cized because it seems to include all reasoningmeasures. The definition has also been criti-cized because its notion of storage and pro-cessing are imprecise and fuzzy (see Chapter22, this volume). A critique of the workingmemory = reasoning hypothesis can also focuson the problem of the reduction of one construct

    386HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    Verbal

    gf

    Figural

    Quant.

    IV2

    DV1

    DV2

    IN1

    IN2

    IV1

    IF2

    DF1

    DF2

    DN1

    DN2

    IF1.35

    .57

    .50

    .68

    .67

    .60

    .33

    .49

    .45

    .63

    .73

    .69

    .83

    .84 1.00

    Figure 21.2 Higher-Order Model of Fluid Intelligence (Reasoning)

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 386

  • 5/24/2018 Measuring Deductive Reasoning

    15/20

    in need of explanation through another one(Deary, 2001) that is not doing any better.

    However, this critique is unjustified for severalreasons.

    1. It is easy to construct and create workingmemory tasks. Many tasks that satisfy the abovedefinition work in the sense that they correlatehighly with other working memory measures,reasoning, Gf, and g. In addition, it is easy andstraightforward to manipulate the difficulty of aworking memory item by manipulating the stor-age demand, the process demand, or the timeavailable to do storage, processing, or both.

    Those manipulations account for a large amountof variance of task difficulty in almost all cases.

    2. There is an enormous corpus of researchon working memory and processes in workingmemory in cognitive psychology (Conway,Jarrold, Kane, Miyake, & Towse, in press;Miyake & Shah, 1999). It is fruitful to deriveknowledge and hypotheses about individual dif-ferences in cognition from this body of research.

    3. In the sense of a reduction of workingmemory on biological substrates, intensive andvery productive research has linked workingmemory functioning to the frontal lobes andinvestigated the role of various physiologicalparameters to cognitive functioning (Kane &Engle, 2002; see Chapter 9, this volume, for areview of research linking reasoning to variousneuropsychological parameters). Hence, theequation of working memory with reasoning iscomplemented by relating working memory tothe frontal lobes and other characteristics andfeatures of the brain.

    The strengths of the relation found between

    latent factors of working memory and reasoningvary substantially, fluctuating between a low of.6 (Engle, 2002; Engle, Tuholski, Laughlin, &Conway, 1999; Kane et al., 2004) and a high ofnearly 1 (Kyllonen, 1996). In the discussion ofthe strength of the relation, several sources thatcould cause an underestimation or an overesti-mation should be kept in mind.

    1. The relation should be assessed on thelevel of latent factors because this is the levelof major interest when it comes to assessing

    psychological constructs. There should be morethan three indicators of sufficient psychometric

    quality for each construct to allow an evaluationof the measurement models on both sides.

    2. Depending on the task selection and thebreadth of the definition of both constructs, thespecification of more than one factor on bothsides might be necessary (Oberauer, S,Wilhelm, & Wittmann, 2003).

    3. The definition of constructs and taskclasses is a difficult issue. Classifying anythingas a working memory task that requires simulta-neous storage and processing could turn out to

    be overinclusive. Restricting fluid intelligenceto figural-spatial reasoning measures is likely tobe underinclusive. The comments on tasks ofreasoning ability presented in this chapter, aswell as similar comments on what constitutes agood working memory task (see Chapters 5 and22, this volume), might be a good starting pointfor definition of task classes.

    4. Content variation in the operationaliza-tion for both constructs can have an influence onthe magnitude of the relation. When assessingreasoning ability, one is well advised to use

    several tasks with verbal, figural, and quantita-tive content. The same is true for workingmemory. This chapter provided some evidencefor the content distinction on the reasoning side.Similar evidence for the working memoryside is evident in structural models that positcontent-specific factors of working memory(Kane et al., 2004; Kyllonen, 1996; Oberauer,S, Schulze, Wilhelm, & Wittmann, 2000).Relating working memory tasks of one contentwith reasoning tasks of another content causesone to underestimate the true relation.

    5. A mono-operation bias should be avoidedin assessing both constructs. Using only com-plex span tasks or only dual-tasks to assessworking memory functioning does not do

    justice to the much more general nature of theconstruct (Oberauer et al., 2000). Task class-specific factors or task-specific strategies mighthave an effect on the estimated relation.

    6. Reasoning measureslike other intelli-gence tasksare frequently administered undertime constraints. Timed and untimed reasoning

    Measuring Reasoning Ability387

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 387

  • 5/24/2018 Measuring Deductive Reasoning

    16/20

    ability are not perfectly correlated (Wilhelm &Schulze, 2002). Similarly, working memory tasks

    frequently have timed aspects (Ackerman, Beier,& Boyle, 2003). For example, there might beonly a limited time to execute a process beforethe next stimulus appears, there might be a timedrate of stimulus presentation, and the like.Common speed variance could inflate the corre-lation between working memory and reasoning.

    The assumption that working memory is acritical ingredient to success on reasoning tasksis compatible with experimental evidence andtheories from cognitive psychology. The ability

    to successfully create and manipulate mentalrepresentations was argued to be the criticalingredient in reasoning. Whether the necessaryrepresentations can be created and manipulateddepends crucially on working memory. Thisprediction has gained strong support from thecorrelational studies relating working memoryand reasoning. If the individual differences inreasoning ability and working memory turn outto be roughly the same, the evidence supportingthe predictive validity of reasoning ability andfluid intelligence applies to working memorycapacity, too. After careful consideration ofcosts and benefits, it might be sensible to usemore tractable working memory tasks for manypractical purposes.

    SUMMARY AND CONCLUSIONS

    The fruitful avenue to future research on mea-suring and understanding reasoning ability ischaracterized by (a) more theoretically moti-vated work in the processes and resourcesinvolved in reasoning and (b) the use of confir-

    matory methods on the item and test level toinvestigate meaningful measurement and struc-tural models. The major result of efforts directedthat way would be a more profound understand-ing of important thought processes and animproved construction and design of measuresof reasoning ability. A side product of suchefforts will be generative item production andtheoretically derived assumptions about psycho-metric properties of items and tests. Anotherside product would be the option to developmore appropriate means of altering reasoning

    ability. There are several very interestingattempts to develop training methods for rea-

    soning ability, and the initial results are encour-aging in some cases (Klauer, 1990, 2001).Although it was not possible to discriminatebetween inductive and deductive reasoningpsychometrically, it could be possible thatappropriate training causes differential gains inboth forms of reasoning. The cognitive processesin inductive and deductive reasoning tasksmight be different, but the individual differenceswe can observe on adequate measures are not.This does not exclude the option that boththought processes might be affected by different

    interventions.

    REFERENCES

    Ackerman, P. L. (1989). Abilities, elementary infor-

    mation processes, and other sights to see at the

    zoo. In R. Kanfer, P. L. Ackerman, & R. Cudeck

    (Eds.), Abilities, motivation, and methodology:

    The Minnesota symposium on learning and

    individual differences (Vol. 10, pp. 280293).

    Hillsdale, NJ: Lawrence Erlbaum.

    Ackerman, P. L. (1996). A theory of adult intellectual

    development: Process, personality, interests, andknowledge.Intelligence, 22, 229259.

    Ackerman, P. L., Beier, M. E., & Boyle, M. D.

    (2003). Individual differences in working

    memory within a nomological network of

    cognitive and perceptual speed abilities.Journal

    of Experimental Psychology: General, 131,

    567589.

    Andrews, G., & Halford, G. S. (2002). A cognitive

    complexity metric applied to cognitive develop-

    ment. Cognitive Psychology, 45, 153219.

    Beauducel, A., Brocke, B., & Liepmann, D. (2001).

    Perspectives on fluid and crystallized intelli-gence: Facets for verbal, numerical, and figural

    intelligence. Personality and Individual Differ-

    ences, 30, 977994.

    Binet, A. (1903).L etude experimentale de lintelli-

    gence [Experimental studies of intelligence].

    Paris: Schleicher, Frenes.

    Binet, A. (1905). A propos de la measure de lintelli-

    gence [On the subject of measuring intelli-

    gence].Anne Psychologique, 12, 6982.

    Binet, A. (1907). La psychologie du raisonnement

    [The psychology of reasoning]. Paris: Alcan.

    388HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 388

  • 5/24/2018 Measuring Deductive Reasoning

    17/20

    Boole, G. (1847). The mathematical analysis of logic:

    Being an essay towards a calculus of deductive

    reasoning. Cambridge, UK: Macmillan, Barclay,and Macmillan.

    Byrne, R. M. J., & Johnson-Laird, P. N. (1989).

    Spatial reasoning. Journal of Memory and

    Language, 28, 564575.

    Carnap, R. (1971). Logical foundations of probabil-

    ity. Chicago: University of Chicago Press.

    Carroll, J. B. (1989). Factor analysis since Spearman:

    Where do we stand? What do we know? In

    R. Kanfer, P. L. Ackerman, & R. Cudeck (Eds.),

    Abilities, motivation, and methodology: The

    Minnesota symposium on learning and individ-

    ual differences (Vol. 10, pp. 4370). Hillsdale,NJ: Lawrence Erlbaum.

    Carroll, J. B. (1993). Human cognitive abilities: A

    survey of factor-analytic studies. Cambridge,

    MA: Cambridge University Press.

    Colberg, M., Nester, M. A., & Cormier, S. M. (1982).

    Inductive reasoning in psychometrics: A philo-

    sophical corrective.Intelligence, 6, 139164.

    Colberg, M., Nester, M. A., & Trattner, M. H. (1985).

    Convergence of the inductive and deductive

    models in the measurement of reasoning abilities.

    Journal of Applied Psychology, 70, 681694.

    Conway, A. R. A., Jarrold, C., Kane, M., Miyake, A.,

    & Towse, J. (in press). Variation in working

    memory. Oxford, UK: Oxford University Press.

    Craik, K. (1943). The nature of explanation.

    Cambridge, MA: Cambridge University Press.

    Deary, I. J. (2001). Human intelligence differences:

    Towards a combined experimental-differential

    approach. Trends in Cognitive Science, 5,

    164170.

    Ebbinghaus, H. (1895). ber eine neue Methode

    zur Prfung geistiger Fhigkeiten und ihre

    Anwendung bei Schulkindern [On a new method

    to test mental abilities and its application with

    schoolchildren].Zeitschrift fr Psychologie undPhysiologie der Sinnesorgane, 13, 401459.

    Ekstrom, R. B., French, J. W., & Harman, H. H.

    (1976).Manual for kit of factor-reference cogni-

    tive tests. Princeton, NJ: Educational Testing

    Service.

    Engle, R. W. (2002). Working memory capacity as

    executive attention. Current Directions in

    Psychological Science, 11, 1923.

    Engle, R. W., Tuholski,S. W., Laughlin, J. E., & Conway,

    A. R. A. (1999). Working memory, short-term

    memory and general fluid intelligence: A latent

    variable approach. Journal of Experimental

    Psychology: General, 128, 309331.

    Epstein, S. (1994). Integration of the cognitive andthe psychodynamic unconscious. American

    Psychologist, 49, 709724.

    Evans, J. St. B. T. (1989).Bias in human reasoning:

    Causes and consequences. Hove, UK: Lawrence

    Erlbaum.

    Ford, M. (1995). Two modes of mental representation

    and problem solution in syllogistic reasoning.

    Cognition, 51, 171.

    Frege, G. (1879).Begriffsschrift: Eine der arithmetis-

    chen nachgebildete Formelsprache des reinen

    Denkens [Begriffsschrift: A formula language

    modeled upon that of arithmetic, for purethought]. Halle a.S.: L. Nebert.

    Gilinsky, A. S., & Judd, B. B. (1993). Working

    memory and bias in reasoning across the life

    span. Psychology and Aging, 9, 356371.

    Gitomer, D. H. (1988). Individual differences in

    technical troubleshooting.Human Performance,

    1, 111131.

    Guilford, J. P. (1956). The structure of intellect.

    Psychological Bulletin, 53, 267293.

    Guilford, J. P. (1967). The nature of human intelli-

    gence. New York: McGraw-Hill.

    Guilford, J. P., Christensen, P. R., Kettner, N. W.,

    Green, R. F., & Hertzka, A. F. (1954). A factor

    analytic study of Navy reasoning tests with the

    Air Force Aircrew Classification Battery.

    Educational and Psychological Measurement,

    14, 301325.

    Guilford, J. P., Comrey, A. L., Green, R. F., &

    Christensen, P. R. (1950). A factor-analytic

    study on reasoning abilities: I. Hypotheses and

    description of tests. Reports from the

    Psychological Laboratory, University of

    Southern California, Los Angeles.

    Guilford, J. P., Green, R. F., & Christensen, P. R.

    (1951). A factor-analytic study on reasoningabilities: II. Administration of tests and analysis

    of results. Reports from the Psychological

    Laboratory, University of Southern California,

    Los Angeles.

    Gustafsson, J.-E. (1983). A unifying model for the

    structure of intellectual abilities.Intelligence, 8,

    179203.

    Hammond, K. R. (1996).Human judgment and social

    policy: Irreducible uncertainty, inevitable error,

    unavoidable injustice. Oxford, UK: Oxford

    University Press.

    Measuring Reasoning Ability389

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 389

  • 5/24/2018 Measuring Deductive Reasoning

    18/20

    Handley, S. J., Dennis, I., Evans, J. St. B. T., & Capon,

    A. (2000). Individual differences and the search

    for counter-examples in reasoning. In W. Schaeken,A. Vandierendonck, & G. de Vooght (Eds.),

    Deductive reasoning and strategies (pp. 241266).

    Hillsdale, NJ: Lawrence Erlbaum.

    Hertzka, A. F., Guilford, J. P., Christensen, P. R., &

    Berger, R. M. (1954). A factor analytic study

    of evaluative abilities.Educational and Psycho-

    logical Measurement, 14, 581597.

    Holyoak, K. J., & Thagard, P. (1997). The analogical

    mind.American Psychologist, 52, 3544.

    Horn, J. L., & Cattell, R. B. (1967). Age differences

    in fluid and crystallized intelligence. Acta

    Psychologica, 26, 107129.Horn, J. L., & Noll, J. (1994). A system for under-

    standing cognitive capabilities: A theory and

    the evidence on which it is based. In D. K.

    Detterman (Ed.), Current topics in human

    intelligence: Vol. 4. Theories of intelligence

    (pp. 151203). Norwood, NJ: Ablex.

    Horn, J. L., & Noll, J. (1997). Human cognitive

    capabilities: Gf-Gc theory. In D. P. Flanagan, J. L.

    Genshaft, & P. L. Harrison (Eds.), Contemporary

    intellectual assessment: Theories, tests, and

    issues (pp. 5392). New York: Guilford.

    Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-

    connectionist theory of relational inference and

    generalization. Psychological Review, 110,

    220264.

    Jger, A. O., S, H.- M., & Beauducel, A. (1997).

    Berliner Intelligenzstruktur Test [Berlin Intel-

    ligence Structure test]. Gttingen: Hogrefe.

    Jensen, A. R. (1998). The g factor: The science of

    mental ability. London: Praeger.

    Johnson-Laird, P. N. (1985). Deductive reasoning abil-

    ity. In R. J. Sternberg (Ed.),Human abilities: An

    information-processing approach (pp. 173194).

    New York: Freeman.

    Johnson-Laird, P. N. (1994a). Mental models andprobabilistic thinking. Cognition, 50, 189209.

    Johnson-Laird, P. N. (1994b). A model theory of

    induction. International Studies in the Philoso-

    phy of Science, 8, 529.

    Johnson-Laird, P. N. (2001). Mental models and

    deduction. Trends in Cognitive Science, 5,

    434442.

    Johnson-Laird, P. N., & Byrne, R. M. J. (1991).

    Deduction. Hove, UK: Lawrence Erlbaum.

    Johnson-Laird, P. N., & Byrne, R. M. J. (1993).

    Models and deductive rationality. In K. Manktelov

    & D. Over (Eds.), Rationality: Psychological

    and philosophical perspectives (pp. 177210).

    London: Routledge.Johnson-Laird, P. N., Legrenzi, P., Girotto, V.,

    Legrenzi, M. S., & Caverni, J. P. (1999). Nave

    probability: A mental model theory of exten-

    sional reasoning. Psychological Review, 106,

    6288.

    Kane, M. J., & Engle, R. W. (2002). The role of pre-

    frontal cortex in working-memory capacity,

    executive attention, and general fluid intelli-

    gence: An individual-differences perspective.

    Psychonomic Bulletin & Review, 9, 637671.

    Kane, M. J., Hambrick, D. Z., Tuholski, S. W.,

    Wilhelm, O., Payne, T. W., & Engle, R. W.(2004). The generality of working-memory

    capacity: A latent-variable approach to verbal

    and visuo-spatial memory span and reasoning.

    Journal of Experimental Psychology: General,

    133, 189217.

    Klauer, K. C., Musch, J., & Naumer, B. (2000). On

    belief bias in syllogistic reasoning. Psychologi-

    cal Review, 107, 852884.

    Klauer, K. J. (1990). A process theory of inductive

    reasoning tested by the teaching of domain-

    specific thinking strategies.European Journal of

    Psychology of Education, 5, 191206.

    Klauer, K. J. (2001). Handbuch kognitives training

    [Handbook of cognitive training]. Toronto:

    Hogrefe.

    Krueger, F., & Spearman, C. (1906). Die Korrelation

    zwischen verschiedenen geistigen Leistungs-

    fhigkeiten [The correlation between different

    mental abilities].Zeitschrift fr psychologie, 44,

    50114.

    Kyllonen, P. C. (1996). Is working memory capacity

    Spearmans g? In I. Dennis & P. Tapsfield

    (Eds.),Human abilities: Their nature and mea-

    surement (pp. 4975). Mahwah, NJ: Lawrence

    Erlbaum.Kyllonen, P. C., & Christal, R. E. (1990). Reasoning

    ability is (little more than) working-memory

    capacity?!Intelligence, 14, 389433.

    Kyllonen, P. C., & Stephens, D. L. (1990). Cognitive

    abilities as determinants of success in acquiring

    logic skill.Learning and Individual Differences,

    2, 129160.

    Line, W. (1931). The growth of visual perception in

    children.British Journal of Psychology, 15.

    Lohman, D. F. (1996). Spatial ability and g. In

    I. Dennis & P. Tapsfield (Eds.),Human abilities:

    390HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 390

  • 5/24/2018 Measuring Deductive Reasoning

    19/20

    Their nature and measurement (pp. 97116).

    Mahwah, NJ: Lawrence Erlbaum.

    Magnani, L. (2001).Abduction, reason, and science:Processes of discovery and explanation.

    Dordrecht, the Netherlands: Kluwer Academic.

    McDonald, R. P. (1985). Factor analysis and related

    methods. Hillsdale, NJ: Lawrence Erlbaum.

    Miyake, A., & Shah, P. (1999). Models of working

    memory: Mechanisms of active maintenance

    and executive control. New York: Cambridge

    University Press.

    Oberauer, K., S, H.-M., Schulze, R., Wilhelm, O.,

    & Wittmann, W. W. (2000). Working memory

    capacity: Facets of a cognitive ability construct.

    Personality and Individual Differences, 29,10171045.

    Oberauer, K., S, H.-M., Wilhelm, O., & Wittmann,

    W. W. (2003). The multiple faces of working

    memory: Storage, processing, supervision, and

    coordination.Intelligence, 31, 167193.

    Penrose, L. S., & Raven, J. C. (1936). A new series

    of perceptual tests: Preliminary communication.

    British Journal of Medical Psychology, 16,

    97104.

    Rips, L. J. (1994). The psychology of proof:

    Deductive reasoning in human thinking.

    Cambridge: MIT Press.

    Roberts, R. D., Goff, G. N.,Anjoul, F., Kyllonen, P. C.,

    Pallier, G., & Stankov, L. (2000). The Armed

    Services Vocational Aptitude Battery: Not much

    more than acculturated learning (Gc)?Learning

    and Individual Differences, 12, 81103.

    Schaeken, W., de Vooght, G., Vandierendonck, A., &

    dYdewalle, G. (Eds.). (2000).Deductive reason-

    ing and strategies. New York: Lawrence Erlbaum.

    Schmidt, F. L., & Hunter, J. E. (1998). The validity

    and utility of selection methods in personnel

    psychology: Practical and theoretical implica-

    tions of 85 years of research findings.

    Psychological Bulletin, 124, 262274.Shafir, E., & Le Boeuf, R. A. (2002). Rationality.

    Annual Review of Psychology, 53, 491517.

    Shye, S. (1988). Inductive and deductive reasoning:A

    structural reanalysis of ability tests. Journal of

    Applied Psychology, 73, 308311.

    Sloman, S. A. (1996). The empirical case for two

    systems of reasoning. Psychological Bulletin,

    119, 322.

    Spearman, C. (1904). General intelligence objec-

    tively determined and measured. American

    Journal of Psychology, 15, 201293.

    Spearman, C. (1923). The nature of intelligenceand

    the principles of cognition. London: Macmillan.

    Spearman, C. (1927). The abilities of man: Theirnature and measurement. New York: AMS.

    Spearman, C. (1938). Measurement of intelligence.

    Scientia, 64, 7582.

    Stanovich, K. E. (1999). Who is rational: Studies of

    individual differences in reasoning. Mahwah,

    NJ: Lawrence Erlbaum.

    Stegmller, W. (1996). Das Problem der Induktion:

    Humes Herausforderung und moderne Antworten

    [The problem of induction: Humes challenge

    and modern answers]. Darmstadt: Wissenschaf-

    tliche Buchgesellschaft.

    Stenning, K., & Oberlander, J. (1995). A cognitivetheory of graphical and linguistic reasoning:

    Logic and implementation. Cognitive Science,

    19, 97140.

    Sternberg, R. J., & Turner, M. E. (1981). Components

    of syllogistic reasoning.Acta Psychologica, 47,

    245265.

    String, G. (1908). Experimentelle Untersuchungen

    ber einfache Schlussprozesse [Experimental

    studies on simple inference processes]. Archiv

    fr die gesamte Psychologie, 11, 127.

    S, H.-M., Oberauer, K., Wittmann, W. W.,

    Wilhelm, O., & Schulze, R. (2002). Working

    memory capacity explains reasoning ability

    and a little bit more.Intelligence, 30, 261288.

    Thurstone, L. L. (1938). Primary mental abilities.

    Chicago: University of Chicago Press.

    Thurstone, L. L., & Thurstone, T. G. (1941).

    Factorial studies of intelligence. Chicago:

    University of Chicago Press.

    Undheim, J. O., & Gustafsson, J.-E. (1987). The hier-

    archical organization of cognitive abilities:

    Restoring general intelligence through the use of

    linear structural relations.Multivariate Behavior

    Research, 22, 149171.

    Wilhelm, O. (2000). Psychologie des schlussfolgern-den Denkens: Differentialpsychologische Prfung

    von Strukturberlegungen [Psychology of rea-

    soning: Testing structural theories]. Hamburg:

    Dr. Kovac.

    Wilhelm, O., & Conrad, W. (1998). Entwicklung und

    Erprobung von Tests zur Erfassung des logis-

    chen Denkens [Development and evaluation of

    deductive reasoning tests]. Diagnostica, 44,

    7183.

    Wilhelm, O., & McKnight, P. E. (2002). Ability and

    achievement testing on the World Wide Web. In

    Measuring Reasoning Ability391

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 391

  • 5/24/2018 Measuring Deductive Reasoning

    20/20

    B. Batinic, U.-D. Reips, & M. Bosnjak (Eds.),

    Online social sciences (pp. 151181). Toronto:

    Hogrefe.Wilhelm, O., & Schulze, R. (2002). The relation of

    speeded and unspeeded reasoning with mental

    speed.Intelligence, 30, 537554.

    Wilkins, M. C. (1929). The effect of changed mater-

    ial on ability to do formal syllogistic reasoning.

    Psychological Archives, 16, (102).

    Woodworth, R. S., & Sells, S. B. (1935). An

    atmosphere effect in formal syllogistic reason-

    ing. Journal of Experimental Psychology, 18,451460.

    Yang, Y., & Johnson-Laird, P. N. (2001). Mental

    models and logical reasoning problems in the

    GRE. Journal of Experimental Psychology:

    Applied, 7, 308316.

    392HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

    21-Wilhelm.qxd 9/8/2004 5:09 PM Page 392