human causal induction: a glimpse at the whole picture

This article was downloaded by: [York University Libraries]On: 18 November 2014, At: 23:31Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

European Journal of CognitivePsychologyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/pecp20

Human causal induction: A glimpse atthe whole pictureJosé Perales a & Andrés Catena aa Universidad de Granada, SpainPublished online: 10 Sep 2010.

To cite this article: José Perales & Andrés Catena (2006) Human causal induction: Aglimpse at the whole picture, European Journal of Cognitive Psychology, 18:2, 277-320, DOI:10.1080/09541440540000167

To link to this article: http://dx.doi.org/10.1080/09541440540000167

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/pecp20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/09541440540000167

http://dx.doi.org/10.1080/09541440540000167

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Human causal induction:

A glimpse at the whole picture

JoseÂ C. Perales and AndreÂs Catena

Universidad de Granada, Spain

In the present work, most relevant evidence in causal learning literature is reviewedand a general cognitive architecture based on the available corpus of experimentaldata is proposed. However, contrary to algorithms formulated in the Bayesian netsframework, such architecture is not assumed to optimise the usefulness of theavailable information in order to induce the underlying causal structure as a whole.Instead, human reasoners seem to rely heavily on local clues and previous knowledgeto discriminate between spurious and truly causal covariations, and piece thoserelations together only when they are demanded to do so. Bayesian networks and AIalgorithms for causal inference are nonetheless considered valuable tools to identifythe main computational goals of causal induction processes and to define theproblems any intelligent causal inference system must solve.

Causal induction is one of the pillars of intelligent behaviour. As pointed out by

Newsome (2003), except for domains such as logic and mathematics, causal

knowledge is a prerequisite for effective reasoning and problem solving, and

allows humans to manipulate, predict, and understand the world in an adaptive

way. However, the field still lacks a unifying framework to integrate the

available experimental evidence. This work is intended to contribute to clarify

the goals of research on causal induction, the extent to which the data available

to date have contributed to achieve those goals, and how new lines of research

could help to fill the gaps in our corpus of knowledge.

The plan of the paper is as follows. First, we define the computational goals

of causal induction. In order to do so, we will present an introduction to causal

Bayes' nets (for detailed reviews, see Pearl, 2000; Spirtes, Glymour, &

Scheines, 2000), a normative framework born in the field of Artificial Intelli-

Correspondence should be addressed to JoseÂ C. Perales-LoÂpez, Department of Psychology,

University of Granada, Campus Cartuja s/n, 18071, Granada, Spain. Email: [email protected]

We would like to thank Dave Lagnado and David Shanks for their many helpful comments. This

research was supported by the Spanish Ministry of Education's programme Becas Postdoctorales en

EspanÄa y el Extranjero: Convocatoria 2003, for the first author, and by a MCyT (Ministerio de

Ciencia y TecnologõÂa) grant (BSO2003-03723) to the first and the second authors.

# 2006 Psychology Press Ltd

http://www.psypress.com/ecp DOI:10.1080/09541440540000167

EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY

2006, 18 (2), 277±320

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

gence (AI) that has greatly contributed to unify the views on the relationship

between empirical, verifiable evidence, and causal constructs. In relation to

psychological research, this approach can be useful to establish a computational

reference to interpret human behaviour. In addition, we will briefly discuss some

questions relative to the definition of causes that can be helpful to circumscribe

the scope of this work.

The second part of the paper provides a general view of how humans achieve

the computational goals described in the preceding section. First, we list the

different sources of information that reasoners use to make a decision on the

existence or nonexistence of a causal link (clues to causal structure). And,

second, we briefly review the evidence about how people integrate covariation

information when estimating the degree to which a putative cause and an effect

are related (estimating causal strength).

The third section focuses on the main questions related to the ontogeny of

causal reasoning. Although developmental issues are not the core of the present

work, we will try at least to identify those developmental questions relevant to the

main arguments presented in the preceding sections. Our strongest claims are the

existence of some developmental dependency between basic learning mechan-

isms and abstract causal inference strategies, and the importance of intervention

in the emergence of abstract principles of causal induction. The existence of

biological preparedness to learn in some areas and some innate content-specific

information are not denied, but we remain unconvinced by those positions that

postulate the existence of a unitary and innate ``causal inference'' module.

Finally, the fourth section summons up a general cognitive architecture of

adult causal induction, based on the experimental evidence available to date.

That architecture is based on the idea that (1) associative and rule-based

mechanisms are just descriptions of different hierarchical levels in the system

responsible for covariation computation, and (2) covariation estimates are

integrated with other clues present in the environment and with pre-stored

general-purpose and domain-specific knowledge in order to decide whether or

not the observed covariations signal the presence of hidden causal links. This

architecture does not strictly respond to rational normativity criteria, but is

highly adaptive at adjusting the reasoner's knowledge and behaviour to the

actual causal texture of the world.

COMPUTATIONAL GOALS OF CAUSALINDUCTION: THE CAUSAL BAYES' NETS

APPROACH

Causal reasoning is the broad term that refers both to the accumulation of new

evidence from which to infer the existence of a causal link (Cheng, 1993) and to

the application of previously acquired causal knowledge when thinking, attri-

buting, or making decisions. Let us imagine a student interested in knowing

278 PERALES AND CATENA

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

whether drinking coffee improves his or her concentration when studying. With

that aim in mind, he will probably drink coffee for some days, until he has a

clear idea of whether or not there is any consistent improvement in concen-

tration. On the other hand, that same student could be interested in finding an

explanation for the fact that one particular night he felt unusually concentrated.

If our student had been drinking coffee, he could attribute his good state of

concentration to the coffee without collecting any more information. Finally, the

same student could try to imagine what his state of concentration would be if he

had not drunk coffee.

Our first example points to the importance of covariation in generating,

confirming or disconfirming causal beliefs (a psychological function that is

usually called causal induction, or, in a more restricted way, causal learning)

from the observation of a number of individual instances. The second example

stresses the relevance of previous knowledge about specific causal mechanisms

in causal attribution. And finally, in the third example our student engages in

counterfactual thinking (thinking about what would have happened if a given

precedent circumstance had been different). Some authors (see, for example,

Mackie, 1974) have even claimed that counterfactuals form the definitional

basis of causation and, in fact, that is the case in some legal definitions of

causation and responsibility (Hart & HonoreÂ, 1959/1985). The most general

view in Psychology is however that causal knowledge drives counterfactual

thinking rather than the other way round (Pearl, 2000; Sloman & Lagnado,

2005). In accordance with this view, there have been recent proposals to use

counterfactual statements (instead of causal strength judgements) to assess

causal knowledge in naõÈve reasoners (Buehner & Cheng, 1997; Buehner, Cheng,

& Clifford, 2003).

In this work, we will focus on causal induction, under the assumption that

both causal attribution and counterfactual reasoning require applying the

knowledge previously acquired through causal induction processes. Studies on

causal attribution (see Ahn & Kalish, 2000; Kelley, 1967; Schultz, 1982; White,

1995, 2000; Wimer & Kelley, 1982, for milestone works on the issue), and

counterfactual reasoning (Byrne, 1997; German & Nichols, 2003; Roese, 1994;

Walsh & Byrne, 2004) are definitely necessary for understanding how people

think about causes and effects. However, our main interest is the psychological

and epistemological problem of how causal knowledge is acquired from non-

causal previous experience. The aim of the present work is trying to formulate a

general architecture of causal induction to translate noncausal empirical input

(mainly correlational information, combined with other sources of information,

as will be discussed later) into causal knowledge. A related yet different issue is

how and for what purpose that knowledge is subsequently used.

As illustrated by the previous example, causal induction requires in most

cases some understanding of causal mechanisms to have a causal hypothesis to

begin with, and, at the same time, covariation between the cause and the effect is

HUMAN CAUSAL INDUCTION 279

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

important to learn about new causal mechanisms. However, despite the inter-

dependence between the notions of covariation and mechanism, one of the two

types of knowledge has always been given epistemological priority in causal

reasoning theories (see Ahn & Baillenson, 1996; Cheng, 1993; Fugelsang &

Thompson, 2003). On the one hand, covariation-based theories generally neglect

the distinction between covariation and causation; on the other hand,

mechanism-based theories have difficulties at defining exactly what a

mechanism is and, most importantly, they do not specify how causal mechan-

isms are learnt in the first place. Some authors have claimed that causal

knowledge in some domains is innate (Keil, 1995; Leslie & Keeble, 1987;

Spelke, Breinlinger, Macomber, & Jacobson, 1992). However, even advocates

of this approach acknowledge that all knowledge on causal mechanisms cannot

be derived from that innate knowledge.

In recent years a number of theorists (Glymour, 1998; Glymour, Scheines,

Spirtes, & Kelly, 1987; Pearl, 1988, 2000) have abandoned this fruitless

dichotomy and have tried to develop algorithmic tools to infer causal structures

in the absence of previous knowledge from covariational evidence with a

minimum set of assumptions. The assumption that differentiates this approach

from covariational theories is rather simple: There are things in the world as

causes and effects, linked by causal powers, although those powers are not

directly accessible to our senses (Cheng, 1997). This assumption does not either

presuppose or require any specific knowledge about any particular causal

mechanism, as mechanism-based theories propose; it just specifies that causal

powers exist and need to be unveiled.

Dependencies among variables are represented in this framework by means

of graphs containing two types of elements: (1) a finite set of nodes standing for

a set of variables whose possible values correspond to particular states of those

variables in the world, and (2) a number of directional links (edges) representing

dependency relationships among those variables. If every edge between each

two nodes in a graph is directional (that is, if it is directed from one node to

another), and there are no closed loops in the graph (that is, there are no chains

of edges starting and finishing at the same node), the graph is called a directed

acyclic graph (DAG). The notation used to refer to the nodes in a DAG is that of

kinship relations. All the nodes in any chain finishing at a generic node A are the

ancestors of A, and all the nodes in any chain departing from A are descendants

of A. Plus, the nodes linked to A by means of a single edge pointing to A are the

parents of A, and the nodes A points to by means of a single edge are children of

A.

The intuitive power of these graphs relies on their capacity to represent causal

systems. In a causal DAG, edges represent direct causal links. Additionally, the

so-called Markov condition specifies that the joint probability distribution

describing the statistical relationships among the variables in the structure can

be factorised in such a way that the value of each variable is independent of all


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

of its nondescendants, conditional on its parents. Therefore, given an exhaustive

set of n of variables (X1, X2, X3 . . ., Xn) in a DAG, the Markov condition can be

expressed as follows:

P(X1, X2, X3 . . ., Xn) = PP[Xi|parents(Xi)] (1)

where I = 1, 2, 3, . . . n.

P(X1, X2, X3 . . ., Xn) represents the joint probability distribution of the

variables in the structure, and P[Xi|parents(Xi)] represents the probability dis-

tribution of each variable Xi conditional on its parents. If the DAG stands for a

causal structure, the Markov condition follows from assuming that the value of a

variable is exclusively determined by the value of its direct causes (parents).

Each causal DAG or causal Bayes' net generates a pattern of statistical

dependencies between the variables in the structure. Let us imagine a causal

structure as the one depicted in the left panel of Figure 1 (common cause model).

With a certain parameterisation, and assuming that A, B, and C are discrete

events, that model can indicate, for instance, that A increments the probability

with which B and C occur (for example smoking increments the probability of

heart disease and lung cancer). In that case A, B, and C are marginally

dependent of each other, but B and C become independent if conditioned on A.

In the causal chain model (right panel), A, B, and C are marginally dependent on

each other, and A and C become independent if conditioned on B. In more

technical terms, A screens off the relationship between B and C in the common

cause model, and B screens off the relationship between A and C in the causal

chain model (Reichenbach, 1956).

Therefore, if a causal structure is known, it is possible to predict the

dependency±independency pattern shown by the variables in the structure. A

more interesting question is whether the opposite operation is also viable.

The answer is to some degree positive. A causal Bayes' net produces a

pattern of dependencies±independencies, but, at the same time, that pattern can

arise from more than one causal Bayes' net. Bayes' nets that produce the same

pattern of probabilistic dependencies are called Markov equivalent. It can be

demonstrated that, if an exhaustive set of Markov equivalent nets have common

Figure 1. Models representing a causal fork (left panel) and a causal chain (right panel). A, B, and

C stand for observable variables, and arrows represent causal influence links among them.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

features, those common features (but not the differential features) can be

recovered from the observable pattern (Pearl, 2000).

The causal Bayes' nets framework has inspired recent psychological theories.

According to Gopnik et al. (2004), recovering a pattern of causal relations from

a pattern of statistical regularities is a problem (epistemologically) analogous to

the one the visual system needs to solve to infer the 3-D structure of the world

from the limited 2-D information from the retina. These authors maintain that

even very young children ``have a causal learning system . . . that recovers

causal facts by making implicit assumptions about the causal structure of the

environment and the relation between the environment and evidence'' (p. 4).

Causal reasoning theories inspired by Bayes' nets seem to make the strong

claim that humans have a unitary and innate system to recover causal structure

from patterns of statistical regularity. Moreover, in some cases it has also been

suggested that humans are able to apply holistic induction methods (e.g.,

Bayesian methods; see Tenenbaum & Griffiths, 2003). However, as we will try

to show later both strong claims are probably unrealistic. First, there are no

definitive proofs of the innateness and unity of the causal learning system. And

second, the approach of adults to causal reasoning tasks seems to be heuristic

rather than holistic (Gigerenzer & Todd, 1999).

Given the computational complexity of the operations they carry out, it is

extremely unlikely that humans use algorithms as the ones proposed in AI to

recover causal structure, but it can be sensibly assumed that humans apply

simple strategies over local clues to accomplish the same goals. In this context,

the causal Bayes' nets framework has been helpful to identify what those goals

are: first, to determine where causal links are, and second, to determine the exact

features of those links. Borrowing Danks' (2002) terminology (see also

Tenenbaum & Griffiths, 2001), we will refer to these two different aspects of

causal reasoning as causal structure and causal strength estimation,

respectively.

A BRIEF DIGRESSION ON THE NATURE OFCAUSES

Before analysing how humans achieve the two main computational goals

identified in the previous section we will briefly introduce here some con-

siderations on the nature of causes. This digression is relevant to understand our

approach to causal induction from a probabilistic approach.

In physical views of the world, causes are deterministic (except at the level of

subatomic particles). Let us imagine a circuit in which two switches are con-

nected serially to a bulb. In that simple scenario, if the two switches (S1 and S2)

are on, the light will be on; and if any of the switches is off, the light will be off.

The two causes are individually necessary and jointly sufficient to produce the

effect, and there is no room for indetermination. Imagine now that the same bulb


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

is also connected to a second circuitÐinvisible for an external observerÐthat

randomly switches the light on and off, independently of the state of the first

circuit. In this second case, neither S1 nor S2 are necessary for the effect to

occur, but they are still joint causes of that effect. The same case occurs in other

scenarios in which unknown variables feed into the variables included in the

causal scenario under scrutiny. For example, scientists have concluded that

smoking causes cancer in spite of the fact that many smokers do not suffer

cancer, and many cancer sufferers have never smoked. Thus, even if we accept

that causes are deterministic and necessity and sufficiency are necessary com-

ponents of the definition of a cause, causality can manifest itself under the form

of probabilistic regularity; what is more important, humans can infer causality

from that probabilistic evidence.

A different, yet related, issue is whether humans understand causality in

probabilistic terms. The meaning of causal statements has been scrutinised by

philosophers (see HarreÂ & Maden, 1975; Hart & HonoreÂ, 1959/1985; Suppes,

1970), but only very recently has entered psychological research. Psychological

theories on the meaning and representation of causal knowledge are especially

important to ascertain how evidence from different sources (for example verbal

learning and direct experience) is integrated, and how humans use causal

information to solve problems and make predictions and deductive inferences.

Preliminary evidence (Goldvarg & Johnson-Laird, 2001; see also Cummins,

1995, 1998) seems to show that humans interpret and represent causal clauses in

a nonprobabilistic manner, or, in other words, that the everyday meaning of

causality is not probabilistic. As anticipated by Hume (1739/1978) people seem

to treat chance as a case of a hidden cause.

Causal Bayes' nets are, in principle (in the absence of a specific para-

meterisation), blind to the deterministic or probabilistic nature of the regularities

on which they are based. The proposed algorithms use patterns of dependency/

independency among the variables in the scenario to specify, when possible,

what dependency relations are manifestations of an underlying causal link, given

the assumptions commented earlier. Similarly, when generating a causal

hypothesis from the available evidence in the absence of previous knowledge,

humans do not know in advance whether the relationship to assess is determi-

nistic or probabilistic, and must then be prepared to derive conclusions from

probabilistic evidence (Cheng, 1993).

The fact that causality manifests itself in a probabilistic manner implies that

accumulation of evidence is crucial for deriving strong conclusions from it,

which means that confidence in our own causal statements will increase as the

evidence confirming those statements grows. In some cases, however, reasoners

have previous knowledge of the nature of the relationships present in a certain

scenario, although they do not know the exact pattern of those relations. For

instance, if I try to set the alarm of my new watch, I will be confident that I have

reached a solution as soon as I corroborate that pressing one of the buttons given


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

a certain configuration of the other three produces a change in the alarm time

display. In principle, a single ``trial'' or a reduced number of them (showing a

certain pattern of dependencies/independencies) is sufficient to partially induce

part of a causal structure. Such fast learning is possible by virtue of my previous

knowledge on the functioning of electronic devices, in which, in normal con-

ditions, all relationships are deterministic. Interestingly, even young infants

seem capable of this sort of fast induction (Gopnik et al., 2004), and it has been

even proposed that in certain domains it is supported by innate domain-specific

knowledge (for example, in the field of mechanics).

However, inducing the existence of a certain rule from a single instance or a

small number of them has to do with the detection of regularities rather than

with causal induction itself. For the sake of parsimony, we will assume that

causal induction mechanisms process all regularities (either deterministic or

probabilistic) in the same way. We are aware that the bulk of research in causal

induction has focused on probabilistic learning preparations and evidence

accumulation, and evidence accrual and causal induction have remained

somewhat confounded (see Buehner et al., 2003, for a similar argument). Evi-

dence emerging from other paradigms is still sparse (see, for example, Cohen,

Rundell, Spellman, & Cashon, 1999), and this must be kept in mind when

considering the evidence relative to contingency (statistical regularity) as a clue

to causality in a later section of this work.

ACHIEVING THE COMPUTATIONAL GOALS OFCAUSAL REASONING: A REVIEW OF

EXPERIMENTAL DATA

The world is plenty of statistical regularities: Birds sing every day before the sun

rises, clouds appear in the sky before it starts raining, children are punished

when they misbehave, and so on; but only a subset of those statistical asso-

ciations are due to the existence of an underlying causal link. In addition, we can

bind several causal links involving common variables together, and generate a

causal structure. These two aspects of causal cognition (differentiating causal

relations from spurious covariations, and generating complex causal structures)

are related to the first computational goal identified in the Bayes' nets frame-

work: causal structure induction. This goal can be defined as building the

skeleton formed by the causal forces that determine the appearance of covari-

ations in a given scenario.

On the other hand, in order to fully understand what data can be expected from

a causal structure, it is necessary to establish the values of the parameters that

determine the relationship between each event and its causes. In general terms, we

can refer to this second goal as causal function induction, orÐin the simplest

case, when the two variables are binary and the probability of the effect is directly

or inversely related to the probability of the causeÐas causal strength estimation.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Causal structure inference and causal strength estimation are thus the two

main computational aspects of causal reasoning, and AI algorithms are very

effective tools to accomplish these goals. In constraint-based methods, such as

TETRAD (Scheines, Spirtes, Glymour, Meek, & Richardson, 1998), statistical

dependencies are used to construct, step-by-step, the several Markov-equivalent

graphs compatible with the data. In Bayesian methods (Steyvers, Tenenbaum,

Wagenmakers, & Blum, 2003; Tenenbaum & Griffiths, 2001), on the other

hand, all possible graphs comprising the observed set of variables (G1, G2, G3,

. . .. Gn; I = 1, 2, 3, ... n) are considered simultaneously, and are assigned prior

probabilities, P(Gi), according to previous beliefs. The prior likelihood of the

observed data pattern given each graph, P(D/Gi) is then computed, and, finally,

the posterior likelihood of each graph given the observed data, P(Gi/D)Ðwhich

is initially unknownÐis obtained according to the Bayes rule. Given a sufficient

number of trials, constraint-based methods and Bayesian methods yield very

similar results, as they both tend to select the set of Markov equivalent graphs

compatible with the observed data.

These methods have three properties in common: They (1) are information

optimisers, in the sense that they extract all the possible implications from the

available (statistical) information, (2) are intended to recover the causal structure

as a whole, and (3), estimate the precise parameters determining the shape of

causal functions a posteriori, when at least part of the causal structure is already

known. In the following section we will try to show that human reasoning

strategies differ from those algorithms precisely in these three aspects. Instead of

optimising the usefulness of available information, humans probably make use

of local clues to make individual decisions on whether certain observed corre-

lations are due to a causal link or not, and then test their hypotheses under

conditions they interpret as adequate to do so. SubsequentlyÐif necessaryÐ

they piece several links together to construct a complex causal model of the

situation. Finally, causal strength estimation is temporally intertwined with and

tightly linked to causal structure inference. The final result, however, must not

be very different to the one produced by artificial algorithms: A causal orga-

nisation of events that mostly coincides with their actual causal structure.

CLUES TO CAUSAL STRUCTURE1

Strictly speaking, causal structure inference per se has been the focus of very

limited attention in psychological research. Only a few empirical works have

tried to identify the conditions under which causal links are singled out from the

broader set of verifiable covariations (see Gopnik et al., 2004; Lagnado &

1The organisation of information in this section has been borrowed from Sloman and Lagnado

(2004). Its contents, however, have been comprehensively expanded in order to include new

empirical evidence.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Sloman, 2004; Schulz & Gopnik, 2004; Sloman & Lagnado, 2004; Steyvers et

al., 2003). In most other cases only a small set of candidate causes in a pre-

established structure are provided for evaluation. Next, we will present a brief

review of the different sources of information that have been proved to be used

by naõÈve reasoners to make decisions on causal structure.

Covariation and control

As noted above, discovering the existence of a previously unknown causal link

between two events implies realising that those events covary. The most usual

statistic used to measure the degree of covariation between two discrete events is

contingency (DP), which is defined as the difference between the probability of

the effect given the cause and the probability of the effect in the absence of the

cause:

DP = P(e/c) 7 P(e/~c). (1)

If the task is divided into discrete trials, P(e/c) and P(e/~c) can be computed

from the frequencies of the four trial types resulting from combining the pre-

sence/absence of the two events.

DP � a

a� bÿ c

c� d�2�

where a stands for the frequency of the trials in which both the cause and the

effect are present, b for that of the trials in which the cause is present and the

effect is absent, c for that of the trials in which the cause is absent and the effect

is present, and d for that of the trials in which both the cause and the effect are

absent.

As we will discuss later, when people are asked to judge the degree to which

a cause and an effect are causally related in this type of task, their estimates are

highly correlated with DP (see Allan, 1993; De Houwer & Beckers, 2002; Lober

& Shanks, 2000; Wasserman, Kao, Van Hamme, Katagiri, & Young, 1996).

Once a reasoner has observed that two events covary, and in order to decide

whether or not that covariation is indicative of a causal link, it is crucial to assess

the circumstances under which that covariation has been computed, and consider

the presence of alternative potential causes.

Covariation can be explained by the existence of a common ancestor of the

two events. However, such spurious covariations vanish when covariation is

computed in a context in which the common cause is held constant. For

example, if smoking is the only common cause of lung cancer and high-blood

pressure, the marginal covariation between lung disease and high-blood pressure

will disappear if it is computed exclusively over a sample of smokers, or over a

sample of nonsmokers. Consequently, covariation between a putative cause and


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

an effect can be indicative of a causal link only if all other potential causes of the

effect are controlled.

The Probabilistic Contrast Model (Cheng & Holyoak, 1995; Cheng &

Novick, 1992) maintains that people estimate causal strength by computing

contingency in a set of instances in which all other known potential causes of the

effect are held constant (focal set). In accordance with this proposal, there is an

impressive corpus of data that shows that naõÈve reasoners compute covariation

conditionally when asked to draw causal conclusion from statistical information,

and are more confident on judgements made under controlled circumstances. In

other words, people actually apply the principle of conditionalisation (see

Gopnik, Sobel, Schulz, & Glymour, 2001; Spellman, 1996, for examples), which

is convincingly demonstrated by the extensive literature on cue interaction

effects in causal learning.2 Extensive reviews of the importance of cue inter-

action effects and their relation to normative standards can be found in Cheng

(1997) and Shanks (1995).

Control, however, does not seem to be the only inferential normative prin-

ciple people apply. Even when covariation is computed under controlled con-

ditions, there are situations in which reasoners do not take covariation as an

index of causality. For instance, as shown by Wu and Cheng (1999), when

contingency is computed over the presence of a deterministic cause of the effect

(and therefore DP = 171 = 0), the perceived causal status of a candidate

generative cause remains undetermined. Let us imagine that a reasoner wants to

find out whether a fertiliser is effective at making the plants in a greenhouse

bloom. All plants have been fertilised with substance A, which is a constant

cause in the background, and half the plants are also fertilised with substance B,

which is the candidate cause. An indetermination case would occur if all the

plants in the greenhouse bloom (a situation equivalent to the A+, AB+ design

present in most blocking experiments). As the effect of the causes in the

background is maximal (all plants bloom), the candidate cause (B) does not have

room to show whether is has any effect itself or not. Importantly, sensitivity to

maximality has been reported in a number of human studies (see De Houwer,

Beckers, & Glautier, 2002).

In summary, a number of studies have convincingly shown that humans make

use of covariation to identify the possible presence of causal links, and are

sensitive at least to two principles of causal inference: conditionalisation and

ceiling effect avoidance. It is still a matter of discussion whether sensitivity to

2Cue interaction or cue competition effects refer to experimental preparations in which the

accrual of evidence about the relationship between a given predictor or candidate cause (A) and an

outcome or effect (+) is hindered by the presence of a known predictor (B). For example, in blocking

experiments participants are shown a number of B+ trials. Subsequently, they are presented with a

series of AB+ trials. The usual result is that the predictive or causal value of A is diminished by the

presence of B, which has been previously learnt to be a reliable predictor of +.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

the principle of conditionalisation is a controlled, purposeful strategy, as

maintained by the inferential approach3 (De Houwer & Beckers, 2003; Wald-

mann & Hagmayer, 2001), or the consequence of a competitive process in an

associative device (Allan, 1993; Dickinson & Burke, 1996). On the contrary,

sensitivity to ceiling effect cannot be interpreted in mere associative terms, and

is claimed to be a strong proof in favour of an inferential approach.

Intervention

As noted above, most results described in causal learning literature arise from

experiments in which participants passively receive the relevant information

about covariations among the critical variables in a very restricted scenario and

are asked to make a judgement on the strength of a single target relation.

Moreover, the basic structure of the causal scenario is often provided by means

of the task instructions.

Unfortunately, these situations are inadequate to test how people discover

brand new causal links from empirical evidence (Glymour, 1998; Newsome,

2003). In order to ascertain how people unveil the causal structure underlying a

pattern of covariations, a more realistic experimental preparation is required, in

which a variety of events or variables are present, no causal roles are pre-

assigned to some of those variables, and people can interact with the system and

receive feedback about the effect of their interventions.

Some previous data arising from a variety of experimental paradigms show

that intervention is a very powerful tool for learning, and that humans could be

especially well prepared to learn from the consequences of their acts. For

example, Haggard, Clark, and Kalogeras (2002) demonstrated that learners

perceive the sensory consequences of their voluntary movements as occurring

earlier than they actually did, which was interpreted as a proof that the central

nervous system applies a specific neural mechanism to produce intentional

binding of actions and their effects in conscious awareness. In addition, some

studies (Chatlosh, Neunaber, & Wasserman, 1985; Wasserman, Chatlosh, &

Neunaber, 1983) have shown that contingency estimation is more accurate in

free-operant paradigms than in observational situations.

Lagnado and Sloman (2004; see also Sloman & Lagnado, 2004) have shown

that people can actually learn a causal structure more efficiently through active

intervention than through passive observation (a similar pattern of results has

3 The inferential approach maintains that conditionalisation and cue interaction results from the

application of reasoning rules based on general-purpose normative knowledge. The application of

such rules is supposed to be effortful and demand general executive resources. However, a detailed

algorithmic description of this approach has not yet been provided. In this context, specific pre-

dictions can be derived by using the causal Bayes' nets approach, although it must be taken into

account that such approach does not take cognitive boundary limitations into approach (algorithms

for causal inference in the causal Bayes' nets approach are information usefulness optimisers).


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

been reported by Steyvers et al., 2003). In Lagnado and Sloman's Experiment 1

participants were asked to describe the causal structure underlying the pattern of

covariation shown by two cues (temperature and pressure, both of which could

take the values high and low) and an outcome (the launch of a rocket, which

could take the values present and absent). The correct structure underlying the

observed covariations was Temperature ! Pressure ! Launch (or other

equivalent). In the observational condition the participants just observed several

instances resulting from that structure. In the interventional condition, on the

contrary, people were allowed to set the value of either the variable temperature

or the variable pressure and then see what happened with the value of the other

variables. As expected, results showed that interveners chose the correct model

more often than observers.

The advantageous effect of intervention seems to be a very robust one. But, in

addition, these experiments provide some clues to ascertain where that advan-

tage arises from. First, intervention actually alters the causal structure that the

observed pattern of statistical dependencies stems from. In graphical terms, in

the moment in which a variable depends on an external manipulation, it is

necessary (1) to introduce an external influence in the causal DAG (the ``do''

operator, in Pearl's, 2000, terms), and (2) to disconnect the manipulated variable

from its parents (graph surgery). This modification ensures control (as defined in

the previous section) and, therefore, if a statistical dependency between two

variables in the system disappears after the intervention, that dependency must

be interpreted as spurious or mediated by the manipulated variable. These

informational differences could allow the reasoner to discriminate between

Markov equivalent structures.

Nevertheless, informational differences between observational and inter-

ventional situationsÐas captured by the ``do'' operator and graph surgeryÐare

not the only explanation of the facilitating effect of intervention. Lagnado and

Sloman's (2004) Experiment 2 demonstrated that when the observational group

was equated (yoked) to the interventional group with respect to the amount and

type of information received, a certain (nonsignificant) improvement in the

observational group was detected but, still, the interveners' performance was

significantly better than that of the observers. Moreover, the advantage of

intervention did not disappear when interventions were forced instead of free

(Exp. 3).

In summary, using intervention implies an advantage for causal structure

inference in three senses, compared to mere observation: First, it triggers some

biological preparation to bind one's responses to the outcomes consistently

occurring after them (Haggard et al., 2002); second, it ensures control of

potential causes alternative to the candidate one; and, third, generates temporal

order clues that are not always present in mere observations. In this last sense,

the time heuristic hypothesis maintains that one of the advantages of intervention

relies on the fact that ``whenever an intervener makes an intervention they


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

experience the putative cause before any of its possible effects'' (Lagnado &

Sloman, 2004, p. 47). However, other factors also mediating the advantage of

intervention (i.e., attention focusing and intentional hypothesis testing) are not

discarded either.

A final issue related to the role of intervention in causal reasoning is how

interventions are selected. In a given scenario, the number of possible manip-

ulations is at least as large as the number of variables in the scenario; so, how is

an intervention selected among the many possible ones? Steyvers et al. (2003,

Exp. 2) showed that in a situation in which people did not receive any infor-

mation beyond the pattern of statistical dependencies±independencies among

three multilevel variables, the probability of selecting the correct structure was

.18 (an optimal Bayesian learner would reach a hit rate of .50). That probability

subsequently rose to .33 when participants were allowed to make a single

intervention (the optimal hit rate was 1.0 in this case). Most importantly, when

location of the preferred intervention was analysed, it was observed that inter-

vention was strongly guided by the model hypothesised by the reasoner in the

prior observational phase, and that the variables that were most likely to be

intervened on were those that had been previously identified as causes. In

general terms, people were more likely to intervene on the common cause in

common cause models (A B ! C), on one of the two possible causes in

common effect models (A ! B C), in one of the two linked causes in chain

models (A! B! C), and in the single cause in single-link models (A/B! C).

This result can be interpreted as stemming from naõÈve reasoners' general ten-

dency to test hypotheses on individual causal links on that structure, rather than

on complete causal structures.

In conclusion, the evidence described in this section provides a clear picture

of the role of intervention in intuitive causal structure learning. If any outcome is

observed to consistently correlate with any changes introduced by the reasoner

in the value of some variable, that outcome is almost immediately interpreted as

a consequence of the manipulation. In consequence, manipulation is the pre-

ferred strategy to confirm or disconfirm hypotheses formulated on the basis of

mere observation. In addition, those hypotheses are located at the level of

individual links and determine where interventions are more likely to be made.

Time and order

In general terms, effects closely follow their causes. Consequently, also gen-

erally speaking, anything occurring before A or long after A is unlikely to be an

effect of A; hence, temporal order and temporal contiguity can be used as

powerful heuristics to make a decision on the viability of a causal link. This

general principle, however, has important exceptions. Both the temporal order

and the contiguity principles are violated in many daily life situations. For

example, the relationship between symptoms and diseases is learnt backwards:


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Learners have access to information on the symptoms before a diagnosis is

provided, but they still assign their correct causal roles to the cause (the disease)

and the effects (the symptoms). These effect-to-cause learning situations are

referred to as diagnostic tasks, whereas standard, cause±effect situations are

called predictive tasks (Waldmann & Holyoak, 1992). In general terms, pre-

dictive inferences are easier and more accurate than diagnostic ones (Gigerenzer

& Hoffrage, 1995), but, still, there are many daily life situations (and also

experimental ones; see Waldmann and colleagues' works) in which reasoners

learn causal relations backwards.

As detailed earlier, people compute causal strength by controlling potential

alternative causes. Therefore, if a causal scenario is composed of two temporally

preceding cues and an outcome, the requirement to hold one of the cues constant

in order to compute covariation conditionally depends on the causal role

assigned to the cues and the outcome. If the cues are interpreted as potential

causes and the outcome as an effect, one of the cues needs to be controlled while

the potential influence of the other one is assessed, and cue interaction will arise.

On the contrary, if the two cues are interpreted as possible effects of the out-

come, there is no need to compute covariation conditionally (and no cue

interaction effect will occur).

Several studies have shown that cue interaction is sensitive to causal direc-

tionality (Waldmann, 2000, 2001; Waldmann & Hagmayer, 2001; Waldmann &

Holyoak, 1992, 1997), whereas others have failed to show that dependency

effect, and have reported the existence of cue interaction between temporally

precedent cues, independently of the causal role assigned to them (Cobos,

LoÂpez, Cano, Almaraz, & Shanks, 2002; Shanks & LoÂpez, 1996). Recent work

has pointed out the importance of processing effort at explaining this contra-

dictory pattern of results. According to this hypothesis, replicating the depen-

dency of cue interaction on causal directionality requires reasoners to be

carefully instructed to build a mental model of the task (usually, either a

common-cause model, or a common-effect model; Waldmann & Martignon,

1998). Seemingly, the elaboration and application of a causal model consume

executive resources, and therefore, sensitivity of causal interaction to abstract

causal knowledge will depend on the availability of such resources (De Houwer

& Beckers, 2003; Waldmann & Hagmayer, 2001).

It is important to note, however, that cue interaction effects can occur even in

situations in which the complexity of the task makes the application of an

abstract model almost unviable (Aitken, Larkin, & Dickinson, 2000, 2001;

Dickinson & Burke, 1996; Larkin, Aitken, & Dickinson, 1998), which indicates

that causal interaction effects themselves do not necessarily depend on top-down

influences (see, however, De Houwer, 2002). The claim that knowledge-based

processes superimpose but do not completely disable basic, time-sensitive

learning mechanisms is in accordance with data from mediated learning para-

digms (Perales, Catena, & Maldonado, 2004).


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

In summary, experimental evidence seems to show that basic learning

mechanisms incorporate temporal order as an important clue to intuitively

represent causal structure. Abstract knowledge can be used to override the effect

of temporal order, and to impose a knowledge-based structure over the observed

data, but with a cost in terms of processing resources.

The same principle can be applied to the effect of temporal contiguity.

Relationships between causes and delayed effects are more difficult to learn than

temporally contiguous relations, both for humans (see Shanks & Dickinson,

1991) and animals (see Damianopoulos, 1967; Jones, 1962). This is consistent

with the idea that, in order to get associated, two stimuli need to be simul-

taneously active in working memory, but representations in working memory

tend to decay with time (Terry & Wagner, 1975).

However, humans are aware that effects can show up long after the cause was

present. In these cases, the connection between the cause and the effect is a long

chain of contiguously connected events, but that chain remains hidden for the

observer. Probably, in the absence of previous knowledge on that mechanism

humans would be unable to learn those relationships, unless the relevant data

were artificially brought together.

Nonetheless, humans normally possess a rich background of causal know-

ledge, which, combined with episodic retrieval processes, allows them to search

backward from the occurrence of the effect to the potential cause that generated

it. Buehner and May (2003) have demonstrated that the effect of delay on causal

estimates is mediated by previous assumptions on the mechanism that accounts

for the link between the putative cause and the effect. The detrimental influence

of delay on learning was clearly attenuated by instructions that encouraged

people to interpret the cause as having delayed effects. In a similar vein, Hag-

mayer and Waldmann (2002) showed that the temporal assumptions held by the

participants in their experiments strongly guided the choice of appropriate

statistical indicators of causality by structuring the event stream, by selecting the

potential causes among a set of competing candidates, and by influencing the

level of aggregation of events.

In summary, temporal order and contiguity are important clues to structure.

First, the direction of some causal links in a Bayesian net cannot be ascer-

tained on the basis of statistical evidence alone. Temporal order can be used

to disambiguate such relations, and as preliminary evidence seems to show,

can even mislead people to ignore more informative statistical clues. Simi-

larly, order can also be useful to define a focal set to compute the degree of

covariation between a cue and an outcome conditionally. As many experi-

ments have shown, interaction between effects often occurs when effects pre-

cede their causes during learning. Under constrained circumstances, however,

humans can apply knowledge-based constraints to override the effect of tem-

poral order, and impose a different structure over the available covariational

data.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Second, contiguity can determine which candidates are assessed when linking

causes to effects. Again, however, the effect of contiguity in putative causes'

selection depends on whether or not the reasoner has previous knowledge on the

causal mechanism leading to the observed effect. The application of such

knowledge requires previous learning, and the involvement of episodic retrieval

processes to bring to awareness potential causes that appeared long before the

effect. In the absence of such previous knowledge, or when episodic retrieval

processes are not fully developed (for example, children younger than 3±5 years;

Perner, 2001) the effects of contiguity are expected to emerge more straight-

forwardly.

Pre-stored knowledge

Causal reasoning theories have focused on specifying how causal knowledge is

acquired from data, without the involvement of previous knowledge. Despite the

fact that in most daily life causal reasoning tasks reasoners have at least some

knowledge about the crucial events, theories have paid much less attention to the

processes responsible for integrating new evidence and previous beliefs.

On the one hand, humans have general causal knowledge that comprises

intuitive and training-based normative principles of causal induction. We will

refer to this kind of knowledge as general-purpose inductive principles. For

instance, experimental psychologists have declarative knowledge about the

conditions necessary to experimentally test causal hypotheses. Similarly, naõÈve

reasoners also know and apply some of these induction principles in daily life

causal reasoning. Although these principles are not necessarily explicit, their

pervasive application across the individual's learning history should lead to the

progressive accrual of declarative knowledge about them.

On the other hand, reasoners have also specific knowledge on mechanisms

linking specific events or categories of events. For example, we discard the

changes in the dial of a barometer as causing weather changes because we know

that physical phenomena in the scale of the barometer do not interact with events

in the scale of weather changes. In this case, a decision on the plausibility of a

causal mechanism is made on the basis of previous knowledge about the lack of

a link between very broad categories of objects. Part of this knowledge can be

innate, or, at least, biologically facilitated, whereas links between specific events

is necessarily strongly based on the individual's learning history.

In any case, the difference between narrowly and broadly defined events is

quantitative rather than qualitative. Individuals' bases of causal knowledge are

formed by vast nets of propositions linking events defined at different levels of

abstraction, and, consequently, one can hold a certain belief at a certain level

(for instance, ``smoking causes lung cancer''), and a different belief at a dif-

ferent level (``smoking cigars does not cause lung cancer''). In accordance with

this idea, the coherence hypothesis (Cheng & Lien, 1995; Lien & Cheng, 2000)


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

maintains that even when there is no causal knowledge about the connection

between two specific single events, a structural decision can be made on the

basis of whether or not the two events belong to categories that are causally

connected. In addition, Cheng and Lien demonstrated that causal categories

during the acquisition of causal links tend to be defined at a level of abstraction

that maximises the level of covariation between the cause and the effect. If

causal categories are defined too broadly (``inhaling fumes causes lung cancer'')

or too narrowly (``smoking Fortuna cigarettes causes lung cancer'') during

learning, that learning process can lead to wrong conclusions. For instance, in

the overinclusive case, a learner could conclude that inhaling eucalyptus vapours

causes cancer, whereas in the underinclusive case, a learner could conclude that

smoking Camel cigarettes does not cause cancer.

The coherence hypothesis thus maintains that causal categories' learning and

covariation learning are engaged in an incremental loop of knowledge

acquisition. First, covariation determines the level of inclusiveness of causal

categories; second, causal categories modulate the acquisition of new causal

knowledge from covariation between specific events; and third, causal know-

ledge acquired this way contributes to redefine the limits of causal categories.

Importantly, a ``mechanism'' and a specific link are not necessarily two dif-

ferent types of representation, but just two propositions of essentially the same

nature binding concepts in different abstraction levels. However, the integration

of new evidence with previous knowledge is not simply additive. Instead, pre-

vious knowledge about links between categories of events is used to determine

whether an observed covariation can be interpreted as causal or not.

From an opposite perspective, Fugelsang and Thompson (2003) have tried to

demonstrate that causal beliefs based on knowledge on mechanisms, and causal

beliefs based on covariation are dissociable. In their first experiment, they

generated two supposedly different types of beliefs in a group of participants.

Reasoners in the covariation-based belief condition were instructed to hold a

strong or a weak belief, exclusively based on the information received about the

previously observed level of covariation between the two target events. In the

mechanism-based belief condition, on the contrary, participants were informed

about the plausibility or implausibility of a causal mechanism. The two key

variables were thus belief modality (covariation based, mechanism based) and

belief level (low, high). Subsequently, reasoners were exposed to two levels of

contingency (.9, .1) between the two target events. Mechanism implausibility (in

the mechanism-based, low-belief condition) was expected to preclude the

integration of new covariational evidence with previous beliefs. The expected

interaction, however, did not reach the significance level, and new covariational

information had a significant effect on causal judgements in all conditions. In

two subsequent experiments belief level, but not belief modality, was manipu-

lated, and results showed that low previous beliefs diminished (but not com-

pletely precluded) the effect of new covariational evidence.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Fugelsang and Thompson's (2003) experiments thus failed to show the

hypothesised dissociability between mechanism- and covariation-based causal

beliefs. Quite the opposite, they showed a very high degree of interchangeability

and integration between previous causal beliefs and new covariational evidence.

Results are then more compatible with the idea that causal beliefs, at different

levels of abstraction, are represented in a common base of declarative know-

ledge. When a new covariation is found, the two covarying events (a and b) are

immediately classified as members of previously known causal categories (A

and B). If there is not any level at which A and B are plausibly related, the

covariation will be interpreted as spurious. However, if A and B are possibly

related at some level, any evidence confirming that relation will be incorporated

into previous beliefs at that level (Busemeyer, 1991), and any new evidence

disconfirming that relation will contribute to refine the scope and definition of

categories.

In summary, knowledge on causal categories can determine a priori what is

and what is not likely to be a cause. In any context, there is an almost infinite

number of close and distant potential causes for an observed effect, and that

knowledge is crucial to select a manageable set of candidates. In other words,

plausibility knowledge restricts the set of candidates for which covariation is

considered (White, 1995). In doing so, not only does it single a candidate cause

out, but it also determines the alternative causal factors that will enter the focal set

when computing covariation conditionally. In that sense, not only are general-

purpose inductive principles and domain-specific causal knowledge compatible,

but also interact and depend on each other to make adaptive behaviour possible.

Summary

As posed by Cheng (1997, 2000) humans know that causal powers are intrinsic

properties of effective causes. In other words, people need to assume that

causality exists in the world in order to make sense of it. However, the key

question is not whether people believe in causation or not, but how people come

to know where causal relations are.

The Bayesian networks' approach has provided a tool for answering that

question: a set of algorithms that achieve a certain level of success at recovering

causal structures from statistical patterns of data. These algorithms do not make

use by default of time, order, intervention, or pre-stored knowledge; but only of

the pattern of statistical dependencies/independencies present in the data. The

enumerated clues can be further used to select one between several equally

predictive structures.

Algorithms developed in the field of Bayesian networks are information

optimisers, in the sense that they go as far as possible with the available

statistical information. Humans, however, do not seem to be as good as com-

puters at optimising the utility of the available statistical information, in part


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

because representing a whole pattern of dependenciesÐeven in very simple

causal scenariosÐis beyond any reasonable estimate of human memory and

attention capabilities. In fact, humans do not seem to be very good either at

maximising the utility of screening off information in cognitively manageable

situations (Lagnado & Sloman, 2002; see also Barsalou, Sloman, & Chaigneau,

2002).

Instead, reasoners seem to heavily rely on local clues to decide whether a

single link under assessment exists or not. Previous knowledge can modulate

their influence, by picking a covariation as causally plausible or discarding it as

implausible, determining what factors will be controlled in order to compute

contingency, replacing the order of appearance of the cause and the effect in

diagnostic tasks by their true causal order, and diminishing the relevance of

contiguity when evaluating the relationship between distant elements in a causal

chain. In case there is no relevant previous knowledge, decisions on causal

structure will rely more heavily on evidential clues. These clues are only

imperfect indicators of causation and can lead to the formation of erroneous

causal concepts in some cases but they ensure a certain level of concordance

between the organism's behaviour and the causal structure of the environment.

ESTIMATING CAUSAL STRENGTH

The Power PC theory

A causal Bayes' net is the skeleton of a causal scenario, but, in order to get the

whole picture, it is necessary to assign to every link the parameters that deter-

mine the probability distribution of each child (effect) conditional on the set of

possible values of its parents (causes).

Although the variables in the model can be either discrete or continuous,

learning of causal functions between continuous and multilevel variables has

been the focus of very limited attention (Busemeyer, Byun, Delosh, &

McDaniel, 1997; White, 2001). More often, empirical works have tried to

ascertain how people compute the degree to which two binary variables repre-

senting a putative cause and a possible effect (almost always, two discrete events

that can be present or absent) are causally related, in the presence or absence of

alternative causes.

Let us imagine two causes (A and B) pointing to a common effect (C). These

three events can be related in different ways. For example, A and B could be two

serial switches in a circuit, and C a bulb connected to that circuit, in which case

both A and B must be on for C to be on; or they could be two parallel switches,

in which case either A or B must be on for C to be on. In other cases, the effects

of A and B could be probabilistic and additive. For instance, it could be assumed

that the different causes of lung cancer simply add to each other to produce the

effect (the parallel circuit is indeed an example of an additive deterministic

model).


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

The additive model is the simplest way to represent the influence of two

discrete probabilistic causes on an effect. According to the Power PC theory

(Cheng, 1997), in the absence of contradictory evidence, humans explain every

effect in the world as the result of the addition of its causes. Therefore, when

evidence is clearly contradictory with this model, interactive causal power is

computed (Cheng, 2000).

In terms of the Bayesian nets framework, the Power PC theory is a theory

about a specific parameterisation of causal structures, and about how humans

compute those parameters. Specifically, it assumes that when assessing the

causal power of a candidate cause (i), in the presence of unknown alternative

causes (a) of the same effect (e), naõÈve reasoners explain the occurrence of e in

the presence of i as resulting from the union of two probabilities: (1) the

probability of e being caused by i, and (2) the probability of e being caused by a

(the composite of unknown alternative causes in the background).

P(e/i) = pi + paP(a/i) 7 pipaP(a/i) (3)

where pa and pi are the probabilities with which a and i generates by themselves

the occurrence of e (their causal powers). Equivalently, the probability of e in

the absence of i is explained as:

P(e/~i) = paP(a/~i) (4)

applying probability calculus, and assuming that (1) i does not interact with a to

cause e, (2) i and a are independent (control condition), and (3) causal powers paand pi are constant properties of a and i, it follows that:

pi � DP

1ÿ P�e= � i� �5�

In other words, the causal power of a generative cause i, can be computed

from the observed contingency, DP, between the candidate cause and the effect.

An equivalent argument can be used to derive the power of a preventative cause

(qi), in which case it follows that:

qi � ÿ DP

P�e= � i� �6�

As a psychological theory, the Power PC theory assumes that naõÈve reasoners

apply the mental operations necessary to compute powers in natural environ-

ments. However, the theory does not specify how p and q (henceforth, generi-

cally p) are exactly computed. p describes the expected shape of the mapping

between the input provided to the reasoner (the frequencies of a, b, c, and d type


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

trials) and the output of the mechanism in charge of computing causal strength,

but not the algorithm that carries out that operation. The core claim of the theory

is that if the information necessary for computing power is adequately provided

and perceived, and if the probe question used to induce a causal estimate is

adequately understood, the judgement made by the reasoner should closely

conform to p.

Since the Power PC theory was first presented, a number of studies have tried

to test it, both on the basis of reanalyses of previous data (Allan, 2003) and new

empirical evidence (Lober & Shanks, 2000; Perales & Shanks, 2003; ValleÂe-

Tourangeau et al., 1998; White, 2003c). In all of these studies the statistical

relationship between a single candidate cause and an effect (the combination of

trial type frequencies) was manipulated over a constant background in order to

generate different conditions in which the Power PC theory's predictions dif-

fered from predictions of other models; judgements were elicited by using

standard causal judgements; and, finally, covariational information was pro-

vided in a trial-by-trial sequential format.

In four of these studies, the Power PC theory was disconfirmed. However,

Buehner et al. (2003; see also Buehner & Cheng, 1997) have claimed that

deviations from p in these studies can be explained by a combination of pro-

cedural factors. First, trial-by-trial tasks are highly demanding in terms of

attention and memory load, and can hinder the computation of conditional

probabilities P(e/c) and P(e/~c), necessary to compute power. Second, the

question probe with which the causal judgement is normally elicited in these

works is ambiguous with respect to the context to which the question itself

applies (the learning context, or an ideal context in which alternative causes

have been removed). And third, in some of these studies no measures were taken

to ensure that final judgements are at asymptote and reasoners are maximally

confident on them. Or, the other way round, in order to test the Power PC theory

it is required that (1) judgements are asymptotic and not conflated with con-

fidence, (2) the type of probe question used to elicit them has a clear normative

reference, and (3) the input necessary to compute powers is provided in a format

that allows an adequate comprehension and representation.

In most causal learning experiments people are asked to estimate just the

extent to what the candidate cause generates the effect (i.e., ``to what degree

does [i] causes [e]?''). According to Buehner et al. (2003) some people could

interpret that they are required to estimate the influence of the candidate cause in

the same context in which it was first observed (in which case DP would be the

normative reference to contrast judgements against), whereas some other people

could assume that the question is about the effect of the candidate cause in a

context in which alternative causes are absent (in which case p would be the

adequate reference).

The recommended judgement is worded as follows: ``Imagine a sample of 10

new [instances] in which we know for sure that [the effect] would not appear


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

unless [the candidate cause] is present. If [the candidate cause] were introduced,

how many of them would show [the effect]?'' Henceforth, we will refer to this

type of question as the causal counterfactual judgement. To date, however, this

judgement has been used very scarcely (Buehner et al., 2003; Perales, Shanks, &

Castro, 2005). In addition, in two experiments of Buehner et al.'s (2003) study

the standard trial-by-trial presentation format was replaced by a simultaneous

format, in which both the instances in which the cause was present, and those in

which it was absent were present at the same time on a single sheet. Other minor

modifications included stressing the fulfilment of the conditions for the analysis

underlying the derivation of p from contingency to apply (for instance, the

background equivalence of the cause±present and cause±absent instances in the

sample).

Alternatives to Power PC

Causal counterfactual questions are far from being universally accepted as the

best way to assess naõÈve reasoners' causal beliefs, and they have been used too

scarcely to draw reliable conclusions from them. In addition, Buehner et al.'s

(2003) results are still controversial (Perales et al., 2005). Therefore, the main

corpus of results on causal strength estimation still stems from those works in

which standard causal questions were used. Trial-by-trial presentations have

been used, for instance, in Buehner et al. (2003), Lober and Shanks (2000),

Perales and Shanks (2003), ValleÂe-Tourangeau et al. (1998), Wasserman et al.

(1996), White (2003b), and Anderson and Sheu (1995). Comprehensive studies

with summarised presentations are Levin, Wasserman, and Kao (1993), Mandel

and Lehman (1998), and White (2003b).

Independently of what model is favoured by the analysis of the results

from these studies, there is a series of trends that have been found to be con-

sistent enough to be taken into account by any theory that intends to be fully

explanatory. First, both DP manipulation across conditions in which p is held

constant, and p manipulation across conditions in which DP is held constant,

have a direct effect on causal judgements (Lober & Shanks, 2000; Perales &

Shanks, 2003). Second, the four trial types are attributed different subjective

weights (a > b > c > d), which has been shown both with direct methods of

assessment and estimating the effect of orthogonal manipulation on final

causal judgement (Levin et al., 1993; Mandel & Lehman, 1998; White,

2003a, 2003b, 2003c). Third, and consequently, information contributing to

P(e/c) is attributed more weight than information contributing to P(e/~c)

(Lober & Shanks, 2000; Mandel & Lehman, 1998; Perales & Shanks, 2003),

and judgements correlate with P(e) across conditions in which DP is 0. And

finally, several studies with trial-by-trial presentations (Wasserman et al.,

1996; White, 2003b) have reported an effect of P(c) across conditions in

which P(e/c) and P(e/~c) are held constant.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Two types of algorithmic models have been proposed to account for the way

in which covariational information is used for causal estimation (see Allan,

1993; De Houwer & Beckers, 2002; Shanks, Holyoak, & Medin, 1996). Rule-

based models are based on the idea that humans keep track of the frequencies of

the different types of trials or the conditional probabilities they are exposed to

during the task, and combine those probabilities or frequencies according to a

given rule or statistic. DP-like models assume that people contrast estimates of

P(e/c) and P(e/~c), in a weighted or unweighted manner, in order to compute the

degree of contingency between the cause and the effect (Cheng & Holyoak,

1995; Cheng & Novick, 1992). On the contrary, DD-like or linear combination

models assume that naõÈve reasoners compare evidence confirming the existence

of a generative causal link, or disconfirming the existence of a preventative

causal link (a and d type trials), against evidence disconfirming the existence of

a generative link, or confirming the existence of a preventative link (b and c type

trials), in a nonprobabilistic manner. These models (for instance, Catena, Mal-

donado, & Candido, 1998; White, 2003a, 2003b, 2003c) also assume that dif-

ferent trial types are given different evidential weights in confirming or

disconfirming a previous hypothesis. Weighting can arise both from inter-

individual factorsÐfor example, some people considering a certain trial type as

confirmatory whereas others consider it as disconfirmatory (White, 2003a)Ðand

intraindividual factorsÐfor example, people focusing more on some trial types

than others (Maldonado, Catena, Candido, & Garcia, 1999; Mandel & Lehman,

1998).

Associative models,4 on the other hand, assume that causal links are learnt in

the same way as other associations, by means of a mechanism that accumulates

associative strength in the link between the mental representations (nodes) of the

cue (or cause) and the outcome (or effect). The most widely known associative

model is the Rescorla±Wagner rule (Rescorla & Wagner, 1972):

DVi = ab (l 7 SVi ±1) (7)

According to this model, the increment in the associative strength between a

cue and an outcome (DVi), is a multiplicative function of the salience of the cue

and the outcome (a and b), and the difference between the maximum associative

strength an outcome can recruit (l, the asymptotic learning level) and the

associative strength recruited by all the cues present at the current trial (SVi ±1).

The increment in associative strength on a given trial is a function of the degree

4Although it has never been formally stated, most authors seem to assume that associative

mechanisms operate automatically, are mainly data driven, and are thus insensitive to top-down

influences. Contrarily, rule-based mechanisms are supposed to reflect the operation of purposeful

reasoning strategies, are demanding in terms of executive load, and are sensitive to top-down

influences.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

to which the outcome is unpredicted by the cues presented in that trial. a and bare assumed to take values between 0 and 1; l is 0 on those trials in which the

outcome is present and 1 on trials in which the outcome is absent. An absent cue

cannot recruit associative strength on trials in which it is absent, although other

cues present in those trials (i.e., the context) can, and these compete with the

target cue for associative strength (determined by Äl).If the Rescorla±Wagner rule is restricted in such a way that b always takes

the same value (independently of whether the outcome is present or absent in the

current trial), its predictions are equal to DP at asymptote (Chapman & Robbins,

1990). Therefore, the restricted version of Rescorla±Wagner can be rejected on

the same empirical basis as DP. The unrestricted version does not fit the pattern

of results either, as it requires a given parameter order (b Outcome< b NoOutcome) to

account for the positive effect of p on estimates when DP is held constant in

positive contingency tasks, and the opposite order of parameters (b Outcome >

NoOutcome) in negative contingency tasks. That change is justifiable in com-

parisons across experiments, but it is not when negative and positive con-

tingencies are included in the same experiment with the same cover story and

the same materials for all conditions (Lober & Shanks, 2000).

The Rescorla±Wagner model has been modified in several ways to eliminate

its inadequacies in explaining human and animal data (Miller, Barnet, & Gra-

hame, 1995; Siegel & Allan, 1996; Van-Hamme & Wasserman, 1994). How-

ever, among those modified algorithms the only one that can account for the

family of effects described above (with the exception of the cause-density effect)

is Pearce's (1987) model. This model is based on the assumption that any set of

cues presented during the task is represented as a single configuration, and

recruits associative strength as such, and adopts a slightly modified version of

the Rescorla±Wagner rule for associative strength updating (for a detailed

description of the model and its asymptotic derivations, see Perales & Shanks,

2003).

A recent meta-analysis of the most significant causal learning experiments

with sequential presentations (Perales et al., 2005) has shown that weighted DD-like rules are the most predictive ones among rule-based models, whereas

Pearce's is the best-fitting associative model. In addition, global goodness-of-fit

analyses showed that weighted DD is significantly more predictive than Pearce's

model. However, it seems more and more clear that rule-based mechanisms and

associative mechanisms could actually complement each other. We propose here

that associative- and rule-based mechanisms are just descriptions of different

levels of processing in the mechanism in charge of computing covariation.

In any case, it has not convincingly shown that people are able to intuitively

carry out the kind of normative analysis proposed by the Power PC theory.

Seemingly, covariation is computed in a non-normative yet adaptive way and

then incorporated into previously hypothesised causal structures. Covariation is

used to confirm or disconfirm the existence of a causal link in a given structure,


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

and to determine how and to what extent variables in that structure are related. It

is important to note, however, that the result of the application of weighted DDto covariation computation is in fact very close to normative prescriptions.

Although deviations from p are predicted in some specific situations, the degree

of correlation between judgements predicted by weighted DD and those

predicted by p across the 114 experimental conditions in Perales et al. (2005)

was r = .95.

SOME QUESTIONS ON THE DEVELOPMENT OFCAUSAL INDUCTION

NaõÈve reasoners are able to integrate covariational information with previous

knowledge to discover new causal links. That ability is made possible by con-

tent-specific knowledge about causal mechanism and the application of general-

purpose principles of causal induction, as discussed previously. Although

developmental issues are not the core of this article, we will try to formulate

some questions on the emergence of such knowledge that are relevant for the

understanding of causal induction in adults.

By the age of 4, children have already got a rich amount of causal knowledge

in the domains of biology, psychology, and physics (Spelke et al., 1992; Sperber,

Premack, & Premack, 1995). On the face of it, some authors have claimed that

children have innate substantive knowledge on causal domains (Keil, 1995), or

apply content-specific mechanisms for the accrual of knowledge about these

domains (Leslie & Keeble, 1987). However, this innatist approach is not sup-

posed to explain, however, how causal knowledge outside these domains is

acquired, and, most importantly, how causal knowledge is refined and integrated

inside them. Ultimately, children must also be able to use cross-domain causal

inductive principles to behave in a causally coherent way, independently of the

content of the task they are dealing with.

In any case, it is undeniable that humans are equipped with a primary

general-purpose learning system (either associative or computational; Gallistel,

2002) that allows them to capture at least part of the causal structure of the

environment. For example, connectionist models incorporating simple associa-

tive rules are powerful simple tools to partially recover both causal structure and

causal strength from covariation. Cheng (1997) has shown that, in task in which

potential causes' sets are nested, Rescorla±Wagner rule's predictions are

coincident at asymptote with conditional DP. As long as temporal order and

causal order match, computing contingency conditionally is consistent with the

principle of control, which in turn, is one of the ways in which conditional

dependencies/independencies can be used to induce or discard the existence of

causal links.

Cue interaction studies, both in human and animals (see Shanks, 1995), have

shown that the application of a simple learning mechanism can be rather


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

successful at selecting a potential cause among a number of candidates. Most

associative mechanisms use temporal order and contiguity to discriminate

between events that must be represented at the input level and events that must

be represented at the output level. However, some authors have recently shown

that basic learning mechanisms must be able to codify not only order and

statistical association, but also the precise timing of the events occurring in each

trial during the task (Kirkpatrick & Church, 1998; Savastano & Miller, 1998).

Therefore, independently of what particular model is favoured by current evi-

dence (see Mowrer & Klein, 2001, for a review), the kind of basic learning

mechanisms conditioning seems to depend on (both human and nonhuman

animals) is capable of successfully using order, timing, and cue competition

principles to identify potential causes in the environment.

In addition, the associative view is not completely incompatible with a

moderate innatist approach (and with content-related specificity). Even though

associative mechanisms are domain unspecific, the idea that there exist some

predisposition to learn more readily and effectively in some domains than in

others is accepted by most associationists. The biological preparedness

hypothesis (Seligman, 1971) states that some objects are, by virtue of species'

evolution, more likely to be grouped together and get associated to other cate-

gories of objects than others. For example, humans and nonhuman primates are

more likely to associate aversive experienced and vicarious unconditioned sti-

muli to pictures of snakes than to pictures of flowers or mushrooms (Oehman &

Mineka, 2003). Similarly, facilitated associations have also been discovered

between new flavours and delayed sickness feelings across a number of species

(taste aversion learning; see Schafe & Bernstein, 1996). As noted by an

anonymous reviewer, the preparedness hypothesis formulated in conditioning

literature is not far from the domain specific content approach defended by some

developmental psychologists (see, for example, Carey, 2001). The only addi-

tional specification that is necessary to make both approaches compatible is

assuming that ``content-specificity'' refers to categories of representations that

are more easily formed and linked among them than others, rather than to

essentially different and biologically separated heritable learning systems; or,

more simply stated, that content specificity feeds into a general-purpose learning

system. In tune with this idea, the canalisation hypothesis (Cummins, Cummins,

& Poirier, 2003; Dellarosa-Cummins & Cummins, 1999) maintains that rapid

learning from poor stimuli about some contents (which is taken as evidence for

advocating a modular-innatist approach to learning and reasoning development)

can be consequence of biasing learning/acquisition processes in ways that

favours the development of concepts and cognitive functions that proved

adaptive to an organism's ancestors. This implies that ``criticisms of innate

cognitive modules [in a strong sense] are not ipso facto criticisms of evolu-

tionary explanations of cognitive capacities'' (p. B38, clause between brackets

added). From this perspective, the existence of innate concepts (representational


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

nativism) and innate computational models is not strongly denied, but it is not

considered necessary for explaining the available data either.

In any case, basic mechanisms for the detection of statistical regularities or

associationsÐeven constrained by facilitating content-related mechanismsÐ

cannot comprehend the rich complexity of causal reasoning. There is extensive

evidence that even young children can discriminate between true causes and

mere predictors by virtue of the application of induction operations. In principle,

what differentiates causal relations from correlations is the knowledge that in

causal relations there must be some sort of transmission from cause to effect.

The idea that causal power or causal transmission exists in the world and needs

to be unveiled is the core principle of causal induction, and the starting point for

the causal Bayes' nets approach and the Power PC theory (Cheng, 1997; Pearl,

2000). However, it is not that clear where that essential principleÐby definition

lacking of a clear sensorial inputÐemerges from (nonetheless, it is important to

note here that that knowledge is not referred to any particular causal mechanism

or category of objects, but to the more abstract or ``Kantian'' interpretation of

the world as causal).

The first possibility is that causal power is directly ``perceived'' in some

specific domains when some conditions are met, without the involvement of

reasoning or learning, and then generalised to other domains. In launching

experiments (Michotte, 1963), for instance, one shape moves towards another,

which moves upon contact, as in collision. In this situation, adults report they

see the first shape set the second in motion, although there is no real physical

interaction between the two shapes. Importantly, even 7-month-old infants are

sensitive to this type of illusion (Leslie & Keeble, 1987; Schlottmann, Allen,

Linderoth, & Hesketh, 2002).

It is still a matter of discussion whether this effect is dependent on an innate

perceptual mechanism or on experience. For some authors, perceptual causality

is innate and modular (Leslie & Keeble, 1987); for others, perceptual causality

meets the automaticity and encapsulation criteria for considering it modular, but

this module can develop across time (Scholl & Leslie, 1999; Scholl &

Tremoulet, 2000); a third group of authors denies both the innate and modular

nature of perceptual causality, and consider it an experience-dependent

phenomenon (Cohen & Oakes, 1993; Oakes, 1994; Rakison & Oakes, 2003);

and finally, according to a fourth group of authors several processes feed into the

subjective experience of perceptual causality, and some components can be

modular (Schlottmann, 2000).

Nevertheless, independently of the origin and modular or nonmodular nature

of perceptual causality, there are reasons to believe that it cannot be the only

source for acquiring the concept of power. First, because perceptual causality

can indeed interfere with that acquisition process (Schlottman, 1999) in situa-

tions in which the appearance of causality does not correspond to the actual

causal structure of the events in the current scenario. ``Eventually, children can


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

learn that perceptual causality can be illusory, and subjugate it to a mechanism-

based criterion for causality'' (2000, p. 441), which means that, logically, the

concept of causal power cannot emerge only from perceptual causality. And

second, because perceptual causality has been reported almost exclusively in the

domain of mechanical systems and no clear cognitive mechanism has been

proposed to account for its generalisation to other domains.

As mentioned in a previous section, a second possibility is the existence of a

``content-independent causal inference module'' in which the concept of causal

transmission is innately hard-wired. This position is consistent with the idea that

``computational adaptation can be content-general . . . yet still functionally

specialized'' (Duchaine, Cosmides, & Tooby, 2001); that is, that there exist

modules in the brain that operate over a multiplicity of contents, yet always

solve the structural problems at which they are specialised. The systems

underlying conditioning (Gallistel & Gibbon, 2000) and probabilistic reasoning

(Brase, Cosmides, & Tooby, 1997) have been cited as examples of such

modules.

It has been shown that 3-year-olds can already make use of conditional

dependencies±independencies revealed in a minimum set of trials to select or

discard causal candidates (Gopnik et al., 2001, 2004), and they can plan a

completely new intervention on a putative cause to produce or stop an effect

(thus showing that their behaviour involves true causal understanding; Schulz &

Gopnik, 2004), after merely observing a set of statistical dependencies±

independencies. Accordingly, Gopnik and her collaborators have proposed that

even very young children have a cognitive system that recovers causal facts by

making implicit assumptions about the causal structure of the environment and

the relations between the environment and evidence. The origin of such implicit

assumption is not clearly specified, but an analogy is drawn between the way the

causal learning system recovers the causal structure of the world from covari-

ation, and the way the visual system recovers tridimensionality from 2-D pro-

jections on the retinas (apparently implying that such system is modular). This

assertion is, however, difficult to reconcile with evidence emerging from adult

studies (De Houwer & Beckers, 2003; see also Waldmann & Hagmayer, 2001;

Waldmann & Martignon, 1998) showing that the application of general-purpose

causal induction principles is effortful and time consuming, and strongly

interferes with other concurrent tasks.

In the two previously described approaches, the origin of the general dis-

tinction between causal and noncausal links ultimately relies on innateness.

However, there is a third possibility that does not rely on (although it does not

deny) the existence of core innate knowledge. Piaget (1927/1969) proposed that

the appreciation of causal necessity begins with feeling the efficacy of our own

actions, which is subsequently projected onto external objects. This idea points

to intervention as a key element in causal cognition development, but, probably,

it also underestimates its importance. As formulated, the perception of efficiency


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

of our actions is similar to perceptual causality (we cannot help interpreting the

events occurring immediately after our actions as effects of them). Yet, inter-

vention could also play a central role in facilitating the acquisition of the concept

of ``cause'' and the general features of causes, as differentiated from the concept

of ``predictor'' (some primitive and intuitive version of what we call causal

power or causal transmission). As noted in a previous section, interventions

actually change the structure of a causal scenario (as represented by graph

surgery and the introduction of the ``do'' operator). In consequence, there are

predictive relations that survive intervention whereas others do not, and only the

latter signal the existence of a causal relationship. Intervention can thus be

regarded as the key to identify other features also manifested by causes, as the

clues to causal structure enumerated earlier (including the patterns of depen-

dence/independence normally associated to causes and effects that Gopnik and

her colleagues refer to in their recent works).

Thus, it can be assumed that in the early stages of cognitive development

associative mechanisms, more or less enhanced and constrained by innate

facilitatory mechanisms, play the role of signalling where interventions are

likely to be effective, and intervention allows the ascertainment of whether a

candidate is a cause or not, in a more or less reflexive way early in development

and more purposely later (we have seen earlier that adults' interventions are

indeed directed to those variables identified as possible causes in a previous

observational phase; Steyvers et al., 2003). Although the link between stimulus±

stimulus associative learning and intervention has never been properly studied,

even young children appear to show a general tendency to manipulate consistent

predictors of desirable outcomes. In fact, this remains a basic question to be

explored and consequently a potential source of new hypotheses for experi-

mentation. Our general prediction is that those associatively singled out events

``attract'' manipulative behaviour, and learning from the correlational patterns

emerging of such behaviour has a special status in causal cognition development

(maybe by virtue of canalisation).

In an evolutionary and developmental sense, manipulative efficiency is

related to a previous psychological function that assigns motivational and

emotional value to classically conditioned stimuli (Lovibond, 1983), in a way

that confers them reinforcing or aversive properties. However, there is a

qualitative developmental gap between pursuing the occurrence of desirable

events because of their conditioned hedonic value, and being able to pursue the

occurrence of those same events because they are expected to cause a desirable

effect. The latter ability cannot obey to simple conditioning andÐfrom our point

of viewÐmust be based on the identification of some of the consistent features

that are associated to efficient causes.

In summary, we propose that the most basic principle of causal reasoningÐ

the distinction between mere predictiveness or co-occurrence and causal

powerÐcan emerge gradually. If this presumption is right, manipulation must


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

play a central role in such process. The acquisition of the concept of causal

power can be understood as the progressive abstraction of the differences

between predictive events and actively intervened events. Later on, general

principles of causal induction and general features of effective causes (i.e., the

local clues that signal the presence of a causal link) are progressively general-

ised, and, at the same time, their influence is increasingly modulated by pre-

viously acquired causal knowledge. As it happens with other cognitive

capacities, causal learning in early childhood is essentially an empirical rather

than a formal activity, which leads to prototypical inductive errors in specific

cases (Koslowski & Masnick, 2002; Siegler, 1983; Vosniadou & Brewer, 1992;

see also Vosniadou, 1994). For instance, Heyman, Phillips, and Gelman (2003)

have shown that young children (5- to 7-year-olds) generalise physical princi-

ples across ontological kinds (animate vs. inanimate entities) but also show

sensitivity to those ontological kinds in their projections, an effect absent in

adults. In addition, 5-year-olds are more likely to project principles from ani-

mate to inanimate objects than vice versa, which is in clear contradiction with

adult physical knowledge. Direct experience, verbal learning (including formal

education; German, Hollander, Star, & Heyman, 2000), and even nonverbal

communication (Brooks, Hanauer, & Frye, 2001), contribute to the increasing

importance of top-down factors in causal reasoning (Amsel, Goodman, Savoie,

& Clark, 1996).

On the other hand, as noted earlier, general induction processes can be

innately biased to operate over certain categories of stimuli selected from the

flow of events. In that sense, the existence of content-specific knowledge is not

incompatible with general purpose induction principles. In fact, early innate

content knowledge can be useful in providing the kind of constraints on initial

causal hypotheses necessary for the application of such principles (i.e., both for

selecting causal candidates, and for controlling nontarget extraneous factors).

Finally, in parallel to the development of executive strategies, the causal

exploration of the world through intervention becomes more formal and sys-

tematic. Confronted with a complex causal scenario, in which several variables

can produce an effect, reasoners in their late childhood and early adolescence

systematically manipulate each potential factor across the values of the other

factors in order to isolate the individual and interactive influence of each of them

on the value of the outcome. Younger reasoners, in contrast, proceed in a more

random, arbitrary way (Inhelder & Piaget, 1958).

To recap, it is proposed here that an innate and unitary causal learning system

is not strictly necessary for the development of causal reasoning, although the

innate nature of learning facilitation mechanisms in some domains and per-

ceptual causation are not denied. A unitary regularity-detection learning system

can be regarded as the basis on which the primitives of causal reasoning are built

on. However, that system by itself (even if enhanced by content-related

preparedness) cannot give rise to the emergent features of developing causal


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

reasoning abilities. Tentatively, we have proposed that intervention can play an

important role in acquiring the concept of causal power as something essentially

different to mere predictive validity. From a rational point of view (as noted in

an earlier section), the value of intervention relies on the fact that it changes the

structure of the causal scenario under scrutiny and ensures control, which means

that the biological facilitation of learning through intervention is indeed

rationally justified. Therefore, from this perspective causal reasoning abilities

emerge from the interaction between general purpose learning mechanisms, and

content-specific knowledge ``modules'' that enhance them. Further development

of episodic memory and executive function, verbal instruction, and the accrual of

a progressively richer and more refined base of causal knowledge are responsible

for causal reasoning development through childhood and adolescence.

We are aware that the evidence supporting the gradual and learning-based

emergence of the concept of causal power and causal induction principles we

have proposed here is still sparse, but so is the evidence against it. To date, the

existence of a general-purpose causal induction innate module is based exclu-

sively on the demonstration that young infants show causal induction abilities

that cannot be accounted for by basic associative learning. However, pre-

cociousness is not a definitive proof of innateness. Three-year-olds already have

an extensive history of hits and failures in interacting with the world, and their

environment is plenty of opportunities to learn that not all predictive relations

remain predictive when an intervention by oneself or other person is introduced.

Our hypothesis is thus aimed at inspiring new research on the precise timing and

conditions for the development of general-purpose causal induction principles,

and on the role of intervention in the emergence of those principles.

A GLIMPSE AT THE WHOLE PICTURE

In the final section of this work, we intend to compose a general cognitive

architecture for causal reasoning based on the empirical evidence available to

date. That tentative architecture is summarised in Figure 2. The upper and lower

sections of the figure represent inputs to the processes involved in causal rea-

soning, whereas the middle part represents the internal representations generated

by those processes. This architecture is not aimed at being exhaustive, and parts

of it are still underdefined (for instance, how do the different clues to causal

structure interact? And which of them takes precedence when two of them

contradict each other?). However, it serves the objective of assigning the main

current research lines a place in relation to the others, and casts some light on the

path to follow by new research efforts.

Covariation computation is the central element of this set of processes, as it is

necessary both for causal structure inference and for causal strength estimation.

Here we will consider only learning from probabilistic evidence, although the

structure can be extended to deal with deterministic causes. That is possible


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

because specific top-down influences feed into the processes that select infor-

mation for covariation computation (grey arrow). As noted earlier, specific

knowledge on the nature of the mechanism in operation in the scenario under

scrutiny is necessary for taking a limited number of trials as representative of

general rules (as in the example of the alarm watch).

Most studies on covariation computation from noisy information have tried to

provide evidence to solve the controversy between rule-based and associative

models. To date, the most convincing results in favour of the associative

approach are those reported by Dickinson and his collaborators (Aitken et al.,

2000, 2001; Dickinson & Burke, 1996; Larkin et al., 1998). In this series of

studies, participants were first trained with a compound (AB+), and then with

one of the elements of the compound (A+, or A7). Judgements on B were

shown to be affected by the elemental phase (retrospective revaluation) only

when the two elements of the compound (A and B) had been consistently

associated between them in the first phase of the study. This result is predicted

by associative theories that claim that the representation of the target cue (B)

need to be associatively activated by the competing cue (A) during the second

phase of the task for retrospective revaluation effects to take place. This inter-

Figure 2. General architecture of causal reasoning. The upper and lower sectors stand for the

informational inputs for causal reasoning process. The middle sector represents the intermediate

products (representations) of those processes. Arrows represent information transformations and

modulations.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

pretation, however, has been recently challenged (Beckers, De Houwer, &

Miller, 2004). According to this second interpretation, the complexity of the

designs used in these studies preclude reasoners to remember the relevant trials

presented in the first stage of the task, and thus to compute covariation con-

ditionally (which is necessary for retrospective revaluation to occur).

On the other hand, evidence in more simple situations clearly favours a rule-

based account of covariation estimation. As already commented in an earlier

section, a recent meta-analysis of the most relevant studies in which the trial-by-

trial causal learning procedure has been used (Perales et al., 2005) shows that a

simple rule integrating the frequencies of the four trial types in a weighted

manner (Busemeyer, 1991) isÐgloballyÐthe most predictive of the models

proposed to date (see also White, 2003b). In addition, a number of studies have

shown that reasoners' inferences on the maximality of the outcome, on the

additivity of the effects of the competing cues, and on the presence or absence of

the blocked cue during the first stage of a blocking design (De Houwer, 2002;

De Houwer & Beckers, 2003; De Houwer et al., 2002) modulate cue competition

effects.

Taken together, the available evidence seems to demonstrate the implication

of effortful, deliberate inferential reasoning processes in covariation estimation

in causal reasoning tasks, which are much more compatible with a rule-based

account of causal learning than with associative models. In all of these studies,

however, the main dependent variable has been a global predictive or causal

judgement. The interpretation of results is not that clear in studies in which other

responses were collected. For instance, in a recent nonpublished study by Allan,

Siegel, and Tangen (2004) it has been shown that trial-by-trial outcome

expectancy measures, analysed by using uncontaminated signal detection

theory's indices (d' and û), did not show the biases (i.e., the outcome-density

and the cue-density effects) that have been consistently reported in experiments

in which global predictive and causal judgements were collected.

Although evidence is still only partial and preliminary, it already shows that

some of the effects that have been interpreted to be contrary to an associative

account of causal learning do not take place during learning, but during the

elaboration of a judgement. Evidence accrual during training can be regarded as

a mostly automatic, data-driven, and probably associative process, whereas

computation processes can be responsible for information integration when a

judgement is required (Matute, Vegas, & de MaÂrez, 2002). Therefore, associa-

tive and rule-based mechanisms could well define two different hierarchically

related levels in the set of mechanisms computing covariation. According to our

proposal, the formation of the categories representing the four trial types, and the

computation of their relative frequencies in the task (a, b, c, and d; necessary for

the application of a rule) responds to the operation of an associative mechanism.

In other words, an associative mechanism could be responsible for forming and

strengthening the representations of the four trial types and those associative


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

representations can be subsequently used to compute a composite value of

perceived contingency. Evidence on the associative nature of frequency esti-

mations can arise from studies (as the ones we are currently carrying out in our

laboratory) showing that manipulating the degree of attention to the cause or the

effect in causal learning tasks can differentially alter the frequency estimations

of the four trial types (a, b, c, and d).

In this architecture, all processing is considered to be bottom-up until the

trial-type frequency estimation level. From that level on, further processing can

be influenced by previous knowledge in a top-down manner. However, as noted

earlier, top-down influences imply reorganising the information represented at

this level, and are costly in terms of attentional and executive load. In more

specific terms, once the frequencies of the different trial types have been

computed, it is necessary to determine which of them are considered relevant for

contingency estimation, in accordance with general inferential principles

specifying that covariation must be computed conditionally (Spellman, 1996;

White, 2002), and ceiling effect situations must be avoided (Wu & Cheng,

1999). Only sets of instances in which certain factors are constantly present or

absent and the base rate of the effect is submaximal must be taken into account.

At this level, content-specific knowledge has a role in signalling what factors in

the environment need to controlled, and may determine what events in the tasks

are potential causes and what of them are effects (Waldmann & Holyoak, 1992).

For the sake of parsimony, we have also assumed that, independently of how

covariation between a potential cause and an effect is computed the result of that

process is equivalently available for further causal processing. On the one hand,

the reasoner needs to determine whether that covariation is different from zero

(which depends not only on its absolute value, but also on the number of

instances and the reliability of the source from which the information stems). In

case the covariation is considered reliable, it has still to be decided whether it

results from the existence of a causal link or it is a spurious one. And, on the

other hand, if the covariation is interpreted as causal, its specific value can be

used to directly or indirectly determine the strength of the causal relationship.

Several studies have shown that naõÈve reasoners intuitively discriminate

covariation from causation (Fugelsang & Thompson, 2003, Exp. 2; Wu &

Cheng, 1999). According to our proposal, for covariation to be interpreted as

causal, it has to be combined with the available information on order and

contiguity and with previous categorical knowledge on the plausibility a causal

mechanism linking the candidate cause and the effect. As noted in an earlier

section, previous causal knowledge can also modulate the use of time and order

as informative clues to causal structure. Additionally, it must also be considered

whether the covariation has emerged from an intervention or just from an

observational situation.

For purely illustrative reasons, knowledge on causal categories/mechanisms

has been represented separately from the evaluated causal link in Figure 2;


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

however, links between causal categories and causal links between single events

have, from our point of view, a common representational basis. The bidirec-

tional relation between the two levels stands for the incremental loop of

knowledge acquisition described previously: causal categories modulate the

acquisition of new causal knowledge from covariation between specific events;

but, simultaneously, causal knowledge acquired in this way contributes to

refining causal categories. In this framework, a general causal mechanism, and a

specific causal link are not necessarily two different types of representation, but

just two propositions of essentially the same nature linking concepts at different

abstraction levels.

To conclude, our proposal is strongly based on the available evidence

(thoroughly reported in the first two sections of this work), but new predictions

can also be experimentally tested. For example, it has been proposed that

expectancy measures and, consequently, trial type frequencies, are insensitive to

top-down influences. A new paradigm under development in our laboratory

allows analysing response reaction times to online expectancy questions, and

identifying the factors that influence those reaction times. Similarly, trial-type

estimations can also be analysed to test whether or not they show any sign of

underlying associative processes.

In addition, the proposed architecture has a clear hierarchical structure, which

implies that the logic of dissociations±associations can be applied to test it.

Specifically, any factor influencing the lower part of the system (conditioned

expectancies) is expected to also influence the levels above it, whereas it is in

principle possible to find factors that influence the upper levels of the archi-

tecture without having any effect on the levels below them. For example, it can

be tested whether the biases usually detected in contingency estimations (e.g.,

cue- and outcome-density biases) are also shown by trial-by-trial predictive

accuracy measures and trial-type frequency estimations.

And finally, we have only listed the different sources of information

reasoners use to make structural decisions (time and order, manipulation, con-

tingency and control, and previous knowledge; at the top of Figure 2), but little

is known about how they interact, and when one takes precedence over other

when they point in different directions. The most exciting studies in the field at

the moment are those in which the traditional causal learning paradigm is being

abandoned or reformulated, and new tasks are proposed to study the importance

of these systematically neglected factors.

FINAL REMARKS

The proposed general architecture for causal reasoning is based on the idea that

(1) associative- and rule-based mechanisms are just descriptions of different

hierarchical levels in the system responsible for covariation computation, and

(2) covariation estimates are integrated with other clues present in the


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

environment, and with pre-stored general-purpose and domain-specific know-

ledge in order to decide whether or not the observed covariations signal the

presence of hidden causal links. This architecture neither optimises the useful-

ness of all the available information, nor strictly responds to rational normativity

criteria, but it is highly adaptive in adjusting the reasoner's knowledge and

behaviour to the actual causal texture of the world.

Nonhuman animals and newborns are probably equipped with only part of

this architecture. Basic learning mechanisms ensure a certain level of adaptation

to the causal structure of the environment, as long as they incorporate cue

competition principles that avoid confounding causal factors and noncausal

predictors, and are sensitive to the order and timing of events in the world.

Additionally, we also postulate that interventions are driven by stimuli that have

been associatively signalled to be potential causes. By means of these

mechanisms, and the possibility of learning from the patterns emerging from

intervention, the basic distinction between effective causes and other types of

correlates can be learnt. Subsequently, content-independent principles of causal

reasoning can be discriminatively learnt, even in young children, from the

monitoring of those patterns of dependency/independency resulting from their

own responses.

The more evolved parts of the proposed architecture are dependent on

declarative and episodic memory, and on executive processes, and are therefore

assumed to appear in parallel to the development of those cognitive systems.

The final system is a complex set of interrelated processes in charge of gen-

erating and manipulating the different types of representations needed to build

an interrelated set of causal propositions that can be used further to predict,

manipulate, and understand the events in the world.

Original manuscript received June 2004

Revised manuscript received October 2004

PrEview proof published online October 2005

REFERENCES

Ahn, W. K., & Bailenson, J. (1996). Causal attribution as a search for underlying mechanisms: An

explanation of the conjunction fallacy and the discounting principle. Cognitive Psychology, 31,

82±123.

Ahn, W. K., & Kalish, C. W. (2000). The role of mechanism beliefs in causal reasoning. In F. C. Keil

& R. A. Wilson (Ed.), Explanation and cognition (pp. 199±226). Cambridge, MA: MIT Press.

Aitken, M. R. F., Larkin, M. J. W., & Dickinson, A. (2000). Super-learning of causal judgements.

Quarterly Journal of Experimental Psychology, 53B, 59±81.

Aitken, M. R. F., Larkin, M. J. W., & Dickinson, A. (2001). Re-examination of the role of within

compound associations in the retrospective revaluation of causal judgements. Quarterly Journal

of Experimental Psychology, 54B, 27±51.

Allan, L. (2003). Assessing Power-PC. Learning & Behavior, 31, 192±204.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Allan, L. G. (1993). Human contingency judgments: Rule based or associative? Psychological

Bulletin, 114, 435±448.

Allan, L. G., Siegel, S., & Tangen, J. M. (2004, May). A signal detection analysis of contingency

data. Communication presented at the first special interest meeting on Human Contingency

Learning, Le Lignely, Belgium.

Amsel, E., Goodman, G., Savoie, D., & Clark, M. (1996). The development of reasoning about

causal and noncausal influences on levers. Child Development, 67, 1624±1646.

Anderson, J. R., & Sheu, C. (1995). Causal inferences as perceptual judgments. Memory and

Cognition, 23, 510±524.

Barsalou, L. W., Sloman, S. A., & Chaigneau, S. E. (2002). The HIPE theory of function. In L.

Carlson & E. van der Zee (Eds.), Representing functional features for language and space:

Insights from perception, categorization, and development. New York: Oxford University

Press.

Beckers, T., De Houwer, J., & Miller, R. R. (2004, May). Outcome additivity and outcome max-

imality as independent modulators of blocking in human and rat causal learning. Communication

presented at the first special interest meeting on Human Contingency Learning, Le Lignely,

Belgium.

Brase, G. L., Cosmides, L., & Tooby, J. (1997). Individuation, counting, and statistical inference:

The role of frequency and whole-object representations in judgment under uncertainty. Journal of

Experimental Psychology: General, 127, 3±21.

Brooks, P. J., Hanauer, J. B., & Frye, D. (2001). Training 3-year-olds in rule-based causal reasoning.

British Journal of Developmental Psychology, 19, 573±595.

Buehner, M. J., & Cheng, P. W. (1997). Causal induction: The Power PC theory versus the Rescorla±

Wagner model. In M. G. Shafto & P. Langley (Eds.), Proceedings of the 19th annual conference

of the Cognitive Science Society (pp. 55±60). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). From covariation to causation: A test of the

assumption of causal power. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 29, 1119±1140.

Buehner, M. J., & May, J. (2003). Rethinking temporal contiguity and the judgement of causality:

Effects of prior knowledge, experience, and reinforcement procedure. Quarterly Journal of

Experimental Psychology, 56A, 865±890.

Busemeyer, J. R. (1991). Intuitive statistical estimation. In N. H. Anderson (Ed.), Contributions to

information integration theory (pp. 187±215). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Busemeyer, J. R., Byun, E., Delosh, E. L., & McDaniel, M. A. (1997). Learning functional relations

based on experience with input output pairs by humans and artificial neural networks. In K.

Lamberts & D. R. Shanks (Eds.), Knowledge, concepts and categories: Studies in cognition (pp.

405±437). Cambridge, MA: MIT Press.

Byrne, R. M. J. (1997). Cognitive processes in counterfactual thinking about what might have been.

The psychology of learning and motivation: Advances in research and theory (Vol. 37, pp. 105±

154). San Diego, CA: Academic Press.

Carey, S. (2001). On the very possibility of discontinuities in conceptual development. In E. Dupoux

(Ed.), Language, brain, and cognitive development: Essays in honor of Jacques Mehler (pp. 303±

324). Cambridge, MA: MIT Press.

Catena, A., Maldonado, A., & Candido, A. (1998). The effect of frequency of judgement and the type

of trials on covariation learning. Journal of Experimental Psychology: Human Perception and

Performance, 24, 481±495.

Chapman, G. B., & Robbins, S. J. (1990). Cue interaction in human contingency judgment. Memory

and Cognition, 18, 537±545.

Chatlosh, D. L., Neunaber, D. J., & Wasserman, E. A. (1985). Response±outcome contingency:

Behavioral and judgmental effects of appetitive and aversive outcomes with college students.

Learning and Motivation, 16, 1±34.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Cheng, P. W. (1993). Separating causal laws from casual facts: Pressing the limits of statistical

relevance. In D. L. Medin (Ed.), Advances in research and theory: Vol. 30. The psychology of

learning and motivation (pp. 215±264). San Diego, CA: Academic Press.

Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review,

104, 367±405.

Cheng, P. W. (2000). Causality in the mind: Estimating contextual and conjunctive power. In F. C.

Keil & R. A. Wilson (Eds.), Explanation and cognition (pp. 227±253). Cambridge, MA: MIT

Press.

Cheng, P. W., & Holyoak, K. J. (1995). Complex adaptive systems as intuitive statisticians: Caus-

ality, contingency, and prediction. In H. L. Roitblat & J. A. Meyer (Eds.), Comparative

approaches to cognitive science: Complex adaptive systems (pp. 271±302). Cambridge, MA:

MIT Press.

Cheng, P. W., & Lien, Y. (1995). The role of coherence in differentiating genuine from spurious

causes. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition: A multidisciplinary

debate (pp. 463±494). New York: Clarendon Press/Oxford University Press.

Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological

Review, 99, 365±382.

Cobos, P. L., LoÂpez, F. J., Cano, A., Almaraz, J., & Shanks, D. R. (2002). Mechanisms of predictive

and diagnostic causal induction. Journal of Experimental Psychology: Animal Behavior Pro-

cesses, 28, 331±346.

Cohen, L. B., & Oakes, L. M. (1993). How infants perceive a simple causal event. Developmental

Psychology, 29, 421±433.

Cohen, L. B., Rundell, L. J., Spellman, B. A., & Cashon, C. H. (1999). Infants' perception of causal

chains. Psychological Science, 10, 412±418.

Cummins, D., Cummins, R., & Poirier, P. (2003). Cognitive evolutionary psychology without

representational nativism. Journal of Experimental and Theoretical Artificial Intelligence, 15,

143±159.

Cummins, D. D. (1995). Naive theories of causal deduction. Memory and Cognition, 23, 646±685.

Cummins, D. D. (1998). The pragmatics of causal inference. In M. A. Gernsbacher & S. J. Derry

(Eds.), Proceedings of the 20th annual conference of the cognitive science society (pp. 9±14).

Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Damianopoulos, E. N. (1967). S±R contiguity and delay of reinforcement as critical parameters in

classical aversive conditioning. Psychological Review, 74, 420±427.

Danks, D. J. (2002). The epistemology of causal judgment. Dissertation Abstracts International:

Section A. Humanities and Social Sciences, 63, 212.

De Houwer, J. (2002). Forward blocking depends on retrospective inferences about the presence of

the blocked cue during the elemental phase. Memory and Cognition, 30, 24±33.

De Houwer, J., & Beckers, T. (2002). A review of recent developments in research and theories on

human contingency learning. Quarterly Journal of Experimental Psychology, 55B, 289±310.

De Houwer, J., & Beckers,T. (2003). Secondary task difficulty modulates forward blocking in human

contingency learning. Quarterly Journal of Experimental Psychology, 56B, 345±357.

De Houwer, J., Beckers, T., & Glautier, S. (2002). Outcome and cue properties modulate blocking.

Quarterly Journal of Experimental Psychology, 55A, 965±985.

Dellarosa-Cummins, D., & Cummins, R. (1999). Biological preparedness and evolutionary expla-

nation. Cognition, 73, B37±B53.

Dickinson, A., & Burke, J. (1996). Within-compound associations mediate the retrospective reva-

luation of causality judgements. Quarterly Journal of Experimental Psychology, 49B, 60±80.

Duchaine, B., Cosmides, L., & Tooby, J. (2001). Evolutionary psychology and the brain. Current

Opinion in Neurobiology, 11, 225±230.

Fugelsang, J. A., & Thompson, V. A. (2003). A dual process model of belief and evidence inter-

actions in causal reasoning. Memory and Cognition, 31, 800±815.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Gallistel, C. R. (2002). Frequency, contingency and the information processing theory of con-

ditioning. In P. Sedlmeier (Ed.), Frequency processing and cognition (pp. 153±171). London:

Oxford University Press.

Gallistel, C. R., & Gibbon, J. (2000). Time, rate, and conditioning. Psychological Review, 107, 289±

344.

Gelman, S. A., Hollander, M., Star, J., & Heyman, G. D. (2000). The role of language in the

construction of kinds. In D. Medin (Ed.), Advances in research and theory: Vol. 39. The psy-

chology of learning and motivation (pp. 201±263). San Diego, CA: Academic Press.

German, T. P., & Nichols, S. (2003). Children's counterfactual inferences about long and short causal

chains. Developmental Science, 6, 514±523.

Gigerenzer, G., & Hoffrage, U. (1999). How to improve Bayesian reasoning without instruction:

Frequency formats. Psychological Review, 102, 684±704.

Gigerenzer, G., & Todd, P. M. (1999). Simple heuristics that make us smart. New York: Oxford

University Press.

Glymour, C. (1998). Learning causes: Psychological explanations of causal explanation. Minds and

Machines, 8, 39±60.

Glymour, C., Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure. London:

Academic Press.

Goldvarg, E., & Johnson-Laird, P. N. (2001). NaõÈve causality: A mental model theory of causal

meaning and reasoning. Cognitive Science, 25, 565±610.

Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of

causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3±32.

Gopnik, A., Sobel, D. M., Schulz, L. E., & Glymour, C. (2001). Causal learning mechanisms in very

young children: Two, three , and four year olds infer causal relations from patterns of variation

and covariation. Developmental Psychology, 37, 620±629.

Haggard, P., Clark, S., & Kalogeras, J. (2002). Voluntary action and conscious awareness. Nature

Neuroscience, 5, 382±385.

Hagmayer, Y., & Waldmann, M. R. (2002). How temporal assumptions influence causal judgments.

Memory and Cognition, 30, 1128±1137.

HarreÂ, R., & Madden, E. H. (1975). Causal powers. Oxford, UK: Blackwell.

Hart, H. L., & HonoreÂ, A. M. (1985). Causation in the law (2nd ed.). Oxford, UK: Clarendon Press.

(Original work published 1959.)

Heyman, G. D., Phillips, A. T., & Gelman, S. A. (2003). Children's reasoning about physics within

and across ontological kinds. Cognition, 89, 43±61.

Hume, D. (1978). A treatise of human nature. Oxford, UK: Oxford University Press. (Original work

published 1739.)

Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence: An

essay on the construction of formal operational structures. Oxford, UK: Basic Books.

Jones, J. E. (1962). Contiguity and reinforcement in relation to CS-UCS intervals in classical

aversive conditioning. Psychological Review, 69, 176±186.

Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D. Sperber, D. Premack,

& A. J. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 234±262). New York:

Clarendon Press/Oxford University Press.

Kelley, H. H. (1967). Attribution theory in social psychology. Nebraska Symposium on Motivation,

15, 192±238.

Kirkpatrick, K., & Church, R. M. (1998). Are separate theories of conditioning and timing necessary?

Behavioural Processes, 44, 163±182.

Koslowski, B., & Masnick, A. (2002). The development of causal reasoning. In U. Goswami (Ed.),

Blackwell handbook of childhood cognitive development. Malden, MA: Blackwell.

Lagnado, D., & Sloman, S. A. (2002). Learning causal structure. In Proceedings of the 24th Annual

Conference of the Cognitive Science Society (pp. 560±565). Mahwah, NJ: Erlbaum.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Lagnado, D., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 30, 856±876.

Larkin, M. J. W., Aitken, M. R. F., & Dickinson, A. (1998). Retrospective revaluation of causal

judgments under positive and negative contingencies. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 24, 1331±1352.

Leslie, A. M., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25, 265±

288.

Levin, I. P., Wasserman, E. A., & Kao, S. F. (1993). Multiple methods of examining biased infor-

mation use in contingency judgments. Organizational Behavior and Human Decision Processes,

55, 228±250.

Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence

hypothesis. Cognitive Psychology, 40, 87±137.

Lober, K., & Shanks, D. R. (2000). Is causal induction based on causal power? Critique of Cheng

(1997). Psychological Review, 107, 195±212.

Lovibond, P. F. (1983). Facilitation of instrumental behavior by a Pavlovian appetitive conditioned

stimulus. Journal of Experimental Psychology: Animal Behavior Processes, 9, 225±247.

Mackie, J. L. (1974). The cement of Universe: A study in causation. Oxford, UK: Oxford University Press.

Maldonado, A., Catena, A., Candido, A., & Garcia, I. (1999). The belief revision model: Asym-

metrical effects of noncontingency on human covariation learning. Animal Learning and

Behavior, 27, 168±180.

Mandel, D. R., & Lehman, D. R. (1998). Integration of contingency information in judgments of

cause, covariation, and probability. Journal of Experimental Psychology: General, 127, 269±285.

Matute, H., Vegas, S., & de MaÂrez, P. J. (2002). Flexible use of recent information in causal and

predictive judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition,

28, 714±725.

Michotte, A. (1963). The perception of causality. Oxford, UK: Basic Books.

Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla±Wagner model.

Psychological Bulletin, 117, 363±386.

Mowrer, R. R., & Klein, S. B. (Eds.). (2001). Handbook of contemporary learning theories. Mahwah,

NJ: Lawrence Erlbaum Associates, Inc.

Newsome, G. L. (2003). The debate between current versions of covariation and mechanism

approaches to causal inference. Philosophical Psychology, 16, 87±107.

Oakes, L. M. (1994). Development of infants' use of continuity cues in their perception of causality.

Developmental Psychology, 30, 869±870.

Oehman, A., & Mineka, S. (2003). The malicious serpent: Snakes as a prototypical stimulus for an

evolved module of fear. Current Directions in Psychological Science, 12, 5±9.

Pearce, J. M. (1987). A model for stimulus generalization in Pavlovian conditioning. Psychological

Review, 94, 61±73.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San

Mateo, CA: Morgan Kaufmann.

Pearl, J. (2000).Causality: Models, reasoning, and inference. NewYork: Cambridge University Press.

Perales, J. C., Catena, A., & Maldonado, A. (2004). Inferring non-observed correlations from causal

scenarios: The role of causal knowledge. Learning and Motivation, 35, 115±135.

Perales, J. C., & Shanks, D. R. (2003). Normative and descriptive accounts of the influence of power

and contingency on causal judgement. Quarterly Journal of Experimental Psychology, 56A, 977±

1007.

Perales, J. C., Shanks, D. R., & Castro, L. (2005). Formal models of causal learning: A review and

synthesis. Manuscript submitted for publication.

Perner, J. (2001). Episodic memory: Essential distinctions and developmental implications. In C.

Moore & K. Lemmon (Eds.), The self in time: Developmental perspectives (pp. 181±202).

Mahwah, NJ: Lawrence Erlbaum Associates, Inc.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Piaget, J. (1969). The child's conception of physical causality. Totowa, NJ: Littlefield, Adams & Co.

(Original work published 1927.)

Rakison, D. H., & Oakes, L. M. (Eds.). (2003). Early category and concept development: Making

sense of the blooming, buzzing confusion. London: Oxford University Press.

Reichenbach, H. (1956). The direction of time. Berkeley, CA: University of California Press.

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations of the

effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.),

Classical conditioning: II. Current theory and research (pp. 64±99). New York: Appleton-

Century-Crofts.

Roese, N. J. (1994). The functional basis of counterfactual thinking. Journal of Personality and

Social Psychology, 66, 805±818.

Savastano, H. I., & Miller, R. R. (1998). Time as content in Pavlovian conditioning. Behavioural

Processes, 44, 147±162.

Schafe, G., & Bernstein, I. (1996). Taste aversion learning. In E. Capaldi (Ed.), Why we eat what we

eat: The psychology of eating (pp. 31±51). Washington, DC: American Psychological Associa-

tion.

Scheines, R., Spirtes, P., Glymour, C., Meek, C., & Richardson, T. (1998). The TETRAD project:

Constraint based aids to causal model specification. Multivariate Behavioral Research, 33, 65±

117.

Schlottmann, A. (1999). Seeing it happen and knowing how it works: How children understand the

relation between perceptual causality and knowledge of underlying mechanism. Developmental

Psychology, 35, 303±317.

Schlottmann, A. (2000). Is perception of causality modular? Trends in Cognitive Sciences, 4, 441±

442.

Schlottmann, A., Allen, D., Linderoth, C., & Hesketh, S. (2002). Perceptual causality in children.

Child Development, 73, 1656±1677.

Scholl, B. J., & Leslie, A. M. (1999). Modularity, development and ``theory of mind''. Mind and

Language, 14, 131±153.

Scholl, B. J., & Tremoulet, P. D. (2000). Perceptual causality and animacy. Trends in Cognitive

Sciences, 4, 299±309.

Schultz, T. R. (1982). Rules of causal attribution. Monographs of the Society for Research in Child

Development, 47(1, Serial No. 194).

Schulz, L. E., & Gopnik, A. (2004). Causal learning across domains. Developmental Psychology, 40,

162±176.

Seligman, M. E. (1971). Phobias and preparedness. Behavior Therapy, 2, 307±320.

Shanks, D. (1995). The psychology of associative learning. New York: Cambridge University Press.

Shanks, D. R., & Dickinson, A. (1991). Instrumental judgment and performance under variations in

action±outcome contingency and contiguity. Memory and Cognition, 19, 353±360.

Shanks, D. R., Holyoak, K., & Medin, D. L. (Eds.). (1996). Causal learning. San Diego, CA:

Academic Press.

Shanks, D. R., & LoÂpez, F. J. (1996). Causal order does not affect cue selection in human associative

learning. Memory and Cognition, 24, 511±522.

Siegel, S., & Allan, L. G. (1996). The widespread influence of the Rescorla±Wagner model.

Psychonomic Bulletin and Review, 3, 314±321.

Siegler, R. S. (1983). How knowledge influences learning. American Scientist, 71, 631±638.

Sloman, S.A.,&Lagnado,D. (2004). Causal invariance in reasoning and learning. InB.Ross (Ed.),The

psychology of learning and motivation, Vol. 44 (pp. 287±325). San Diego, CA: Academic Press.

Sloman, S., & Lagnado, D. (2005). Do we ``do''? Cognitive Science, 29, 5±39.

Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge.

Psychological Review, 99, 605±632.

Spellman, B. A. (1996). Conditionalizing causality. In D. R. Shanks & K. Holyoak (Eds.), Causal

learning (pp. 167±206). San Diego, CA: Academic Press.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

Sperber, D., Premack, D., & Premack, A. J. (Eds.). (1995). Causal cognition: A multidisciplinary

debate. New York: Clarendon Press/Oxford University Press.

Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. Cambridge, MA:

MIT Press.

Steyvers, M., Tenenbaum, J., Wagenmakers, E. J., & Blum, B. (2003). Inferring causal networks

from observations and interventions. Cognitive Science, 27, 453±489.

Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland.

Tenenbaum, J. B., & Griffiths, T. L. (2001). Structure learning in human causal induction. In T.

Leen, T. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, Vol.

13. Cambridge, MA: MIT Press.

Terry, W. S., & Wagner, A. R. (1975). Short-term memory for ``surprising'' versus ``expected''

unconditioned stimuli in Pavlovian conditioning. Journal of Experimental Psychology: Animal

Behavior Processes, 1, 122±133.

ValleÂe-Tourangeau, F., Murphy, R. A., Drew, S., & Baker, A. G. (1998). Judging the importance of

constant and variable candidate causes: A test of the Power-PC theory. Quarterly Journal of

Experimental Psychology, 51A, 65±84.

Van-Hamme, L. J., & Wasserman, E. A. (1994). Cue competition in causality judgments: The role of

nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127±151.

Vosniadou, S. (1994). Capturing and modeling the process of conceptual change. Learning and

Instruction, 4, 45±69.

Vosniadou, S., & Brewer, W. F. (1992). Mental models of the earth: A study of conceptual change in

childhood. Cognitive Psychology, 24, 535±585.

Waldmann, M. R. (2000). Competition among causes but not effects in predictive and diagnostic

learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 53±76.

Waldmann, M. R. (2001). Predictive versus diagnostic causal learning: Evidence from an over-

shadowing paradigm. Psychonomic Bulletin and Review, 8, 600±608.

Waldmann, M. R., & Hagmayer, Y. (2001). Estimating causal strength: The role of structural

knowledge and processing effort. Cognition, 82, 27±58.

Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models:

Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222±236.

Waldmann, M. R., & Holyoak, K. J. (1997). Determining whether causal order affects cue selection

in human contingency learning: Comments on Shanks and LoÂpez (1996).Memory and Cognition,

25, 125±134.

Waldmann, M., & Martignon, L. (1998). A Bayesian network model of causal learning. In M. A.

Gernsbacher & S. J. Derry (Eds.), Proceedings of the 20th annual conference of the Cognitive

Science Society (pp. 1102±1107). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Walsh, C. R., & Byrne, R. M. J. (2004). Counterfactual thinking: The temporal order effect.Memory

and Cognition, 32, 369±378.

Wasserman, E. A., Chatlosh, D. L., & Neunaber, D. J. (1983). Perception of causal relations in

humans: Factors affecting judgments of response±outcome contingencies under free-operant

procedures. Learning and Motivation, 14, 406±432.

Wasserman, E. A., Kao, S. F., van Hamme, L., Katagiri, M., & Young, M. E. (1996). Causation and

association. In D. R. Shanks & K. Holyoak (Eds.), Causal learning (pp. 207±264). San Diego,

CA: Academic Press.

White, P. A. (1995). Use of prior beliefs in the assignment of causal roles: Causal powers versus

regularity-based accounts. Memory and Cognition, 23, 243±254.

White, P. A. (2000). Causal attribution and Mill's methods of experimental inquiry: Past, present,

and prospect. British Journal of Social Psychology, 39, 429±447.

White, P. A. (2001). Causal judgments about relations between multilevel variables. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 27, 499±513.

White, P. A. (2002). Causal judgement from contingency information: Judging interactions between

two causal candidates. Quarterly Journal of Experimental Psychology, 55A, 819±838.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

White, P. A. (2003a). Causal judgement as evaluation of evidence: The use of confirmatory and

disconfirmatory information. Quarterly Journal of Experimental Psychology, 56A, 491±513.

White, P. A. (2003b). Effects of wording and stimulus format on the use of contingency information

in causal judgment. Memory and Cognition, 31, 231±242.

White, P. A. (2003c). Making causal judgments from the proportion of confirming instances: The

pCI rule. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 710±727.

Wimer, S., & Kelley, H. H. (1982). An investigation of the dimensions of causal attribution. Journal

of Personality and Social Psychology, 43, 1142±1162.

Wu, M., & Cheng, P. W. (1999). Why causation need not follow from statistical association:

Boundary conditions for the evaluation of generative and preventive causal powers. Psycho-

logical Science, 10, 92±97.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 2

3:31

18

Nov

embe

r 20

14

human causal induction: a glimpse at the whole picture

Documents