human causal induction: a glimpse at the whole picture
TRANSCRIPT
This article was downloaded by: [York University Libraries]On: 18 November 2014, At: 23:31Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
European Journal of CognitivePsychologyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/pecp20
Human causal induction: A glimpse atthe whole pictureJosé Perales a & Andrés Catena aa Universidad de Granada, SpainPublished online: 10 Sep 2010.
To cite this article: José Perales & Andrés Catena (2006) Human causal induction: Aglimpse at the whole picture, European Journal of Cognitive Psychology, 18:2, 277-320, DOI:10.1080/09541440540000167
To link to this article: http://dx.doi.org/10.1080/09541440540000167
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Human causal induction:
A glimpse at the whole picture
Jose C. Perales and AndreÂs Catena
Universidad de Granada, Spain
In the present work, most relevant evidence in causal learning literature is reviewedand a general cognitive architecture based on the available corpus of experimentaldata is proposed. However, contrary to algorithms formulated in the Bayesian netsframework, such architecture is not assumed to optimise the usefulness of theavailable information in order to induce the underlying causal structure as a whole.Instead, human reasoners seem to rely heavily on local clues and previous knowledgeto discriminate between spurious and truly causal covariations, and piece thoserelations together only when they are demanded to do so. Bayesian networks and AIalgorithms for causal inference are nonetheless considered valuable tools to identifythe main computational goals of causal induction processes and to define theproblems any intelligent causal inference system must solve.
Causal induction is one of the pillars of intelligent behaviour. As pointed out by
Newsome (2003), except for domains such as logic and mathematics, causal
knowledge is a prerequisite for effective reasoning and problem solving, and
allows humans to manipulate, predict, and understand the world in an adaptive
way. However, the field still lacks a unifying framework to integrate the
available experimental evidence. This work is intended to contribute to clarify
the goals of research on causal induction, the extent to which the data available
to date have contributed to achieve those goals, and how new lines of research
could help to fill the gaps in our corpus of knowledge.
The plan of the paper is as follows. First, we define the computational goals
of causal induction. In order to do so, we will present an introduction to causal
Bayes' nets (for detailed reviews, see Pearl, 2000; Spirtes, Glymour, &
Scheines, 2000), a normative framework born in the field of Artificial Intelli-
Correspondence should be addressed to Jose C. Perales-LoÂpez, Department of Psychology,
University of Granada, Campus Cartuja s/n, 18071, Granada, Spain. Email: [email protected]
We would like to thank Dave Lagnado and David Shanks for their many helpful comments. This
research was supported by the Spanish Ministry of Education's programme Becas Postdoctorales en
EspanÄa y el Extranjero: Convocatoria 2003, for the first author, and by a MCyT (Ministerio de
Ciencia y TecnologõÂa) grant (BSO2003-03723) to the first and the second authors.
# 2006 Psychology Press Ltd
http://www.psypress.com/ecp DOI:10.1080/09541440540000167
EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY
2006, 18 (2), 277±320
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
gence (AI) that has greatly contributed to unify the views on the relationship
between empirical, verifiable evidence, and causal constructs. In relation to
psychological research, this approach can be useful to establish a computational
reference to interpret human behaviour. In addition, we will briefly discuss some
questions relative to the definition of causes that can be helpful to circumscribe
the scope of this work.
The second part of the paper provides a general view of how humans achieve
the computational goals described in the preceding section. First, we list the
different sources of information that reasoners use to make a decision on the
existence or nonexistence of a causal link (clues to causal structure). And,
second, we briefly review the evidence about how people integrate covariation
information when estimating the degree to which a putative cause and an effect
are related (estimating causal strength).
The third section focuses on the main questions related to the ontogeny of
causal reasoning. Although developmental issues are not the core of the present
work, we will try at least to identify those developmental questions relevant to the
main arguments presented in the preceding sections. Our strongest claims are the
existence of some developmental dependency between basic learning mechan-
isms and abstract causal inference strategies, and the importance of intervention
in the emergence of abstract principles of causal induction. The existence of
biological preparedness to learn in some areas and some innate content-specific
information are not denied, but we remain unconvinced by those positions that
postulate the existence of a unitary and innate ``causal inference'' module.
Finally, the fourth section summons up a general cognitive architecture of
adult causal induction, based on the experimental evidence available to date.
That architecture is based on the idea that (1) associative and rule-based
mechanisms are just descriptions of different hierarchical levels in the system
responsible for covariation computation, and (2) covariation estimates are
integrated with other clues present in the environment and with pre-stored
general-purpose and domain-specific knowledge in order to decide whether or
not the observed covariations signal the presence of hidden causal links. This
architecture does not strictly respond to rational normativity criteria, but is
highly adaptive at adjusting the reasoner's knowledge and behaviour to the
actual causal texture of the world.
COMPUTATIONAL GOALS OF CAUSALINDUCTION: THE CAUSAL BAYES' NETS
APPROACH
Causal reasoning is the broad term that refers both to the accumulation of new
evidence from which to infer the existence of a causal link (Cheng, 1993) and to
the application of previously acquired causal knowledge when thinking, attri-
buting, or making decisions. Let us imagine a student interested in knowing
278 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
whether drinking coffee improves his or her concentration when studying. With
that aim in mind, he will probably drink coffee for some days, until he has a
clear idea of whether or not there is any consistent improvement in concen-
tration. On the other hand, that same student could be interested in finding an
explanation for the fact that one particular night he felt unusually concentrated.
If our student had been drinking coffee, he could attribute his good state of
concentration to the coffee without collecting any more information. Finally, the
same student could try to imagine what his state of concentration would be if he
had not drunk coffee.
Our first example points to the importance of covariation in generating,
confirming or disconfirming causal beliefs (a psychological function that is
usually called causal induction, or, in a more restricted way, causal learning)
from the observation of a number of individual instances. The second example
stresses the relevance of previous knowledge about specific causal mechanisms
in causal attribution. And finally, in the third example our student engages in
counterfactual thinking (thinking about what would have happened if a given
precedent circumstance had been different). Some authors (see, for example,
Mackie, 1974) have even claimed that counterfactuals form the definitional
basis of causation and, in fact, that is the case in some legal definitions of
causation and responsibility (Hart & HonoreÂ, 1959/1985). The most general
view in Psychology is however that causal knowledge drives counterfactual
thinking rather than the other way round (Pearl, 2000; Sloman & Lagnado,
2005). In accordance with this view, there have been recent proposals to use
counterfactual statements (instead of causal strength judgements) to assess
causal knowledge in naõÈve reasoners (Buehner & Cheng, 1997; Buehner, Cheng,
& Clifford, 2003).
In this work, we will focus on causal induction, under the assumption that
both causal attribution and counterfactual reasoning require applying the
knowledge previously acquired through causal induction processes. Studies on
causal attribution (see Ahn & Kalish, 2000; Kelley, 1967; Schultz, 1982; White,
1995, 2000; Wimer & Kelley, 1982, for milestone works on the issue), and
counterfactual reasoning (Byrne, 1997; German & Nichols, 2003; Roese, 1994;
Walsh & Byrne, 2004) are definitely necessary for understanding how people
think about causes and effects. However, our main interest is the psychological
and epistemological problem of how causal knowledge is acquired from non-
causal previous experience. The aim of the present work is trying to formulate a
general architecture of causal induction to translate noncausal empirical input
(mainly correlational information, combined with other sources of information,
as will be discussed later) into causal knowledge. A related yet different issue is
how and for what purpose that knowledge is subsequently used.
As illustrated by the previous example, causal induction requires in most
cases some understanding of causal mechanisms to have a causal hypothesis to
begin with, and, at the same time, covariation between the cause and the effect is
HUMAN CAUSAL INDUCTION 279
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
important to learn about new causal mechanisms. However, despite the inter-
dependence between the notions of covariation and mechanism, one of the two
types of knowledge has always been given epistemological priority in causal
reasoning theories (see Ahn & Baillenson, 1996; Cheng, 1993; Fugelsang &
Thompson, 2003). On the one hand, covariation-based theories generally neglect
the distinction between covariation and causation; on the other hand,
mechanism-based theories have difficulties at defining exactly what a
mechanism is and, most importantly, they do not specify how causal mechan-
isms are learnt in the first place. Some authors have claimed that causal
knowledge in some domains is innate (Keil, 1995; Leslie & Keeble, 1987;
Spelke, Breinlinger, Macomber, & Jacobson, 1992). However, even advocates
of this approach acknowledge that all knowledge on causal mechanisms cannot
be derived from that innate knowledge.
In recent years a number of theorists (Glymour, 1998; Glymour, Scheines,
Spirtes, & Kelly, 1987; Pearl, 1988, 2000) have abandoned this fruitless
dichotomy and have tried to develop algorithmic tools to infer causal structures
in the absence of previous knowledge from covariational evidence with a
minimum set of assumptions. The assumption that differentiates this approach
from covariational theories is rather simple: There are things in the world as
causes and effects, linked by causal powers, although those powers are not
directly accessible to our senses (Cheng, 1997). This assumption does not either
presuppose or require any specific knowledge about any particular causal
mechanism, as mechanism-based theories propose; it just specifies that causal
powers exist and need to be unveiled.
Dependencies among variables are represented in this framework by means
of graphs containing two types of elements: (1) a finite set of nodes standing for
a set of variables whose possible values correspond to particular states of those
variables in the world, and (2) a number of directional links (edges) representing
dependency relationships among those variables. If every edge between each
two nodes in a graph is directional (that is, if it is directed from one node to
another), and there are no closed loops in the graph (that is, there are no chains
of edges starting and finishing at the same node), the graph is called a directed
acyclic graph (DAG). The notation used to refer to the nodes in a DAG is that of
kinship relations. All the nodes in any chain finishing at a generic node A are the
ancestors of A, and all the nodes in any chain departing from A are descendants
of A. Plus, the nodes linked to A by means of a single edge pointing to A are the
parents of A, and the nodes A points to by means of a single edge are children of
A.
The intuitive power of these graphs relies on their capacity to represent causal
systems. In a causal DAG, edges represent direct causal links. Additionally, the
so-called Markov condition specifies that the joint probability distribution
describing the statistical relationships among the variables in the structure can
be factorised in such a way that the value of each variable is independent of all
280 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
of its nondescendants, conditional on its parents. Therefore, given an exhaustive
set of n of variables (X1, X2, X3 . . ., Xn) in a DAG, the Markov condition can be
expressed as follows:
P(X1, X2, X3 . . ., Xn) = PP[Xi|parents(Xi)] (1)
where I = 1, 2, 3, . . . n.
P(X1, X2, X3 . . ., Xn) represents the joint probability distribution of the
variables in the structure, and P[Xi|parents(Xi)] represents the probability dis-
tribution of each variable Xi conditional on its parents. If the DAG stands for a
causal structure, the Markov condition follows from assuming that the value of a
variable is exclusively determined by the value of its direct causes (parents).
Each causal DAG or causal Bayes' net generates a pattern of statistical
dependencies between the variables in the structure. Let us imagine a causal
structure as the one depicted in the left panel of Figure 1 (common cause model).
With a certain parameterisation, and assuming that A, B, and C are discrete
events, that model can indicate, for instance, that A increments the probability
with which B and C occur (for example smoking increments the probability of
heart disease and lung cancer). In that case A, B, and C are marginally
dependent of each other, but B and C become independent if conditioned on A.
In the causal chain model (right panel), A, B, and C are marginally dependent on
each other, and A and C become independent if conditioned on B. In more
technical terms, A screens off the relationship between B and C in the common
cause model, and B screens off the relationship between A and C in the causal
chain model (Reichenbach, 1956).
Therefore, if a causal structure is known, it is possible to predict the
dependency±independency pattern shown by the variables in the structure. A
more interesting question is whether the opposite operation is also viable.
The answer is to some degree positive. A causal Bayes' net produces a
pattern of dependencies±independencies, but, at the same time, that pattern can
arise from more than one causal Bayes' net. Bayes' nets that produce the same
pattern of probabilistic dependencies are called Markov equivalent. It can be
demonstrated that, if an exhaustive set of Markov equivalent nets have common
Figure 1. Models representing a causal fork (left panel) and a causal chain (right panel). A, B, and
C stand for observable variables, and arrows represent causal influence links among them.
HUMAN CAUSAL INDUCTION 281
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
features, those common features (but not the differential features) can be
recovered from the observable pattern (Pearl, 2000).
The causal Bayes' nets framework has inspired recent psychological theories.
According to Gopnik et al. (2004), recovering a pattern of causal relations from
a pattern of statistical regularities is a problem (epistemologically) analogous to
the one the visual system needs to solve to infer the 3-D structure of the world
from the limited 2-D information from the retina. These authors maintain that
even very young children ``have a causal learning system . . . that recovers
causal facts by making implicit assumptions about the causal structure of the
environment and the relation between the environment and evidence'' (p. 4).
Causal reasoning theories inspired by Bayes' nets seem to make the strong
claim that humans have a unitary and innate system to recover causal structure
from patterns of statistical regularity. Moreover, in some cases it has also been
suggested that humans are able to apply holistic induction methods (e.g.,
Bayesian methods; see Tenenbaum & Griffiths, 2003). However, as we will try
to show later both strong claims are probably unrealistic. First, there are no
definitive proofs of the innateness and unity of the causal learning system. And
second, the approach of adults to causal reasoning tasks seems to be heuristic
rather than holistic (Gigerenzer & Todd, 1999).
Given the computational complexity of the operations they carry out, it is
extremely unlikely that humans use algorithms as the ones proposed in AI to
recover causal structure, but it can be sensibly assumed that humans apply
simple strategies over local clues to accomplish the same goals. In this context,
the causal Bayes' nets framework has been helpful to identify what those goals
are: first, to determine where causal links are, and second, to determine the exact
features of those links. Borrowing Danks' (2002) terminology (see also
Tenenbaum & Griffiths, 2001), we will refer to these two different aspects of
causal reasoning as causal structure and causal strength estimation,
respectively.
A BRIEF DIGRESSION ON THE NATURE OFCAUSES
Before analysing how humans achieve the two main computational goals
identified in the previous section we will briefly introduce here some con-
siderations on the nature of causes. This digression is relevant to understand our
approach to causal induction from a probabilistic approach.
In physical views of the world, causes are deterministic (except at the level of
subatomic particles). Let us imagine a circuit in which two switches are con-
nected serially to a bulb. In that simple scenario, if the two switches (S1 and S2)
are on, the light will be on; and if any of the switches is off, the light will be off.
The two causes are individually necessary and jointly sufficient to produce the
effect, and there is no room for indetermination. Imagine now that the same bulb
282 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
is also connected to a second circuitÐinvisible for an external observerÐthat
randomly switches the light on and off, independently of the state of the first
circuit. In this second case, neither S1 nor S2 are necessary for the effect to
occur, but they are still joint causes of that effect. The same case occurs in other
scenarios in which unknown variables feed into the variables included in the
causal scenario under scrutiny. For example, scientists have concluded that
smoking causes cancer in spite of the fact that many smokers do not suffer
cancer, and many cancer sufferers have never smoked. Thus, even if we accept
that causes are deterministic and necessity and sufficiency are necessary com-
ponents of the definition of a cause, causality can manifest itself under the form
of probabilistic regularity; what is more important, humans can infer causality
from that probabilistic evidence.
A different, yet related, issue is whether humans understand causality in
probabilistic terms. The meaning of causal statements has been scrutinised by
philosophers (see Harre & Maden, 1975; Hart & HonoreÂ, 1959/1985; Suppes,
1970), but only very recently has entered psychological research. Psychological
theories on the meaning and representation of causal knowledge are especially
important to ascertain how evidence from different sources (for example verbal
learning and direct experience) is integrated, and how humans use causal
information to solve problems and make predictions and deductive inferences.
Preliminary evidence (Goldvarg & Johnson-Laird, 2001; see also Cummins,
1995, 1998) seems to show that humans interpret and represent causal clauses in
a nonprobabilistic manner, or, in other words, that the everyday meaning of
causality is not probabilistic. As anticipated by Hume (1739/1978) people seem
to treat chance as a case of a hidden cause.
Causal Bayes' nets are, in principle (in the absence of a specific para-
meterisation), blind to the deterministic or probabilistic nature of the regularities
on which they are based. The proposed algorithms use patterns of dependency/
independency among the variables in the scenario to specify, when possible,
what dependency relations are manifestations of an underlying causal link, given
the assumptions commented earlier. Similarly, when generating a causal
hypothesis from the available evidence in the absence of previous knowledge,
humans do not know in advance whether the relationship to assess is determi-
nistic or probabilistic, and must then be prepared to derive conclusions from
probabilistic evidence (Cheng, 1993).
The fact that causality manifests itself in a probabilistic manner implies that
accumulation of evidence is crucial for deriving strong conclusions from it,
which means that confidence in our own causal statements will increase as the
evidence confirming those statements grows. In some cases, however, reasoners
have previous knowledge of the nature of the relationships present in a certain
scenario, although they do not know the exact pattern of those relations. For
instance, if I try to set the alarm of my new watch, I will be confident that I have
reached a solution as soon as I corroborate that pressing one of the buttons given
HUMAN CAUSAL INDUCTION 283
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
a certain configuration of the other three produces a change in the alarm time
display. In principle, a single ``trial'' or a reduced number of them (showing a
certain pattern of dependencies/independencies) is sufficient to partially induce
part of a causal structure. Such fast learning is possible by virtue of my previous
knowledge on the functioning of electronic devices, in which, in normal con-
ditions, all relationships are deterministic. Interestingly, even young infants
seem capable of this sort of fast induction (Gopnik et al., 2004), and it has been
even proposed that in certain domains it is supported by innate domain-specific
knowledge (for example, in the field of mechanics).
However, inducing the existence of a certain rule from a single instance or a
small number of them has to do with the detection of regularities rather than
with causal induction itself. For the sake of parsimony, we will assume that
causal induction mechanisms process all regularities (either deterministic or
probabilistic) in the same way. We are aware that the bulk of research in causal
induction has focused on probabilistic learning preparations and evidence
accumulation, and evidence accrual and causal induction have remained
somewhat confounded (see Buehner et al., 2003, for a similar argument). Evi-
dence emerging from other paradigms is still sparse (see, for example, Cohen,
Rundell, Spellman, & Cashon, 1999), and this must be kept in mind when
considering the evidence relative to contingency (statistical regularity) as a clue
to causality in a later section of this work.
ACHIEVING THE COMPUTATIONAL GOALS OFCAUSAL REASONING: A REVIEW OF
EXPERIMENTAL DATA
The world is plenty of statistical regularities: Birds sing every day before the sun
rises, clouds appear in the sky before it starts raining, children are punished
when they misbehave, and so on; but only a subset of those statistical asso-
ciations are due to the existence of an underlying causal link. In addition, we can
bind several causal links involving common variables together, and generate a
causal structure. These two aspects of causal cognition (differentiating causal
relations from spurious covariations, and generating complex causal structures)
are related to the first computational goal identified in the Bayes' nets frame-
work: causal structure induction. This goal can be defined as building the
skeleton formed by the causal forces that determine the appearance of covari-
ations in a given scenario.
On the other hand, in order to fully understand what data can be expected from
a causal structure, it is necessary to establish the values of the parameters that
determine the relationship between each event and its causes. In general terms, we
can refer to this second goal as causal function induction, orÐin the simplest
case, when the two variables are binary and the probability of the effect is directly
or inversely related to the probability of the causeÐas causal strength estimation.
284 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Causal structure inference and causal strength estimation are thus the two
main computational aspects of causal reasoning, and AI algorithms are very
effective tools to accomplish these goals. In constraint-based methods, such as
TETRAD (Scheines, Spirtes, Glymour, Meek, & Richardson, 1998), statistical
dependencies are used to construct, step-by-step, the several Markov-equivalent
graphs compatible with the data. In Bayesian methods (Steyvers, Tenenbaum,
Wagenmakers, & Blum, 2003; Tenenbaum & Griffiths, 2001), on the other
hand, all possible graphs comprising the observed set of variables (G1, G2, G3,
. . .. Gn; I = 1, 2, 3, ... n) are considered simultaneously, and are assigned prior
probabilities, P(Gi), according to previous beliefs. The prior likelihood of the
observed data pattern given each graph, P(D/Gi) is then computed, and, finally,
the posterior likelihood of each graph given the observed data, P(Gi/D)Ðwhich
is initially unknownÐis obtained according to the Bayes rule. Given a sufficient
number of trials, constraint-based methods and Bayesian methods yield very
similar results, as they both tend to select the set of Markov equivalent graphs
compatible with the observed data.
These methods have three properties in common: They (1) are information
optimisers, in the sense that they extract all the possible implications from the
available (statistical) information, (2) are intended to recover the causal structure
as a whole, and (3), estimate the precise parameters determining the shape of
causal functions a posteriori, when at least part of the causal structure is already
known. In the following section we will try to show that human reasoning
strategies differ from those algorithms precisely in these three aspects. Instead of
optimising the usefulness of available information, humans probably make use
of local clues to make individual decisions on whether certain observed corre-
lations are due to a causal link or not, and then test their hypotheses under
conditions they interpret as adequate to do so. SubsequentlyÐif necessaryÐ
they piece several links together to construct a complex causal model of the
situation. Finally, causal strength estimation is temporally intertwined with and
tightly linked to causal structure inference. The final result, however, must not
be very different to the one produced by artificial algorithms: A causal orga-
nisation of events that mostly coincides with their actual causal structure.
CLUES TO CAUSAL STRUCTURE1
Strictly speaking, causal structure inference per se has been the focus of very
limited attention in psychological research. Only a few empirical works have
tried to identify the conditions under which causal links are singled out from the
broader set of verifiable covariations (see Gopnik et al., 2004; Lagnado &
1The organisation of information in this section has been borrowed from Sloman and Lagnado
(2004). Its contents, however, have been comprehensively expanded in order to include new
empirical evidence.
HUMAN CAUSAL INDUCTION 285
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Sloman, 2004; Schulz & Gopnik, 2004; Sloman & Lagnado, 2004; Steyvers et
al., 2003). In most other cases only a small set of candidate causes in a pre-
established structure are provided for evaluation. Next, we will present a brief
review of the different sources of information that have been proved to be used
by naõÈve reasoners to make decisions on causal structure.
Covariation and control
As noted above, discovering the existence of a previously unknown causal link
between two events implies realising that those events covary. The most usual
statistic used to measure the degree of covariation between two discrete events is
contingency (DP), which is defined as the difference between the probability of
the effect given the cause and the probability of the effect in the absence of the
cause:
DP = P(e/c) 7 P(e/~c). (1)
If the task is divided into discrete trials, P(e/c) and P(e/~c) can be computed
from the frequencies of the four trial types resulting from combining the pre-
sence/absence of the two events.
DP � a
a� bÿ c
c� d�2�
where a stands for the frequency of the trials in which both the cause and the
effect are present, b for that of the trials in which the cause is present and the
effect is absent, c for that of the trials in which the cause is absent and the effect
is present, and d for that of the trials in which both the cause and the effect are
absent.
As we will discuss later, when people are asked to judge the degree to which
a cause and an effect are causally related in this type of task, their estimates are
highly correlated with DP (see Allan, 1993; De Houwer & Beckers, 2002; Lober
& Shanks, 2000; Wasserman, Kao, Van Hamme, Katagiri, & Young, 1996).
Once a reasoner has observed that two events covary, and in order to decide
whether or not that covariation is indicative of a causal link, it is crucial to assess
the circumstances under which that covariation has been computed, and consider
the presence of alternative potential causes.
Covariation can be explained by the existence of a common ancestor of the
two events. However, such spurious covariations vanish when covariation is
computed in a context in which the common cause is held constant. For
example, if smoking is the only common cause of lung cancer and high-blood
pressure, the marginal covariation between lung disease and high-blood pressure
will disappear if it is computed exclusively over a sample of smokers, or over a
sample of nonsmokers. Consequently, covariation between a putative cause and
286 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
an effect can be indicative of a causal link only if all other potential causes of the
effect are controlled.
The Probabilistic Contrast Model (Cheng & Holyoak, 1995; Cheng &
Novick, 1992) maintains that people estimate causal strength by computing
contingency in a set of instances in which all other known potential causes of the
effect are held constant (focal set). In accordance with this proposal, there is an
impressive corpus of data that shows that naõÈve reasoners compute covariation
conditionally when asked to draw causal conclusion from statistical information,
and are more confident on judgements made under controlled circumstances. In
other words, people actually apply the principle of conditionalisation (see
Gopnik, Sobel, Schulz, & Glymour, 2001; Spellman, 1996, for examples), which
is convincingly demonstrated by the extensive literature on cue interaction
effects in causal learning.2 Extensive reviews of the importance of cue inter-
action effects and their relation to normative standards can be found in Cheng
(1997) and Shanks (1995).
Control, however, does not seem to be the only inferential normative prin-
ciple people apply. Even when covariation is computed under controlled con-
ditions, there are situations in which reasoners do not take covariation as an
index of causality. For instance, as shown by Wu and Cheng (1999), when
contingency is computed over the presence of a deterministic cause of the effect
(and therefore DP = 171 = 0), the perceived causal status of a candidate
generative cause remains undetermined. Let us imagine that a reasoner wants to
find out whether a fertiliser is effective at making the plants in a greenhouse
bloom. All plants have been fertilised with substance A, which is a constant
cause in the background, and half the plants are also fertilised with substance B,
which is the candidate cause. An indetermination case would occur if all the
plants in the greenhouse bloom (a situation equivalent to the A+, AB+ design
present in most blocking experiments). As the effect of the causes in the
background is maximal (all plants bloom), the candidate cause (B) does not have
room to show whether is has any effect itself or not. Importantly, sensitivity to
maximality has been reported in a number of human studies (see De Houwer,
Beckers, & Glautier, 2002).
In summary, a number of studies have convincingly shown that humans make
use of covariation to identify the possible presence of causal links, and are
sensitive at least to two principles of causal inference: conditionalisation and
ceiling effect avoidance. It is still a matter of discussion whether sensitivity to
2Cue interaction or cue competition effects refer to experimental preparations in which the
accrual of evidence about the relationship between a given predictor or candidate cause (A) and an
outcome or effect (+) is hindered by the presence of a known predictor (B). For example, in blocking
experiments participants are shown a number of B+ trials. Subsequently, they are presented with a
series of AB+ trials. The usual result is that the predictive or causal value of A is diminished by the
presence of B, which has been previously learnt to be a reliable predictor of +.
HUMAN CAUSAL INDUCTION 287
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
the principle of conditionalisation is a controlled, purposeful strategy, as
maintained by the inferential approach3 (De Houwer & Beckers, 2003; Wald-
mann & Hagmayer, 2001), or the consequence of a competitive process in an
associative device (Allan, 1993; Dickinson & Burke, 1996). On the contrary,
sensitivity to ceiling effect cannot be interpreted in mere associative terms, and
is claimed to be a strong proof in favour of an inferential approach.
Intervention
As noted above, most results described in causal learning literature arise from
experiments in which participants passively receive the relevant information
about covariations among the critical variables in a very restricted scenario and
are asked to make a judgement on the strength of a single target relation.
Moreover, the basic structure of the causal scenario is often provided by means
of the task instructions.
Unfortunately, these situations are inadequate to test how people discover
brand new causal links from empirical evidence (Glymour, 1998; Newsome,
2003). In order to ascertain how people unveil the causal structure underlying a
pattern of covariations, a more realistic experimental preparation is required, in
which a variety of events or variables are present, no causal roles are pre-
assigned to some of those variables, and people can interact with the system and
receive feedback about the effect of their interventions.
Some previous data arising from a variety of experimental paradigms show
that intervention is a very powerful tool for learning, and that humans could be
especially well prepared to learn from the consequences of their acts. For
example, Haggard, Clark, and Kalogeras (2002) demonstrated that learners
perceive the sensory consequences of their voluntary movements as occurring
earlier than they actually did, which was interpreted as a proof that the central
nervous system applies a specific neural mechanism to produce intentional
binding of actions and their effects in conscious awareness. In addition, some
studies (Chatlosh, Neunaber, & Wasserman, 1985; Wasserman, Chatlosh, &
Neunaber, 1983) have shown that contingency estimation is more accurate in
free-operant paradigms than in observational situations.
Lagnado and Sloman (2004; see also Sloman & Lagnado, 2004) have shown
that people can actually learn a causal structure more efficiently through active
intervention than through passive observation (a similar pattern of results has
3 The inferential approach maintains that conditionalisation and cue interaction results from the
application of reasoning rules based on general-purpose normative knowledge. The application of
such rules is supposed to be effortful and demand general executive resources. However, a detailed
algorithmic description of this approach has not yet been provided. In this context, specific pre-
dictions can be derived by using the causal Bayes' nets approach, although it must be taken into
account that such approach does not take cognitive boundary limitations into approach (algorithms
for causal inference in the causal Bayes' nets approach are information usefulness optimisers).
288 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
been reported by Steyvers et al., 2003). In Lagnado and Sloman's Experiment 1
participants were asked to describe the causal structure underlying the pattern of
covariation shown by two cues (temperature and pressure, both of which could
take the values high and low) and an outcome (the launch of a rocket, which
could take the values present and absent). The correct structure underlying the
observed covariations was Temperature ! Pressure ! Launch (or other
equivalent). In the observational condition the participants just observed several
instances resulting from that structure. In the interventional condition, on the
contrary, people were allowed to set the value of either the variable temperature
or the variable pressure and then see what happened with the value of the other
variables. As expected, results showed that interveners chose the correct model
more often than observers.
The advantageous effect of intervention seems to be a very robust one. But, in
addition, these experiments provide some clues to ascertain where that advan-
tage arises from. First, intervention actually alters the causal structure that the
observed pattern of statistical dependencies stems from. In graphical terms, in
the moment in which a variable depends on an external manipulation, it is
necessary (1) to introduce an external influence in the causal DAG (the ``do''
operator, in Pearl's, 2000, terms), and (2) to disconnect the manipulated variable
from its parents (graph surgery). This modification ensures control (as defined in
the previous section) and, therefore, if a statistical dependency between two
variables in the system disappears after the intervention, that dependency must
be interpreted as spurious or mediated by the manipulated variable. These
informational differences could allow the reasoner to discriminate between
Markov equivalent structures.
Nevertheless, informational differences between observational and inter-
ventional situationsÐas captured by the ``do'' operator and graph surgeryÐare
not the only explanation of the facilitating effect of intervention. Lagnado and
Sloman's (2004) Experiment 2 demonstrated that when the observational group
was equated (yoked) to the interventional group with respect to the amount and
type of information received, a certain (nonsignificant) improvement in the
observational group was detected but, still, the interveners' performance was
significantly better than that of the observers. Moreover, the advantage of
intervention did not disappear when interventions were forced instead of free
(Exp. 3).
In summary, using intervention implies an advantage for causal structure
inference in three senses, compared to mere observation: First, it triggers some
biological preparation to bind one's responses to the outcomes consistently
occurring after them (Haggard et al., 2002); second, it ensures control of
potential causes alternative to the candidate one; and, third, generates temporal
order clues that are not always present in mere observations. In this last sense,
the time heuristic hypothesis maintains that one of the advantages of intervention
relies on the fact that ``whenever an intervener makes an intervention they
HUMAN CAUSAL INDUCTION 289
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
experience the putative cause before any of its possible effects'' (Lagnado &
Sloman, 2004, p. 47). However, other factors also mediating the advantage of
intervention (i.e., attention focusing and intentional hypothesis testing) are not
discarded either.
A final issue related to the role of intervention in causal reasoning is how
interventions are selected. In a given scenario, the number of possible manip-
ulations is at least as large as the number of variables in the scenario; so, how is
an intervention selected among the many possible ones? Steyvers et al. (2003,
Exp. 2) showed that in a situation in which people did not receive any infor-
mation beyond the pattern of statistical dependencies±independencies among
three multilevel variables, the probability of selecting the correct structure was
.18 (an optimal Bayesian learner would reach a hit rate of .50). That probability
subsequently rose to .33 when participants were allowed to make a single
intervention (the optimal hit rate was 1.0 in this case). Most importantly, when
location of the preferred intervention was analysed, it was observed that inter-
vention was strongly guided by the model hypothesised by the reasoner in the
prior observational phase, and that the variables that were most likely to be
intervened on were those that had been previously identified as causes. In
general terms, people were more likely to intervene on the common cause in
common cause models (A B ! C), on one of the two possible causes in
common effect models (A ! B C), in one of the two linked causes in chain
models (A! B! C), and in the single cause in single-link models (A/B! C).
This result can be interpreted as stemming from naõÈve reasoners' general ten-
dency to test hypotheses on individual causal links on that structure, rather than
on complete causal structures.
In conclusion, the evidence described in this section provides a clear picture
of the role of intervention in intuitive causal structure learning. If any outcome is
observed to consistently correlate with any changes introduced by the reasoner
in the value of some variable, that outcome is almost immediately interpreted as
a consequence of the manipulation. In consequence, manipulation is the pre-
ferred strategy to confirm or disconfirm hypotheses formulated on the basis of
mere observation. In addition, those hypotheses are located at the level of
individual links and determine where interventions are more likely to be made.
Time and order
In general terms, effects closely follow their causes. Consequently, also gen-
erally speaking, anything occurring before A or long after A is unlikely to be an
effect of A; hence, temporal order and temporal contiguity can be used as
powerful heuristics to make a decision on the viability of a causal link. This
general principle, however, has important exceptions. Both the temporal order
and the contiguity principles are violated in many daily life situations. For
example, the relationship between symptoms and diseases is learnt backwards:
290 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Learners have access to information on the symptoms before a diagnosis is
provided, but they still assign their correct causal roles to the cause (the disease)
and the effects (the symptoms). These effect-to-cause learning situations are
referred to as diagnostic tasks, whereas standard, cause±effect situations are
called predictive tasks (Waldmann & Holyoak, 1992). In general terms, pre-
dictive inferences are easier and more accurate than diagnostic ones (Gigerenzer
& Hoffrage, 1995), but, still, there are many daily life situations (and also
experimental ones; see Waldmann and colleagues' works) in which reasoners
learn causal relations backwards.
As detailed earlier, people compute causal strength by controlling potential
alternative causes. Therefore, if a causal scenario is composed of two temporally
preceding cues and an outcome, the requirement to hold one of the cues constant
in order to compute covariation conditionally depends on the causal role
assigned to the cues and the outcome. If the cues are interpreted as potential
causes and the outcome as an effect, one of the cues needs to be controlled while
the potential influence of the other one is assessed, and cue interaction will arise.
On the contrary, if the two cues are interpreted as possible effects of the out-
come, there is no need to compute covariation conditionally (and no cue
interaction effect will occur).
Several studies have shown that cue interaction is sensitive to causal direc-
tionality (Waldmann, 2000, 2001; Waldmann & Hagmayer, 2001; Waldmann &
Holyoak, 1992, 1997), whereas others have failed to show that dependency
effect, and have reported the existence of cue interaction between temporally
precedent cues, independently of the causal role assigned to them (Cobos,
LoÂpez, Cano, Almaraz, & Shanks, 2002; Shanks & LoÂpez, 1996). Recent work
has pointed out the importance of processing effort at explaining this contra-
dictory pattern of results. According to this hypothesis, replicating the depen-
dency of cue interaction on causal directionality requires reasoners to be
carefully instructed to build a mental model of the task (usually, either a
common-cause model, or a common-effect model; Waldmann & Martignon,
1998). Seemingly, the elaboration and application of a causal model consume
executive resources, and therefore, sensitivity of causal interaction to abstract
causal knowledge will depend on the availability of such resources (De Houwer
& Beckers, 2003; Waldmann & Hagmayer, 2001).
It is important to note, however, that cue interaction effects can occur even in
situations in which the complexity of the task makes the application of an
abstract model almost unviable (Aitken, Larkin, & Dickinson, 2000, 2001;
Dickinson & Burke, 1996; Larkin, Aitken, & Dickinson, 1998), which indicates
that causal interaction effects themselves do not necessarily depend on top-down
influences (see, however, De Houwer, 2002). The claim that knowledge-based
processes superimpose but do not completely disable basic, time-sensitive
learning mechanisms is in accordance with data from mediated learning para-
digms (Perales, Catena, & Maldonado, 2004).
HUMAN CAUSAL INDUCTION 291
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
In summary, experimental evidence seems to show that basic learning
mechanisms incorporate temporal order as an important clue to intuitively
represent causal structure. Abstract knowledge can be used to override the effect
of temporal order, and to impose a knowledge-based structure over the observed
data, but with a cost in terms of processing resources.
The same principle can be applied to the effect of temporal contiguity.
Relationships between causes and delayed effects are more difficult to learn than
temporally contiguous relations, both for humans (see Shanks & Dickinson,
1991) and animals (see Damianopoulos, 1967; Jones, 1962). This is consistent
with the idea that, in order to get associated, two stimuli need to be simul-
taneously active in working memory, but representations in working memory
tend to decay with time (Terry & Wagner, 1975).
However, humans are aware that effects can show up long after the cause was
present. In these cases, the connection between the cause and the effect is a long
chain of contiguously connected events, but that chain remains hidden for the
observer. Probably, in the absence of previous knowledge on that mechanism
humans would be unable to learn those relationships, unless the relevant data
were artificially brought together.
Nonetheless, humans normally possess a rich background of causal know-
ledge, which, combined with episodic retrieval processes, allows them to search
backward from the occurrence of the effect to the potential cause that generated
it. Buehner and May (2003) have demonstrated that the effect of delay on causal
estimates is mediated by previous assumptions on the mechanism that accounts
for the link between the putative cause and the effect. The detrimental influence
of delay on learning was clearly attenuated by instructions that encouraged
people to interpret the cause as having delayed effects. In a similar vein, Hag-
mayer and Waldmann (2002) showed that the temporal assumptions held by the
participants in their experiments strongly guided the choice of appropriate
statistical indicators of causality by structuring the event stream, by selecting the
potential causes among a set of competing candidates, and by influencing the
level of aggregation of events.
In summary, temporal order and contiguity are important clues to structure.
First, the direction of some causal links in a Bayesian net cannot be ascer-
tained on the basis of statistical evidence alone. Temporal order can be used
to disambiguate such relations, and as preliminary evidence seems to show,
can even mislead people to ignore more informative statistical clues. Simi-
larly, order can also be useful to define a focal set to compute the degree of
covariation between a cue and an outcome conditionally. As many experi-
ments have shown, interaction between effects often occurs when effects pre-
cede their causes during learning. Under constrained circumstances, however,
humans can apply knowledge-based constraints to override the effect of tem-
poral order, and impose a different structure over the available covariational
data.
292 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Second, contiguity can determine which candidates are assessed when linking
causes to effects. Again, however, the effect of contiguity in putative causes'
selection depends on whether or not the reasoner has previous knowledge on the
causal mechanism leading to the observed effect. The application of such
knowledge requires previous learning, and the involvement of episodic retrieval
processes to bring to awareness potential causes that appeared long before the
effect. In the absence of such previous knowledge, or when episodic retrieval
processes are not fully developed (for example, children younger than 3±5 years;
Perner, 2001) the effects of contiguity are expected to emerge more straight-
forwardly.
Pre-stored knowledge
Causal reasoning theories have focused on specifying how causal knowledge is
acquired from data, without the involvement of previous knowledge. Despite the
fact that in most daily life causal reasoning tasks reasoners have at least some
knowledge about the crucial events, theories have paid much less attention to the
processes responsible for integrating new evidence and previous beliefs.
On the one hand, humans have general causal knowledge that comprises
intuitive and training-based normative principles of causal induction. We will
refer to this kind of knowledge as general-purpose inductive principles. For
instance, experimental psychologists have declarative knowledge about the
conditions necessary to experimentally test causal hypotheses. Similarly, naõÈve
reasoners also know and apply some of these induction principles in daily life
causal reasoning. Although these principles are not necessarily explicit, their
pervasive application across the individual's learning history should lead to the
progressive accrual of declarative knowledge about them.
On the other hand, reasoners have also specific knowledge on mechanisms
linking specific events or categories of events. For example, we discard the
changes in the dial of a barometer as causing weather changes because we know
that physical phenomena in the scale of the barometer do not interact with events
in the scale of weather changes. In this case, a decision on the plausibility of a
causal mechanism is made on the basis of previous knowledge about the lack of
a link between very broad categories of objects. Part of this knowledge can be
innate, or, at least, biologically facilitated, whereas links between specific events
is necessarily strongly based on the individual's learning history.
In any case, the difference between narrowly and broadly defined events is
quantitative rather than qualitative. Individuals' bases of causal knowledge are
formed by vast nets of propositions linking events defined at different levels of
abstraction, and, consequently, one can hold a certain belief at a certain level
(for instance, ``smoking causes lung cancer''), and a different belief at a dif-
ferent level (``smoking cigars does not cause lung cancer''). In accordance with
this idea, the coherence hypothesis (Cheng & Lien, 1995; Lien & Cheng, 2000)
HUMAN CAUSAL INDUCTION 293
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
maintains that even when there is no causal knowledge about the connection
between two specific single events, a structural decision can be made on the
basis of whether or not the two events belong to categories that are causally
connected. In addition, Cheng and Lien demonstrated that causal categories
during the acquisition of causal links tend to be defined at a level of abstraction
that maximises the level of covariation between the cause and the effect. If
causal categories are defined too broadly (``inhaling fumes causes lung cancer'')
or too narrowly (``smoking Fortuna cigarettes causes lung cancer'') during
learning, that learning process can lead to wrong conclusions. For instance, in
the overinclusive case, a learner could conclude that inhaling eucalyptus vapours
causes cancer, whereas in the underinclusive case, a learner could conclude that
smoking Camel cigarettes does not cause cancer.
The coherence hypothesis thus maintains that causal categories' learning and
covariation learning are engaged in an incremental loop of knowledge
acquisition. First, covariation determines the level of inclusiveness of causal
categories; second, causal categories modulate the acquisition of new causal
knowledge from covariation between specific events; and third, causal know-
ledge acquired this way contributes to redefine the limits of causal categories.
Importantly, a ``mechanism'' and a specific link are not necessarily two dif-
ferent types of representation, but just two propositions of essentially the same
nature binding concepts in different abstraction levels. However, the integration
of new evidence with previous knowledge is not simply additive. Instead, pre-
vious knowledge about links between categories of events is used to determine
whether an observed covariation can be interpreted as causal or not.
From an opposite perspective, Fugelsang and Thompson (2003) have tried to
demonstrate that causal beliefs based on knowledge on mechanisms, and causal
beliefs based on covariation are dissociable. In their first experiment, they
generated two supposedly different types of beliefs in a group of participants.
Reasoners in the covariation-based belief condition were instructed to hold a
strong or a weak belief, exclusively based on the information received about the
previously observed level of covariation between the two target events. In the
mechanism-based belief condition, on the contrary, participants were informed
about the plausibility or implausibility of a causal mechanism. The two key
variables were thus belief modality (covariation based, mechanism based) and
belief level (low, high). Subsequently, reasoners were exposed to two levels of
contingency (.9, .1) between the two target events. Mechanism implausibility (in
the mechanism-based, low-belief condition) was expected to preclude the
integration of new covariational evidence with previous beliefs. The expected
interaction, however, did not reach the significance level, and new covariational
information had a significant effect on causal judgements in all conditions. In
two subsequent experiments belief level, but not belief modality, was manipu-
lated, and results showed that low previous beliefs diminished (but not com-
pletely precluded) the effect of new covariational evidence.
294 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Fugelsang and Thompson's (2003) experiments thus failed to show the
hypothesised dissociability between mechanism- and covariation-based causal
beliefs. Quite the opposite, they showed a very high degree of interchangeability
and integration between previous causal beliefs and new covariational evidence.
Results are then more compatible with the idea that causal beliefs, at different
levels of abstraction, are represented in a common base of declarative know-
ledge. When a new covariation is found, the two covarying events (a and b) are
immediately classified as members of previously known causal categories (A
and B). If there is not any level at which A and B are plausibly related, the
covariation will be interpreted as spurious. However, if A and B are possibly
related at some level, any evidence confirming that relation will be incorporated
into previous beliefs at that level (Busemeyer, 1991), and any new evidence
disconfirming that relation will contribute to refine the scope and definition of
categories.
In summary, knowledge on causal categories can determine a priori what is
and what is not likely to be a cause. In any context, there is an almost infinite
number of close and distant potential causes for an observed effect, and that
knowledge is crucial to select a manageable set of candidates. In other words,
plausibility knowledge restricts the set of candidates for which covariation is
considered (White, 1995). In doing so, not only does it single a candidate cause
out, but it also determines the alternative causal factors that will enter the focal set
when computing covariation conditionally. In that sense, not only are general-
purpose inductive principles and domain-specific causal knowledge compatible,
but also interact and depend on each other to make adaptive behaviour possible.
Summary
As posed by Cheng (1997, 2000) humans know that causal powers are intrinsic
properties of effective causes. In other words, people need to assume that
causality exists in the world in order to make sense of it. However, the key
question is not whether people believe in causation or not, but how people come
to know where causal relations are.
The Bayesian networks' approach has provided a tool for answering that
question: a set of algorithms that achieve a certain level of success at recovering
causal structures from statistical patterns of data. These algorithms do not make
use by default of time, order, intervention, or pre-stored knowledge; but only of
the pattern of statistical dependencies/independencies present in the data. The
enumerated clues can be further used to select one between several equally
predictive structures.
Algorithms developed in the field of Bayesian networks are information
optimisers, in the sense that they go as far as possible with the available
statistical information. Humans, however, do not seem to be as good as com-
puters at optimising the utility of the available statistical information, in part
HUMAN CAUSAL INDUCTION 295
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
because representing a whole pattern of dependenciesÐeven in very simple
causal scenariosÐis beyond any reasonable estimate of human memory and
attention capabilities. In fact, humans do not seem to be very good either at
maximising the utility of screening off information in cognitively manageable
situations (Lagnado & Sloman, 2002; see also Barsalou, Sloman, & Chaigneau,
2002).
Instead, reasoners seem to heavily rely on local clues to decide whether a
single link under assessment exists or not. Previous knowledge can modulate
their influence, by picking a covariation as causally plausible or discarding it as
implausible, determining what factors will be controlled in order to compute
contingency, replacing the order of appearance of the cause and the effect in
diagnostic tasks by their true causal order, and diminishing the relevance of
contiguity when evaluating the relationship between distant elements in a causal
chain. In case there is no relevant previous knowledge, decisions on causal
structure will rely more heavily on evidential clues. These clues are only
imperfect indicators of causation and can lead to the formation of erroneous
causal concepts in some cases but they ensure a certain level of concordance
between the organism's behaviour and the causal structure of the environment.
ESTIMATING CAUSAL STRENGTH
The Power PC theory
A causal Bayes' net is the skeleton of a causal scenario, but, in order to get the
whole picture, it is necessary to assign to every link the parameters that deter-
mine the probability distribution of each child (effect) conditional on the set of
possible values of its parents (causes).
Although the variables in the model can be either discrete or continuous,
learning of causal functions between continuous and multilevel variables has
been the focus of very limited attention (Busemeyer, Byun, Delosh, &
McDaniel, 1997; White, 2001). More often, empirical works have tried to
ascertain how people compute the degree to which two binary variables repre-
senting a putative cause and a possible effect (almost always, two discrete events
that can be present or absent) are causally related, in the presence or absence of
alternative causes.
Let us imagine two causes (A and B) pointing to a common effect (C). These
three events can be related in different ways. For example, A and B could be two
serial switches in a circuit, and C a bulb connected to that circuit, in which case
both A and B must be on for C to be on; or they could be two parallel switches,
in which case either A or B must be on for C to be on. In other cases, the effects
of A and B could be probabilistic and additive. For instance, it could be assumed
that the different causes of lung cancer simply add to each other to produce the
effect (the parallel circuit is indeed an example of an additive deterministic
model).
296 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
The additive model is the simplest way to represent the influence of two
discrete probabilistic causes on an effect. According to the Power PC theory
(Cheng, 1997), in the absence of contradictory evidence, humans explain every
effect in the world as the result of the addition of its causes. Therefore, when
evidence is clearly contradictory with this model, interactive causal power is
computed (Cheng, 2000).
In terms of the Bayesian nets framework, the Power PC theory is a theory
about a specific parameterisation of causal structures, and about how humans
compute those parameters. Specifically, it assumes that when assessing the
causal power of a candidate cause (i), in the presence of unknown alternative
causes (a) of the same effect (e), naõÈve reasoners explain the occurrence of e in
the presence of i as resulting from the union of two probabilities: (1) the
probability of e being caused by i, and (2) the probability of e being caused by a
(the composite of unknown alternative causes in the background).
P(e/i) = pi + paP(a/i) 7 pipaP(a/i) (3)
where pa and pi are the probabilities with which a and i generates by themselves
the occurrence of e (their causal powers). Equivalently, the probability of e in
the absence of i is explained as:
P(e/~i) = paP(a/~i) (4)
applying probability calculus, and assuming that (1) i does not interact with a to
cause e, (2) i and a are independent (control condition), and (3) causal powers paand pi are constant properties of a and i, it follows that:
pi � DP
1ÿ P�e= � i� �5�
In other words, the causal power of a generative cause i, can be computed
from the observed contingency, DP, between the candidate cause and the effect.
An equivalent argument can be used to derive the power of a preventative cause
(qi), in which case it follows that:
qi � ÿ DP
P�e= � i� �6�
As a psychological theory, the Power PC theory assumes that naõÈve reasoners
apply the mental operations necessary to compute powers in natural environ-
ments. However, the theory does not specify how p and q (henceforth, generi-
cally p) are exactly computed. p describes the expected shape of the mapping
between the input provided to the reasoner (the frequencies of a, b, c, and d type
HUMAN CAUSAL INDUCTION 297
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
trials) and the output of the mechanism in charge of computing causal strength,
but not the algorithm that carries out that operation. The core claim of the theory
is that if the information necessary for computing power is adequately provided
and perceived, and if the probe question used to induce a causal estimate is
adequately understood, the judgement made by the reasoner should closely
conform to p.
Since the Power PC theory was first presented, a number of studies have tried
to test it, both on the basis of reanalyses of previous data (Allan, 2003) and new
empirical evidence (Lober & Shanks, 2000; Perales & Shanks, 2003; ValleÂe-
Tourangeau et al., 1998; White, 2003c). In all of these studies the statistical
relationship between a single candidate cause and an effect (the combination of
trial type frequencies) was manipulated over a constant background in order to
generate different conditions in which the Power PC theory's predictions dif-
fered from predictions of other models; judgements were elicited by using
standard causal judgements; and, finally, covariational information was pro-
vided in a trial-by-trial sequential format.
In four of these studies, the Power PC theory was disconfirmed. However,
Buehner et al. (2003; see also Buehner & Cheng, 1997) have claimed that
deviations from p in these studies can be explained by a combination of pro-
cedural factors. First, trial-by-trial tasks are highly demanding in terms of
attention and memory load, and can hinder the computation of conditional
probabilities P(e/c) and P(e/~c), necessary to compute power. Second, the
question probe with which the causal judgement is normally elicited in these
works is ambiguous with respect to the context to which the question itself
applies (the learning context, or an ideal context in which alternative causes
have been removed). And third, in some of these studies no measures were taken
to ensure that final judgements are at asymptote and reasoners are maximally
confident on them. Or, the other way round, in order to test the Power PC theory
it is required that (1) judgements are asymptotic and not conflated with con-
fidence, (2) the type of probe question used to elicit them has a clear normative
reference, and (3) the input necessary to compute powers is provided in a format
that allows an adequate comprehension and representation.
In most causal learning experiments people are asked to estimate just the
extent to what the candidate cause generates the effect (i.e., ``to what degree
does [i] causes [e]?''). According to Buehner et al. (2003) some people could
interpret that they are required to estimate the influence of the candidate cause in
the same context in which it was first observed (in which case DP would be the
normative reference to contrast judgements against), whereas some other people
could assume that the question is about the effect of the candidate cause in a
context in which alternative causes are absent (in which case p would be the
adequate reference).
The recommended judgement is worded as follows: ``Imagine a sample of 10
new [instances] in which we know for sure that [the effect] would not appear
298 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
unless [the candidate cause] is present. If [the candidate cause] were introduced,
how many of them would show [the effect]?'' Henceforth, we will refer to this
type of question as the causal counterfactual judgement. To date, however, this
judgement has been used very scarcely (Buehner et al., 2003; Perales, Shanks, &
Castro, 2005). In addition, in two experiments of Buehner et al.'s (2003) study
the standard trial-by-trial presentation format was replaced by a simultaneous
format, in which both the instances in which the cause was present, and those in
which it was absent were present at the same time on a single sheet. Other minor
modifications included stressing the fulfilment of the conditions for the analysis
underlying the derivation of p from contingency to apply (for instance, the
background equivalence of the cause±present and cause±absent instances in the
sample).
Alternatives to Power PC
Causal counterfactual questions are far from being universally accepted as the
best way to assess naõÈve reasoners' causal beliefs, and they have been used too
scarcely to draw reliable conclusions from them. In addition, Buehner et al.'s
(2003) results are still controversial (Perales et al., 2005). Therefore, the main
corpus of results on causal strength estimation still stems from those works in
which standard causal questions were used. Trial-by-trial presentations have
been used, for instance, in Buehner et al. (2003), Lober and Shanks (2000),
Perales and Shanks (2003), ValleÂe-Tourangeau et al. (1998), Wasserman et al.
(1996), White (2003b), and Anderson and Sheu (1995). Comprehensive studies
with summarised presentations are Levin, Wasserman, and Kao (1993), Mandel
and Lehman (1998), and White (2003b).
Independently of what model is favoured by the analysis of the results
from these studies, there is a series of trends that have been found to be con-
sistent enough to be taken into account by any theory that intends to be fully
explanatory. First, both DP manipulation across conditions in which p is held
constant, and p manipulation across conditions in which DP is held constant,
have a direct effect on causal judgements (Lober & Shanks, 2000; Perales &
Shanks, 2003). Second, the four trial types are attributed different subjective
weights (a > b > c > d), which has been shown both with direct methods of
assessment and estimating the effect of orthogonal manipulation on final
causal judgement (Levin et al., 1993; Mandel & Lehman, 1998; White,
2003a, 2003b, 2003c). Third, and consequently, information contributing to
P(e/c) is attributed more weight than information contributing to P(e/~c)
(Lober & Shanks, 2000; Mandel & Lehman, 1998; Perales & Shanks, 2003),
and judgements correlate with P(e) across conditions in which DP is 0. And
finally, several studies with trial-by-trial presentations (Wasserman et al.,
1996; White, 2003b) have reported an effect of P(c) across conditions in
which P(e/c) and P(e/~c) are held constant.
HUMAN CAUSAL INDUCTION 299
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Two types of algorithmic models have been proposed to account for the way
in which covariational information is used for causal estimation (see Allan,
1993; De Houwer & Beckers, 2002; Shanks, Holyoak, & Medin, 1996). Rule-
based models are based on the idea that humans keep track of the frequencies of
the different types of trials or the conditional probabilities they are exposed to
during the task, and combine those probabilities or frequencies according to a
given rule or statistic. DP-like models assume that people contrast estimates of
P(e/c) and P(e/~c), in a weighted or unweighted manner, in order to compute the
degree of contingency between the cause and the effect (Cheng & Holyoak,
1995; Cheng & Novick, 1992). On the contrary, DD-like or linear combination
models assume that naõÈve reasoners compare evidence confirming the existence
of a generative causal link, or disconfirming the existence of a preventative
causal link (a and d type trials), against evidence disconfirming the existence of
a generative link, or confirming the existence of a preventative link (b and c type
trials), in a nonprobabilistic manner. These models (for instance, Catena, Mal-
donado, & Candido, 1998; White, 2003a, 2003b, 2003c) also assume that dif-
ferent trial types are given different evidential weights in confirming or
disconfirming a previous hypothesis. Weighting can arise both from inter-
individual factorsÐfor example, some people considering a certain trial type as
confirmatory whereas others consider it as disconfirmatory (White, 2003a)Ðand
intraindividual factorsÐfor example, people focusing more on some trial types
than others (Maldonado, Catena, Candido, & Garcia, 1999; Mandel & Lehman,
1998).
Associative models,4 on the other hand, assume that causal links are learnt in
the same way as other associations, by means of a mechanism that accumulates
associative strength in the link between the mental representations (nodes) of the
cue (or cause) and the outcome (or effect). The most widely known associative
model is the Rescorla±Wagner rule (Rescorla & Wagner, 1972):
DVi = ab (l 7 SVi ±1) (7)
According to this model, the increment in the associative strength between a
cue and an outcome (DVi), is a multiplicative function of the salience of the cue
and the outcome (a and b), and the difference between the maximum associative
strength an outcome can recruit (l, the asymptotic learning level) and the
associative strength recruited by all the cues present at the current trial (SVi ±1).
The increment in associative strength on a given trial is a function of the degree
4Although it has never been formally stated, most authors seem to assume that associative
mechanisms operate automatically, are mainly data driven, and are thus insensitive to top-down
influences. Contrarily, rule-based mechanisms are supposed to reflect the operation of purposeful
reasoning strategies, are demanding in terms of executive load, and are sensitive to top-down
influences.
300 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
to which the outcome is unpredicted by the cues presented in that trial. a and bare assumed to take values between 0 and 1; l is 0 on those trials in which the
outcome is present and 1 on trials in which the outcome is absent. An absent cue
cannot recruit associative strength on trials in which it is absent, although other
cues present in those trials (i.e., the context) can, and these compete with the
target cue for associative strength (determined by Äl).If the Rescorla±Wagner rule is restricted in such a way that b always takes
the same value (independently of whether the outcome is present or absent in the
current trial), its predictions are equal to DP at asymptote (Chapman & Robbins,
1990). Therefore, the restricted version of Rescorla±Wagner can be rejected on
the same empirical basis as DP. The unrestricted version does not fit the pattern
of results either, as it requires a given parameter order (b Outcome< b NoOutcome) to
account for the positive effect of p on estimates when DP is held constant in
positive contingency tasks, and the opposite order of parameters (b Outcome >
NoOutcome) in negative contingency tasks. That change is justifiable in com-
parisons across experiments, but it is not when negative and positive con-
tingencies are included in the same experiment with the same cover story and
the same materials for all conditions (Lober & Shanks, 2000).
The Rescorla±Wagner model has been modified in several ways to eliminate
its inadequacies in explaining human and animal data (Miller, Barnet, & Gra-
hame, 1995; Siegel & Allan, 1996; Van-Hamme & Wasserman, 1994). How-
ever, among those modified algorithms the only one that can account for the
family of effects described above (with the exception of the cause-density effect)
is Pearce's (1987) model. This model is based on the assumption that any set of
cues presented during the task is represented as a single configuration, and
recruits associative strength as such, and adopts a slightly modified version of
the Rescorla±Wagner rule for associative strength updating (for a detailed
description of the model and its asymptotic derivations, see Perales & Shanks,
2003).
A recent meta-analysis of the most significant causal learning experiments
with sequential presentations (Perales et al., 2005) has shown that weighted DD-like rules are the most predictive ones among rule-based models, whereas
Pearce's is the best-fitting associative model. In addition, global goodness-of-fit
analyses showed that weighted DD is significantly more predictive than Pearce's
model. However, it seems more and more clear that rule-based mechanisms and
associative mechanisms could actually complement each other. We propose here
that associative- and rule-based mechanisms are just descriptions of different
levels of processing in the mechanism in charge of computing covariation.
In any case, it has not convincingly shown that people are able to intuitively
carry out the kind of normative analysis proposed by the Power PC theory.
Seemingly, covariation is computed in a non-normative yet adaptive way and
then incorporated into previously hypothesised causal structures. Covariation is
used to confirm or disconfirm the existence of a causal link in a given structure,
HUMAN CAUSAL INDUCTION 301
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
and to determine how and to what extent variables in that structure are related. It
is important to note, however, that the result of the application of weighted DDto covariation computation is in fact very close to normative prescriptions.
Although deviations from p are predicted in some specific situations, the degree
of correlation between judgements predicted by weighted DD and those
predicted by p across the 114 experimental conditions in Perales et al. (2005)
was r = .95.
SOME QUESTIONS ON THE DEVELOPMENT OFCAUSAL INDUCTION
NaõÈve reasoners are able to integrate covariational information with previous
knowledge to discover new causal links. That ability is made possible by con-
tent-specific knowledge about causal mechanism and the application of general-
purpose principles of causal induction, as discussed previously. Although
developmental issues are not the core of this article, we will try to formulate
some questions on the emergence of such knowledge that are relevant for the
understanding of causal induction in adults.
By the age of 4, children have already got a rich amount of causal knowledge
in the domains of biology, psychology, and physics (Spelke et al., 1992; Sperber,
Premack, & Premack, 1995). On the face of it, some authors have claimed that
children have innate substantive knowledge on causal domains (Keil, 1995), or
apply content-specific mechanisms for the accrual of knowledge about these
domains (Leslie & Keeble, 1987). However, this innatist approach is not sup-
posed to explain, however, how causal knowledge outside these domains is
acquired, and, most importantly, how causal knowledge is refined and integrated
inside them. Ultimately, children must also be able to use cross-domain causal
inductive principles to behave in a causally coherent way, independently of the
content of the task they are dealing with.
In any case, it is undeniable that humans are equipped with a primary
general-purpose learning system (either associative or computational; Gallistel,
2002) that allows them to capture at least part of the causal structure of the
environment. For example, connectionist models incorporating simple associa-
tive rules are powerful simple tools to partially recover both causal structure and
causal strength from covariation. Cheng (1997) has shown that, in task in which
potential causes' sets are nested, Rescorla±Wagner rule's predictions are
coincident at asymptote with conditional DP. As long as temporal order and
causal order match, computing contingency conditionally is consistent with the
principle of control, which in turn, is one of the ways in which conditional
dependencies/independencies can be used to induce or discard the existence of
causal links.
Cue interaction studies, both in human and animals (see Shanks, 1995), have
shown that the application of a simple learning mechanism can be rather
302 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
successful at selecting a potential cause among a number of candidates. Most
associative mechanisms use temporal order and contiguity to discriminate
between events that must be represented at the input level and events that must
be represented at the output level. However, some authors have recently shown
that basic learning mechanisms must be able to codify not only order and
statistical association, but also the precise timing of the events occurring in each
trial during the task (Kirkpatrick & Church, 1998; Savastano & Miller, 1998).
Therefore, independently of what particular model is favoured by current evi-
dence (see Mowrer & Klein, 2001, for a review), the kind of basic learning
mechanisms conditioning seems to depend on (both human and nonhuman
animals) is capable of successfully using order, timing, and cue competition
principles to identify potential causes in the environment.
In addition, the associative view is not completely incompatible with a
moderate innatist approach (and with content-related specificity). Even though
associative mechanisms are domain unspecific, the idea that there exist some
predisposition to learn more readily and effectively in some domains than in
others is accepted by most associationists. The biological preparedness
hypothesis (Seligman, 1971) states that some objects are, by virtue of species'
evolution, more likely to be grouped together and get associated to other cate-
gories of objects than others. For example, humans and nonhuman primates are
more likely to associate aversive experienced and vicarious unconditioned sti-
muli to pictures of snakes than to pictures of flowers or mushrooms (Oehman &
Mineka, 2003). Similarly, facilitated associations have also been discovered
between new flavours and delayed sickness feelings across a number of species
(taste aversion learning; see Schafe & Bernstein, 1996). As noted by an
anonymous reviewer, the preparedness hypothesis formulated in conditioning
literature is not far from the domain specific content approach defended by some
developmental psychologists (see, for example, Carey, 2001). The only addi-
tional specification that is necessary to make both approaches compatible is
assuming that ``content-specificity'' refers to categories of representations that
are more easily formed and linked among them than others, rather than to
essentially different and biologically separated heritable learning systems; or,
more simply stated, that content specificity feeds into a general-purpose learning
system. In tune with this idea, the canalisation hypothesis (Cummins, Cummins,
& Poirier, 2003; Dellarosa-Cummins & Cummins, 1999) maintains that rapid
learning from poor stimuli about some contents (which is taken as evidence for
advocating a modular-innatist approach to learning and reasoning development)
can be consequence of biasing learning/acquisition processes in ways that
favours the development of concepts and cognitive functions that proved
adaptive to an organism's ancestors. This implies that ``criticisms of innate
cognitive modules [in a strong sense] are not ipso facto criticisms of evolu-
tionary explanations of cognitive capacities'' (p. B38, clause between brackets
added). From this perspective, the existence of innate concepts (representational
HUMAN CAUSAL INDUCTION 303
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
nativism) and innate computational models is not strongly denied, but it is not
considered necessary for explaining the available data either.
In any case, basic mechanisms for the detection of statistical regularities or
associationsÐeven constrained by facilitating content-related mechanismsÐ
cannot comprehend the rich complexity of causal reasoning. There is extensive
evidence that even young children can discriminate between true causes and
mere predictors by virtue of the application of induction operations. In principle,
what differentiates causal relations from correlations is the knowledge that in
causal relations there must be some sort of transmission from cause to effect.
The idea that causal power or causal transmission exists in the world and needs
to be unveiled is the core principle of causal induction, and the starting point for
the causal Bayes' nets approach and the Power PC theory (Cheng, 1997; Pearl,
2000). However, it is not that clear where that essential principleÐby definition
lacking of a clear sensorial inputÐemerges from (nonetheless, it is important to
note here that that knowledge is not referred to any particular causal mechanism
or category of objects, but to the more abstract or ``Kantian'' interpretation of
the world as causal).
The first possibility is that causal power is directly ``perceived'' in some
specific domains when some conditions are met, without the involvement of
reasoning or learning, and then generalised to other domains. In launching
experiments (Michotte, 1963), for instance, one shape moves towards another,
which moves upon contact, as in collision. In this situation, adults report they
see the first shape set the second in motion, although there is no real physical
interaction between the two shapes. Importantly, even 7-month-old infants are
sensitive to this type of illusion (Leslie & Keeble, 1987; Schlottmann, Allen,
Linderoth, & Hesketh, 2002).
It is still a matter of discussion whether this effect is dependent on an innate
perceptual mechanism or on experience. For some authors, perceptual causality
is innate and modular (Leslie & Keeble, 1987); for others, perceptual causality
meets the automaticity and encapsulation criteria for considering it modular, but
this module can develop across time (Scholl & Leslie, 1999; Scholl &
Tremoulet, 2000); a third group of authors denies both the innate and modular
nature of perceptual causality, and consider it an experience-dependent
phenomenon (Cohen & Oakes, 1993; Oakes, 1994; Rakison & Oakes, 2003);
and finally, according to a fourth group of authors several processes feed into the
subjective experience of perceptual causality, and some components can be
modular (Schlottmann, 2000).
Nevertheless, independently of the origin and modular or nonmodular nature
of perceptual causality, there are reasons to believe that it cannot be the only
source for acquiring the concept of power. First, because perceptual causality
can indeed interfere with that acquisition process (Schlottman, 1999) in situa-
tions in which the appearance of causality does not correspond to the actual
causal structure of the events in the current scenario. ``Eventually, children can
304 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
learn that perceptual causality can be illusory, and subjugate it to a mechanism-
based criterion for causality'' (2000, p. 441), which means that, logically, the
concept of causal power cannot emerge only from perceptual causality. And
second, because perceptual causality has been reported almost exclusively in the
domain of mechanical systems and no clear cognitive mechanism has been
proposed to account for its generalisation to other domains.
As mentioned in a previous section, a second possibility is the existence of a
``content-independent causal inference module'' in which the concept of causal
transmission is innately hard-wired. This position is consistent with the idea that
``computational adaptation can be content-general . . . yet still functionally
specialized'' (Duchaine, Cosmides, & Tooby, 2001); that is, that there exist
modules in the brain that operate over a multiplicity of contents, yet always
solve the structural problems at which they are specialised. The systems
underlying conditioning (Gallistel & Gibbon, 2000) and probabilistic reasoning
(Brase, Cosmides, & Tooby, 1997) have been cited as examples of such
modules.
It has been shown that 3-year-olds can already make use of conditional
dependencies±independencies revealed in a minimum set of trials to select or
discard causal candidates (Gopnik et al., 2001, 2004), and they can plan a
completely new intervention on a putative cause to produce or stop an effect
(thus showing that their behaviour involves true causal understanding; Schulz &
Gopnik, 2004), after merely observing a set of statistical dependencies±
independencies. Accordingly, Gopnik and her collaborators have proposed that
even very young children have a cognitive system that recovers causal facts by
making implicit assumptions about the causal structure of the environment and
the relations between the environment and evidence. The origin of such implicit
assumption is not clearly specified, but an analogy is drawn between the way the
causal learning system recovers the causal structure of the world from covari-
ation, and the way the visual system recovers tridimensionality from 2-D pro-
jections on the retinas (apparently implying that such system is modular). This
assertion is, however, difficult to reconcile with evidence emerging from adult
studies (De Houwer & Beckers, 2003; see also Waldmann & Hagmayer, 2001;
Waldmann & Martignon, 1998) showing that the application of general-purpose
causal induction principles is effortful and time consuming, and strongly
interferes with other concurrent tasks.
In the two previously described approaches, the origin of the general dis-
tinction between causal and noncausal links ultimately relies on innateness.
However, there is a third possibility that does not rely on (although it does not
deny) the existence of core innate knowledge. Piaget (1927/1969) proposed that
the appreciation of causal necessity begins with feeling the efficacy of our own
actions, which is subsequently projected onto external objects. This idea points
to intervention as a key element in causal cognition development, but, probably,
it also underestimates its importance. As formulated, the perception of efficiency
HUMAN CAUSAL INDUCTION 305
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
of our actions is similar to perceptual causality (we cannot help interpreting the
events occurring immediately after our actions as effects of them). Yet, inter-
vention could also play a central role in facilitating the acquisition of the concept
of ``cause'' and the general features of causes, as differentiated from the concept
of ``predictor'' (some primitive and intuitive version of what we call causal
power or causal transmission). As noted in a previous section, interventions
actually change the structure of a causal scenario (as represented by graph
surgery and the introduction of the ``do'' operator). In consequence, there are
predictive relations that survive intervention whereas others do not, and only the
latter signal the existence of a causal relationship. Intervention can thus be
regarded as the key to identify other features also manifested by causes, as the
clues to causal structure enumerated earlier (including the patterns of depen-
dence/independence normally associated to causes and effects that Gopnik and
her colleagues refer to in their recent works).
Thus, it can be assumed that in the early stages of cognitive development
associative mechanisms, more or less enhanced and constrained by innate
facilitatory mechanisms, play the role of signalling where interventions are
likely to be effective, and intervention allows the ascertainment of whether a
candidate is a cause or not, in a more or less reflexive way early in development
and more purposely later (we have seen earlier that adults' interventions are
indeed directed to those variables identified as possible causes in a previous
observational phase; Steyvers et al., 2003). Although the link between stimulus±
stimulus associative learning and intervention has never been properly studied,
even young children appear to show a general tendency to manipulate consistent
predictors of desirable outcomes. In fact, this remains a basic question to be
explored and consequently a potential source of new hypotheses for experi-
mentation. Our general prediction is that those associatively singled out events
``attract'' manipulative behaviour, and learning from the correlational patterns
emerging of such behaviour has a special status in causal cognition development
(maybe by virtue of canalisation).
In an evolutionary and developmental sense, manipulative efficiency is
related to a previous psychological function that assigns motivational and
emotional value to classically conditioned stimuli (Lovibond, 1983), in a way
that confers them reinforcing or aversive properties. However, there is a
qualitative developmental gap between pursuing the occurrence of desirable
events because of their conditioned hedonic value, and being able to pursue the
occurrence of those same events because they are expected to cause a desirable
effect. The latter ability cannot obey to simple conditioning andÐfrom our point
of viewÐmust be based on the identification of some of the consistent features
that are associated to efficient causes.
In summary, we propose that the most basic principle of causal reasoningÐ
the distinction between mere predictiveness or co-occurrence and causal
powerÐcan emerge gradually. If this presumption is right, manipulation must
306 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
play a central role in such process. The acquisition of the concept of causal
power can be understood as the progressive abstraction of the differences
between predictive events and actively intervened events. Later on, general
principles of causal induction and general features of effective causes (i.e., the
local clues that signal the presence of a causal link) are progressively general-
ised, and, at the same time, their influence is increasingly modulated by pre-
viously acquired causal knowledge. As it happens with other cognitive
capacities, causal learning in early childhood is essentially an empirical rather
than a formal activity, which leads to prototypical inductive errors in specific
cases (Koslowski & Masnick, 2002; Siegler, 1983; Vosniadou & Brewer, 1992;
see also Vosniadou, 1994). For instance, Heyman, Phillips, and Gelman (2003)
have shown that young children (5- to 7-year-olds) generalise physical princi-
ples across ontological kinds (animate vs. inanimate entities) but also show
sensitivity to those ontological kinds in their projections, an effect absent in
adults. In addition, 5-year-olds are more likely to project principles from ani-
mate to inanimate objects than vice versa, which is in clear contradiction with
adult physical knowledge. Direct experience, verbal learning (including formal
education; German, Hollander, Star, & Heyman, 2000), and even nonverbal
communication (Brooks, Hanauer, & Frye, 2001), contribute to the increasing
importance of top-down factors in causal reasoning (Amsel, Goodman, Savoie,
& Clark, 1996).
On the other hand, as noted earlier, general induction processes can be
innately biased to operate over certain categories of stimuli selected from the
flow of events. In that sense, the existence of content-specific knowledge is not
incompatible with general purpose induction principles. In fact, early innate
content knowledge can be useful in providing the kind of constraints on initial
causal hypotheses necessary for the application of such principles (i.e., both for
selecting causal candidates, and for controlling nontarget extraneous factors).
Finally, in parallel to the development of executive strategies, the causal
exploration of the world through intervention becomes more formal and sys-
tematic. Confronted with a complex causal scenario, in which several variables
can produce an effect, reasoners in their late childhood and early adolescence
systematically manipulate each potential factor across the values of the other
factors in order to isolate the individual and interactive influence of each of them
on the value of the outcome. Younger reasoners, in contrast, proceed in a more
random, arbitrary way (Inhelder & Piaget, 1958).
To recap, it is proposed here that an innate and unitary causal learning system
is not strictly necessary for the development of causal reasoning, although the
innate nature of learning facilitation mechanisms in some domains and per-
ceptual causation are not denied. A unitary regularity-detection learning system
can be regarded as the basis on which the primitives of causal reasoning are built
on. However, that system by itself (even if enhanced by content-related
preparedness) cannot give rise to the emergent features of developing causal
HUMAN CAUSAL INDUCTION 307
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
reasoning abilities. Tentatively, we have proposed that intervention can play an
important role in acquiring the concept of causal power as something essentially
different to mere predictive validity. From a rational point of view (as noted in
an earlier section), the value of intervention relies on the fact that it changes the
structure of the causal scenario under scrutiny and ensures control, which means
that the biological facilitation of learning through intervention is indeed
rationally justified. Therefore, from this perspective causal reasoning abilities
emerge from the interaction between general purpose learning mechanisms, and
content-specific knowledge ``modules'' that enhance them. Further development
of episodic memory and executive function, verbal instruction, and the accrual of
a progressively richer and more refined base of causal knowledge are responsible
for causal reasoning development through childhood and adolescence.
We are aware that the evidence supporting the gradual and learning-based
emergence of the concept of causal power and causal induction principles we
have proposed here is still sparse, but so is the evidence against it. To date, the
existence of a general-purpose causal induction innate module is based exclu-
sively on the demonstration that young infants show causal induction abilities
that cannot be accounted for by basic associative learning. However, pre-
cociousness is not a definitive proof of innateness. Three-year-olds already have
an extensive history of hits and failures in interacting with the world, and their
environment is plenty of opportunities to learn that not all predictive relations
remain predictive when an intervention by oneself or other person is introduced.
Our hypothesis is thus aimed at inspiring new research on the precise timing and
conditions for the development of general-purpose causal induction principles,
and on the role of intervention in the emergence of those principles.
A GLIMPSE AT THE WHOLE PICTURE
In the final section of this work, we intend to compose a general cognitive
architecture for causal reasoning based on the empirical evidence available to
date. That tentative architecture is summarised in Figure 2. The upper and lower
sections of the figure represent inputs to the processes involved in causal rea-
soning, whereas the middle part represents the internal representations generated
by those processes. This architecture is not aimed at being exhaustive, and parts
of it are still underdefined (for instance, how do the different clues to causal
structure interact? And which of them takes precedence when two of them
contradict each other?). However, it serves the objective of assigning the main
current research lines a place in relation to the others, and casts some light on the
path to follow by new research efforts.
Covariation computation is the central element of this set of processes, as it is
necessary both for causal structure inference and for causal strength estimation.
Here we will consider only learning from probabilistic evidence, although the
structure can be extended to deal with deterministic causes. That is possible
308 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
because specific top-down influences feed into the processes that select infor-
mation for covariation computation (grey arrow). As noted earlier, specific
knowledge on the nature of the mechanism in operation in the scenario under
scrutiny is necessary for taking a limited number of trials as representative of
general rules (as in the example of the alarm watch).
Most studies on covariation computation from noisy information have tried to
provide evidence to solve the controversy between rule-based and associative
models. To date, the most convincing results in favour of the associative
approach are those reported by Dickinson and his collaborators (Aitken et al.,
2000, 2001; Dickinson & Burke, 1996; Larkin et al., 1998). In this series of
studies, participants were first trained with a compound (AB+), and then with
one of the elements of the compound (A+, or A7). Judgements on B were
shown to be affected by the elemental phase (retrospective revaluation) only
when the two elements of the compound (A and B) had been consistently
associated between them in the first phase of the study. This result is predicted
by associative theories that claim that the representation of the target cue (B)
need to be associatively activated by the competing cue (A) during the second
phase of the task for retrospective revaluation effects to take place. This inter-
Figure 2. General architecture of causal reasoning. The upper and lower sectors stand for the
informational inputs for causal reasoning process. The middle sector represents the intermediate
products (representations) of those processes. Arrows represent information transformations and
modulations.
HUMAN CAUSAL INDUCTION 309
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
pretation, however, has been recently challenged (Beckers, De Houwer, &
Miller, 2004). According to this second interpretation, the complexity of the
designs used in these studies preclude reasoners to remember the relevant trials
presented in the first stage of the task, and thus to compute covariation con-
ditionally (which is necessary for retrospective revaluation to occur).
On the other hand, evidence in more simple situations clearly favours a rule-
based account of covariation estimation. As already commented in an earlier
section, a recent meta-analysis of the most relevant studies in which the trial-by-
trial causal learning procedure has been used (Perales et al., 2005) shows that a
simple rule integrating the frequencies of the four trial types in a weighted
manner (Busemeyer, 1991) isÐgloballyÐthe most predictive of the models
proposed to date (see also White, 2003b). In addition, a number of studies have
shown that reasoners' inferences on the maximality of the outcome, on the
additivity of the effects of the competing cues, and on the presence or absence of
the blocked cue during the first stage of a blocking design (De Houwer, 2002;
De Houwer & Beckers, 2003; De Houwer et al., 2002) modulate cue competition
effects.
Taken together, the available evidence seems to demonstrate the implication
of effortful, deliberate inferential reasoning processes in covariation estimation
in causal reasoning tasks, which are much more compatible with a rule-based
account of causal learning than with associative models. In all of these studies,
however, the main dependent variable has been a global predictive or causal
judgement. The interpretation of results is not that clear in studies in which other
responses were collected. For instance, in a recent nonpublished study by Allan,
Siegel, and Tangen (2004) it has been shown that trial-by-trial outcome
expectancy measures, analysed by using uncontaminated signal detection
theory's indices (d' and û), did not show the biases (i.e., the outcome-density
and the cue-density effects) that have been consistently reported in experiments
in which global predictive and causal judgements were collected.
Although evidence is still only partial and preliminary, it already shows that
some of the effects that have been interpreted to be contrary to an associative
account of causal learning do not take place during learning, but during the
elaboration of a judgement. Evidence accrual during training can be regarded as
a mostly automatic, data-driven, and probably associative process, whereas
computation processes can be responsible for information integration when a
judgement is required (Matute, Vegas, & de MaÂrez, 2002). Therefore, associa-
tive and rule-based mechanisms could well define two different hierarchically
related levels in the set of mechanisms computing covariation. According to our
proposal, the formation of the categories representing the four trial types, and the
computation of their relative frequencies in the task (a, b, c, and d; necessary for
the application of a rule) responds to the operation of an associative mechanism.
In other words, an associative mechanism could be responsible for forming and
strengthening the representations of the four trial types and those associative
310 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
representations can be subsequently used to compute a composite value of
perceived contingency. Evidence on the associative nature of frequency esti-
mations can arise from studies (as the ones we are currently carrying out in our
laboratory) showing that manipulating the degree of attention to the cause or the
effect in causal learning tasks can differentially alter the frequency estimations
of the four trial types (a, b, c, and d).
In this architecture, all processing is considered to be bottom-up until the
trial-type frequency estimation level. From that level on, further processing can
be influenced by previous knowledge in a top-down manner. However, as noted
earlier, top-down influences imply reorganising the information represented at
this level, and are costly in terms of attentional and executive load. In more
specific terms, once the frequencies of the different trial types have been
computed, it is necessary to determine which of them are considered relevant for
contingency estimation, in accordance with general inferential principles
specifying that covariation must be computed conditionally (Spellman, 1996;
White, 2002), and ceiling effect situations must be avoided (Wu & Cheng,
1999). Only sets of instances in which certain factors are constantly present or
absent and the base rate of the effect is submaximal must be taken into account.
At this level, content-specific knowledge has a role in signalling what factors in
the environment need to controlled, and may determine what events in the tasks
are potential causes and what of them are effects (Waldmann & Holyoak, 1992).
For the sake of parsimony, we have also assumed that, independently of how
covariation between a potential cause and an effect is computed the result of that
process is equivalently available for further causal processing. On the one hand,
the reasoner needs to determine whether that covariation is different from zero
(which depends not only on its absolute value, but also on the number of
instances and the reliability of the source from which the information stems). In
case the covariation is considered reliable, it has still to be decided whether it
results from the existence of a causal link or it is a spurious one. And, on the
other hand, if the covariation is interpreted as causal, its specific value can be
used to directly or indirectly determine the strength of the causal relationship.
Several studies have shown that naõÈve reasoners intuitively discriminate
covariation from causation (Fugelsang & Thompson, 2003, Exp. 2; Wu &
Cheng, 1999). According to our proposal, for covariation to be interpreted as
causal, it has to be combined with the available information on order and
contiguity and with previous categorical knowledge on the plausibility a causal
mechanism linking the candidate cause and the effect. As noted in an earlier
section, previous causal knowledge can also modulate the use of time and order
as informative clues to causal structure. Additionally, it must also be considered
whether the covariation has emerged from an intervention or just from an
observational situation.
For purely illustrative reasons, knowledge on causal categories/mechanisms
has been represented separately from the evaluated causal link in Figure 2;
HUMAN CAUSAL INDUCTION 311
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
however, links between causal categories and causal links between single events
have, from our point of view, a common representational basis. The bidirec-
tional relation between the two levels stands for the incremental loop of
knowledge acquisition described previously: causal categories modulate the
acquisition of new causal knowledge from covariation between specific events;
but, simultaneously, causal knowledge acquired in this way contributes to
refining causal categories. In this framework, a general causal mechanism, and a
specific causal link are not necessarily two different types of representation, but
just two propositions of essentially the same nature linking concepts at different
abstraction levels.
To conclude, our proposal is strongly based on the available evidence
(thoroughly reported in the first two sections of this work), but new predictions
can also be experimentally tested. For example, it has been proposed that
expectancy measures and, consequently, trial type frequencies, are insensitive to
top-down influences. A new paradigm under development in our laboratory
allows analysing response reaction times to online expectancy questions, and
identifying the factors that influence those reaction times. Similarly, trial-type
estimations can also be analysed to test whether or not they show any sign of
underlying associative processes.
In addition, the proposed architecture has a clear hierarchical structure, which
implies that the logic of dissociations±associations can be applied to test it.
Specifically, any factor influencing the lower part of the system (conditioned
expectancies) is expected to also influence the levels above it, whereas it is in
principle possible to find factors that influence the upper levels of the archi-
tecture without having any effect on the levels below them. For example, it can
be tested whether the biases usually detected in contingency estimations (e.g.,
cue- and outcome-density biases) are also shown by trial-by-trial predictive
accuracy measures and trial-type frequency estimations.
And finally, we have only listed the different sources of information
reasoners use to make structural decisions (time and order, manipulation, con-
tingency and control, and previous knowledge; at the top of Figure 2), but little
is known about how they interact, and when one takes precedence over other
when they point in different directions. The most exciting studies in the field at
the moment are those in which the traditional causal learning paradigm is being
abandoned or reformulated, and new tasks are proposed to study the importance
of these systematically neglected factors.
FINAL REMARKS
The proposed general architecture for causal reasoning is based on the idea that
(1) associative- and rule-based mechanisms are just descriptions of different
hierarchical levels in the system responsible for covariation computation, and
(2) covariation estimates are integrated with other clues present in the
312 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
environment, and with pre-stored general-purpose and domain-specific know-
ledge in order to decide whether or not the observed covariations signal the
presence of hidden causal links. This architecture neither optimises the useful-
ness of all the available information, nor strictly responds to rational normativity
criteria, but it is highly adaptive in adjusting the reasoner's knowledge and
behaviour to the actual causal texture of the world.
Nonhuman animals and newborns are probably equipped with only part of
this architecture. Basic learning mechanisms ensure a certain level of adaptation
to the causal structure of the environment, as long as they incorporate cue
competition principles that avoid confounding causal factors and noncausal
predictors, and are sensitive to the order and timing of events in the world.
Additionally, we also postulate that interventions are driven by stimuli that have
been associatively signalled to be potential causes. By means of these
mechanisms, and the possibility of learning from the patterns emerging from
intervention, the basic distinction between effective causes and other types of
correlates can be learnt. Subsequently, content-independent principles of causal
reasoning can be discriminatively learnt, even in young children, from the
monitoring of those patterns of dependency/independency resulting from their
own responses.
The more evolved parts of the proposed architecture are dependent on
declarative and episodic memory, and on executive processes, and are therefore
assumed to appear in parallel to the development of those cognitive systems.
The final system is a complex set of interrelated processes in charge of gen-
erating and manipulating the different types of representations needed to build
an interrelated set of causal propositions that can be used further to predict,
manipulate, and understand the events in the world.
Original manuscript received June 2004
Revised manuscript received October 2004
PrEview proof published online October 2005
REFERENCES
Ahn, W. K., & Bailenson, J. (1996). Causal attribution as a search for underlying mechanisms: An
explanation of the conjunction fallacy and the discounting principle. Cognitive Psychology, 31,
82±123.
Ahn, W. K., & Kalish, C. W. (2000). The role of mechanism beliefs in causal reasoning. In F. C. Keil
& R. A. Wilson (Ed.), Explanation and cognition (pp. 199±226). Cambridge, MA: MIT Press.
Aitken, M. R. F., Larkin, M. J. W., & Dickinson, A. (2000). Super-learning of causal judgements.
Quarterly Journal of Experimental Psychology, 53B, 59±81.
Aitken, M. R. F., Larkin, M. J. W., & Dickinson, A. (2001). Re-examination of the role of within
compound associations in the retrospective revaluation of causal judgements. Quarterly Journal
of Experimental Psychology, 54B, 27±51.
Allan, L. (2003). Assessing Power-PC. Learning & Behavior, 31, 192±204.
HUMAN CAUSAL INDUCTION 313
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Allan, L. G. (1993). Human contingency judgments: Rule based or associative? Psychological
Bulletin, 114, 435±448.
Allan, L. G., Siegel, S., & Tangen, J. M. (2004, May). A signal detection analysis of contingency
data. Communication presented at the first special interest meeting on Human Contingency
Learning, Le Lignely, Belgium.
Amsel, E., Goodman, G., Savoie, D., & Clark, M. (1996). The development of reasoning about
causal and noncausal influences on levers. Child Development, 67, 1624±1646.
Anderson, J. R., & Sheu, C. (1995). Causal inferences as perceptual judgments. Memory and
Cognition, 23, 510±524.
Barsalou, L. W., Sloman, S. A., & Chaigneau, S. E. (2002). The HIPE theory of function. In L.
Carlson & E. van der Zee (Eds.), Representing functional features for language and space:
Insights from perception, categorization, and development. New York: Oxford University
Press.
Beckers, T., De Houwer, J., & Miller, R. R. (2004, May). Outcome additivity and outcome max-
imality as independent modulators of blocking in human and rat causal learning. Communication
presented at the first special interest meeting on Human Contingency Learning, Le Lignely,
Belgium.
Brase, G. L., Cosmides, L., & Tooby, J. (1997). Individuation, counting, and statistical inference:
The role of frequency and whole-object representations in judgment under uncertainty. Journal of
Experimental Psychology: General, 127, 3±21.
Brooks, P. J., Hanauer, J. B., & Frye, D. (2001). Training 3-year-olds in rule-based causal reasoning.
British Journal of Developmental Psychology, 19, 573±595.
Buehner, M. J., & Cheng, P. W. (1997). Causal induction: The Power PC theory versus the Rescorla±
Wagner model. In M. G. Shafto & P. Langley (Eds.), Proceedings of the 19th annual conference
of the Cognitive Science Society (pp. 55±60). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). From covariation to causation: A test of the
assumption of causal power. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 29, 1119±1140.
Buehner, M. J., & May, J. (2003). Rethinking temporal contiguity and the judgement of causality:
Effects of prior knowledge, experience, and reinforcement procedure. Quarterly Journal of
Experimental Psychology, 56A, 865±890.
Busemeyer, J. R. (1991). Intuitive statistical estimation. In N. H. Anderson (Ed.), Contributions to
information integration theory (pp. 187±215). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Busemeyer, J. R., Byun, E., Delosh, E. L., & McDaniel, M. A. (1997). Learning functional relations
based on experience with input output pairs by humans and artificial neural networks. In K.
Lamberts & D. R. Shanks (Eds.), Knowledge, concepts and categories: Studies in cognition (pp.
405±437). Cambridge, MA: MIT Press.
Byrne, R. M. J. (1997). Cognitive processes in counterfactual thinking about what might have been.
The psychology of learning and motivation: Advances in research and theory (Vol. 37, pp. 105±
154). San Diego, CA: Academic Press.
Carey, S. (2001). On the very possibility of discontinuities in conceptual development. In E. Dupoux
(Ed.), Language, brain, and cognitive development: Essays in honor of Jacques Mehler (pp. 303±
324). Cambridge, MA: MIT Press.
Catena, A., Maldonado, A., & Candido, A. (1998). The effect of frequency of judgement and the type
of trials on covariation learning. Journal of Experimental Psychology: Human Perception and
Performance, 24, 481±495.
Chapman, G. B., & Robbins, S. J. (1990). Cue interaction in human contingency judgment. Memory
and Cognition, 18, 537±545.
Chatlosh, D. L., Neunaber, D. J., & Wasserman, E. A. (1985). Response±outcome contingency:
Behavioral and judgmental effects of appetitive and aversive outcomes with college students.
Learning and Motivation, 16, 1±34.
314 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Cheng, P. W. (1993). Separating causal laws from casual facts: Pressing the limits of statistical
relevance. In D. L. Medin (Ed.), Advances in research and theory: Vol. 30. The psychology of
learning and motivation (pp. 215±264). San Diego, CA: Academic Press.
Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review,
104, 367±405.
Cheng, P. W. (2000). Causality in the mind: Estimating contextual and conjunctive power. In F. C.
Keil & R. A. Wilson (Eds.), Explanation and cognition (pp. 227±253). Cambridge, MA: MIT
Press.
Cheng, P. W., & Holyoak, K. J. (1995). Complex adaptive systems as intuitive statisticians: Caus-
ality, contingency, and prediction. In H. L. Roitblat & J. A. Meyer (Eds.), Comparative
approaches to cognitive science: Complex adaptive systems (pp. 271±302). Cambridge, MA:
MIT Press.
Cheng, P. W., & Lien, Y. (1995). The role of coherence in differentiating genuine from spurious
causes. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition: A multidisciplinary
debate (pp. 463±494). New York: Clarendon Press/Oxford University Press.
Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological
Review, 99, 365±382.
Cobos, P. L., LoÂpez, F. J., Cano, A., Almaraz, J., & Shanks, D. R. (2002). Mechanisms of predictive
and diagnostic causal induction. Journal of Experimental Psychology: Animal Behavior Pro-
cesses, 28, 331±346.
Cohen, L. B., & Oakes, L. M. (1993). How infants perceive a simple causal event. Developmental
Psychology, 29, 421±433.
Cohen, L. B., Rundell, L. J., Spellman, B. A., & Cashon, C. H. (1999). Infants' perception of causal
chains. Psychological Science, 10, 412±418.
Cummins, D., Cummins, R., & Poirier, P. (2003). Cognitive evolutionary psychology without
representational nativism. Journal of Experimental and Theoretical Artificial Intelligence, 15,
143±159.
Cummins, D. D. (1995). Naive theories of causal deduction. Memory and Cognition, 23, 646±685.
Cummins, D. D. (1998). The pragmatics of causal inference. In M. A. Gernsbacher & S. J. Derry
(Eds.), Proceedings of the 20th annual conference of the cognitive science society (pp. 9±14).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Damianopoulos, E. N. (1967). S±R contiguity and delay of reinforcement as critical parameters in
classical aversive conditioning. Psychological Review, 74, 420±427.
Danks, D. J. (2002). The epistemology of causal judgment. Dissertation Abstracts International:
Section A. Humanities and Social Sciences, 63, 212.
De Houwer, J. (2002). Forward blocking depends on retrospective inferences about the presence of
the blocked cue during the elemental phase. Memory and Cognition, 30, 24±33.
De Houwer, J., & Beckers, T. (2002). A review of recent developments in research and theories on
human contingency learning. Quarterly Journal of Experimental Psychology, 55B, 289±310.
De Houwer, J., & Beckers,T. (2003). Secondary task difficulty modulates forward blocking in human
contingency learning. Quarterly Journal of Experimental Psychology, 56B, 345±357.
De Houwer, J., Beckers, T., & Glautier, S. (2002). Outcome and cue properties modulate blocking.
Quarterly Journal of Experimental Psychology, 55A, 965±985.
Dellarosa-Cummins, D., & Cummins, R. (1999). Biological preparedness and evolutionary expla-
nation. Cognition, 73, B37±B53.
Dickinson, A., & Burke, J. (1996). Within-compound associations mediate the retrospective reva-
luation of causality judgements. Quarterly Journal of Experimental Psychology, 49B, 60±80.
Duchaine, B., Cosmides, L., & Tooby, J. (2001). Evolutionary psychology and the brain. Current
Opinion in Neurobiology, 11, 225±230.
Fugelsang, J. A., & Thompson, V. A. (2003). A dual process model of belief and evidence inter-
actions in causal reasoning. Memory and Cognition, 31, 800±815.
HUMAN CAUSAL INDUCTION 315
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Gallistel, C. R. (2002). Frequency, contingency and the information processing theory of con-
ditioning. In P. Sedlmeier (Ed.), Frequency processing and cognition (pp. 153±171). London:
Oxford University Press.
Gallistel, C. R., & Gibbon, J. (2000). Time, rate, and conditioning. Psychological Review, 107, 289±
344.
Gelman, S. A., Hollander, M., Star, J., & Heyman, G. D. (2000). The role of language in the
construction of kinds. In D. Medin (Ed.), Advances in research and theory: Vol. 39. The psy-
chology of learning and motivation (pp. 201±263). San Diego, CA: Academic Press.
German, T. P., & Nichols, S. (2003). Children's counterfactual inferences about long and short causal
chains. Developmental Science, 6, 514±523.
Gigerenzer, G., & Hoffrage, U. (1999). How to improve Bayesian reasoning without instruction:
Frequency formats. Psychological Review, 102, 684±704.
Gigerenzer, G., & Todd, P. M. (1999). Simple heuristics that make us smart. New York: Oxford
University Press.
Glymour, C. (1998). Learning causes: Psychological explanations of causal explanation. Minds and
Machines, 8, 39±60.
Glymour, C., Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure. London:
Academic Press.
Goldvarg, E., & Johnson-Laird, P. N. (2001). NaõÈve causality: A mental model theory of causal
meaning and reasoning. Cognitive Science, 25, 565±610.
Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of
causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3±32.
Gopnik, A., Sobel, D. M., Schulz, L. E., & Glymour, C. (2001). Causal learning mechanisms in very
young children: Two, three , and four year olds infer causal relations from patterns of variation
and covariation. Developmental Psychology, 37, 620±629.
Haggard, P., Clark, S., & Kalogeras, J. (2002). Voluntary action and conscious awareness. Nature
Neuroscience, 5, 382±385.
Hagmayer, Y., & Waldmann, M. R. (2002). How temporal assumptions influence causal judgments.
Memory and Cognition, 30, 1128±1137.
HarreÂ, R., & Madden, E. H. (1975). Causal powers. Oxford, UK: Blackwell.
Hart, H. L., & HonoreÂ, A. M. (1985). Causation in the law (2nd ed.). Oxford, UK: Clarendon Press.
(Original work published 1959.)
Heyman, G. D., Phillips, A. T., & Gelman, S. A. (2003). Children's reasoning about physics within
and across ontological kinds. Cognition, 89, 43±61.
Hume, D. (1978). A treatise of human nature. Oxford, UK: Oxford University Press. (Original work
published 1739.)
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence: An
essay on the construction of formal operational structures. Oxford, UK: Basic Books.
Jones, J. E. (1962). Contiguity and reinforcement in relation to CS-UCS intervals in classical
aversive conditioning. Psychological Review, 69, 176±186.
Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D. Sperber, D. Premack,
& A. J. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 234±262). New York:
Clarendon Press/Oxford University Press.
Kelley, H. H. (1967). Attribution theory in social psychology. Nebraska Symposium on Motivation,
15, 192±238.
Kirkpatrick, K., & Church, R. M. (1998). Are separate theories of conditioning and timing necessary?
Behavioural Processes, 44, 163±182.
Koslowski, B., & Masnick, A. (2002). The development of causal reasoning. In U. Goswami (Ed.),
Blackwell handbook of childhood cognitive development. Malden, MA: Blackwell.
Lagnado, D., & Sloman, S. A. (2002). Learning causal structure. In Proceedings of the 24th Annual
Conference of the Cognitive Science Society (pp. 560±565). Mahwah, NJ: Erlbaum.
316 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Lagnado, D., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 30, 856±876.
Larkin, M. J. W., Aitken, M. R. F., & Dickinson, A. (1998). Retrospective revaluation of causal
judgments under positive and negative contingencies. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 24, 1331±1352.
Leslie, A. M., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25, 265±
288.
Levin, I. P., Wasserman, E. A., & Kao, S. F. (1993). Multiple methods of examining biased infor-
mation use in contingency judgments. Organizational Behavior and Human Decision Processes,
55, 228±250.
Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence
hypothesis. Cognitive Psychology, 40, 87±137.
Lober, K., & Shanks, D. R. (2000). Is causal induction based on causal power? Critique of Cheng
(1997). Psychological Review, 107, 195±212.
Lovibond, P. F. (1983). Facilitation of instrumental behavior by a Pavlovian appetitive conditioned
stimulus. Journal of Experimental Psychology: Animal Behavior Processes, 9, 225±247.
Mackie, J. L. (1974). The cement of Universe: A study in causation. Oxford, UK: Oxford University Press.
Maldonado, A., Catena, A., Candido, A., & Garcia, I. (1999). The belief revision model: Asym-
metrical effects of noncontingency on human covariation learning. Animal Learning and
Behavior, 27, 168±180.
Mandel, D. R., & Lehman, D. R. (1998). Integration of contingency information in judgments of
cause, covariation, and probability. Journal of Experimental Psychology: General, 127, 269±285.
Matute, H., Vegas, S., & de MaÂrez, P. J. (2002). Flexible use of recent information in causal and
predictive judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition,
28, 714±725.
Michotte, A. (1963). The perception of causality. Oxford, UK: Basic Books.
Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla±Wagner model.
Psychological Bulletin, 117, 363±386.
Mowrer, R. R., & Klein, S. B. (Eds.). (2001). Handbook of contemporary learning theories. Mahwah,
NJ: Lawrence Erlbaum Associates, Inc.
Newsome, G. L. (2003). The debate between current versions of covariation and mechanism
approaches to causal inference. Philosophical Psychology, 16, 87±107.
Oakes, L. M. (1994). Development of infants' use of continuity cues in their perception of causality.
Developmental Psychology, 30, 869±870.
Oehman, A., & Mineka, S. (2003). The malicious serpent: Snakes as a prototypical stimulus for an
evolved module of fear. Current Directions in Psychological Science, 12, 5±9.
Pearce, J. M. (1987). A model for stimulus generalization in Pavlovian conditioning. Psychological
Review, 94, 61±73.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San
Mateo, CA: Morgan Kaufmann.
Pearl, J. (2000).Causality: Models, reasoning, and inference. NewYork: Cambridge University Press.
Perales, J. C., Catena, A., & Maldonado, A. (2004). Inferring non-observed correlations from causal
scenarios: The role of causal knowledge. Learning and Motivation, 35, 115±135.
Perales, J. C., & Shanks, D. R. (2003). Normative and descriptive accounts of the influence of power
and contingency on causal judgement. Quarterly Journal of Experimental Psychology, 56A, 977±
1007.
Perales, J. C., Shanks, D. R., & Castro, L. (2005). Formal models of causal learning: A review and
synthesis. Manuscript submitted for publication.
Perner, J. (2001). Episodic memory: Essential distinctions and developmental implications. In C.
Moore & K. Lemmon (Eds.), The self in time: Developmental perspectives (pp. 181±202).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
HUMAN CAUSAL INDUCTION 317
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Piaget, J. (1969). The child's conception of physical causality. Totowa, NJ: Littlefield, Adams & Co.
(Original work published 1927.)
Rakison, D. H., & Oakes, L. M. (Eds.). (2003). Early category and concept development: Making
sense of the blooming, buzzing confusion. London: Oxford University Press.
Reichenbach, H. (1956). The direction of time. Berkeley, CA: University of California Press.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations of the
effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.),
Classical conditioning: II. Current theory and research (pp. 64±99). New York: Appleton-
Century-Crofts.
Roese, N. J. (1994). The functional basis of counterfactual thinking. Journal of Personality and
Social Psychology, 66, 805±818.
Savastano, H. I., & Miller, R. R. (1998). Time as content in Pavlovian conditioning. Behavioural
Processes, 44, 147±162.
Schafe, G., & Bernstein, I. (1996). Taste aversion learning. In E. Capaldi (Ed.), Why we eat what we
eat: The psychology of eating (pp. 31±51). Washington, DC: American Psychological Associa-
tion.
Scheines, R., Spirtes, P., Glymour, C., Meek, C., & Richardson, T. (1998). The TETRAD project:
Constraint based aids to causal model specification. Multivariate Behavioral Research, 33, 65±
117.
Schlottmann, A. (1999). Seeing it happen and knowing how it works: How children understand the
relation between perceptual causality and knowledge of underlying mechanism. Developmental
Psychology, 35, 303±317.
Schlottmann, A. (2000). Is perception of causality modular? Trends in Cognitive Sciences, 4, 441±
442.
Schlottmann, A., Allen, D., Linderoth, C., & Hesketh, S. (2002). Perceptual causality in children.
Child Development, 73, 1656±1677.
Scholl, B. J., & Leslie, A. M. (1999). Modularity, development and ``theory of mind''. Mind and
Language, 14, 131±153.
Scholl, B. J., & Tremoulet, P. D. (2000). Perceptual causality and animacy. Trends in Cognitive
Sciences, 4, 299±309.
Schultz, T. R. (1982). Rules of causal attribution. Monographs of the Society for Research in Child
Development, 47(1, Serial No. 194).
Schulz, L. E., & Gopnik, A. (2004). Causal learning across domains. Developmental Psychology, 40,
162±176.
Seligman, M. E. (1971). Phobias and preparedness. Behavior Therapy, 2, 307±320.
Shanks, D. (1995). The psychology of associative learning. New York: Cambridge University Press.
Shanks, D. R., & Dickinson, A. (1991). Instrumental judgment and performance under variations in
action±outcome contingency and contiguity. Memory and Cognition, 19, 353±360.
Shanks, D. R., Holyoak, K., & Medin, D. L. (Eds.). (1996). Causal learning. San Diego, CA:
Academic Press.
Shanks, D. R., & LoÂpez, F. J. (1996). Causal order does not affect cue selection in human associative
learning. Memory and Cognition, 24, 511±522.
Siegel, S., & Allan, L. G. (1996). The widespread influence of the Rescorla±Wagner model.
Psychonomic Bulletin and Review, 3, 314±321.
Siegler, R. S. (1983). How knowledge influences learning. American Scientist, 71, 631±638.
Sloman, S.A.,&Lagnado,D. (2004). Causal invariance in reasoning and learning. InB.Ross (Ed.),The
psychology of learning and motivation, Vol. 44 (pp. 287±325). San Diego, CA: Academic Press.
Sloman, S., & Lagnado, D. (2005). Do we ``do''? Cognitive Science, 29, 5±39.
Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge.
Psychological Review, 99, 605±632.
Spellman, B. A. (1996). Conditionalizing causality. In D. R. Shanks & K. Holyoak (Eds.), Causal
learning (pp. 167±206). San Diego, CA: Academic Press.
318 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
Sperber, D., Premack, D., & Premack, A. J. (Eds.). (1995). Causal cognition: A multidisciplinary
debate. New York: Clarendon Press/Oxford University Press.
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. Cambridge, MA:
MIT Press.
Steyvers, M., Tenenbaum, J., Wagenmakers, E. J., & Blum, B. (2003). Inferring causal networks
from observations and interventions. Cognitive Science, 27, 453±489.
Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland.
Tenenbaum, J. B., & Griffiths, T. L. (2001). Structure learning in human causal induction. In T.
Leen, T. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, Vol.
13. Cambridge, MA: MIT Press.
Terry, W. S., & Wagner, A. R. (1975). Short-term memory for ``surprising'' versus ``expected''
unconditioned stimuli in Pavlovian conditioning. Journal of Experimental Psychology: Animal
Behavior Processes, 1, 122±133.
ValleÂe-Tourangeau, F., Murphy, R. A., Drew, S., & Baker, A. G. (1998). Judging the importance of
constant and variable candidate causes: A test of the Power-PC theory. Quarterly Journal of
Experimental Psychology, 51A, 65±84.
Van-Hamme, L. J., & Wasserman, E. A. (1994). Cue competition in causality judgments: The role of
nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127±151.
Vosniadou, S. (1994). Capturing and modeling the process of conceptual change. Learning and
Instruction, 4, 45±69.
Vosniadou, S., & Brewer, W. F. (1992). Mental models of the earth: A study of conceptual change in
childhood. Cognitive Psychology, 24, 535±585.
Waldmann, M. R. (2000). Competition among causes but not effects in predictive and diagnostic
learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 53±76.
Waldmann, M. R. (2001). Predictive versus diagnostic causal learning: Evidence from an over-
shadowing paradigm. Psychonomic Bulletin and Review, 8, 600±608.
Waldmann, M. R., & Hagmayer, Y. (2001). Estimating causal strength: The role of structural
knowledge and processing effort. Cognition, 82, 27±58.
Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models:
Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222±236.
Waldmann, M. R., & Holyoak, K. J. (1997). Determining whether causal order affects cue selection
in human contingency learning: Comments on Shanks and LoÂpez (1996).Memory and Cognition,
25, 125±134.
Waldmann, M., & Martignon, L. (1998). A Bayesian network model of causal learning. In M. A.
Gernsbacher & S. J. Derry (Eds.), Proceedings of the 20th annual conference of the Cognitive
Science Society (pp. 1102±1107). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Walsh, C. R., & Byrne, R. M. J. (2004). Counterfactual thinking: The temporal order effect.Memory
and Cognition, 32, 369±378.
Wasserman, E. A., Chatlosh, D. L., & Neunaber, D. J. (1983). Perception of causal relations in
humans: Factors affecting judgments of response±outcome contingencies under free-operant
procedures. Learning and Motivation, 14, 406±432.
Wasserman, E. A., Kao, S. F., van Hamme, L., Katagiri, M., & Young, M. E. (1996). Causation and
association. In D. R. Shanks & K. Holyoak (Eds.), Causal learning (pp. 207±264). San Diego,
CA: Academic Press.
White, P. A. (1995). Use of prior beliefs in the assignment of causal roles: Causal powers versus
regularity-based accounts. Memory and Cognition, 23, 243±254.
White, P. A. (2000). Causal attribution and Mill's methods of experimental inquiry: Past, present,
and prospect. British Journal of Social Psychology, 39, 429±447.
White, P. A. (2001). Causal judgments about relations between multilevel variables. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 27, 499±513.
White, P. A. (2002). Causal judgement from contingency information: Judging interactions between
two causal candidates. Quarterly Journal of Experimental Psychology, 55A, 819±838.
HUMAN CAUSAL INDUCTION 319
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14
White, P. A. (2003a). Causal judgement as evaluation of evidence: The use of confirmatory and
disconfirmatory information. Quarterly Journal of Experimental Psychology, 56A, 491±513.
White, P. A. (2003b). Effects of wording and stimulus format on the use of contingency information
in causal judgment. Memory and Cognition, 31, 231±242.
White, P. A. (2003c). Making causal judgments from the proportion of confirming instances: The
pCI rule. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 710±727.
Wimer, S., & Kelley, H. H. (1982). An investigation of the dimensions of causal attribution. Journal
of Personality and Social Psychology, 43, 1142±1162.
Wu, M., & Cheng, P. W. (1999). Why causation need not follow from statistical association:
Boundary conditions for the evaluation of generative and preventive causal powers. Psycho-
logical Science, 10, 92±97.
320 PERALES AND CATENA
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 2
3:31
18
Nov
embe
r 20
14