center for research in language · center for research in language, university of california, san...

17
CENTER FOR RESEARCH IN LANGUAGE June 2007 Vol. 19, No. 2 CRL Technical Reports, University of California, San Diego, La Jolla CA 92093-0526 Tel: (858) 534-2536 • E-mail: [email protected] • WWW: http://crl.ucsd.edu/newsletter/current/TechReports/articles.html TECHNICAL REPORT The Coordinated Interplay Account of Utterance Comprehension, Attention, and the Use of Scene Information Pia Knoeferle Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously published with every issue of the CRL Newsletter. The Newsletter is now limited to announcements and news concerning the CENTER FOR RESEARCH IN LANGUAGE. CRL is a research center at the University of California, San Diego that unites the efforts of fields such as Cognitive Science, Linguistics, Psychology, Computer Science, Sociology, and Philosophy, all who share an interest in language. The Newsletter can be found at http://crl.ucsd.edu/newsletter/current/TechReports/articles.html. The Technical Reports are also produced and published by CRL and feature papers related to language and cognition (distributed via the World Wide Web). We welcome response from friends and colleagues at UCSD as well as other institutions. Please visit our web site at http://crl.ucsd.edu. SUBSCRIPTION INFORMATION If you know of others who would be interested in receiving the Newsletter and the Technical Reports, you may add them to our email subscription list by sending an email to [email protected] with the line "subscribe newsletter <email-address>" in the body of the message (e.g., subscribe newsletter [email protected]). Please forward correspondence to: John Lewis and Jenny Staab, Editors Center for Research in Language, 0526 9500 Gilman Drive, University of California, San Diego 92093-0526 Telephone: (858) 534-2536 • E-mail: [email protected] CRL Technical Reports, Vol. 19, No. 2, June 2007 1

Upload: others

Post on 23-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

C E N T E R F O R R E S E A R C H I N L A N G U A G E

June 2007 Vol. 19, No. 2

CRL Technical Reports, University of California, San Diego, La Jolla CA 92093-0526 Tel: (858) 534-2536 • E-mail: [email protected] • WWW: http://crl.ucsd.edu/newsletter/current/TechReports/articles.html

TECHNICAL REPORT

The Coordinated Interplay Account of Utterance Comprehension, Attention, and the Use of

Scene Information

Pia Knoeferle

Center for Research in Language, University of California, San Diego

EDITOR’S NOTE

The CRL Technical Report replaces the feature article previously published with every issue of the CRL Newsletter. The Newsletter is now limited to announcements and news concerning the CENTER FOR RESEARCH IN LANGUAGE. CRL is a research center at the University of California, San Diego that unites the efforts of fields such as Cognitive Science, Linguistics, Psychology, Computer Science, Sociology, and Philosophy, all who share an interest in language. The Newsletter can be found at http://crl.ucsd.edu/newsletter/current/TechReports/articles.html. The Technical Reports are also produced and published by CRL and feature papers related to language and cognition (distributed via the World Wide Web). We welcome response from friends and colleagues at UCSD as well as other institutions. Please visit our web site at http://crl.ucsd.edu.

SUBSCRIPTION INFORMATION If you know of others who would be interested in receiving the Newsletter and the Technical Reports, you may add them to our email subscription list by sending an email to [email protected] with the line "subscribe newsletter <email-address>" in the body of the message (e.g., subscribe newsletter [email protected]). Please forward correspondence to:

John Lewis and Jenny Staab, Editors Center for Research in Language, 0526

9500 Gilman Drive, University of California, San Diego 92093-0526 Telephone: (858) 534-2536 • E-mail: [email protected]

CRL Technical Reports, Vol. 19, No. 2, June 2007

1

Page 2: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Back issues of the the CRL Technical Reports are available on our website. Papers featured in recent issues include the following: Syntactic Processing in High- and Low-skill Comprehenders Working under Normal and Stressful Conditions Frederic Dick, Department of Cognitive Science, UCSD Morton Ann Gernsbacher, Department of Psychology, University of Wisconsin Rachel R. Robertson, Department of Psychology, Emory University Vol. 14, No. 1, February 2002

Teasing Apart Actions and Objects: A Picture Naming Study Analia L. Arevalo Language & Communicative Disorders, SDSU & UCSD Vol. 14, No. 2, May 2002

The Effects of Linguistic Mediation on the Identification of Environmental Sounds Frederic Dick , Joseph Bussiere and Ayşe Pınar Saygın Department of Cognitive Science and Center for Research in Language, UCSD Vol. 14, No. 3, August 2002

On the Role of the Anterior Superior Temporal Lobe in Language Processing: Hints from Functional Neuroimaging Studies Jenny Staab Language & Communicative Disorders, SDSU & UCSD Vol. 14, No. 4, December 2002

A Phonetic Study of Voiced, Voiceless, and Alternating Stops in Turkish Stephen M. Wilson Neuroscience Interdepartmental Program, UCLA Vol. 15, No. 1, April 2003

New corpora, new tests, and new data for frequency-based corpus comparisons Robert A. Liebscher Cognitive Science, UCSD Vol. 15, No.2; December 2003

The relationship between language and coverbal gesture in aphasia Eva Schleicher Psychology, University of Vienna & Cognitive Science, UCSD Vol. 16, No. 1, January 2005

In search of Noun-Verb dissociations in aphasia across three processing tasks Analía Arévalo, Suzanne Moineau Language and Communicative Disorders, SDSU & UCSD, Center for Research in Language, UCSD Ayşe Saygin Cognitive Science & Center for Research in Language, UCSD Carl Ludy VA Medical Center Martinez Elizabeth Bates Cognitive Science & Center for Research in Language, UCSD Vol. 17, No. 1, March 2005

Meaning in gestures: What event-related potentials reveal about processes underlying the comprehension of iconic gestures Ying C. Wu Cognitive Science Department, UCSD Vol. 17, No. 2, August 2005

What age of acquisition effects reveal about the nature of phonological processing Rachel I. Mayberry Linguistics Department, UCSD Pamela Witcher School of Communication Sciences & Disorders, McGill University Vol. 15, No.3, December 2005

Effects of Broca's aphasia and LIPC damage on the use of contextual information in sentence comprehension Eileen R. Cardillo CRL & Institute for Neural Computation, UCSD Kim Plunkett Experimental Psychology, University of Oxford Jennifer Aydelott Psychology, Birbeck College, University of London Vol. 18, No. 1, June 2006

Avoid ambiguity! (If you can) Victor S. Ferreira Department of Psychology, UCSD Vol. 18, No. 2, December 2006

Arab Sign Languages: A Lexical Comparison Kinda Al-Fityani Department of Communication, UCSD Vol. 19, No. 1, March 2007

CRL Technical Reports, Vol. 19, No. 2, June 2007

2

Page 3: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

The Coordinated Interplay Account of utterance comprehension,attention, and the use of scene information

Pia Knoeferle ([email protected])Center for Research on Language,University of California San Diego

AbstractThis paper reviews recent research on theinterplay between language comprehen-sion processes, attention to relevant ob-jects and events, and the use of sceneinformation for comprehension. It dis-cusses, in particular, claims that informa-tion from scene events is prioritized overstereotypical thematic role knowledge dur-ing comprehension, and claims regard-ing a close temporal coordination betweencomprehension, visual attention, and theuse of scene information. The discus-sion takes into account findings from dif-ferent modalities (spoken comprehensionversus reading), as well as insights regard-ing the decay of recently inspected sceneinformation that is no longer present dur-ing language comprehension. We con-sider the implications of these findings fora recently proposed account of the coordi-nated interplay between utterance process-ing, attention, and the rapid use of sceneinformation during comprehension.

Introduction

”What is the time course with which we inte-grate what we see in a scene with a sentence thatwe read or hear” is one question that we mightask when studying the use of scene informationduring language comprehension. Answering thisquestion is of interest in various types of compre-hension situations such as when we read comicbooks (Carroll, Young, & Guertin, 1992), news-paper advertisements (Rayner, Rotello, Stewart,Keir, & Duffy, 2001), or inspect scientific dia-grams (Feeney, Hola, Liversedge, Findley, & Met-calf, 2000). In addition to reading, this question

has also been addressed in spoken comprehen-sion, when carrying out instructions to manipulateobjects in the environment (Sedivy, Tanenhaus,Chambers, & Carlson, 1999; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995) or in listen-ing to sentences that describe pictures on a com-puter screen (e.g., Knoeferle, Crocker, Scheep-ers, & Pickering, 2005). The above issue has bynow been largely resolved, and it is widely ac-cepted that scene information rapidly affects syn-tactic structure building (e.g., Knoeferle et al.,2005; Spivey, Tanenhaus, Eberhard, & Sedivy,2002; Tanenhaus et al., 1995) and incremental se-mantic interpretation (e.g., Sedivy et al., 1999).

In addition to showing that people rapidly use anon-linguistic (e.g., scene) context for online lan-guage comprehension, existing findings also pro-vide evidence for the rapid influence of linguis-tic knowledge and discourse contexts during on-line language comprehension. Prior psycholin-guistic research has, for instance, shown thatstored linguistic and world knowledge rapidly in-fluence the structuring and incremental interpreta-tion of a sentence during reading (e.g., Traxler &Pickering, 1996; Trueswell, Tanenhaus, & Kello,1993). In particular, stored verb-based knowl-edge of who-does-what-to-whom has been foundto have a strong and rapid influence on structuraldisambiguation and incremental thematic role as-signment during reading (e.g., McRae, Ferretti,& Amyote, 1997; McRae, Spivey-Knowlton, &Tanenhaus, 1998; Trueswell, Tanenhaus, & Gar-nsey, 1994). In addition, research on auditorycomprehension has demonstrated its rapid influ-ence on the incremental interpretation of an utter-ance that describes a scene (e.g., Kamide, Scheep-ers, & Altmann, 2003). It has further been shownthat people rapidly use a referential discourse con-

CRL Technical Reports, Vol. 19, No. 2, June 2007

3

Page 4: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

text to disambiguate locally structurally ambigu-ous sentences (e.g., Altmann & Steedman, 1988).

Thus, linguistic context and knowledge as wellas scene contexts are rapidly used for online lan-guage comprehension. Psycholinguistic findingsthat underscore the importance of these two typesof information - linguistic and scene context -receive support from developmental studies thatshow both linguistic and non-linguistic (e.g., in-formation about objects and events in the environ-ment) information are important for language ac-quisition. With respect to the influence of diverseinformational sources, the acquisition accounts byGleitman (1990), and Gillette, Gleitman, Gleit-man, and Lederer (1999) emphasize the impor-tance of linguistic knowledge for child languageacquisition. At the same time, they acknowledgethat no one informational source can account forfull language acquisition: ”The position I havebeen urging is that children usually succeed in fer-reting out the forms and the meaning of the lan-guage just because they can play off these two im-perfect and insufficient data bases(the saliently in-terpretable events and the syntactically interpretedutterances) against each other to derive the bestfit between them (Gleitman, 1990, p. 50). Snow(1977) suggested that early language is largelyabout objects and events in the immediate environ-ment of a child, thus corroborating this proposal.For the acquisition of more abstract terms, in con-trast, Gillette et al.(1999) suggested that informa-tion from the immediate scene was less important.

Developmental findings further draw our at-tention to pre-requisites for the successful useof scene information in acquisition: Experimen-tal findings suggest, for instance, that the influ-ence of nonlinguistic information (e.g., objectsand events) on language acquisition is highly de-pendent on the time-lock between a child’s at-tention to, and child-directed speech about, theseobjects and events (e.g., Dunham, Dunham, &Curwin, 1993; Harris, Jones, Brookes, & Grant,1986; Tomasello & Farrar, 1986). It has beensuggested that similar pre-requisities as for childlanguage acquisition also hold for adult languagecomprehension (e.g., Knoeferle & Crocker, 2006).Findings from Tomasello & Farrar (1986) supportthe importance of attention for the acquisition ofconcrete words: when a child’s attention was al-ready focused on an object, words referring to itwere learned better than words that were presented

when trying to redirect the child’s attention.Based on these findings - the importance of both

linguistic contexts / knowledge and scene contextsfor language comprehension and acquisition, aswell as based on first insights into pre-requisitesfor the use of visual contexts in language acquisi-tion - recent psycholinguistic research has movedon to examine in more detail how precisely sceneinformation influences adult language compre-hension (Knoeferle, 2007; Knoeferle & Crocker,2006, in press; Knoeferle, Habets, Crocker, &Munte, in press).

The present article will present an overview anda discussion of this recent research by Knoeferleand colleagues. A first section below will draw at-tention to five key issues of their research that aremotivated by both psycholinguistic and develop-mental findings. The reminder of the article willreview individual experimental studies that haveinvestigated these five research issues and discussrelevant findings in the context of a first accountof the interplay between language-mediated visualattention to objects and events, and the rapid useof scene information for language comprehension(Knoeferle & Crocker, in press).

Five questions on the use of scene eventsfor online language comprehension

When considering the developmental findings thatwe reviewed above, it is noteworthy that the influ-ence of information about objects and events onlanguage acquisition depended on the time-lockbetween a child’s visual attention to objects andchild-directed speech. It is further interesting tonote that while both linguistic and non-linguistic(e.g., scene) information influenced comprehen-sion and acquisition, their relative importance isso far unclear. Motivated by these two insights, wehighlight five questions on how information froma visual scene (e.g., events) influences adult lan-guage comprehension. The first two questions per-tain to the time-lock between language-mediatedattention to objects / events and the influence ofinformation from these objects / events on com-prehension. A further three points draw attentionto the relative importance of linguistic knowledgeversus scene context.

Q1 Is the time course with which scene contextaffects utterance comprehension a function ofwhen scene information is identified as rele-vant by the utterance?

CRL Technical Reports, Vol. 19, No. 2, June 2007

4

Page 5: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Q2 Is scene context on a par with linguistic con-text / knowledge in resolving local structuralambiguity?

Q3 Is scene information - when it is clearly rel-evant for comprehension - potentially priori-tized for language comprehension comparedwith linguistic knowledge?

Q4 Does the absence of an immediate scene orscene events during language comprehensiondiminish the preferred reliance on depictedevents?

Q5 Is a close time-lock between utterance-mediated attention to objects and events pre-requisite for the rapid use of visual context(e.g., information about objects and evens)during comprehension?

In the following sections we review in more de-tail the studies that have addressed the above fiveissues, and discuss findings that provide insightinto when there is (and when there is not) a closetemporal coordination between utterance compre-hension, attention in a scene and the use of rele-vant scene information.

Q1: Do we use scene context time-lockedto language-mediated attention?

An important initial finding that relied on what hassince been dubbed the ”visual world” paradigmwas relevant to both the coordination of utter-ance comprehension with scene processing andthe rapid influence of scene information (Tanen-haus et al., 1995). Tanenhaus and colleagues pre-sented people with instructions such as Put the ap-

ple on the towel in the box, in which the phrase on

the towel can be temporarily analysed as modifierof the first noun phrase, identifying the location ofthe apple, or analysed as argument of the verb, in-dicating where to put the apple. As participantscarried out the instructions, their eye gaze wasmonitored, and provided insights into online lan-guage comprehension. In a context containing oneapple placed on a towel (location) and an emptytowel (destination), people mostly inspected theapple when hearing Put the apple. Having heardon the towel, listeners mostly fixated the emptytowel, suggesting interpretation of the ambiguousphrase as the destination. In a scene with two ap-ples (of which one was on a towel) and an emptytowel, fixation patterns differed from the start of

the utterance in comparison with the one-applecontext. People’s eye movements - as they heardthe apple - alternated between the two apples andsettled on the apple on the towel shortly after hear-ing on the towel, indicating interpretation of thephrase on the towel as location of the apple. Thecore findings of this research are that utterance in-terpretation directs attention in the scene (see alsoCooper, 1974; Spivey, Tyler, Eberhard, & Tanen-haus, 2001), and that the visual referential contextrapidly influences the incremental structuring ofthe utterance. The rapidity of scene influence wasconfirmed by the fact that eye movements differedbetween the two visual context conditions fromthe onset of the utterance (see also Spivey et al.,2002).

The fixation patterns in the studies by Tanen-haus et al. (1995) do not, however, permit us todetermine whether scene information influencedstructuring and interpretation of the utterance in amanner that is closely time-locked to, or indepen-dent of, when the utterance identified that sceneinformation as relevant. On a first interpretation,comprehension of the utterance (i.e., the apple) di-rected attention in the scene, and this triggered theconstruction and use of the appropriate referentialcontext (one apple, two apples). A second possi-bility is that people acquired the referential contexttemporally independently of - maybe even prior to- hearing the utterance. They may then have ac-cessed that context much as they would access aprior discourse context. The findings by Tanen-haus et al. are compatible with both interpreta-tions. Since eye-movements differed from the startof the utterance between the two contexts, it isimpossible to determine precisely whether or notidentification of relevant scene information by theutterance was necessary for that scene informationto affect structural disambiguation.

My own prior research that has relied on eyetracking in scenes during spoken comprehensionhas provided important insights into the tempo-ral coordination of utterance-mediated attentionand the use of scene information. Knoeferle etal. (2005) have extended insights on the rapidinfluence of visual contexts on comprehensionby investigating comprehension in relatively richscenes that contained depicted events in additionto objects. They examined the comprehension ofsubject-verb-object (SVO, see (1a)) and object-verb-subject (OVS, see (1b)) sentences that related

CRL Technical Reports, Vol. 19, No. 2, June 2007

5

Page 6: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

to event scenes. While both of these constituentorders are grammatical in German, the subject-initial order is preferred (e.g., Hemforth, 1993).The correct syntactic and thematic relations wereambiguous prior to the sentence-final accusative(SVO, 1a) or nominative (OVS, 1b) case-markednoun phrase.

Using such initially structurally ambiguous sen-tences (1) while people inspected a related eventscene (Fig. 1) permitted the authors to exam-ine whether people would use depicted events thatshow who-does-what-to-whom for early disam-biguation (i.e., prior to hearing the second nounphrase that disambiguated the structural ambigu-ity through case marking on the determiner of thatnoun phrase).

The materials were constructed such that earlydisambiguation through depicted events was pos-sible at the verb, if listeners exploited scene infor-mation (a princess washing a pirate; a fencer paint-ing that princess, see Fig. 1). The verb for canon-ical ambiguous-verb-object sentences (1a) identi-fied the princess as the agent of a washing event(princess-washing-pirate). In contrast, for non-canonical ambiguous-verb-subject sentences, theverb (paints) identified the princess as the patientof a painting event performed by another agent,the fencer. Such depicted role relations makeit clear whether the referent of the initially am-biguous noun phrase (the princess) is the agent(SVO) or patient (OVS), and hence the subjector object of the sentence. If people use verb-mediated depicted events, then we would expectto see their use of scene events reflected in theireye-movements to relevant role fillers in the scene.Specifically, after hearing the verb washes andnoticing that washing is the relevant event (1a,SVO), they may infer that the pirate is the patientof the event and look at him immediately afterhearing the verb and before he is mentioned bythe second noun phrase. After hearing paints, incontrast, the role relations make it clear that theprincess is undergoing the event, and hence theobject of the sentence, an inference that shouldresult in people rapidly anticipating the correctagent (the fencer) if people rapidly exploit de-picted events.

(1) (a) Die Prinzessin wascht den Pirat.‘The princess (amb.) washes the pirate(ACC).’

(b) Die Prinzessin malt der Fechter.

‘The princess (amb.) paints apparentlythe fencer (NOM)’.

Figure 1: Example image for sentences 1a and 1b

Indeed Knoeferle et al. found more eye move-ments to the patient (the pirate) than the agent (thefencer) of the action for SVO sentences (1) andmore inspections of the agent (the fencer) than pa-tient (the pirate) for OVS sentences (1) respec-tively. This gaze pattern occurred shortly afterpeople had heard the verb and prior to people hear-ing the disambiguating second noun phrase. Thetime course of the gaze pattern permitted the au-thors to interpret this finding as revealing rapidthematic role assignment and structural disam-biguation through verb-mediated depicted events.We replicated these findings in another language(English) and for another sentence structure (mainclause/reduced relative clause ambiguity) (Knoe-ferle & Crocker, 2006).

The findings by Knoeferle et al. (2005) thusshowed a rapid effect of depicted events on the-matic role assignment as evidenced by eye move-ments to role fillers in the scene. These findingsfurther suggest a close temporal coordination be-tween when a depicted event is identified as rele-vant by the utterance (at the verb), and the pointin time when that event affects comprehension.This view was supported by the observation thatgaze patterns revealed thematic role assignmentonly after the verb had mediated a depicted event(washing/painting) but with a close temporal coor-dination to the verb (i.e., post-verbally).

Knoeferle (2007) tested this hypothesis. Theauthors examined thematic role assignment andthe structuring of an utterance for two types ofGerman sentences that differ with respect to whenthe utterance identifies relevant role relations inthe scene. If the above “temporal coordina-

CRL Technical Reports, Vol. 19, No. 2, June 2007

6

Page 7: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

tion” account is correct, then the time-course withwhich a depicted event can affect structural disam-biguation and thematic role assignment dependson when that depicted event is identified as rele-vant for comprehension by the utterance. For ear-lier identification of a relevant depicted event, anearlier influence of that event on structure-buildingand thematic role assignment would be expectedthan for cases where identification of a relevantevent takes place comparatively later. Gaze pat-tern in the scene revealed indeed that the point intime when the utterance identified a relevant de-picted event was closely temporally coordinatedwith the point in time when that depicted eventtriggered thematic role assignment and structur-ing of the utterance. An early mediation of therelevant depicted event (before people heard theverb) triggered an earlier thematic role assign-ment than a comparatively later, verb-based me-diation of the relevant depicted event in initiallyambiguous sentences (Knoeferle, 2007). The take-home message from this set of studies is that (a)depicted events affect structural disambiguationrapidly, and (b) that scene events are used for on-line comprehension in close temporal coordinationwith utterance-mediated attention to the events.

Q2: Are scene events on a par withlinguistic knowledge for disambiguation?

The above section has reviewed findings that con-cerned the first question (Q1). In what follows,we will consider findings that pertain to the firsttwo questions (Q1 and Q2). The findings byKnoeferle et al. (2005) lend support to the viewof a temporally coordinated interplay betweenutterance-mediated attention and the use of infor-mation about scene events. Their findings andthose by Tanenhaus et al. (1995) furthermoreprovide behavioral evidence for the claim thatscene information affects the incremental struc-turing of initially structurally ambiguous utter-ances. Utterance-mediated attention in scenes isalso known to reflect various other underlying lin-guistic and non-linguistic processes such as se-mantic interpretation (Sedivy et al., 1999) the-matic interpretation (e.g., Altmann and Kamide1999), or visual search (e.g., Spivey et al., 2001).Furthermore, eye movement measures alone donot clarify whether the processes involved in re-solving local structural ambiguity through sceneinformation are similar to the processes that are

at work when linguistic cues resolve a temporarystructural ambiguity.

To better understand the influence of visualcontexts (depicted events) on structural revisionduring spoken sentence comprehension, Knoe-ferle et al. (in press) conducted event-related po-tential (ERP) studies. Measures such as ERPshave in the past been used to examine the pro-cessing of syntactic violations (e.g., Friederici,Pfeifer, and Hahne 1993; Hagoort, Brown, andGroothusen 1993; Osterhout, Holcomb, and Swin-ney 1994) and, in particular, the resolution oftemporary structural ambiguity through linguisticcues: When linguistic cues triggered structural re-vision towards a non-canonical structure duringreading in the absence of scenes, the difficulty ofthis revision has typically been associated with apositivity that has a maximum at approximately600 ms (P600, e.g., beim Graben, Saddy, andSchlesewsky, 2000; Frisch et al., 2002; Matzke etal., 2002).

Based on these findings (beim Graben, Saddy,and Schlesewsky, 2000; Frisch et al., 2002;Matzke et al., 2002) Knoeferle et al. (in press) in-vestigated the structural revision of locally struc-turally ambiguous German utterances through lin-guistic cues (e.g., case marking on the determinerof a noun phrase) and through verb-mediated de-picted events in sentences such as (1a/b). In theERP study, sentences such as (1a/b) were pre-sented with the image in Fig. 1 just as in theeye-tracking study. In addition, the sentences fromthe eye-tracking study (e.g,. 1a/b) were presentedin the absence of scenes, thus forcing late disam-biguation through linguistic cues (case marking onthe determiner of the second noun phrase disam-biguated (1a) towards SVO and (1b) towards OVSstructure). This permitted the authors to exam-ine two issues: A first important question that theauthors addressed in the ERP study was whetherbrain potentials - just as eye-movements - wouldreveal the use of depicted events for structural dis-ambiguation shortly after the verb. Evidence infavor of this view would be if we were to finda positivity for non-canonical relative to canoni-cal conditions time-locked to the onset of the verbin cases when the verb mediates relevant depictedevents.

In addition, we wanted to compare disambigua-tion through depicted events - mediated at theverb - with linguistic disambiguation through the

CRL Technical Reports, Vol. 19, No. 2, June 2007

7

Page 8: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

case marked determiner of the second, post-verbalnoun phrase (i.e., when no scenes were present).When the verb does not mediate relevant events(e.g., when no scenes are present during utterancepresentation), no early disambiguation throughverb-mediated events should be possible. As aresult, when no scenes are present, there shouldbe no P600-like component for non-canonical rel-ative to canonical conditions time-locked to theverb. Rather, we should observe late disambigua-tion on the post-verbal second noun phrase.

Findings confirmed that depicted events areimmediately used for structural disambiguation:When relevant depicted events were available forinspection during utterance presentation, findingsshowed that scenes containing depicted eventsrapidly influence the disambiguation of a locallystructurally ambiguous utterance. This was re-vealed by an earlier P600, time-locked to the verbthat mediated the events in auditory event-relatedpotentials when depicted events were simultane-ously present while people listened to the utter-ance. When no scenes were present, in contrast,and disambiguation of local structural ambiguitytowards either an object-verb-subject or a subject-verb-object order could only occur through a case-marked determiner on the second noun phrase,Knoeferle and colleagues found - as expected - noP600 time-locked to the verb. Rather, they repli-cated previous findings of later disambiguationthrough cues in the linguistic input (e.g., Matzkeet al., 2002): They observed a positivity with apeak at approximately 600 ms time-locked to theonset of the second noun phrase in response to lin-guistic disambiguation towards the non-canonicalstructure.

With respect to the role of scene informationin disambiguation it is further interesting to notethat the distribution of the P600-like componentwas similar - regardless of whether disambigua-tion was triggered by depicted events at the verb orby linguistic marking on the second noun phrase.This suggests that the kinds of cues (e.g., non-linguistic depicted events versus linguistic cuessuch as case marking) that trigger disambiguationdo not fundamentally modulate the neural corre-lates underlying disambiguation mechanisms. Itappears that scene context (e.g., depicted events) isused on a par with linguistic cues (e.g., case mark-ing) for structural disambiguation.

Q3: Are scene events preferred in situatedcomprehension over linguistic knowledge?

In fact, are scene events maybe even prioritizedover linguistic context and knowledge in situatedcomprehension? A further finding that providedevidence for both the great importance of depictedevents in comprehension and for the close tem-poral coordination between utterance comprehen-sion and the use of scene information comes fromKnoeferle and Crocker (2006). The authors di-rectly compared the importance of verb-mediatedworld knowledge about likely role-fillers (Kamideet al., 2003) with that of verb-mediated depictedevents (Knoeferle et al., 2005).

To directly compare these two sources of in-formation they relied upon characters in a clipartscene that could be identified through a verb in theutterance - relying either on people’s world knowl-edge or alternatively on events that the characterswere depicted as performing. An example imagein the eye tracking study showed two agents (e.g.,a doctor, and a cook), each performing an actionupon a patient (e.g., the tourist, see Fig. 2). Thedepicted events provided information about rolerelations (e.g., doctor-jinxing-tourist and cook-bandaging-tourist). In addition, each agent pro-vided stereotypical thematic role knowledge (e.g.,a cook is a stereotypical agent of a cooking action,and a doctor a stereotypical agent of a bandagingaction). Sentences always started with an unam-biguously accusative case-marked noun phrase re-ferring to a patient role-filler (Fig. 2, the tourist).

(2) (a) Den Touristen (ACC) verkostigt gleichder Koch (NOM)‘The tourist (ACC) serves-food-to soonthe cook (NOM).’

(b) Den Touristen (ACC) verzaubert gleichder Arzt (NOM)‘The tourist (ACC) jinxes soon thedoctor (NOM).’

(c) Den Touristen (ACC) bandagiert gleichder Arzt (NOM)‘The tourist (ACC) bandages soon thedoctor (NOM). ’

(d) Den Touristen (ACC) bandagiert gleichder Koch (NOM)‘The tourist (ACC) bandages soon thecook (NOM). ’

A first condition pair was designed to ensurethat depicted events and verb-based stereotypical

CRL Technical Reports, Vol. 19, No. 2, June 2007

8

Page 9: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Figure 2: Example image for sentences 2a to 2d

role knowledge each rapidly influence thematicrole assignment when uniquely identified. For theimage in Fig. 2 and sentence 2a, accusative case-marking on the determiner of the first noun phrasemarked the first noun phrase as the patient. Thistogether with the verb, verkostigt (‘serves-food-to’), should bias expectations towards an upcom-ing agent. World knowledge associated with theverb, in particular, identified the cook as the likelyagent of a cooking-event. The fact that there wereno other agents that performed a cooking eventor that were plausible agents for such an eventmade the cook a unique target based on worldknowledge. In contrast, in a second condition,another character - the physician - was identifiedas a unique target by means of the action depic-tion (rather than world knowledge): When peopleheard sentence 2b, the verb uniquely identified a(nontypical) agent of the verb as relevant target(the physician) through a jinxing action that thephysician was depicted as performing. No othercharacter was either a plausible agent for a jinxingaction or depicted as performing such an action.

A second condition pair contrasted stereotypi-cal knowledge and depicted events: For Fig. 2and sentences (2c) and (2d), the verb (‘bandages’)identified two agents as likely: a stereotypicalagent (the physician), and a second participant de-picted as the agent of a bandaging event (the cook)(Fig. 2). Prior to hearing the second noun phrase,the comprehension system is forced to chose be-tween relying on the immediate event depiction oron stereotypical thematic role knowledge for an-ticipating an agent in the scene since these two

types of information suggest each a different en-tity as likely agent.

Gaze pattern for conditions (2a) and (2b) inKnoeferle and Crocker (2006) showed that peo-ple relied on each of these two informationalsources when they uniquely identified a relevantscene agent: For condition (2a), this claim wasconfirmed by more inspections to the stereotyp-ical agent (the cook) than to the other agent inthe scene. For sentence (2b), in contrast, peoplelooked more often at the agent depicted as per-forming a jinxing action (the physician) than at thesecond agent in the scene (the cook).

In contrast, for conditions (2c) and (2d), we ex-pected no such interaction but rather a main ef-fect of inspections to either the agent that wasdepicted as performing a non-stereotypical action(cook-bandaging) or to the agent that was com-patible with expectations based on stereotypicalknowledge (physician-bandaging). The interest-ing question was which agent people would in-spect most for conditions (2c) and (2d) shortly af-ter the verb. The authors observed a higher propor-tion of anticipatory eye movements (prior to thesecond noun phrase) to the agent depicted as per-forming the verb action (the cook) in comparisonwith the agent that was stereotypical for the verb(the doctor). This suggests a strong preferenceof the comprehension system to rapidly rely ondepicted events over stored thematic knowledgefor incremental thematic role assignment. Notethat the finding of a greater relative priority of de-picted events over stereotypical event knowledgedoes not depend on specifics of the images andhow the event depiction was realized. The imagewas the same for all four conditions and since find-ings for (2a) and (2b) clearly showed that peoplecan use each type of information - depicted ac-tion and stereotypical event knowledge - equallywell provided it is uniquely identified. It is onlywhen the utterance is ambiguous regarding whichcharacter - the agent depicted as performing a non-stereotypical action or the stereotypical agent - isthe most likely agent of the verb that we observedthe preference to rely on the depicted events andthe associated agent over knowledge of stereotyp-ical events.

CRL Technical Reports, Vol. 19, No. 2, June 2007

9

Page 10: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Q 4 and 5: Do recent objects & eventsrapidly influence spoken comprehension?

Together, the findings that I reviewed above pro-vide strong evidence for a tight temporal coordina-tion between utterance-mediated attention and theuse of scene events for comprehension when thoseevents are simultaneously present during compre-hension. In addition, they underscore the great im-portance of scene events for utterance comprehen-sion when scenes are both co-present and relevantto comprehension.

A crucial point in both the coordinated interplayand the rapid use of scene information, however, isthe co-presence of an immediate scene: The closetime-lock of utterance and attention in a scenehas importantly been extended to serial picture-utterance presentation at least for scenes that con-tain objects. In Altmann (2004) people inspectedan image with a man, a woman, a cake, and anewspaper, and then the screen went blank (seealso Richardson & Spivey, 2000; Spivey, Richard-son, & Fitneva, 2004). Two seconds later, peo-ple heard a sentence that described part of thepreviously-inspected scene (e.g., The man will eat

the cake). Once people had heard the verb inthe sentence, they rapidly looked at the locationon the blank screen where previously there hadbeen a cake. The time-course of eye-movementsin the serial-presentation study closely ressembledthe time-course of gaze-patterns in an earlier study(Altmann & Kamide, 1999) with concurrent sceneand utterance presentation. The data by Altmann(2004) suggest that even when a prior scene is nolonger immediately available, mental representa-tions of the spatial configuration of objects in thescene may remain accessible for comprehension.

It was unclear whether findings by Altmann(2004) extend to complex scenes that containagent-action-patient events in addition to objectsand characters, and to scene information that isnon-stereotypical, or even implausible, as withthe depicted events in the studies by Knoeferleand Crocker (2006). Furthermore, even if peoplerapidly exploit depicted event information whenthose events are both non-stereotypical and nolonger visually present during utterance presenta-tion, it is unclear whether they still prioritize themover long-term knowledge of stereotypical agentsfor thematic interpretation.

Knoeferle and Crocker (in press) examinedthese issues by modifying scene presentation. In

real life, for instance, scene events may be at agreater disadvantage than entities regarding theirco-presence. Real-world actions are often fleet-ing, completed, and part of the recent past, andmay as a result only be briefly available for inspec-tion, and not necessarily concurrently with utter-ance comprehension. Event participants, however,often remain for some time after performing an ac-tion. To put the depicted event preference to a test,they used the stimuli from Knoeferle and Crocker(2006, Exp. 2), but presented them in sequencesof four images to briefly depict the actions, thenremove them with only the characters remainingas people listened to the utterance. This presenta-tion was a first, coarse-grained approximation ofmore dynamic event sequences. It further disad-vantaged the depicted events alone since the char-acters remained visible during comprehension andwere thus in the scene for performing future ac-tions.

Gaze patterns confirmed that - even when thedepicted actions were absent while the charactersremained in the scene - people were able to use ei-ther verb-based stereotypical thematic knowledgeor non-stereotypical depicted events when the verbuniquely identified these information types as rel-evant. However, when the verb identified two dif-ferent agents as relevant, there no longer was apreferential anticipation of the agent that had pre-viously been depicted as performing an action.There was a tendency to anticipate the stereotyp-ical agent more often than the agent of the (non-stereotypical) depicted event, but this effect wasnot significant. Regarding the temporal coordina-tion, the time course of gaze patterns was similarto the original study in which the events had beensimultaneously during utterance comprehension.

The analyses of gaze patterns from this studysuggest that the visual co-presence of an event isnot required to explain the findings by Knoeferleand Crocker (2006). If visual co-presence werecrucial, we should not have replicated the findingthat people still rely on depicted events with dy-namic action presentation and when only the char-acters were simultaneously present during utter-ance presentation. On the other hand, if modulat-ing the co-presence of events did not affect the pri-

ority of depicted events for comprehension, thenthe preference to rely on depicted events shouldhave been replicated with dynamic action presen-tation.

CRL Technical Reports, Vol. 19, No. 2, June 2007

10

Page 11: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Summary of the experimental findings

The findings from eye tracking gaze on a displayduring comprehension of a concurrently presentedutterance have shown that depicted events rapidlyinfluence structural disambiguation; that there isa close temporal coordination between the identi-fication of relevant scene information through theutterance, attention to relevant scene information,and the use of associated scene events for struc-tural disambiguation. Our data further provide ev-idence for the view that scene events are on a parwith linguistic cues for structural disambiguation,and that scene events are even prioritized for com-prehension over stereotypical thematic role knowl-edge.

The close temporal coordination was not inter-rupted by the absence of recently inspected sceneevents during utterance comprehension. The pre-ferred reliance on scene events, in contrast, wasdiminished when scene events were not simultane-ously present during utterance presentation whilestereotypical thematic role fillers remained in thescene during utterance presentation.

This suggests that the close temporal coordina-tion plays a great role in the use of scene informa-tion during comprehension. For the greater rela-tive importance of scene events our findings sug-gest that their influence relative to that of other in-formation sources such as stereotypical thematicrole knowledge decays when non-stereotypicalscene events are no longer present during utterancepresentation while characters that proffer stereo-typical knowledge are present in the scene. Wenow discuss the findings from the preceding sec-tions regarding their theoretical implications fortheories of language comprehension.

Implications for frameworks of languagecomprehension

Findings from the above studies have provided ev-idence for a fluid interaction between linguisticprocesses and the output of the visual perceptualsystem. To account for these findings, Knoeferleand colleagues have considered existing frame-works of the language system and language pro-cessing.

Prior research has suggested that whether utter-ance and scene can rapidly combine depends to agreat extent on the architecture (i.e., the static ar-rangement) of the systems that perceive and pro-cess utterance and scene: Fodor (1983), for in-

stance, postulated strong architectural restrictionson the informational interaction between distinctcognitive systems such as language and vision. Heproposed that the mind was organized into trans-ducers, input modules, and a central system. Anillustrative sketch of this type of architecture isprovided in Figure 3 (see Coltheart, 1999, p.116).Transducers convert physical stimuli into neuralactivity that is then passed on to the input mod-ules. These interpret the transduced signal. Ex-amples for input modules are the language or vi-sual perception systems, and their characterizingproperty is that they are ‘modular’. While the con-cept of modularity is defined by several charac-teristics, one of the most important properties ofa modular system is informational encapsuation.This term means that distinct input modules suchas language and visual perception have only ac-cess to the output of another module, but cannotinfluence its internal processes. In Figure 3 this isillustrated by the lack of arrows between the inputmodules. In consequence, in a Fodorian frame-work the answer to the question of whether sceneinformation can influence the incremental inter-pretation of an unfolding utterance is that the out-put of the visual perceptual system can only com-bine with the output of the language system. On-going scene perception cannot, however, directlyinfluence core linguistic processes that are inter-nal to the language system such as the incrementalsyntactic structuring of an utterance. As a result ofthis assumption, direct consideration of the influ-ence of scene perception on such core comprehen-sion processes is not possible (Knoeferle, 2005).

The findings that we have presented in this pa-per, as well as existing results in the literature (e.g.,Tanenhaus et al., 1995) exclude strictly modularaccounts that postulate informational encapsula-tion of processes internal to the language system(e.g., Frazier & Clifton, 1996). Nonetheless, it ap-pears that under the influence of a Fodorian viewof the mind, many psycholinguistic theories of on-line language comprehension have been developedon the basis of the assumption that the mecha-nisms underlying language comprehension can beexamined in isolation from the perception of sceneinformation. This becomes apparent in the factthat scene information has not been explicitly in-cluded in most psycholinguistic frameworks of on-line language comprehension (e.g., Forster, 1979;Frazier & Fodor, 1979; Frazier & Clifton, 1996;

CRL Technical Reports, Vol. 19, No. 2, June 2007

11

Page 12: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Figure 3: Schematic outline of a Fodorian organi-zation of the mind, Coltheart, 1999, p. 116

MacDonald, Pearlmutter, & Seidenberg, 1994;Pickering, Traxler, & Crocker, 2000; Townsend &Bever, 2001). Furthermore, the visual perceptualsystem has not been explicitly included as a cog-nitive system that might proffer important infor-mation for comprehension. We argue that as a re-sult such theories do not make sufficiently detailedpredictions for language comprehension in situa-tions in which language is about the scene (i.e.,detailing the time course, temporal coordination,and relative importance of scene information).

Only a few interactionist accounts of on-linesentence comprehension explicitly include sceneinformation as an informational source and char-acterize an alternative view to Fodorian modular-ism (e.g., Altmann & Steedman, 1988; Tanenhauset al., 1995). Interactionist theories and the Ref-erential Theory of sentence processing (Crain &Steedman, 1985; Altmann & Steedman, 1988) ac-count for the findings by Tanenhaus et al. (1995)in settings that contain objects. They do not, how-ever, provide the rich inventory of ontological cat-egories, referential expressions and mental repre-sentations required to account for the influence ofdepicted events.

To account for the rapid, verb-mediated influ-ence of depicted events, Knoeferle et al. (2005)have adopted the Jackendoff architecture (Jack-endoff, 1997, 2002) as a suitable basis for de-scribing the processing mechanisms that underlie

on-line comprehension (see Fig. 4). Jackendoff(2002, p. 198) proposes a parallel constraint-basedprocessing architecture and includes a suitablyrich inventory of representations. Interface pro-cessors allow information exchange between se-mantic/conceptual structure and perception or ac-tion, permitting us to describe a rapid, incrementalinterplay between scene information and sentencecomprehension. The individual levels in Jackend-offs architecture are modular in the sense of beingdomain-specific (i.e., their representational vocab-ulary is specialized), but unlike Fodorian modu-larity, linguistic structures are linked among them-selves and to other cognitive sub-systems (see Fig.4). The version of modularity advocated in Jack-endoff hence permits incremental communicationbetween ongoing processes of the phonological,syntactic, and conceptual systems. Importantly, italso allows incremental information exchange be-tween conceptual structure and perception or ac-tion via interface processors. Jackendoff s frame-work provides an interface at which conceptualstructure and the visual system can be linked, andthus provides a means for describing how com-prehension proceeds when these different types ofinformation are available and interact (see Knoe-ferle, 2005).

Figure 4: Schematic outline of the Jackendoff ar-chitecture (Jackendoff, 2002, p. 1999)

While Jackendoff outlines an architecture forsituated comprehension, he does not provide aprocessing account of the coordinated interplaybetween utterance comprehension, attention, andthe use of scene information. Based on the above

CRL Technical Reports, Vol. 19, No. 2, June 2007

12

Page 13: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

findings, and further motivated by insights con-cerning the importance of both joint attention andscene information in language acquisition (e.g.,Harris et al., 1986; Dunham et al., 1993; Snow,1977), Knoeferle and Crocker (2006) and Knoe-ferle and Crocker (in press) outlined the Coordi-nated Interplay Account of scene-sentence inter-action in adult utterance comprehension. Brieflyrecall the findings that the authors had to accountfor:

F1 Rapid effect of scene objects / events on in-cremental structure building

F2 Close temporal coordination between thepoint in time when scene information is iden-tified as relevant by the utterance and when itaffects online comprehension.

F3 Scene context is on a par with linguistic cuesfor structural disambiguation and may evenbe prioritized for online comprehension

F4 The absence of recently inspected events dur-ing language comprehension does not disruptthe close temporal coordination between ut-terance processing and the influence of thosescene events on comprehension

F5 The preferred reliance on scene events is afunction of their decay in working memory

We first outline at a general level how theCoordinated Interplay Account by Knoeferle andCrocker operates. Subsequent to that we will pro-vide two illustrative examples. The CoordinatedInterplay Account identifies two basic steps in sit-uated comprehension. Note that we assume theseprocesses overlap modulo informational depen-dencies. First, comprehension of the unfoldingutterance guides attention in the scene. This per-mits establishing reference to objects (e.g., Tanen-haus et al., 1995) and events (e.g., Knoeferle et al.,2005) and anticipating likely referents (e.g., Alt-mann & Kamide, 1999). The currently processedword is integrated into the structure and interpre-tation of the utterance, and the listener derives lin-guistic expectations in the process. The listenerthen inspects the scene for objects/events that thecurrent input refers to, and he may further ex-plore the scene, anticipating objects and / or eventsbased on her linguistic and world-knowledge ex-pectations. Upon finding an object / event thatis the referent for the current input, reference to

that object/event is established by forging a linkbetween the meaning of the referring expressionin the input and the designated object/event inthe scene/world. In the model, this is achievedthrough coindexing of referring expressions in theutterance (e.g., nouns and verbs, see Knoeferle etal., 2005) with corresponding scene objects andevents (see also Jackendoff, 2002).

In the second step, once attention has shifted tothe most relevant object or event, the scene infor-mation that has been inspected rapidly influencesutterance comprehension (Knoeferle & Crocker,2006; Knoeferle, 2007). After any revisions, thenext word is integrated with the revised interpre-tation, yielding a new interpretation and derivedlinguistic expectations. The Coordinated InterplayAccount crucially suggests that the close time-lock between comprehension and attention in thescene is at the origin of the relative priority ofimmediately depicted events over knowledge ofstereotypical events in comprehension.

Findings F4 and F5 furthermore point to an ac-count in which prior scene information can informsubsequent comprehension, but also in which ob-jects and events that are no longer present dur-ing comprehension experience some decay. Thesefindings (F4/5) have motivated a revision of theoriginal Coordinated Interplay Account as out-lined above. This revised version differs fromthe account described above in that it incorporatesan explicit working memory from which recentlyinspected scene information can be accessed forrapid use during online language comprehensio.Knoeferle and Crocker (in press) furthermore re-vised the original mechanism such that represen-tations only available in working memory can stillinfluence subsequent comprehension but are lesssalient than representations that still receive sup-port from corresponding objects/events that arepresent during utterance presentation. The revisedmechanism thus assumes that entities and eventsin working memory, which are no longer in thescene, decay. This mechanism permits the au-thors to account for the diminished priority of re-cent scene events while it ensures that recent sceneevents are available during online comprehension.

I now present an example for how structural dis-ambiguation through depicted events takes placeaccording to the account. Consider the exam-ple sentence 1a, Die Prinzessin wascht den Pi-

rat. When people process the first noun phrase,

CRL Technical Reports, Vol. 19, No. 2, June 2007

13

Page 14: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

they look for an appropriate referent and establishreference to the relevant entity in the correspond-ing scene, the princess (Fig. 1). People duringthat time mostly inspect the princess and proba-bly notice proximal scene information such as theaction that she performs. Based on both linguis-tic preferences of subject-initial constituent orderand the scene depiction, people likely interpretthe noun phrase ‘the princess’ as the subject andagent of the sentence. For SVO sentences, peo-ple then hear the verb ‘washes’. They search fora matching action referent. Their expectations ofthe princess being the agent is reinforced when theverb matches the action that the princess performs.As a result people develop expectations about alikely patient of the washing action and anticipatethe pirate shortly after the verb more often than theother role-filler (the fencer). For OVS sentences,in contrast, people hear the verb ‘paints’. Whenthey try to find a referent, they notice that the ac-tion performed by the princess is incongruent withthe verb ‘paints’. The model assumes people thenengage in a search for an appropriate referent, andnotice the painting action. They establish refer-ence between the verb ‘paints’ and the appropriatepainting action. Upon inspecting that action theymay notice proximal scene information such as theagent (fencer) associated with the action, and itspatient, the princess. The event information canthen be used to revise the initial thematic interpre-tation and syntactic structure of the utterance.

As a further example, consider an outline ofhow the model accounts for the greater relativepriority of depicted events. Consider the ex-ample sentence Den Touristen bandagiert gleich

der Doktor (‘The touristacc/obj bandages soon thephysiciannom/subj’). Recall that the correspond-ing scene showed a tourist, a cook bandaging thetourist, and a physician jinxing the tourist (see Fig.2). When people encounter the first noun phraseDen Touristen (‘the tourist’), the meaning of thenoun Tourist is accessed and integrated with lin-guistic constraints (e.g., accusative case markingon the determiner of the noun phrase). People be-gin to build an interpretation with the tourist asthe patient of the action. Most eye movementsare directed towards the tourist during referentialprocessing, and the comprehension system estab-lishes reference to the tourist in the scene. Peoplemay also explore scene regions that are close bythe tourist, and thus notice that the tourist is the

patient of two events: a wizard that is spying onthe tourist, and a detective that is serving food tothe tourist.

Then people encounter the verb bandagiert

(‘bandages’). Its meaning is accessed, and theinterpretation that is built consists of a bandag-ing action of which the tourist is the patient. Atthis point, verb meaning creates linguistic expec-tations of a physician as a stereotypical agent of‘bandages’. The scene events, in contrast, suggestthat the cook is depicted as the agent of a ban-daging action provided people have perceived thisevent. Based on verb meaning, people attempt toestablish reference between the verb an an appro-priate action, prompting them to look at the de-picted bandaging action. In the process they alsolook at proximal objects such as the agent of thatevent (the cook). Informed by the scene event, thecook is now taken to be the most relevant agent forthe bandaging action. For more detailed examplesalso on how the account deals with the influenceof recent scene events see Knoeferle et al. (2005).

Conclusions

To summarize, depicted events influenced spokencomprehension rapidly both when a relevant scenewas present and when it was absent. The greaterrelative priority of scene events over stereotyp-ical thematic role knowledge, however, was di-minished when the events were no longer co-present while characters and their stereotypical af-fordances were supported by the immediate visualcontext.

While accounting for the above findingsthrough the coordinated interplay mechanism anddecay in working memory, there are obviouslyother factors that have not yet been explicitly in-cluded in the Coordinated Interplay Account andthat may influence the use and relative importanceof scene objects and events. One example is theextent of referential success (see Knoeferle andCrocker (2006, in press)); Another issue is loca-tional or temporal cues in the utterance that clarifythe (ir)relevance of the immediate scene. Finally,the availability of social interaction (e.g., gestures)for explicitly directing attention to scene informa-tion will likely modulate the extent to which peo-ple prefer to rely on scene events.

Imagine a situation in which you sit in front ofthe television with a friend, and your friend says“You know, I really loved those cookies at aunt

CRL Technical Reports, Vol. 19, No. 2, June 2007

14

Page 15: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

Marge’s yesterday”. At the same time, the TVscreen displays a newscaster. In this situation,none of the referring expressions are present onthe screen. Furthermore, the utterance refers to anevent in the past. Both of these constraints willobviously decrease reliance on information aboutobjects and events from the immediate screen. Inthe absence of cues to bias against a strong rela-tive importance of scene information, however, theCoordinated Interplay Account predicts that refer-ents and associated proximal scene objects/eventsthat one pays attention to will play a dominant rolein guiding situated comprehension processes (seeKnoeferle & Crocker, in press).

Acknowledgements

This paper was written while the author was ona postdoctoral fellowship awarded by the GermanResearch Foundation (DFG). Thanks go to oneanonymous reviewer for her / his comments. Wealso would like to thank Roger Levy for helpfulcomments on the manuscript.

References

Altmann, G. T. M. (2004). Language-mediatedeye-movements in the absence of a visualworld: the ‘blank screen paradigm’. Cogni-

tion, 93, B79–B87.Altmann, G. T. M., & Kamide, Y. (1999). In-

cremental interpretation at verbs: restrictingthe domain of subsequent reference. Cogni-

tion, 73, 247–264.Altmann, G. T. M., & Steedman, M. (1988). Inter-

action with context during human sentenceprocessing. Cognition, 30, 191–238.

Carroll, P. J., Young, J. R., & Guertin, M. S.(1992). Visual analysis of cartoons: a viewfrom the far side. In K. Rayner (Ed.), Eye-

movements and visual cognition: scene per-

ception and reading (pp. 444–461). NewYork: Springer-Verlag.

Coltheart, M. (1999). Modularity and cognition.Trends in Cognitive Science, 3, 115–120.

Cooper, R. (1974). The control of eye fixa-tion by the meaning of spoken language: anew methodology for the real-time investi-gation of speech perception, memory, andlanguage processing. Cognitive Psychology,6, 84–107.

Crain, S., & Steedman, M. (1985). On not beingled up the garden path: the use of context

by the psychological parser. In D. Dowty,L. Karttunnen, & A. Zwicky (Eds.), Natu-

ral language parsing (pp. 320–358). Cam-bridge, MA: Cambridge University Press.

Dunham, P. J., Dunham, F., & Curwin, A. (1993).Joint-attentional states and lexical acquisi-tion at 18 months. Developmental Psychol-

ogy, 29, 827–831.Feeney, A., Hola, A. K. W., Liversedge, S. P.,

Findley, J. M., & Metcalf, R. (2000). Howpeople extract information from graphs: ev-idence from a sentence-graph verificationparadigm. In M. Anderson, P. Cheng, &V. Haarslev (Eds.), Diagrams (pp. 149–161). Berlin: Springer Verlag.

Fodor, J. (1983). Modularity of mind. Cambridge,MA: MIT Press.

Forster, K. (1979). Levels of processing andthe structure of the language processor. InW. E. Cooper & E. C. T. Waler (Eds.), Sen-

tence processing: psycholinguistic studies

presented to Merrill Garrett (pp. 27–85).Hillsdale, NJ: Lawrence Erlbaum.

Frazier, L., & Clifton, C. (1996). Construal. Cam-bridge, MA: MIT Press.

Frazier, L., & Fodor, J. D. (1979). The sausagemachine: a new two-stage parsing model.Cognition, 6, 291–325.

Gillette, J., Gleitman, H., Gleitman, L., & Lederer,A. (1999). Human simulations of vocabu-lary learning. Cognition, 73, 135–176.

Gleitman, L. (1990). The structural sources ofverb meanings. Language Acquisition, 1, 3–55.

Harris, M., Jones, D., Brookes, S., & Grant, J.(1986). Relations between the non-verbalcontext of maternal speech and rate of lan-guage development. British Journal of De-

velopmental Psychology, 4, 261–268.Hemforth, B. (1993). Kognitives Parsing:

Reprasentation und Verarbeitung sprach-

lichen Wissens. Sankt Augustin: Infix-Verlag.

Jackendoff, R. (1997). The architecture of the lan-

guage faculty. Cambridge, MA: MIT Press.Jackendoff, R. (2002). Foundations of language:

brain, meaning, grammar, evolution. Ox-ford, UK: Oxford University Press.

Kamide, Y., Scheepers, C., & Altmann, G. T. M.(2003). Integration of syntactic and se-mantic information in predictive processing:

CRL Technical Reports, Vol. 19, No. 2, June 2007

15

Page 16: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

cross-linguistic evidence from German andEnglish. Journal of Psycholinguistic Re-

search, 32, 37–55.Knoeferle, P. (2005). The role of visual scenes in

spoken language comprehension: Evidence

from eye-tracking. http://scidok.sulb.uni-saarland.de/volltexte/2005/438: SaarlandUniversity. (Doctoral Dissertation in Com-putational Linguistics)

Knoeferle, P. (2007). Comparing the time-courseof processing initially ambiguous and un-ambiguous german SVO/OVS sentences indepicted events. In R. van Gompel, M. Fis-cher, W. Murray, & R. Hill (Eds.), Eye

movement research: insights into mind and

brain. Oxford: Elsevier.Knoeferle, P., & Crocker, M. W. (2006). The co-

ordinated interplay of scene, utterance, andworld knowledge: evidence from eye track-ing. Cognitive Science, 30, 481–529.

Knoeferle, P., & Crocker, M. W. (in press). Theinfluence of recent scene events on spokencomprehension: Evidence from eye move-ments. Journal of Memory and Language.

Knoeferle, P., Crocker, M. W., Scheepers, C., &Pickering, M. J. (2005). The influence ofthe immediate visual context on incrementalthematic role-assignment: Evidence fromeye-movements in depicted events. Cogni-

tion, 95, 95–127.Knoeferle, P., Habets, B., Crocker, M. W., &

Munte, T. F. (in press). Visual scenestrigger immediate syntactic reanalysis: ev-idence from ERPs during situated spokencomprehension. Cerebral Cortex.

MacDonald, M. C., Pearlmutter, N. J., & Seiden-berg, M. S. (1994). The lexical nature ofsyntactic ambiguity resolution. Psychologi-

cal Review, 101, 676–703.McRae, K., Ferretti, T. R., & Amyote, L. (1997).

Thematic roles as verb-specific concepts.Language and Cognitive Processes, 12,137–176.

McRae, K., Spivey-Knowlton, M. J., & Tanen-haus, M. K. (1998). Modeling the influ-ence of thematic fit (and other constraints)in on-line sentence comprehension. Journal

of Memory and Language, 38, 283–312.Pickering, M. J., Traxler, M. J., & Crocker, M. W.

(2000). Ambiguity resolution in sentenceprocessing: Evidence against frequency-

based accounts. Journal of Memory and

Language, 43, 447–475.Rayner, K., Rotello, C. M., Stewart, A. J., Keir, J.,

& Duffy, S. A. (2001). Integrating text andpictorial information: eye movements whenlooking at print advertisements. Journal of

Experimental Psychology: Applied, 7, 219–226.

Sedivy, J. C., Tanenhaus, M. K., Chambers, C. G.,& Carlson, G. N. (1999). Achieving incre-mental semantic interpretation through con-textual representation. Cognition, 71, 109–148.

Snow, C. (1977). Mothers’ speech research: frominput to interaction. In C. Snow & C. A.Ferguson (Eds.), Talking to children: lan-

guage input and acquisition. Cambridge,MA: Cambridge University Press.

Spivey, M. J., Tanenhaus, M. K., Eberhard, K. M.,& Sedivy, J. C. (2002). Eye-movements andspoken language comprehension: effects ofvisual context on syntactic ambiguity reso-lution. Cognitive Psychology, 45, 447–481.

Spivey, M. J., Tyler, M. J., Eberhard, K. M., &Tanenhaus, M. K. (2001). Linguisticallymediated visual search. Psychological Sci-

ence, 12, 282–286.Tanenhaus, M. K., Spivey-Knowlton, M. J., Eber-

hard, K., & Sedivy, J. C. (1995). Integrationof visual and linguistic information in spo-ken language comprehension. Science, 268,632–634.

Tomasello, M., & Farrar, M. J. (1986). Joint at-tention and early language. Child Develop-

ment, 57, 1454–1463.Townsend, D. J., & Bever, T. G. (2001). Sentence

comprehension: the integration of habits

and rules. Cambridge, MA: MIT Press.Traxler, M. J., & Pickering, M. J. (1996). Plausi-

bility and the processing of unbounded de-pendencies: an eye-tracking study. Journal

of Memory and Language, 35, 454–475.Trueswell, J. C., Tanenhaus, M. K., & Garnsey,

S. M. (1994). Semantic influences on pars-ing: use of thematic role information in syn-tactic disambiguation. Journal of Memory

and Language, 33, 285–318.Trueswell, J. C., Tanenhaus, M. K., & Kello, C.

(1993). Verb-specific constraints in sen-tence processing: separating effects of lex-ical preference from garden-paths. Jour-

CRL Technical Reports, Vol. 19, No. 2, June 2007

16

Page 17: CENTER FOR RESEARCH IN LANGUAGE · Center for Research in Language, University of California, San Diego EDITOR’S NOTE The CRL Technical Report replaces the feature article previously

nal of Experimental Psychology: Learning,

Memory and Cognition, 19, 528–553.

CRL Technical Reports, Vol. 19, No. 2, June 2007

17