sfb 1102 a5: distributing referential information across ...masta/ss16/10_surprisalinvwp.pdf · a5:...

68
SFB 1102 A5: Distributing referential information across modalities Christine Ankener & Mirjana Sekicki PhD Students Maria Staudte Principal Investigator SS16 - (Embodied) Language Comprehension Maria Staudte

Upload: truongminh

Post on 30-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

SFB 1102

A5: Distributing referential information across modalities

Christine Ankener & Mirjana Sekicki PhD Students

Maria Staudte Principal Investigator

SS16 - (Embodied) Language Comprehension

Maria Staudte

Prediction & Surprisal

2

Maria Staudte Project A5

“train.”“The

boy will eat the...”

time

Prediction Surprisal

P=0.1

P=0.3

P=0.6

S=?

Figure 1: Anticipation of and surprisal on “train”: Based on the context up to “train”, several objectsare anticipated/predicted, the cake being the most probable object. Due to the object anticipation,predictability of the word “train” is likely to rise, too. Once the target word “train” is uttered, the surprisalon this word can be also be estimated empirically.

expectation that the listener forms about upcoming referents, which is not identical to the information-theoretic term predictability of a word since a) a word may well be predictable but not anticipated becauseit is not co-present in the scene, for instance, and b) anticipation does not necessarily occur on the lexicallevel but firstly targets objects and possibly their semantic category.

This project will connect the VWP to an information-theoretic account of language processing in orderto quantify the contributions of visual and other non-verbal information sources to the predictability andsurprisal of a word — and vice versa. One challenge in this approach is to relate the anticipation for aconcrete object to the information-theoretic predictability of and the surprisal on a specific word. Thisproject serves to investigate this relationship and quantify how such extra-sentential context influencessurprisal.

Specifically, we predict that, similar to the selectional restrictions of a verb, the visual context as wellas referential speaker gaze will be used to constrain the set of possible continuations, i.e., to reduceentropy for the remaining utterance and raise the predictability of certain continuations (Hale, 2006),which in turn reduces the surprisal of the actual word. Consider the illustration in Fig. 1 which visualizeshow both concepts, predictability and surprisal, refer to different stages in the comprehension processand how both can be estimated. The utterance up to “The boy will eat the” as well as the visual sceneshowing the speaker and some edible and non-edible objects is considered as context for the upcomingword “train”. A traditional information-theoretic approach would use the spoken utterance by itself tocompute the predictability for the word “train” based on the likelihood that the word “train” occurs after“The boy will eat the”. Naturally this likelihood is very low.

However, results from the VWP suggest that providing a co-present object, here the train, can in-crease the perceived likelihood of the train to be among the upcoming spoken references — even if thecloze probability of that word (indexing purely linguistic predictability) would otherwise be low. Fixationproportions on the displayed objects prior to mentioning the target word indicate which objects the lis-tener anticipates to come up and to which extent. Fig. 1 exemplifies these proportions and shows howanticipation may index the predictability of an object/word1 from the listener’s point of view (i.e., with herlinguistic competence and the available visual context). As is also shown in Fig. 1, the surprisal of thetarget word “train” should be observable once it is uttered. A traditional information-theoretic approachcomputes the log-probability of this word but, as we have just argued, it is unclear what the precise

1It is unclear whether a specific word or a set of words (“train” or “toy train” or “train set” ) or, for instance, an abstract set offeatures related to the object is actually predicted by the listener (Rommers et al., 2013).

121

Reasoning

• Understanding information -> cognitive load

• Less predictable input -> higher cognitive load

• Surprisal (Hale 2001) reliable linking function between predictability and cognitive load

• But: Readings times (indexing cognitive load) also correlate with Entropy Reduction (Frank 2013)

Sample Stimulus

4

t-Shirtsock

arm chair

The woman describes soon the t-Shirtsock

arm chair

“plausible”“possible”

“anomalous”

“possible”

or “sanity check”

The woman irons soon the

Classical measures of cognitive load

5

Plausibility Ratings, Cloze

Reading self-paced

Reading eye-trackedLexical Decision

Plausibility Ratings

Plausibility rating on a 7 - point Likert scale

Intructions: „Lies dir die folgenden Sätze aufmerksam durch und bestimme anschließend, wie plausibel diese inhaltlich für dich sind. Die Skala reicht von 1 (sehr plausibel) bis zu 7 (gar nicht plausibel, bzw. schwer bis nicht vorstellbar). Hierbei wäre ein nicht vorstellbarer Satzinhalt z.B. 'Der Mann fährt das Brot.' eine 7.“

Offline Pre-Tests II

Forced choice cloze task

• Instructions: „Im Folgenden findest du unvollständige Sätze. Deine Aufgabe ist es, spontan und ohne zu lange darüber nachzudenken das Objekt aus den Optionen auszuwählen, das dir am passendsten erscheint, um die Lücke zu ersetzen.“

anomalous less plausible plausible0.4% 13% 84%

Self-paced reading: per word

Die | Frau | bügelt | gleich | das | T-shirt | in | der | Washküche. |

• design: 2 (verbs) x 3 (objects)

• 36 experimental sentences; 36 fillers

• each filler sentence illustrated a highly predictable context; followed by a simple yes/no content question

• 30 participants; (7 male); age ranging from 20 to 32 (M = 24)

• Button press: Reaction Times

Eye-tracking

• 25 participants; 8 men; age ranging from 18 to 34 (M = 23.16, SD = 4.49)

• SR Research EyeLink 1000+

• 35 filler sentences; simple yes/no content questions

• First-pass / total reading times

Auditory Lexical Decision Task

Die Frau bügelt gleich - [500ms] - das T-shirt.

• audio stimuli, LDT on the target word

• 24 participants; 4 male; age ranging from 18 to 36 (M = 24, SD = 4.43)

• Button press -> reaction time

Classical measures of cognitive load

11

Plausibility Ratings, Cloze

Reading self-paced

Reading eye-trackedLexical Decision

1) iron 2) describe

Object

Reading time on final word

a) t-

shi

rt

a) t-

shi

rt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Read

ing

time

*1. Self-paced Reading:

Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.

1) iron 2) describe

0

2

4

6

a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject

object

a) t−shirt

b) sock

c) armchair

Rating Scores

*

**

1) iron 2) describe

Plausibility rating scores (95% CI-error bars)

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Ratin

g sc

ores

Low predictability: A comparison of paradigms

Christine Ankener Mirjana Sekicki

P.I.: Dr. Maria Staudte

Saarland University, Germany

Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016

Materials & Offline-Verification

Online - Comparison of Paradigms

• Plausibility & cloze probability assessed by 2 offline pre-tests

=> Questions:• Does plausibility affect cognitive load in

different experimental methods ?

• Can novel measure (ICA) pick up plausibility effects?

=> Contribution:• We compare effects of plausibility manipulation

across different paradigms and modalities

• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)

Summary & Conclusion

• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts

References:

DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.

Table 1: Sample items and corresponding offline results.

Item noun condition plausibility cloze probability%M (SD) M (SD)

(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01

(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA

sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.

Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.

Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).

Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.

Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-

2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)

erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.

The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.

Offline Experiments

First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-

Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.

• Significant 3 way interaction (1a vs. 1b & Verb & Rating)

• Significant interaction (1b vs. 1c & Verb)

3. Auditory Lexical Decision Task:

2) describe

Object

ICA

mea

ns

ICA means per condition / right eye

a) t-

shirt

b) s

ock

b) s

ock

a) t-

shirt

*1) iron

Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.

• condition c) dropped

4. ICA Listening:

Total reading time on target word

Tota

l rea

ding

tim

e

Object

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

1) iron 2) describe

**

Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.

• Sign. difference also in regression from target

• No sign. in first pass

2. Eye-tracking Reading:

1) iron 2) describe

Object

Reading time on final worda)

t- s

hirt

a) t-

shi

rt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Read

ing

time

*1. Self-paced Reading:

Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.

1) iron 2) describe

0

2

4

6

a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject

object

a) t−shirt

b) sock

c) armchair

Rating Scores

*

**

1) iron 2) describe

Plausibility rating scores (95% CI-error bars)

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Ratin

g sc

ores

Low predictability: A comparison of paradigms

Christine Ankener Mirjana Sekicki

P.I.: Dr. Maria Staudte

Saarland University, Germany

Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016

Materials & Offline-Verification

Online - Comparison of Paradigms

• Plausibility & cloze probability assessed by 2 offline pre-tests

=> Questions:• Does plausibility affect cognitive load in

different experimental methods ?

• Can novel measure (ICA) pick up plausibility effects?

=> Contribution:• We compare effects of plausibility manipulation

across different paradigms and modalities

• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)

Summary & Conclusion

• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts

References:

DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.

Table 1: Sample items and corresponding offline results.

Item noun condition plausibility cloze probability%M (SD) M (SD)

(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01

(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA

sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.

Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.

Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).

Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.

Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-

2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)

erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.

The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.

Offline Experiments

First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-

Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.

• Significant 3 way interaction (1a vs. 1b & Verb & Rating)

• Significant interaction (1b vs. 1c & Verb)

3. Auditory Lexical Decision Task:

2) describe

Object

ICA

mea

ns

ICA means per condition / right eye

a) t-

shirt

b) s

ock

b) s

ock

a) t-

shirt

*1) iron

Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.

• condition c) dropped

4. ICA Listening:

Total reading time on target word

Tota

l rea

ding

tim

e

Object

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

1) iron 2) describe

**

Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.

• Sign. difference also in regression from target

• No sign. in first pass

2. Eye-tracking Reading:1) iron 2) describe

Object

Reading time on final word

a) t-

shi

rt

a) t-

shi

rt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Read

ing

time

*1. Self-paced Reading:

Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.

1) iron 2) describe

0

2

4

6

a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject

object

a) t−shirt

b) sock

c) armchair

Rating Scores

*

**

1) iron 2) describe

Plausibility rating scores (95% CI-error bars)

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Ratin

g sc

ores

Low predictability: A comparison of paradigms

Christine Ankener Mirjana Sekicki

P.I.: Dr. Maria Staudte

Saarland University, Germany

Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016

Materials & Offline-Verification

Online - Comparison of Paradigms

• Plausibility & cloze probability assessed by 2 offline pre-tests

=> Questions:• Does plausibility affect cognitive load in

different experimental methods ?

• Can novel measure (ICA) pick up plausibility effects?

=> Contribution:• We compare effects of plausibility manipulation

across different paradigms and modalities

• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)

Summary & Conclusion

• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts

References:

DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.

Table 1: Sample items and corresponding offline results.

Item noun condition plausibility cloze probability%M (SD) M (SD)

(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01

(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA

sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.

Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.

Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).

Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.

Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-

2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)

erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.

The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.

Offline Experiments

First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-

Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.

• Significant 3 way interaction (1a vs. 1b & Verb & Rating)

• Significant interaction (1b vs. 1c & Verb)

3. Auditory Lexical Decision Task:

2) describe

Object

ICA

mea

ns

ICA means per condition / right eye

a) t-

shirt

b) s

ock

b) s

ock

a) t-

shirt

*1) iron

Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.

• condition c) dropped

4. ICA Listening:

Total reading time on target word

Tota

l rea

ding

tim

e

Object

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

1) iron 2) describe

**

Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.

• Sign. difference also in regression from target

• No sign. in first pass

2. Eye-tracking Reading:

iron describe1) iron 2) describe

Object

Reading time on final word

a) t-

shi

rt

a) t-

shi

rt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Read

ing

time

*1. Self-paced Reading:

Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.

1) iron 2) describe

0

2

4

6

a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject

object

a) t−shirt

b) sock

c) armchair

Rating Scores

*

**

1) iron 2) describe

Plausibility rating scores (95% CI-error bars)

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

Ratin

g sc

ores

Low predictability: A comparison of paradigms

Christine Ankener Mirjana Sekicki

P.I.: Dr. Maria Staudte

Saarland University, Germany

Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016

Materials & Offline-Verification

Online - Comparison of Paradigms

• Plausibility & cloze probability assessed by 2 offline pre-tests

=> Questions:• Does plausibility affect cognitive load in

different experimental methods ?

• Can novel measure (ICA) pick up plausibility effects?

=> Contribution:• We compare effects of plausibility manipulation

across different paradigms and modalities

• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)

Summary & Conclusion

• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts

References:

DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.

Table 1: Sample items and corresponding offline results.

Item noun condition plausibility cloze probability%M (SD) M (SD)

(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01

(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA

sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.

Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.

Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).

Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.

Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-

2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)

erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.

The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.

Offline Experiments

First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-

Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.

• Significant 3 way interaction (1a vs. 1b & Verb & Rating)

• Significant interaction (1b vs. 1c & Verb)

3. Auditory Lexical Decision Task:

2) describe

Object

ICA

mea

ns

ICA means per condition / right eye

a) t-

shirt

b) s

ock

b) s

ock

a) t-

shirt

*1) iron

Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.

• condition c) dropped

4. ICA Listening:

Total reading time on target word

Tota

l rea

ding

tim

e

Object

a) t-

shirt

a) t-

shirt

b) s

ock

b) s

ock

c) a

rmch

air

c) a

rmch

air

1) iron 2) describe

**

Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.

• Sign. difference also in regression from target

• No sign. in first pass

2. Eye-tracking Reading: iron describe

Sample Item & PretestsTable 1: Sample items and corresponding offline results.

Item noun condition plausibility cloze probability%M (SD) M (SD)

(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01

(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA

sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.

Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.

Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).

Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.

Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-

2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)

erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.

The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.

Offline Experiments

First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-

12

Introducing ICA

13

Cognitive Load during language processing

ICA during language processing

ICA during language processing in visual context

describe = ironiron sock = describe sock

Visual Context & CL - Studies:

Aim: quantifying the role of visual context in creating predictions about linguistic items

➤ Same set of linguistic stimuli (German) ➤ 2 Experiments (AUDIO & VIS)

➤ Comprehension task

1. AUDIO: 2. VISUAL & AUDIO:

+14

1. Audio ICA study

15

Method • Eye Link II, 250 HZ, binocular • 36 students (20 female) • aged 19 to 46 years (M = 24.72) • 20 items, 4 conditions • audio only • Stimuli presentation: whole sentence

+ 2000ms break (The woman irons soon the t-shirt ____ 2000ms break.)

• Dependent measure: ICA events per 100ms from EACH eye

2. Visual ICA study

16

Method • Eye Link II, 250 HZ, binocular • 36 students -> data from 34 • aged 19 to 38 years (M = M = 23.25) • 20 items, 4 conditions • audio only • Stimuli presentation: visual set 1000

ms in advance /whole sentence + 2000ms break (The woman irons soon the t-shirt ____ 2000ms break.)

• Dependent measure: ICA event per 100ms, new inspections

Cloze & “Entropy”predictability entropy predictability entropy

iron describeaudio

t-shirt 13.67% 1/some with

competitor

t-shirt 0% 1/many no

competitorsock 0.16% sock 0%

visual

t-shirt 94.84% 1/2 t-shirt 25%

1/4 no competitor

sock 5.16% 1/2 with competitor sock 25%

d1 0% . d1 25%

d2 0% . d2 25%

17

Audio

19

Audio study

20

The woman

Audio study

21

The woman irons

describes

Audio study

22

The woman irons soon

describes

Audio study

23

The woman irons soon

describes

600ms

[ iron = describe]

Audio study

24

The woman irons soon the t-shirt.

sock.

describes t-shirt.

sock.

600ms

[ iron = describe]

Audio study

25

The woman irons soon the t-shirt.

sock.

describes t-shirt.

sock.

[ sock = t-shirt ?]

[ sock = t-shirt ]

600ms 600ms

[ iron = describe]

Audio study

Visual

Visual study

The woman

Visual study

The woman irons

describes

Visual study

The woman irons soon

describes

Visual study

The woman irons soon

describes

600ms

[ iron = describe?]

Visual study

The woman irons soon the t-shirt.

sock.

describes t-shirt.

sock.

600ms

[ iron = describe?]

Visual study

The woman irons soon the t-shirt.

sock.

describes t-shirt.

sock.

[ sock > t-shirt ]

[ sock = t-shirt ]

600ms 600ms

Visual study

[ iron = describe?]

study comparison

Surprisal (noun)

35

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Subject Verb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues Conditions

●● iron t−shirt

iron sock

describe t−shirt

describe sock

Audio

●●

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Subject Verb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues Conditions

●● iron t−shirt

iron sock

describe t−shirt

describe sock

Visual

*

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

ns

VISUAL: Start Verb (verb outlier rem.) for V2 O1 / right eye

describe1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

nsAUDIO: Start Verb (verb outlier rem.) for V2 O1 / right eye

describes soon the t-shirt

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

ns

VISUAL: Start Verb (verb outlier rem.) for V2 O2 / right eye

describes soon

the sock

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

ns

AUDIO: Start Verb (verb outlier rem.) for V2 O2 / right eye

the sockdescribes soon

t-shirt = sock

describes soon

the t-shirt

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

ns

VISUAL: Start Verb (verb outlier rem.) for V1 O1 / right eye

the t-shirtirons soon

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

ns

VISUAL: Start Verb (verb outlier rem.) for V1 O2 / right eye

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

nsAUDIO: Start Verb (verb outlier rem.) for V1 O1 / right eye

irons soon the t-shirt

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

0 10 20 30 40Verb − End of Sentence 600

ICA

mea

ns

AUDIO: Start Verb (verb outlier rem.) for V1 O2 / right eye

the sockirons soon

iron

irons soon the sock

t-shirt < sock

Prediction, Surprisal and UID

38

12

13

14

15

16

17

18

19

20

21

22

Subject Verb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A −

Valu

es Verbs:●● Audio: iron

Audio: describe

Visual: iron

Visual: describe

Visual Context & CL - Results Eye Movements:

• upon hearing the restrictive verb (iron): ➤ less looks to the distractors ➤ more looks to the t-shirt

Target (t-shirt) Competitor (sock) Distractor (non-ironable)

*** *

Inspection probability on target, competitor & distractors while hearing the verb

iron

iron

iron

desc

ribe

desc

ribe

desc

ribe

Eye-differences

40

left eye right eyeverb windowvisual > audio

visual: iron > describe

verb:study interaction

noun window

visual > audio

visual: t-shirt iron < describe

visual: iron t-shirt < sock

audio: iron t-shirt < sock

sock - verb:study interaction

Predicting

Surprisal

?

?

inspection data

Conclusions ICA Audio → ICA Visual

1. ICA Audio:

1. little/no effect of verb for predictions of / surprisal on noun in “language-only”

2. similar to classical measures of cognitive load (CL)

2. ICA Visual:

1. addition of visual context raises CL

2. linguistic predictions sensitive to non-linguistic context:

A. lower CL = lower surprisal on more predicted nouns

B. effects of verb constraint + noun-verb plausibility

41

Prediction as Entropy Reduction

- Does CL spread according to UID?

Entropy Reduction in ICA

• Previous: Different verbs/nouns + identical context

• Now: Identical verbs/nouns + different context

43

Hypothesis:

According to entropy reduction hypothesis & UID, we assume:

verb object noun

1 potential referent

verb object noun

4 potential referents

cogn

itive

load

(IC

A va

lues

)

cogn

itive

load

(IC

A va

lues

)

44

1 possible target

3 possible targets

0 possible targets

“The woman irons soon the t-shirt”

4 possible targets

45

17

18

19

20

21

Subject Verb Objectvariable

Mea

n IC

A −

Valu

es TARGETcondition

0

1

3

4

Sentence

Targets

Target

Targets

Targets

0

4

3

1

Prediction/ER Surprisal

17

18

19

20

21

Subject Verb Objectvariable

Mea

n IC

A −

Valu

es

TARGETcondition

0

1

1 Target

***

18

19

20

Subject Verb Objectvariable

Mea

n IC

A −

Valu

es

TARGETcondition

3

1.

17

18

19

20

Subject Verb Objectvariable

Mea

n IC

A −

Valu

es

TARGETcondition

1

4***

Results (single condition comparisons):0 Targets

1 Target

4 Targets 3 Targets

1 Target

17

18

19

20

21

Subject Verb Objectvariable

Mea

n IC

A −

Valu

es

TARGETcondition

0

3

0 Targets

3 Targets

**

0.0

0.2

0.4

0.6

Distractor1 Distractor2 Distractor3 TargetObjects in Scene

New

insp

ectio

ns

Region

Distractor1

Distractor2

Distractor3

Target

Distribution of new inspections in 1 Target Condition1 possible Target

Targ

et m

entio

ned

Dis

trac

tor 3

Dis

trac

tor 1

Dis

trac

tor 2

New

Insp

ectio

ns

***

**

0.0

0.2

0.4

0.6

Distractor1 Distractor2 Distractor3 Distractor4Objects in Scene

New

insp

ectio

ns

Region

Distractor1

Distractor2

Distractor3

Distractor4

Distribution of new inspections in 0 Targets Condition0 possible Targets

Dis

trac

tor 3

New

Insp

ectio

ns

Dis

trac

tor 4

Dis

trac

tor 2

Dis

trac

tor 1

0.0

0.2

0.4

0.6

Competitor1 Competitor2 Distractor TargetObjects in Scene

New

insp

ectio

ns

Region

Competitor1

Competitor2

Distractor

Target

Distribution of new inspections in 3 Target Condition3 possible Targets

Targ

et m

entio

ned

Dis

trac

tor

Com

petit

or 1

Com

petit

or 2N

ew In

spec

tions

Entropy Reduction - Results Eye Movements:

0.0

0.2

0.4

0.6

Competitor1 Competitor2 Competitor3 TargetObjects in Scene

New

insp

ectio

ns

Region

Competitor1

Competitor2

Competitor3

Target

Distribution of new inspections in 4 Target Condition4 possible Targets

Com

petit

or 3

New

Insp

ectio

ns

Targ

et m

entio

ned

Com

petit

or 2

Com

petit

or 1

0 possible Targets 1 possible Target

3 possible Targets 4 possible Targets

Results• Verb:

• Fewer inspections to predicted objects (1,3)

• No effect on ICA

• Noun:

• Inspections of mentioned object

• Clear ICA/surprisal effect of no. of predicted objects

➡ No Cog. Load for Entropy Reduction → no UID ?

➡ Different type of Cog. Load, not ICA-relevant ?

49

Visual Context + Gaze

Why gaze?

• Speakers look at entities shortly before mentioning them. (Griffin & Bock, 2000; Meyer et al., 1998)

• Listeners rapidly inspect objects as they are mentioned. (Tanenhaus et al., 1995)

• Situated communication: gaze cue an inseparable part of the visual context

• Linguistic and visual context aid prediction of the upcoming contents. Can gaze cue adopt the same function?

Distribution of CL across gaze and linguistic cues

• Is gaze cue part of the context for the spoken referent?

• Is there surprisal on the gaze cue?

• Is there a distribution of surprisal between the gaze and the referent?

52

no gaze referent gaze

1. linguistic context

2. visual context

3. gaze cue

Die Frau bügelt gleich das T-Shirt.

die Socke.beschreibtThe woman irons

describes

soon the t-shirt

the sock

“gleich”

1000ms

300ms

100ms

1000ms

“Die Frau bügelt”

“das T-Shirt.”

350ms

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

Referent gaze.Conditions:●● iron t−shirt

iron sock

describe t−shirt

describe sock

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

Referent gaze.Conditions:●● iron t−shirt

iron sock

describe t−shirt

describe sock

●●

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

No gaze.Conditions:●● iron t−shirt

iron sock

describe t−shirt

describe sock

●●

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

No gaze.Conditions:●● iron t−shirt

iron sock

describe t−shirt

describe sock

ICA Results

t-shirt: iron < describe **

sock: iron = describe

iron: t-shirt < sock *

describe: t-shirt = sock

no gaze > ref. gaze *

t-shirt: iron < describe **

sock: iron > describe **

iron: t-shirt < sock ***

describe: t-shirt = sock

Eye-movements. New inspections during the verb.

0.00

0.05

0.10

iron describeVerbs

New

insp

ectio

ns

verb1) iron

2) describe

Verb Window: New inspections to the target

0.00

0.05

0.10

iron describeVerbs

New

insp

ectio

ns

verb1) iron

2) describe

Verb Window: New inspections to the target

0.00

0.03

0.06

0.09

iron describeVerbs

New

insp

ectio

ns

verb1) iron

2) describe

Verb Window: New inspections to the competitor

0.00

0.03

0.06

0.09

iron describeVerbs

New

insp

ectio

ns

verb1) iron

2) describe

Verb Window: New inspections to the competitor

0.0

0.1

0.2

0.3

0.4

iron describeVerbs

New

insp

ectio

ns

verb1) iron

2) describe

Verb Window: New inspections to the distractors

0.0

0.1

0.2

0.3

0.4

iron describeVerbs

New

insp

ectio

ns

verb1) iron

2) describe

Verb Window: New inspections to the distractors

Conclusions

1. Is gaze cue part of the context for the spoken referent?

• Gaze cue is considered as part of the context for the spoken referent.

2. If so, how does it influence the surprisal on the referent?

• (Reliable congruent) gaze cue contributes to the reduction of surprisal on the linguistic referent.

3. Is there surprisal on the gaze cue?

4. Is there a distribution of surprisal between the gaze and the referent?

• No surprisal was induces by the gaze cue itself and thus, no proof found of a distribution of surprisal between the gaze cue and the referent.

17

18

19

20

21

22

Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

Gaze●● no gaze

referent gaze

17

18

19

20

21

22

Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

Gaze●● no gaze

referent gaze

61

Conclusions

1. Is gaze cue part of the context for the spoken referent?

• Gaze cue is considered as part of the context for the spoken referent.

2. If so, how does it influence the surprisal on the referent?

• (Reliable congruent) gaze cue contributes to the reduction of surprisal on the linguistic referent.

3. Is there surprisal on the gaze cue?

4. Is there a distribution of surprisal between the gaze and the referent?

• No surprisal was induces by the gaze cue itself and thus, no proof found of a distribution of surprisal between the gaze cue and the referent.

62

Conclusions

1. Is gaze cue part of the context for the spoken referent?

• Gaze cue is considered as part of the context for the spoken referent.

2. If so, how does it influence the surprisal on the referent?

• (Reliable congruent) gaze cue contributes to the reduction of surprisal on the linguistic referent.

3. Is there surprisal on the gaze cue?

4. Is there a distribution of surprisal between the gaze and the referent?

• No surprisal was induces by the gaze cue itself and thus, no proof found of a distribution of surprisal between the gaze cue and the referent.

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

IRON − gaze:●● no gaze

referent gaze

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

DESCRIBE − gaze●● no gaze

referent gaze

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

IRON − gaze:●● no gaze

referent gaze

16

17

18

19

20

21

22

23

Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.

Mea

n IC

A va

lues

DESCRIBE − gaze●● no gaze

referent gaze

63

Gaze & Surprisal

• Gaze cue is part of the context for the spoken referent.

• (Reliable congruent) gaze cues contribute to reduction of surprisal on linguistic referent

• No surprisal on gaze cue itself

➡ So far no evidence for distribution of surprisal between gaze cue and referent

64

Prediction & Surprisal

65

Maria Staudte Project A5

“train.”“The

boy will eat the...”

time

Prediction Surprisal

P=0.1

P=0.3

P=0.6

S=?

Figure 1: Anticipation of and surprisal on “train”: Based on the context up to “train”, several objectsare anticipated/predicted, the cake being the most probable object. Due to the object anticipation,predictability of the word “train” is likely to rise, too. Once the target word “train” is uttered, the surprisalon this word can be also be estimated empirically.

expectation that the listener forms about upcoming referents, which is not identical to the information-theoretic term predictability of a word since a) a word may well be predictable but not anticipated becauseit is not co-present in the scene, for instance, and b) anticipation does not necessarily occur on the lexicallevel but firstly targets objects and possibly their semantic category.

This project will connect the VWP to an information-theoretic account of language processing in orderto quantify the contributions of visual and other non-verbal information sources to the predictability andsurprisal of a word — and vice versa. One challenge in this approach is to relate the anticipation for aconcrete object to the information-theoretic predictability of and the surprisal on a specific word. Thisproject serves to investigate this relationship and quantify how such extra-sentential context influencessurprisal.

Specifically, we predict that, similar to the selectional restrictions of a verb, the visual context as wellas referential speaker gaze will be used to constrain the set of possible continuations, i.e., to reduceentropy for the remaining utterance and raise the predictability of certain continuations (Hale, 2006),which in turn reduces the surprisal of the actual word. Consider the illustration in Fig. 1 which visualizeshow both concepts, predictability and surprisal, refer to different stages in the comprehension processand how both can be estimated. The utterance up to “The boy will eat the” as well as the visual sceneshowing the speaker and some edible and non-edible objects is considered as context for the upcomingword “train”. A traditional information-theoretic approach would use the spoken utterance by itself tocompute the predictability for the word “train” based on the likelihood that the word “train” occurs after“The boy will eat the”. Naturally this likelihood is very low.

However, results from the VWP suggest that providing a co-present object, here the train, can in-crease the perceived likelihood of the train to be among the upcoming spoken references — even if thecloze probability of that word (indexing purely linguistic predictability) would otherwise be low. Fixationproportions on the displayed objects prior to mentioning the target word indicate which objects the lis-tener anticipates to come up and to which extent. Fig. 1 exemplifies these proportions and shows howanticipation may index the predictability of an object/word1 from the listener’s point of view (i.e., with herlinguistic competence and the available visual context). As is also shown in Fig. 1, the surprisal of thetarget word “train” should be observable once it is uttered. A traditional information-theoretic approachcomputes the log-probability of this word but, as we have just argued, it is unclear what the precise

1It is unclear whether a specific word or a set of words (“train” or “toy train” or “train set” ) or, for instance, an abstract set offeatures related to the object is actually predicted by the listener (Rommers et al., 2013).

121

Entropy Reduction ≠ CL

Wrap-Up• Embodiment (1 + 2)

• Situated & embodied language learning (3 + 4)

• Situated adult language comprehension (& production) (5 + 6)

• Language in Interaction (7 - 9)

• Taking another person into account

• Sending and perceiving bodily signals

• Applications

• Context effects on workload during language processing (10)

66

Questions?

References• Hale, John (2001). A probabilistic early parser as a psycholinguistic

model. Proceedings of the North American association of computational linguistics.

• Frank, Stefan (2013). “Uncertainty reduction as a measure of cognitive load in sentence comprehension”. Topics in Cognitive Science, 5, 475-494.

• Marshall, Sandra P. (2002). The index of cognitive activity: Measuring cognitive workload. In: Proceedings of the conference on Human factors and power plants. IEEE; 7–5.

• Demberg, V., & Sayeed, A. (2016). The Frequency of Rapid Pupil Dilations as a Measure of Linguistic Processing Difficulty. PloS one, 11(1).

68