sfb 1102 a5: distributing referential information across ...masta/ss16/10_surprisalinvwp.pdf · a5:...
TRANSCRIPT
SFB 1102
A5: Distributing referential information across modalities
Christine Ankener & Mirjana Sekicki PhD Students
Maria Staudte Principal Investigator
SS16 - (Embodied) Language Comprehension
Maria Staudte
Prediction & Surprisal
2
Maria Staudte Project A5
“train.”“The
boy will eat the...”
time
Prediction Surprisal
P=0.1
P=0.3
P=0.6
S=?
Figure 1: Anticipation of and surprisal on “train”: Based on the context up to “train”, several objectsare anticipated/predicted, the cake being the most probable object. Due to the object anticipation,predictability of the word “train” is likely to rise, too. Once the target word “train” is uttered, the surprisalon this word can be also be estimated empirically.
expectation that the listener forms about upcoming referents, which is not identical to the information-theoretic term predictability of a word since a) a word may well be predictable but not anticipated becauseit is not co-present in the scene, for instance, and b) anticipation does not necessarily occur on the lexicallevel but firstly targets objects and possibly their semantic category.
This project will connect the VWP to an information-theoretic account of language processing in orderto quantify the contributions of visual and other non-verbal information sources to the predictability andsurprisal of a word — and vice versa. One challenge in this approach is to relate the anticipation for aconcrete object to the information-theoretic predictability of and the surprisal on a specific word. Thisproject serves to investigate this relationship and quantify how such extra-sentential context influencessurprisal.
Specifically, we predict that, similar to the selectional restrictions of a verb, the visual context as wellas referential speaker gaze will be used to constrain the set of possible continuations, i.e., to reduceentropy for the remaining utterance and raise the predictability of certain continuations (Hale, 2006),which in turn reduces the surprisal of the actual word. Consider the illustration in Fig. 1 which visualizeshow both concepts, predictability and surprisal, refer to different stages in the comprehension processand how both can be estimated. The utterance up to “The boy will eat the” as well as the visual sceneshowing the speaker and some edible and non-edible objects is considered as context for the upcomingword “train”. A traditional information-theoretic approach would use the spoken utterance by itself tocompute the predictability for the word “train” based on the likelihood that the word “train” occurs after“The boy will eat the”. Naturally this likelihood is very low.
However, results from the VWP suggest that providing a co-present object, here the train, can in-crease the perceived likelihood of the train to be among the upcoming spoken references — even if thecloze probability of that word (indexing purely linguistic predictability) would otherwise be low. Fixationproportions on the displayed objects prior to mentioning the target word indicate which objects the lis-tener anticipates to come up and to which extent. Fig. 1 exemplifies these proportions and shows howanticipation may index the predictability of an object/word1 from the listener’s point of view (i.e., with herlinguistic competence and the available visual context). As is also shown in Fig. 1, the surprisal of thetarget word “train” should be observable once it is uttered. A traditional information-theoretic approachcomputes the log-probability of this word but, as we have just argued, it is unclear what the precise
1It is unclear whether a specific word or a set of words (“train” or “toy train” or “train set” ) or, for instance, an abstract set offeatures related to the object is actually predicted by the listener (Rommers et al., 2013).
121
Reasoning
• Understanding information -> cognitive load
• Less predictable input -> higher cognitive load
• Surprisal (Hale 2001) reliable linking function between predictability and cognitive load
• But: Readings times (indexing cognitive load) also correlate with Entropy Reduction (Frank 2013)
Sample Stimulus
4
t-Shirtsock
arm chair
The woman describes soon the t-Shirtsock
arm chair
“plausible”“possible”
“anomalous”
“possible”
or “sanity check”
The woman irons soon the
Classical measures of cognitive load
5
Plausibility Ratings, Cloze
Reading self-paced
Reading eye-trackedLexical Decision
Plausibility Ratings
Plausibility rating on a 7 - point Likert scale
Intructions: „Lies dir die folgenden Sätze aufmerksam durch und bestimme anschließend, wie plausibel diese inhaltlich für dich sind. Die Skala reicht von 1 (sehr plausibel) bis zu 7 (gar nicht plausibel, bzw. schwer bis nicht vorstellbar). Hierbei wäre ein nicht vorstellbarer Satzinhalt z.B. 'Der Mann fährt das Brot.' eine 7.“
Offline Pre-Tests II
Forced choice cloze task
• Instructions: „Im Folgenden findest du unvollständige Sätze. Deine Aufgabe ist es, spontan und ohne zu lange darüber nachzudenken das Objekt aus den Optionen auszuwählen, das dir am passendsten erscheint, um die Lücke zu ersetzen.“
anomalous less plausible plausible0.4% 13% 84%
Self-paced reading: per word
Die | Frau | bügelt | gleich | das | T-shirt | in | der | Washküche. |
• design: 2 (verbs) x 3 (objects)
• 36 experimental sentences; 36 fillers
• each filler sentence illustrated a highly predictable context; followed by a simple yes/no content question
• 30 participants; (7 male); age ranging from 20 to 32 (M = 24)
• Button press: Reaction Times
Eye-tracking
• 25 participants; 8 men; age ranging from 18 to 34 (M = 23.16, SD = 4.49)
• SR Research EyeLink 1000+
• 35 filler sentences; simple yes/no content questions
• First-pass / total reading times
Auditory Lexical Decision Task
Die Frau bügelt gleich - [500ms] - das T-shirt.
• audio stimuli, LDT on the target word
• 24 participants; 4 male; age ranging from 18 to 36 (M = 24, SD = 4.43)
• Button press -> reaction time
Classical measures of cognitive load
11
Plausibility Ratings, Cloze
Reading self-paced
Reading eye-trackedLexical Decision
1) iron 2) describe
Object
Reading time on final word
a) t-
shi
rt
a) t-
shi
rt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Read
ing
time
*1. Self-paced Reading:
Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.
1) iron 2) describe
0
2
4
6
a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject
object
a) t−shirt
b) sock
c) armchair
Rating Scores
*
**
1) iron 2) describe
Plausibility rating scores (95% CI-error bars)
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Ratin
g sc
ores
Low predictability: A comparison of paradigms
Christine Ankener Mirjana Sekicki
P.I.: Dr. Maria Staudte
Saarland University, Germany
Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016
Materials & Offline-Verification
Online - Comparison of Paradigms
• Plausibility & cloze probability assessed by 2 offline pre-tests
=> Questions:• Does plausibility affect cognitive load in
different experimental methods ?
• Can novel measure (ICA) pick up plausibility effects?
=> Contribution:• We compare effects of plausibility manipulation
across different paradigms and modalities
• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)
Summary & Conclusion
• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts
References:
DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.
Table 1: Sample items and corresponding offline results.
Item noun condition plausibility cloze probability%M (SD) M (SD)
(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01
(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA
sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.
Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.
Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).
Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.
Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-
2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)
erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.
The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.
Offline Experiments
First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-
Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.
• Significant 3 way interaction (1a vs. 1b & Verb & Rating)
• Significant interaction (1b vs. 1c & Verb)
3. Auditory Lexical Decision Task:
2) describe
Object
ICA
mea
ns
ICA means per condition / right eye
a) t-
shirt
b) s
ock
b) s
ock
a) t-
shirt
*1) iron
Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.
• condition c) dropped
4. ICA Listening:
Total reading time on target word
Tota
l rea
ding
tim
e
Object
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
1) iron 2) describe
**
Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.
• Sign. difference also in regression from target
• No sign. in first pass
2. Eye-tracking Reading:
1) iron 2) describe
Object
Reading time on final worda)
t- s
hirt
a) t-
shi
rt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Read
ing
time
*1. Self-paced Reading:
Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.
1) iron 2) describe
0
2
4
6
a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject
object
a) t−shirt
b) sock
c) armchair
Rating Scores
*
**
1) iron 2) describe
Plausibility rating scores (95% CI-error bars)
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Ratin
g sc
ores
Low predictability: A comparison of paradigms
Christine Ankener Mirjana Sekicki
P.I.: Dr. Maria Staudte
Saarland University, Germany
Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016
Materials & Offline-Verification
Online - Comparison of Paradigms
• Plausibility & cloze probability assessed by 2 offline pre-tests
=> Questions:• Does plausibility affect cognitive load in
different experimental methods ?
• Can novel measure (ICA) pick up plausibility effects?
=> Contribution:• We compare effects of plausibility manipulation
across different paradigms and modalities
• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)
Summary & Conclusion
• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts
References:
DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.
Table 1: Sample items and corresponding offline results.
Item noun condition plausibility cloze probability%M (SD) M (SD)
(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01
(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA
sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.
Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.
Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).
Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.
Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-
2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)
erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.
The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.
Offline Experiments
First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-
Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.
• Significant 3 way interaction (1a vs. 1b & Verb & Rating)
• Significant interaction (1b vs. 1c & Verb)
3. Auditory Lexical Decision Task:
2) describe
Object
ICA
mea
ns
ICA means per condition / right eye
a) t-
shirt
b) s
ock
b) s
ock
a) t-
shirt
*1) iron
Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.
• condition c) dropped
4. ICA Listening:
Total reading time on target word
Tota
l rea
ding
tim
e
Object
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
1) iron 2) describe
**
Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.
• Sign. difference also in regression from target
• No sign. in first pass
2. Eye-tracking Reading:1) iron 2) describe
Object
Reading time on final word
a) t-
shi
rt
a) t-
shi
rt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Read
ing
time
*1. Self-paced Reading:
Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.
1) iron 2) describe
0
2
4
6
a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject
object
a) t−shirt
b) sock
c) armchair
Rating Scores
*
**
1) iron 2) describe
Plausibility rating scores (95% CI-error bars)
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Ratin
g sc
ores
Low predictability: A comparison of paradigms
Christine Ankener Mirjana Sekicki
P.I.: Dr. Maria Staudte
Saarland University, Germany
Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016
Materials & Offline-Verification
Online - Comparison of Paradigms
• Plausibility & cloze probability assessed by 2 offline pre-tests
=> Questions:• Does plausibility affect cognitive load in
different experimental methods ?
• Can novel measure (ICA) pick up plausibility effects?
=> Contribution:• We compare effects of plausibility manipulation
across different paradigms and modalities
• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)
Summary & Conclusion
• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts
References:
DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.
Table 1: Sample items and corresponding offline results.
Item noun condition plausibility cloze probability%M (SD) M (SD)
(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01
(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA
sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.
Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.
Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).
Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.
Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-
2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)
erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.
The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.
Offline Experiments
First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-
Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.
• Significant 3 way interaction (1a vs. 1b & Verb & Rating)
• Significant interaction (1b vs. 1c & Verb)
3. Auditory Lexical Decision Task:
2) describe
Object
ICA
mea
ns
ICA means per condition / right eye
a) t-
shirt
b) s
ock
b) s
ock
a) t-
shirt
*1) iron
Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.
• condition c) dropped
4. ICA Listening:
Total reading time on target word
Tota
l rea
ding
tim
e
Object
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
1) iron 2) describe
**
Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.
• Sign. difference also in regression from target
• No sign. in first pass
2. Eye-tracking Reading:
iron describe1) iron 2) describe
Object
Reading time on final word
a) t-
shi
rt
a) t-
shi
rt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Read
ing
time
*1. Self-paced Reading:
Example of Stimuli Presentation:The - woman - irons - soon - the - t-shirt - in - the - laundry room.
1) iron 2) describe
0
2
4
6
a) t−shirt b) sock c) armchair a) t−shirt b) sock c) armchairObject
object
a) t−shirt
b) sock
c) armchair
Rating Scores
*
**
1) iron 2) describe
Plausibility rating scores (95% CI-error bars)
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
Ratin
g sc
ores
Low predictability: A comparison of paradigms
Christine Ankener Mirjana Sekicki
P.I.: Dr. Maria Staudte
Saarland University, Germany
Universität des Saarlandes CUNY 2016, Gainesville, FL March 2016
Materials & Offline-Verification
Online - Comparison of Paradigms
• Plausibility & cloze probability assessed by 2 offline pre-tests
=> Questions:• Does plausibility affect cognitive load in
different experimental methods ?
• Can novel measure (ICA) pick up plausibility effects?
=> Contribution:• We compare effects of plausibility manipulation
across different paradigms and modalities
• We measure with ICA to see if it can be used to pick up moderate changes in cognitive load (advantages for future set ups)
Summary & Conclusion
• Plausibility (besides cloze probability & word- frequency) affects cognitive load on lower end of cloze probability scale • Plausibility effect robust across different experimental measures & presentation modalities • Novel ICA measure sensitive towards even moderate differences in plausibility-> ICA’s potential lies in future use for set ups combining cognitive load measure & visual contexts
References:
DeLong, K., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. Demberg, V., & Sayeed, A. (2016). The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE, 11, e0146194. Frank, S., L.Otten, G.Galli, & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1 –11. Smith, N. & Levy, R. (2008). Optimal processing times in reading: a formal model and empirical investigation. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society (pp. 595–600). COGSCI '08.Van Petten, C., & Luka, B. (2012). Prediction during language comprehension: benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176 –190.
Table 1: Sample items and corresponding offline results.
Item noun condition plausibility cloze probability%M (SD) M (SD)
(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01
(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA
sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.
Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.
Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).
Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.
Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-
2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)
erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.
The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.
Offline Experiments
First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-
Example of Stimuli Presentation:The woman irons soon ___1000ms ___ the t-shirt.
• Significant 3 way interaction (1a vs. 1b & Verb & Rating)
• Significant interaction (1b vs. 1c & Verb)
3. Auditory Lexical Decision Task:
2) describe
Object
ICA
mea
ns
ICA means per condition / right eye
a) t-
shirt
b) s
ock
b) s
ock
a) t-
shirt
*1) iron
Example of Stimuli Presentation:The woman irons soon the t-shirt ____ 2000ms.
• condition c) dropped
4. ICA Listening:
Total reading time on target word
Tota
l rea
ding
tim
e
Object
a) t-
shirt
a) t-
shirt
b) s
ock
b) s
ock
c) a
rmch
air
c) a
rmch
air
1) iron 2) describe
**
Example of Stimuli Presentation:The woman irons soon the t-shirt in the laundry room.
• Sign. difference also in regression from target
• No sign. in first pass
2. Eye-tracking Reading: iron describe
Sample Item & PretestsTable 1: Sample items and corresponding offline results.
Item noun condition plausibility cloze probability%M (SD) M (SD)
(1) The woman irons soon the a) t-shirt plausible 1.12 (0.68) 13.67 (18.06)b) sock possible 2.76 (2.17) 0.16 (00.54)c) armchair anomalous 6.35 (1.43) <.01
(2) The woman describes soon the a) t-shirt possible 1.65 (1.50) NAb) sock possible 1.90 (1.80) NAc) armchair possible 1.89 (1.78) NA
sure sensitivity. Besides traditional methods, such as readingand reaction time measurements, we also applied a relativelynovel measure: the Index of Cognitive Activity (ICA). ICAis able to capture task related pupil jitter, linked to cognitiveload and predictability (Demberg & Sayeed, 2016). One ofour aims was to examine if ICA can be used to assess dif-ferences in cognitive load induced by plausibility and pre-dictability variation. The advantage of ICA lies in the possi-bility to have the listener view things other than the word sheis currently processing. This would enable the exploration ofvisual context effects on cognitive load and predictability.
Experimental MaterialsWe ran 2 offline and 4 online studies, using different mea-sures on a set of 36 stimuli items. The aim was to empiri-cally establish a solid indication of cognitive load as a func-tion of plausibility and (in part) predictability. The influenceof cloze-probability on cognitive load was reduced by usinglow constraining contexts.
Experimental items All items were German independentmain clauses with uniform syntactic structure (NP-V-ADV-NP). The first NP (sentence subject) was always neutral toavoid an impact on plausibility. The verbs which introducedthe strength of constraint on the subsequent noun were re-strictive (1) or non-restrictive (2). The two different verb cat-egories create constraint and no-constraint contexts to elicitstronger (1) or weaker (2) predictions. The adverb (gleich)following the verb, serves as spill-over region in order to giveparticipants more time to form predictions about the upcom-ing noun. Example: Die Frau bugelt / beschreibt gleich dasT�Shirt / die Socke / den Sessel (see Table 1 for details).
Both of our verb categories were then paired with threedifferent nouns (sentence object) of approximately same fre-quency. Noun frequencies were extracted from word listsDeReWo2 of the German research corpus (DeReKo) and heldconstant for nouns within one item in order to exclude animpact of frequency. Plausibility and predictability were as-sessed by two offline studies described below.
Filler items Fillers were plausible sentences of German,similar to condition (1a). Since a high amount of semanti-cally anomalous sentences could artificially increase the tol-
2(DeReKo corpus size > 28 billion words. Source: Korpusbasierte WortgrundformenlisteDeReWo,2012, v-ww-bll-320000g-2012-12-31-1.0, http://www.ids-mannheim.de/derewo)
erance for implausible sentences, no anomalous fillers wereused. Fillers differed, however, from experimental items inlength and syntactic structure to introduce syntactic variationfor the comprehender. Half of the fillers were followed byyes/no comprehension questions.Analysis The statistical analyses of the data collected fromall experiments were conducted using R statistical program-ming environment (R Core Team, 2013) and the lme4 pack-age (Bates, Maechler, Bolker, & Walker, 2015). Statisticalsignificance was assessed by stepwise reduction of the maxi-mal model’s fixed effects structure.
The Verb and Object fixed factors used for the analysesof all the experiments were contrast coded, thus creating thefollowing factors: O1 which denotes the difference betweenobjects (a) vs. (b); O2 which denotes the difference betweenobjects (b) vs. (c); and their interaction with the Verb, namely,VxO1 for the interaction between the Verb and O1, and VxO2for the interaction between the Verb and O2.
Offline Experiments
First, a plausibility rating determined how plausible on aver-age a noun is in its sentence context. Second, a classical sen-tence completion task for cloze probability norming assessedto what extend the verb constraint would also increase pre-dictability (as in cloze probability) of plausible vs. possiblenouns. Both studies were set up using a web form. Answerswere submitted anonymously via the form or using pen andpaper. We controlled for the unique participation of partici-pants.Plausibility ratingMethod 14 independent German native speakers with ageranging from 18 to 58 years participated in the study volun-tarily without reimbursement. One list contained all our stim-uli sentences in randomized order and without fillers, so thateach participant scored every item once. 216 sentences wereused for our 36 items in 6 conditions (see Table 1). In thetask description, participants were asked to rate the sentencesaccording to their plausibility on a 7�point Likert scale. Thescale ranged from very plausible (1) to not plausible / diffi-cult to imagine (7).Results We substantially changed 3 items which elicitedscores that noticeably deviated from the average rating pat-
12
Introducing ICA
13
Cognitive Load during language processing
ICA during language processing
ICA during language processing in visual context
describe = ironiron sock = describe sock
Visual Context & CL - Studies:
Aim: quantifying the role of visual context in creating predictions about linguistic items
➤ Same set of linguistic stimuli (German) ➤ 2 Experiments (AUDIO & VIS)
➤ Comprehension task
1. AUDIO: 2. VISUAL & AUDIO:
+14
1. Audio ICA study
15
Method • Eye Link II, 250 HZ, binocular • 36 students (20 female) • aged 19 to 46 years (M = 24.72) • 20 items, 4 conditions • audio only • Stimuli presentation: whole sentence
+ 2000ms break (The woman irons soon the t-shirt ____ 2000ms break.)
• Dependent measure: ICA events per 100ms from EACH eye
2. Visual ICA study
16
Method • Eye Link II, 250 HZ, binocular • 36 students -> data from 34 • aged 19 to 38 years (M = M = 23.25) • 20 items, 4 conditions • audio only • Stimuli presentation: visual set 1000
ms in advance /whole sentence + 2000ms break (The woman irons soon the t-shirt ____ 2000ms break.)
• Dependent measure: ICA event per 100ms, new inspections
Cloze & “Entropy”predictability entropy predictability entropy
iron describeaudio
t-shirt 13.67% 1/some with
competitor
t-shirt 0% 1/many no
competitorsock 0.16% sock 0%
visual
t-shirt 94.84% 1/2 t-shirt 25%
1/4 no competitor
sock 5.16% 1/2 with competitor sock 25%
d1 0% . d1 25%
d2 0% . d2 25%
17
24
The woman irons soon the t-shirt.
sock.
describes t-shirt.
sock.
600ms
[ iron = describe]
Audio study
25
The woman irons soon the t-shirt.
sock.
describes t-shirt.
sock.
[ sock = t-shirt ?]
[ sock = t-shirt ]
600ms 600ms
[ iron = describe]
Audio study
The woman irons soon the t-shirt.
sock.
describes t-shirt.
sock.
600ms
[ iron = describe?]
Visual study
The woman irons soon the t-shirt.
sock.
describes t-shirt.
sock.
[ sock > t-shirt ]
[ sock = t-shirt ]
600ms 600ms
Visual study
[ iron = describe?]
Surprisal (noun)
35
●
●
●
●
●
●
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Subject Verb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues Conditions
●● iron t−shirt
iron sock
describe t−shirt
describe sock
Audio
●
●
●●
●
●
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Subject Verb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues Conditions
●● iron t−shirt
iron sock
describe t−shirt
describe sock
Visual
*
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
ns
VISUAL: Start Verb (verb outlier rem.) for V2 O1 / right eye
describe1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
nsAUDIO: Start Verb (verb outlier rem.) for V2 O1 / right eye
describes soon the t-shirt
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
ns
VISUAL: Start Verb (verb outlier rem.) for V2 O2 / right eye
describes soon
the sock
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
ns
AUDIO: Start Verb (verb outlier rem.) for V2 O2 / right eye
the sockdescribes soon
t-shirt = sock
describes soon
the t-shirt
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
ns
VISUAL: Start Verb (verb outlier rem.) for V1 O1 / right eye
the t-shirtirons soon
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
ns
VISUAL: Start Verb (verb outlier rem.) for V1 O2 / right eye
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
nsAUDIO: Start Verb (verb outlier rem.) for V1 O1 / right eye
irons soon the t-shirt
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
0 10 20 30 40Verb − End of Sentence 600
ICA
mea
ns
AUDIO: Start Verb (verb outlier rem.) for V1 O2 / right eye
the sockirons soon
iron
irons soon the sock
t-shirt < sock
Prediction, Surprisal and UID
38
●
●
●
●
●
●
12
13
14
15
16
17
18
19
20
21
22
Subject Verb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A −
Valu
es Verbs:●● Audio: iron
Audio: describe
Visual: iron
Visual: describe
Visual Context & CL - Results Eye Movements:
• upon hearing the restrictive verb (iron): ➤ less looks to the distractors ➤ more looks to the t-shirt
Target (t-shirt) Competitor (sock) Distractor (non-ironable)
*** *
Inspection probability on target, competitor & distractors while hearing the verb
iron
iron
iron
desc
ribe
desc
ribe
desc
ribe
Eye-differences
40
left eye right eyeverb windowvisual > audio
visual: iron > describe
verb:study interaction
noun window
visual > audio
visual: t-shirt iron < describe
visual: iron t-shirt < sock
audio: iron t-shirt < sock
sock - verb:study interaction
Predicting
Surprisal
✓
✓
✓
?
?
inspection data
Conclusions ICA Audio → ICA Visual
1. ICA Audio:
1. little/no effect of verb for predictions of / surprisal on noun in “language-only”
2. similar to classical measures of cognitive load (CL)
2. ICA Visual:
1. addition of visual context raises CL
2. linguistic predictions sensitive to non-linguistic context:
A. lower CL = lower surprisal on more predicted nouns
B. effects of verb constraint + noun-verb plausibility
41
Entropy Reduction in ICA
• Previous: Different verbs/nouns + identical context
• Now: Identical verbs/nouns + different context
43
Hypothesis:
According to entropy reduction hypothesis & UID, we assume:
verb object noun
1 potential referent
verb object noun
4 potential referents
cogn
itive
load
(IC
A va
lues
)
cogn
itive
load
(IC
A va
lues
)
44
❌
❌
❌
1 possible target
❌
3 possible targets
❌
❌
❌
❌
0 possible targets
“The woman irons soon the t-shirt”
4 possible targets
45
17
18
19
20
21
Subject Verb Objectvariable
Mea
n IC
A −
Valu
es TARGETcondition
0
1
3
4
Sentence
Targets
Target
Targets
Targets
0
4
3
1
Prediction/ER Surprisal
17
18
19
20
21
Subject Verb Objectvariable
Mea
n IC
A −
Valu
es
TARGETcondition
0
1
1 Target
***
18
19
20
Subject Verb Objectvariable
Mea
n IC
A −
Valu
es
TARGETcondition
3
1.
17
18
19
20
Subject Verb Objectvariable
Mea
n IC
A −
Valu
es
TARGETcondition
1
4***
Results (single condition comparisons):0 Targets
1 Target
4 Targets 3 Targets
1 Target
17
18
19
20
21
Subject Verb Objectvariable
Mea
n IC
A −
Valu
es
TARGETcondition
0
3
0 Targets
3 Targets
**
0.0
0.2
0.4
0.6
Distractor1 Distractor2 Distractor3 TargetObjects in Scene
New
insp
ectio
ns
Region
Distractor1
Distractor2
Distractor3
Target
Distribution of new inspections in 1 Target Condition1 possible Target
Targ
et m
entio
ned
Dis
trac
tor 3
Dis
trac
tor 1
Dis
trac
tor 2
New
Insp
ectio
ns
***
**
0.0
0.2
0.4
0.6
Distractor1 Distractor2 Distractor3 Distractor4Objects in Scene
New
insp
ectio
ns
Region
Distractor1
Distractor2
Distractor3
Distractor4
Distribution of new inspections in 0 Targets Condition0 possible Targets
Dis
trac
tor 3
New
Insp
ectio
ns
Dis
trac
tor 4
Dis
trac
tor 2
Dis
trac
tor 1
0.0
0.2
0.4
0.6
Competitor1 Competitor2 Distractor TargetObjects in Scene
New
insp
ectio
ns
Region
Competitor1
Competitor2
Distractor
Target
Distribution of new inspections in 3 Target Condition3 possible Targets
Targ
et m
entio
ned
Dis
trac
tor
Com
petit
or 1
Com
petit
or 2N
ew In
spec
tions
Entropy Reduction - Results Eye Movements:
0.0
0.2
0.4
0.6
Competitor1 Competitor2 Competitor3 TargetObjects in Scene
New
insp
ectio
ns
Region
Competitor1
Competitor2
Competitor3
Target
Distribution of new inspections in 4 Target Condition4 possible Targets
Com
petit
or 3
New
Insp
ectio
ns
Targ
et m
entio
ned
Com
petit
or 2
Com
petit
or 1
0 possible Targets 1 possible Target
3 possible Targets 4 possible Targets
Results• Verb:
• Fewer inspections to predicted objects (1,3)
• No effect on ICA
• Noun:
• Inspections of mentioned object
• Clear ICA/surprisal effect of no. of predicted objects
➡ No Cog. Load for Entropy Reduction → no UID ?
➡ Different type of Cog. Load, not ICA-relevant ?
49
Why gaze?
• Speakers look at entities shortly before mentioning them. (Griffin & Bock, 2000; Meyer et al., 1998)
• Listeners rapidly inspect objects as they are mentioned. (Tanenhaus et al., 1995)
• Situated communication: gaze cue an inseparable part of the visual context
• Linguistic and visual context aid prediction of the upcoming contents. Can gaze cue adopt the same function?
Distribution of CL across gaze and linguistic cues
• Is gaze cue part of the context for the spoken referent?
• Is there surprisal on the gaze cue?
• Is there a distribution of surprisal between the gaze and the referent?
52
no gaze referent gaze
1. linguistic context
2. visual context
3. gaze cue
Die Frau bügelt gleich das T-Shirt.
die Socke.beschreibtThe woman irons
describes
soon the t-shirt
the sock
●
●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
Referent gaze.Conditions:●● iron t−shirt
iron sock
describe t−shirt
describe sock
●
●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
Referent gaze.Conditions:●● iron t−shirt
iron sock
describe t−shirt
describe sock
●●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
No gaze.Conditions:●● iron t−shirt
iron sock
describe t−shirt
describe sock
●●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
No gaze.Conditions:●● iron t−shirt
iron sock
describe t−shirt
describe sock
ICA Results
t-shirt: iron < describe **
sock: iron = describe
iron: t-shirt < sock *
describe: t-shirt = sock
no gaze > ref. gaze *
t-shirt: iron < describe **
sock: iron > describe **
iron: t-shirt < sock ***
describe: t-shirt = sock
Eye-movements. New inspections during the verb.
0.00
0.05
0.10
iron describeVerbs
New
insp
ectio
ns
verb1) iron
2) describe
Verb Window: New inspections to the target
0.00
0.05
0.10
iron describeVerbs
New
insp
ectio
ns
verb1) iron
2) describe
Verb Window: New inspections to the target
0.00
0.03
0.06
0.09
iron describeVerbs
New
insp
ectio
ns
verb1) iron
2) describe
Verb Window: New inspections to the competitor
0.00
0.03
0.06
0.09
iron describeVerbs
New
insp
ectio
ns
verb1) iron
2) describe
Verb Window: New inspections to the competitor
0.0
0.1
0.2
0.3
0.4
iron describeVerbs
New
insp
ectio
ns
verb1) iron
2) describe
Verb Window: New inspections to the distractors
0.0
0.1
0.2
0.3
0.4
iron describeVerbs
New
insp
ectio
ns
verb1) iron
2) describe
Verb Window: New inspections to the distractors
Conclusions
1. Is gaze cue part of the context for the spoken referent?
• Gaze cue is considered as part of the context for the spoken referent.
2. If so, how does it influence the surprisal on the referent?
• (Reliable congruent) gaze cue contributes to the reduction of surprisal on the linguistic referent.
3. Is there surprisal on the gaze cue?
4. Is there a distribution of surprisal between the gaze and the referent?
• No surprisal was induces by the gaze cue itself and thus, no proof found of a distribution of surprisal between the gaze cue and the referent.
●
●
●
●
●
●
●
●
17
18
19
20
21
22
Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
Gaze●● no gaze
referent gaze
●
●
●
●
●
●
●
●
17
18
19
20
21
22
Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
Gaze●● no gaze
referent gaze
61
Conclusions
1. Is gaze cue part of the context for the spoken referent?
• Gaze cue is considered as part of the context for the spoken referent.
2. If so, how does it influence the surprisal on the referent?
• (Reliable congruent) gaze cue contributes to the reduction of surprisal on the linguistic referent.
3. Is there surprisal on the gaze cue?
4. Is there a distribution of surprisal between the gaze and the referent?
• No surprisal was induces by the gaze cue itself and thus, no proof found of a distribution of surprisal between the gaze cue and the referent.
62
Conclusions
1. Is gaze cue part of the context for the spoken referent?
• Gaze cue is considered as part of the context for the spoken referent.
2. If so, how does it influence the surprisal on the referent?
• (Reliable congruent) gaze cue contributes to the reduction of surprisal on the linguistic referent.
3. Is there surprisal on the gaze cue?
4. Is there a distribution of surprisal between the gaze and the referent?
• No surprisal was induces by the gaze cue itself and thus, no proof found of a distribution of surprisal between the gaze cue and the referent.
●
●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
IRON − gaze:●● no gaze
referent gaze
●
●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
DESCRIBE − gaze●● no gaze
referent gaze
●
●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons/describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
IRON − gaze:●● no gaze
referent gaze
●
●
●
●
●
●
●
●
16
17
18
19
20
21
22
23
Subject Verb Adverb ObjectThe woman irons / describes soon the t−shirt / the sock.
Mea
n IC
A va
lues
DESCRIBE − gaze●● no gaze
referent gaze
63
Gaze & Surprisal
• Gaze cue is part of the context for the spoken referent.
• (Reliable congruent) gaze cues contribute to reduction of surprisal on linguistic referent
• No surprisal on gaze cue itself
➡ So far no evidence for distribution of surprisal between gaze cue and referent
64
Prediction & Surprisal
65
Maria Staudte Project A5
“train.”“The
boy will eat the...”
time
Prediction Surprisal
P=0.1
P=0.3
P=0.6
S=?
Figure 1: Anticipation of and surprisal on “train”: Based on the context up to “train”, several objectsare anticipated/predicted, the cake being the most probable object. Due to the object anticipation,predictability of the word “train” is likely to rise, too. Once the target word “train” is uttered, the surprisalon this word can be also be estimated empirically.
expectation that the listener forms about upcoming referents, which is not identical to the information-theoretic term predictability of a word since a) a word may well be predictable but not anticipated becauseit is not co-present in the scene, for instance, and b) anticipation does not necessarily occur on the lexicallevel but firstly targets objects and possibly their semantic category.
This project will connect the VWP to an information-theoretic account of language processing in orderto quantify the contributions of visual and other non-verbal information sources to the predictability andsurprisal of a word — and vice versa. One challenge in this approach is to relate the anticipation for aconcrete object to the information-theoretic predictability of and the surprisal on a specific word. Thisproject serves to investigate this relationship and quantify how such extra-sentential context influencessurprisal.
Specifically, we predict that, similar to the selectional restrictions of a verb, the visual context as wellas referential speaker gaze will be used to constrain the set of possible continuations, i.e., to reduceentropy for the remaining utterance and raise the predictability of certain continuations (Hale, 2006),which in turn reduces the surprisal of the actual word. Consider the illustration in Fig. 1 which visualizeshow both concepts, predictability and surprisal, refer to different stages in the comprehension processand how both can be estimated. The utterance up to “The boy will eat the” as well as the visual sceneshowing the speaker and some edible and non-edible objects is considered as context for the upcomingword “train”. A traditional information-theoretic approach would use the spoken utterance by itself tocompute the predictability for the word “train” based on the likelihood that the word “train” occurs after“The boy will eat the”. Naturally this likelihood is very low.
However, results from the VWP suggest that providing a co-present object, here the train, can in-crease the perceived likelihood of the train to be among the upcoming spoken references — even if thecloze probability of that word (indexing purely linguistic predictability) would otherwise be low. Fixationproportions on the displayed objects prior to mentioning the target word indicate which objects the lis-tener anticipates to come up and to which extent. Fig. 1 exemplifies these proportions and shows howanticipation may index the predictability of an object/word1 from the listener’s point of view (i.e., with herlinguistic competence and the available visual context). As is also shown in Fig. 1, the surprisal of thetarget word “train” should be observable once it is uttered. A traditional information-theoretic approachcomputes the log-probability of this word but, as we have just argued, it is unclear what the precise
1It is unclear whether a specific word or a set of words (“train” or “toy train” or “train set” ) or, for instance, an abstract set offeatures related to the object is actually predicted by the listener (Rommers et al., 2013).
121
Entropy Reduction ≠ CL
Wrap-Up• Embodiment (1 + 2)
• Situated & embodied language learning (3 + 4)
• Situated adult language comprehension (& production) (5 + 6)
• Language in Interaction (7 - 9)
• Taking another person into account
• Sending and perceiving bodily signals
• Applications
• Context effects on workload during language processing (10)
66
References• Hale, John (2001). A probabilistic early parser as a psycholinguistic
model. Proceedings of the North American association of computational linguistics.
• Frank, Stefan (2013). “Uncertainty reduction as a measure of cognitive load in sentence comprehension”. Topics in Cognitive Science, 5, 475-494.
• Marshall, Sandra P. (2002). The index of cognitive activity: Measuring cognitive workload. In: Proceedings of the conference on Human factors and power plants. IEEE; 7–5.
• Demberg, V., & Sayeed, A. (2016). The Frequency of Rapid Pupil Dilations as a Measure of Linguistic Processing Difficulty. PloS one, 11(1).
68