s. kita, a. o¨ zyu¨rek / journal of memory and language 48 … · 2015-12-04 · catch a bird...

Information distribution:

Information packaging:

top-ofsimilarity

similarityleft-of

round

similarity

Left-of

top-of

roundproperty

entity part-of

Image Generator Message Generator

activate retrieve activateretrieve

shape

0

25

50

75

100

N 2N 3N 4N

pos. shape+pos.

Formulated gesture

a) Full grammar (one clause)

Goal-based reinforcement(Re-)activate salient visuo-spatial representations.

Retrieval of relevant informationRetrieve salient and most activated objects based on communicative goal.

Imagistic description packagingUnify & condense retrieved information into an imagistic description.

Formulation-based reinforcementReinforce activation of imagistic representations encoded in the formulation process.

Derive gesture form specificationFormulate a gesture form by querying Bayesian decision networks, learned from empirical data. (Bergmann & Kopp, 2009)

Goal-based reinforcement(Re-)activate relevant symbolic-

propositional representations.

Retrieval of relevant informationRetrieve information based on

communicative goal.

Preverbal message packagingPre-packaging of symbolic-

propositional representations.

Formulation-based reinforcementReinforce activation of symbolic-

propositional information contained in the formulator's first formulation

suggestion.

Produce appropriate sentenceLTAG-grammar based sentence

planning. (Stone et al., 2003)

A spreading-activation model of dynamic multimodal memory stabilization

CRC 673, Alignment in Communication B1, Speech-gesture alignment

References

Bergmann, K., Kahl, S., & Kopp, S. (2013). Modeling the semantic coordination of speech and gesture under cognitive and linguistic constraints. In: Lecture Notes in Artificial Intelligence. Intelligent Virtual Agents (pp. 203–216). Berlin/Heidelberg: Springer.

Kopp, S., Bergmann, K., & Kahl, S. (2013). A spreading-activation model of the semantic coordination of speech and gesture. Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013), 823–828.

.

Memory with dynamic activations

Lexicalized syntax tree

Cognitive Constraints for information distribution between speech and gesture: Disfluencies/New information: Co-verbal gesturing more likely. (Bergmann & Kopp, 2006) High cognitive load: Higher gesture rate. (Kita & Davies, 2009) High visual-spatial and low verbal skills: Higher gesture rate. (Hostetter & Alibali, 2007)

Linguistic Constraints for information packaging:Cross-language differences in linguistic encoding capabilities (Kita & Özyürek, 2003).

Cognitive and linguistic constraints

Memory cycles:An abstract notion of time for the system upon which local and global activation-spreading is updated.

Local activation-spreading:Within visual-spatial representations (VSR):

with c as the number of outgoing links (fan-out effect) and d as the depth within the hierarchical IDT structure (fade-out effect).

Global activation-spreading:Towards VSR via supramodal concepts (SMC):

Towards symbolic-propositional representations (SPR) via SMCs:

with r as random noise (order of magnitude 0.1) and !, that controls the rate of convergence towards the SMC activation.

ACT-R based retrieval probability:

with s as the threshold and r as noise in the activation levels. (Anderson et al., 2004)

shape position

0

25

50

75

100

N 2N 3N 4N N 2N 3N 4N

shape+position

Speech form

ulationG

estu

re fo

rmul

atio

nM

essage generationImag

e ge

nera

tion

Goal-based reinforcement

Formulation-based reinforcement

GNetIc ???

0

50

100

150

200

N 2N 3N 4N

redundant non-redundant redundant non-redundant

0

50

100

150

200

N 2N 3N 4N N 2N 3N 4N

a) Full grammar (one clause) b) Limited grammar (two cons. clauses)

b) Limited grammar (two cons. clauses)

at+1 = atc·d

asprt+1 = avsrt +aspr

t2 + ↵ · (asmc

t � asprt ) + r � 0.1

avsrt+1 = avsrt +aspr

t2 + ↵ · (asmc

t � avsrt ) + r � 0.1

p = 1/(1 + e�(at�s)

r )

at the top

a window

hasthe tower

S

NP VP

NP

entity(lm4_tower,tower)

entity(lm4_window, window)

at_top(lm4_tower,lm_window)

PP

"Effects of cognitive and linguistic constraints on semantic coordination of speech and gesturemodelled with a process of multimodal conceptualization."

Modeling results Memory with dynamic activations

Information packagingSingle clause: "There is a round window" (+ shaping gesture)Two clauses: "There is a window" (+ pointing) ... "it's round" (+ shaping)

Information distribution

Semantic coordination

Redundant Supplementary Complementary

Gesture Speech

generated from imagistic representations of the referentevents. When translocational motion is represented asimagery, certain features of the event, such as the di-rection of the motion, have to be specified regardless oftheir significance in the discourse. In the two scenesdiscussed above, whether the lateral motion was to theright or to the left is not consequential in the plot de-velopment and thus this information is not likely to beexpressed in speech. However, when the motion is rep-resented as imagery, its direction has to be specified.Thus, the gesture that is generated on the basis of theimagery should regularly encode the direction of themotion based on the visual experience of the stimulus.

The Free Imagery Hypothesis predicts that there isno cross-linguistic difference in the gestural content forboth the first and second scenes that we just discussed,but that gestures regularly encode spatial details thatmay not be verbally expressed. The Lexical SemanticsHypothesis predicts that gestures reflect differences inlinguistic encoding possibilities in the three languages,but that gestures do not regularly encode spatial detailsthat are not verbalized.

To obtain a cross-linguistically comparable gesturecorpus, narratives in American English, Japanese, andTurkish were collected using the same stimulus. Themethodology basically follows that of McNeill (1992).

Method

Participants

Sixteen adult native speakers of American English,18 adult native speakers of Turkish, and 17 adult nativespeakers of Japanese participated in the experiment.

Materials

The stimulus was an American animated cartoon,which was about 6min long. The recurrent theme of thecartoon was a cat!s (Sylvester) unsuccessful attempts tocatch a bird (Tweetie). For a detailed description of thecartoon, see the appendix of McNeill (1992).

Procedure

Each participant was told that they were participat-ing in a story telling experiment. She/he was instructedto remember the stimulus as well as possible so as to beable to tell a detailed story to a person who did not seethe stimulus. Gesture was not mentioned in the in-struction. The participant was shown the stimulus on aTV monitor, while the listener waited in another room.Immediately after watching the stimulus, the participanttold the story to the listener. No specific instruction wasgiven to the listener except that he/she should pay at-

tention to the story and was allowed to ask questions.Each participant!s narration was videotaped.

Effect of limitation in linguistic expressive resources ongestural representations

The first analysis is carried out to investigate howlimitation in expressive resources of a given languageaffects gestural representation. The scene in the stimulusthat is selected for the analysis is the Swing Scene. In theSwing Scene, a cat and a bird are across the street fromone another in the windows of different high-rises. Thecat!s building is on the right side of the screen and thebird!s building is on the left side of the screen. In anattempt to catch the bird, the cat swings across the streeton a rope that we must imagine is attached somewherein the air above the street. Fig. 1 is the schematicdrawing of the event.

In Turkish and Japanese, there is no readily accessi-ble expression that semantically encodes agentive changeof location with an arc trajectory. There is no verb thatcorresponds to the English intransitive verb ‘‘to swing’’as in ‘‘the cat swings across the street’’. There is noreadily accessible paraphrase for it either. (It would bepossible to use mathematical terms like ‘‘arc’’ to para-phrase English ‘‘swing’’ such as ‘‘fly, drawing an arc,’’but such a paraphrase would not be a readily accessibleone.) Thus, this is not only a lexical gap, but it is also amore general limitation in the expressive resources of thetwo languages.

This cross-linguistic difference requires speakers ofthe three languages differ in their conceptual planningfor speaking. Turkish and Japanese speakers have toconstrue the Swing Event in such a way that the tra-jectory shape is abstracted out, whereas English speak-ers! construal of the event can include the arc trajectory.The Interface Hypothesis proposes that the spatio-mo-toric representation of the event, which manifests itselfas gesture, reflects the way the speakers of each language

Fig. 1. The schematic representation of the Swing Event in thestimulus.

S. Kita, A. Ozyurek / Journal of Memory and Language 48 (2003) 16–32 19

package the information about the event. Thus, it ispredicted that Turkish and Japanese speakers are morelikely to gesturally represent the event without the tra-jectory shape than American English speakers.

Furthermore, the Interface Hypothesis also predictsthat the gestural representation of the event regularlyreflects some aspects of the stimulus scene that are notexpressed in the accompanying speech. It has been re-ported that the direction of the lateral movement (i.e., tothe left or to the right) in the stimulus is regularly re-produced in the gesture, but rarely in the speech(McCullough, 1993). If the participant sees a movementin the stimulus that goes to the right on the videomonitor, she/he is highly likely to gesturally representthe event as a movement to the right from the speaker!spoint of view. It is predicted that Turkish, Japanese, andAmerican English speakers all regularly represent thelateral direction of the cat!s change of location in theirgestures, despite the fact that the content of these ges-tures is also shaped by the information packaging pos-sibility of the respective languages.

Coding

The portion of the narratives in the three languagesthat referred to the change of location of the cat in theSwing Scene, henceforth the Swing Event, was analyzed.Gestures that expressed horizontal displacement werecoded by two coders for the following two form features.First, it was coded whether the trajectory shape is ‘‘arc’’or ‘‘straight’’. A gesture was coded as ‘‘arc’’ when itstrajectory was downward concave (e.g. a semi-circle withthe upward ‘‘opening,’’ or any arc that is a part of such asemi-circle). A gesture was coded ‘‘straight’’ when it didnot include downward concave trajectory. The secondformal feature coded was the horizontal direction of thegesture: ‘‘left-biased’’ or ‘‘right-biased’’ or ‘‘purely awayfrom the body.’’

Gestures by three randomly selected speakers fromeach language were used to check the inter-coder reli-ability. The nine speakers from the three languagegroups produced a total of 16 gesture tokens depictingthe Swing Event. The two coders agreed on the arc–

straight judgement on 94% of the tokens, and on thedirection judgement on 87% of the tokens.

Results

SpeechAll 16 American English speakers encoded the Swing

Event in the speech. All but one used the word ‘‘swing’’to describe the event. Fifteen (out of 17) Japanesespeakers and 17 (out of 18) Turkish speakers encodedthe Swing Event in the speech, but none of them lexicallyencoded the arc-shaped trajectory. Instead, they de-scribed the event with a change of location predicatethat is trajectory-neutral. In Japanese, the verbs used inthe description include ‘‘iku’’ (to go), ‘‘tobu’’ (to jump/fly), ‘‘shinobikomu’’ (to sneak in). In Turkish, the verbsused include ‘‘gidiyor’’ (to go), ‘‘ucuyor’’ (to fly), and‘‘atliyor’’ (to jump).

With regard to the coding of the lateral direction ofthe swing event, none of the speakers of any of thelanguages used the words, ‘‘left’’ or ‘‘right.’’

GestureTrajectory shape encoding. Two English, one Turkish,

and two Japanese speakers were excluded from thisanalysis because they either did not mention the targetevent or did not have a gesture with horizontal dislo-cation for the event.

The remaining participants were classified into threemutually exclusive categories according to their gesturalbehavior: those who used, in their description, arc ges-tures only, those who used both arc gestures and straightgestures, and those who used straight gestures only. Fig. 2shows the percentage of the participants in the threelanguages who fell into the three categories. The pro-portions of the three categories of participants differedacross the three languages (v2 test, v ¼ 12:167, DF ¼ 2,p ¼ :002). The pattern of usage of arc and straight ges-tures was very similar between Turkish and Japanesespeakers. More of the Turkish and Japanese speakers asa group used at least one straight gesture (i.e., the darkbar plus the gray bar in Fig. 2) than the English speakers(Fisher!s exact test, one-tailed, p < :001).

Fig. 2. Percentage of participants with the three patterns of usage of arc and straight gestures.

20 S. Kita, A. Ozyurek / Journal of Memory and Language 48 (2003) 16–32

Sebastian Kahl, Kirsten Bergmann and Stefan Kopp

Conceptualization

Sebastian KahlSociable Agents Group, CITEC, Bielefeld University

Web: http://www.glialfire.netMail: [email protected]: +49 521 106 12947

Twitter: @glialfire gentsSociable

s. kita, a. o¨ zyu¨rek / journal of memory and language 48 … · 2015-12-04 · catch a bird...

Documents