language and cognitive processes - the university of...

For Peer Review O

nly

The Effects of Addressee Attention on Prosodic Prominence

Journal: Language and Cognitive Processes

Manuscript ID: Draft

Manuscript Type: Prosody in Context

Date Submitted by the Author:

n/a

Complete List of Authors: Rosa, Elise; UNC Chapel Hill, Psychology Finch, Kayla; UNC Chapel Hill, Psychology Bergeson, Molly; UNC Chapel Hill, Psychology Arnold, Jennifer; UNC Chapel Hill, Dept. of Psychology

Keywords: Prosody, Attention, audience design

URL: http://mc.manuscriptcentral.com/plcp Email: [email protected]

Language and Cognitive Processes

For Peer Review O

nly

1

Title: The Effects of Addressee Attention on Prosodic Prominence

Names of Authors: Elise C. Rosa, Kayla H. Finch, Molly Bergeson, Jennifer E. Arnold

Address of authors: University of North Carolina at Chapel Hill

Short Title: The Effects of Addressee Attention

Correspondence should be addressed to Elise Rosa, University of North Carolina at Chapel Hill,

Department of Psychology, Davie Hall CB #3270, Chapel Hill, NC, 27516, USA Email:

[email protected]

Fax: 919-962-2537

Email: [email protected]

Page 1 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

mailto:[email protected]

For Peer Review O

nly

2

ABSTRACT

How do speakers accommodate distracted listeners? Specifically, how does prosody change

when speakers know that their addressees are multitasking? Speakers might use more

acoustically prominent words for distracted addressees, to ensure that important information is

communicated. Alternatively, speakers might disengage from the task and use less prominent

pronunciations with distracted addressees. A further question is whether prosodic prominence

changes globally or if there are effects specific to the most relevant information. We studied

these effects in two instruction-giving experiments. Speakers instructed listeners to move objects

to locations on a board. In the distraction condition, addressees were also completing a

demanding secondary computer task; in the attentive condition they paid full attention. Results

demonstrated that speakers modify their speech for distracted listeners, and in an instruction-

giving task they specifically use more acoustically prominent (longer) pronunciations for

distracted listeners. This effect was localized to the most task-relevant information: the object to

be moved.

Page 2 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

3

This research was supported by: NSF grant BCS-0745627. We gratefully acknowledge the

assistance of Giulia Pancani.

Page 3 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

4

Speakers have numerous choices to make for every message they want to communicate.

They can be concise (Crackers please!) or verbose (Can you please hand me that box of

crackers?) They can specify objects with detail (That box of saltines next to you) or not (that).

They can enunciate words prominently or with a reduced pronunciation. Many of these choices

are related to the information being communicated. Already-known or predictable information is

generally expressed with fewer words, less detail, and reduced pronunciation, whereas new or

important information is referred to with more words, more detail, and acoustically prominent

forms (Arnold, 1998, 2008, 2010; Brown, 1983; Chafe, 1976; Gundel, Hedberg & Zacharski,

1993; Halliday, 1967; Sityaev, 2000). A much-debated issue is whether these choices are made

as a result of the speaker’s knowledge about their addressee’s knowledge or attention – a process

known as audience design (Arnold, Kahn, & Pancani, in press; Horton & Keysar, 1996; Galati

& Brennan, 2010).

In this paper we ask whether people speak differently when their addressee is distracted,

as one type of audience design. For example, if your request for crackers is directed at someone

engaged in a different task, like driving a car, how will your word choice and pronunciation be

affected? There are a lot of dimensions on which you might change your prosody- you might

speak the whole sentence more slowly, or loudly, or you might speak only particular words more

slowly. We focus here on how speakers modify the acoustic prominence of their words,

specifically word duration, but also examine how it co-occurs with other types of linguistic form

variation. Duration is especially interesting because it may vary as a function of the speaker’s

desire to make certain words prominent (Breen, Fedorenko, Wagner, & Gibson, 2010; Ladd,

1996), but also can provide a cue about the speaker’s fluency (Bell, Jurafsky, Fosler-Lussier,

Page 4 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

5

Girand, Gregory et al., 2003), which in turn can affect comprehension (e.g., Arnold, Tanenhaus,

Altmann & Fagnano, 2004; Arnold, Hudson Kam & Tanenhaus, 2007).

It is well established that speakers use language differently for different addressees (e.g.,

Clark, 1996; Clark & Krych, 2004; Galati & Brennan, 2010), and there is good evidence that

audience design impacts lexical choices (Brown-Schmidt & Tanenhaus, 2006; Brennan & Clark,

1996; Gorman et al., 2011; Heller, Gorman & Tanenhaus, in press; Horton & Keysar 1996).

Speakers refer to objects in conversation using partner-specific terms they’ve developed over the

course of conversation. They also keep track of the physical presence of objects they’re referring

to for themselves and their conversational partners.

However, an ongoing debate concerns the effect of audience design on acoustic

prominence. Some theories suggest that audience design is the primary determinant of the

speaker’s choice to acoustically emphasize some words (Chafe, 1987, Lindblom, 1990). This

account is consistent with the idea that new and unpredictable information tends to be accented

(e.g., Venditti & Hirschberg, 2003), since this information should be less accessible to listeners,

and thus require more explicit input. However, a strong version of this account has found little

support in empirical studies where the speaker and addressee’s knowledge are examined

separately. For example, Bard and colleagues (Bard, Anderson, Aylett, Doherty-Sneddon &

Newlands, 2000; Bard & Aylett, 2005) found that intelligibility was unaffected by numerous

measures of the listener’s knowledge. They proposed the dual process hypothesis (Bard et al.,

2000), in which fast automatic processes allow for the speaker’s memory to affect articulation,

and slower processes incorporate information about the listener for purposes like choosing

pronominal forms. Similarly, Kahn & Arnold (2012, under review- b) found that speakers

Page 5 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

6

shortened nouns that they had recently heard, regardless of their addressee’s experience with the

word.

In contrast, Galati & Brennan (2010) found that words directed at knowledgeable

addressees were rated as less intelligible than those directed at naïve addressees, even though

they did not differ on duration. Arnold et al. (in press) found that speakers in their experiment

did modulate the duration of words in response to addressee behavior, but specifically on a word

associated with utterance planning -- the determiner the (Clark & Wasow, 1998). This suggested

that effects of audience design may be mediated by production-internal processes of utterance

planning.

Whether audience design affects acoustic variation or not, there is abundant evidence that

speakers use longer and more acoustically prominent pronunciations for information that is

harder to retrieve or plan and shorter pronunciations for easy-to-produce words and referents

(Arnold & Watson, 2012, under review; Balota & Chumbley, 1985; Lam & Watson, 2010; Kahn

& Arnold, 2012, under review-a; Bard et al., 2000; Bell, Brenier, Gregory, Girand & Jurafsky,

2009). For example, when speakers are disfluent, saying um, uh, or repeating words, it indicates

they are having speech production difficulty. Words surrounding such disfluent elements also

tend to be longer (Bell et al., 2003).

In sum, previous work suggests that speakers accommodate their listeners’ needs in many

ways, but effects of audience design on acoustic variation are variable. However, the majority of

work on this question has focused on whether speakers adjust their pronunciations in response to

their addressee’s knowledge. The current work instead examines the effects of the listener’s

attentional state. Do speakers modulate the acoustic properties of their speech in response to

visible evidence that their addressee is distracted?

Page 6 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

7

Distraction is a common characteristic of day-to-day life, yet relatively little is known

about how speakers adjust their linguistic form when speaking to distracted addressees. In a

narrative recall study, Pasupathi, Stallworth & Murdoch (1998) found that speakers with

attentive addressees produced more information than those with distracted addressees. Similarly,

Kuhlen & Brennan (2010) found that speakers told narrative jokes with more detail with

attentive rather than distracted addresses, although this effect was weakened when the speaker

expected the addressee to be distracted. Thus, these studies found that speakers provided less

information to distracted addressees.

By contrast, a study by Arnold et al. (in press) suggests that speakers provide more

information for less-attentive addressees. Speakers gave instructions to addressees to place

objects on a board of colored dots, e.g. The teapot goes on yellow. The addressee was either

especially attentive, anticipating the object when possible, or merely normally attentive.

Speakers both used more words and longer pronunciations of the word the with non-anticipating

addresses. Unlike the narrative tasks in which distracted addressees elicited less detail, this task

required the addressee to follow instructions, so increased verbal specificity may have had a

concrete advantage for completing the task. Similar results come from a narrative production

study by Rosa & Arnold (2011), except they found that speakers provided more explicit referring

expressions when they themselves were distracted.

The current study used an instruction-following task similar to Arnold et al. (in press) to

examine the effects of addressee attention at the other end of the spectrum, when addressees are

distracted with a secondary task. If speakers use acoustic prominence to ensure effective

communication, we would expect to see longer words with distracted addressees. If such an

effect is driven by the comprehension needs of the listener, we would expect increased

Page 7 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

8

prominence to be localized to the most central information for the task, that is, the word

describing the target object. Alternatively, if speakers engage more with attentive addressees,

they might decrease acoustic prominence for distracted addressees, as found in narrative recall

tasks for lexical detail (Kuhlen & Brennan, 2010; Pasupathi et al., 1998).

A second question was whether effects of addressee distraction would interact with

known informational predictors of acoustic prominence. One well-known determinant of word

duration is predictability: Predictable words tend to be acoustically reduced (Jurafsky, Bell,

Gregory & Raymond, 2001; Bell et al., 2009; Gahl & Garnsey, 2004), where predictability can

stem from the surrounding words, the prior sentence meaning, or syntactic structure. Likewise,

when the discourse context leads to an expectation of a specific referent, words referring to it

tend to be reduced (Arnold 1998, 2001; Lam & Watson, 2010; Watson, Arnold, & Tanenhaus,

2008). If speakers think that distracted addressees cannot follow predictability cues effectively,

they may resist the usual tendency to reduce predictable information, and thereby show greater

acoustic prominence specifically for predictable words. Alternatively, they may use a simpler

strategy of adjusting their speech for distracted addressees overall, regardless of predictability.

We therefore examined the effects of addressee distraction in two experiments. Both used

the same instruction-giving task. Each trial involved two objects, and speakers always produced

one instruction for each object. In Experiment 1 the target object was the second item in the pair,

meaning that after the first instruction was given, the object in the second instruction was fully

predictable. In Experiment 2, the same target objects were used as the first instruction, so they

were relatively less predictable. Word duration was the main variable of concern in this study,

but the effects of predictability and distraction on lexical choices were also examined.

Page 8 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

9

One of the advantages of this experimental paradigm is that it involved a concrete task,

which provided the speaker with motivation to accommodate the addressee. Another advantage

to this task is that it imposed little to no memory burden on the subject, in contrast to other

studies that used either maps or narratives that only the speaker had viewed (Bard & Aylett,

2005; Bard et al. 2000). As this added burden of “record-keeping” was reduced in our task,

speakers presumably had more resources with which to complete the task, and therefore might be

more capable of considering their listeners in planning their utterances. Additionally, partner-

specific findings or audience design effects are most likely to occur in an interactive dialog

setting (Brown-Schmidt, 2009)

EXPERIMENTS 1 AND 2

We tested how speakers would modify their speech in reaction to the listener’s state of

distraction, in two experiments. The methods and analyses were nearly identical across the two

experiments, so they are reported together.

Method

Participants. Twenty undergraduate students from the University of North Carolina

participated, ten in Experiment 1, and ten in Experiment 2. All participants were native speakers

of English, and normal or corrected-to-normal vision. Participants received course credit for their

participation.

Materials and Design. Target stimuli consisted of 48 physical objects whose names were

matched for number of phonemes, syllables, and frequency. The same target stimuli were used in

Experiments 1 and 2. Filler stimuli (i.e., those objects used for the other instruction) were all 1

Page 9 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

10

syllable, and were the same across experiments. Two lists were formed for each experiment, as

participants worked with one attentive addressee and 1 distracted addressee. The lists were

paired by length, phonemes, and frequency as closely as possible. Targets were presented to each

participant once, either as the second item in a pair (Experiment 1, predictable targets), or as the

first item in a pair (Experiment 2, unpredictable targets). Different study participants performed

the two experiments. Thus, there were 20 participants total with 48 trials each. The order in

which addressees (distracted, attention) were encountered was counterbalanced across

participants to provide a control of any carry-over effect between first and second blocks.

Equipment. Stimuli were presented on a computer monitor in a slide-show format, using

Powerpoint. The objects to be moved were stored in containers, and were put on the table in

pairs. Responses were recorded using a headset microphone.

Procedure. Participants worked with two confederates during each of the experiments.

During half of the trials the participant was paired with a distracted confederate, who was

performing a secondary computer task while completing the primary task. During the other half

of the trials the participants worked with a confederate who was not performing any secondary

task. The secondary task was a timed state-labeling game that required the confederate to be

continuously engaged, except when pausing to carry out the instructions. The order of distraction

was counter-balanced between participants.

The primary task was an instruction-giving task. Once the experiment began, participants

would see pictures of two objects appear on a computer screen behind the confederate. The

objects were on colored circles on the screen. The two objects to be moved were placed on the

table, and the participant instructed the confederate to move them to the appropriate colored

circles. The objects appeared on the computer screen one at a time. In Experiment 1 the target

Page 10 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

11

item was the second object to appear, making it entirely predictable. In Experiment 2 the target

item object was the first to appear on the screen, making it relatively unpredictable to the

confederate, who could not view the screen. Participants issued verbal instructions to

confederates to move the objects, for example, “Put the fox on the green circle. Now put the cork

on the red circle”. As soon as confederates moved the second object of the pair, the computer

screen was advanced to the next trial.

Analysis. We examined how the distraction manipulation affected the speakers’ choices

in both 1) number of words in the target expression, and 2) the acoustic prominence of their

pronunciations, as measured by the duration of four key regions: a) the latency to begin

speaking, as indexed by the time between the onset of the visual stimulus and the onset of the

first word in the response (excluding disfluencies like uh); b) the determiner the, when produced,

c) the target noun, e.g. fox; and d) the color word, e.g. red.

Duration and latency analyses were restricted to definite noun phrases ‘the koala’ or bare

noun ‘koala’ phrases. Disfluent trials were excluded. Out of the 960 trials (48 trials per 20

participants), 8.96% of the data in Experiment 1 and 9.58% of the data in Experiment 2 were

excluded from the acoustic analyses by these criteria. There were 437 tokens in the analysis for

Experiment 1, and 434 for Experiment 2. Latency analyses additionally excluded outliers that

were more than 2.5 standard deviations above the mean.

Data were analyzed with multilevel logistic regressions in SAS using the proc mixed

command. All models included a random intercept for both subject and either item (for the

number of words analysis) or target noun (for the duration analyses). Target noun was used

instead of item because subjects used a different label for the target object than the intended one

on 14% of the trials, and the noun heavily constrains duration. Similar results obtain if analyses

Page 11 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

12

are restricted to the trials where the intended word was used. We also included random slopes for

subject and item/noun by condition, where possible, following the procedure below.

The primary predictor in our model was the current condition (confederate attentive or

distracted). Critically, the acoustic analyses examined this predictor against the backdrop of

numerous control predictors that are expected to affect word duration. We controlled for speech

rate, calculated as the average time per syllable in the response utterance. Other control

predictors indexed characteristics of the preceding and following context (whether the participant

used a determiner, what the target word was preceded and followed by, and, for the color word

analysis, whether the color word was the last word in the sentence). Both lexical and acoustic

analyses included control variables about the experimental design (the current itemset, which

itemset had come first, which condition had come first, the current confederate, and item order).

(Table 1 about here)

For each analysis, we used the following procedure: a control model was constructed

first, containing all of the control variables, the random intercepts, but not the critical condition

predictor. Control variables that had a t-value of >1.5 were retained in the final model. This final

model was constructed, containing those control variables, plus condition. The model was

initially fit using a maximal random effects structure, including random intercepts for subject and

item/noun, and random slopes for subject x condition and item/noun x condition. If the model

did not converge or was not positive definite, we eliminated the random effects one at a time, in

this order: 1) item/noun x condition; 2) subject x condition; 3) item intercept. The variables

included in each model are shown in Table 1.

Page 12 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

13

Results

When the target was predictable (Exp. 1), speakers used more words to describe the

target object when confederates were distracted (mean = 1.68) than attentive (mean = 1.47);

t(477)= -2.39, p<.05). With predictable targets, there was also a marginal trend for participants to

speak more quickly in the attentive condition, as measured by time per syllable (t (430)= -2.14,

p=.03). In Experiment 2, there was no effect of condition on the number of words used to

describe unpredictable targets, (mean = 1.75), nor overall rate of speech.

The critical analyses concerned word duration, where we found an effect of condition on

the target noun in both experiments. Participants with distracted addressees produced target

words with longer durations than did speakers with attentive addressees, resulting in a main

effect of condition for both predictable targets in Experiment 1 (t(432)= -2.74, p<.01),

unpredictable targets in Experiment 2 (t(423)= -3.21, p<.01). There was also a significant effect

of condition on latency to begin speaking in Experiment 1: latency was longer with distracted

addressees than with attentive addressees (t(410)= -4.60, p<.0001). Condition did not affect

latency for Experiment 2. Analyses of the other two regions revealed no significant effect of

condition on duration of “the” or duration of the color word.

A visual examination of the average durations in Table 2 across the two experiments

shows that the averages are considerably longer in Experiment 2 than Experiment 1. This was

fully expected, given that the time to produce the first instruction may have been influenced by

the need to survey both objects for the trial, as well as the relative unpredictability of the target

object. This contrast was also orthogonal to the goals of the current study, so we did not submit

this comparison to statistical analysis.

Page 13 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

14

(Figure 1 about here)

(Table 2 about here)

Discussion

We found that people speak differently to distracted and attentive addressees, both in terms

of how much information they provided overall, and the acoustic prominence of key words in

their response. In general, distracted addressees elicited longer words and more detailed

utterances. The effect of distraction was robust against variation in the predictability of the target

object, and distracted addressees elicited longer target words than attentive addressees in both

experiments.

A comparison between this study and other studies suggests that speakers can respond to

distracted listeners differently depending on the task demands. In previous studies that required

participants to recall a narrative or tell a joke, speakers provided less information to distracted

listeners, as measured by shorter utterances and less detailed narratives (Kuhlen & Brennan,

2010; Pasupathi et al., 1998). These findings may reflect the social function of narratives and

jokes, as a disinterested listener may change the speaker’s task goals. In the current experiment

the task goals were clear and consistent, and the speaker’s utterances had the function of

instructing the addressee to move the correct object to the right location. This specific set of task

goals may have allowed speakers to assume that greater lexical detail and greater acoustic

prominence would facilitate successful task completion for a distracted listener.

Importantly, the increased duration in the distracted condition occurred specifically on

the target word, and not all regions in the utterance. In Experiment 1, this effect occurred over

and above the tendency for participants to speak faster with attentive addressees. In Experiment

2, there was no general rate change between conditions, yet participants still used longer

Page 14 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

15

durations for attentive addressees. This suggests that speakers were emphasizing words with high

information content for their listeners. The object name was especially critical for the initiation

of the action, which began with selecting the object.

Our results clearly indicate that speakers accommodate distracted addressees by varying

the acoustic prominence of their words. This finding contrasts with other studies in which

duration and intelligibility are frequently unaffected by the addressee’s knowledge (e.g., Bard &

Aylett, 2005; Bard et al.; 2000). This difference may have resulted from the fact that in our task,

speakers also did not have to keep track of what the addressee knew, as we were manipulating

the addressees’ obvious attention. Our task also made the communicative goal transparent, so

speakers were highly motivated to communicate clearly.

This study did not explicitly test the mechanism underlying the effects of addressee’s

attention, but we can speculatively offer some possibilities. A strong audience design explanation

of the increased object-name duration is that speakers were emphasizing the object’s name to

increase addressee understanding. Under this view, speakers in the distracted condition

recognized that their addressees needed extra help. This realization may have triggered a

speaking mode that provided additional information, which presumably would help the distracted

addressee complete the task. The fact that our durational effects were strongest on the target

noun is consistent with this view, since this is the piece of information most critical for initiating

the response. One question is why distraction had no effect on the color word, which presumably

was also an important piece of information for completing the task. We speculate that color word

duration was relatively stable, due to the fact that they were repeated throughout the experiment

and thus relatively facilitated. Additionally, color words are at the end of the sentence, so

speakers presumably had more time to plan.

Page 15 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

16

As Galati and Brennan (2010) suggest, this kind of addressee accommodation could be

done with a “one-bit model”. Speakers can calculate once for each block whether the addressee

is distracted, as this information is readily available and continually present, and this one-time

“either/or” decision can inform their speech for the entirety of the block. If this kind of

calculation underlies our effects, it would predict that speakers can accommodate distraction best

when the addressee’s attentional state is fairly constant. Whether speakers can adjust to moment-

by-moment changes in the addressee’s apparent attention is a topic for future research.

An alternate possibility is that the effects of distraction in our study are not the result of

audience design per se, but rather effects that the addressee’s behavior have on the speaker’s own

cognitive processes. For example, the addressee’s distraction may have led the speaker to be

distracted, or at the very least it may have affected the speaker’s ability to plan each utterance.

Words tend to be shorter when planning is facilitated (Bell et al., 2009; Christodoulou & Arnold,

2011; Kahn & Arnold, 2012, under review-a; see Arnold & Watson 2012, under review, for a

review), which means that audience design effects may be mediated by planning effects, as

opposed to an adjustment of speech forms on the basis of a specific representation of the

addressee’s needs. This possibility would be consistent with evidence that speakers choose more

explicit words when distracted (Rosa & Arnold, 2011).

This planning-based account is consistent with findings from a similar experiment,

reported by Arnold et al. (in press). Their experiment used a very similar paradigm, except that

the manipulation consisted of the addressee’s behavior immediately before the second instruction

– specifically, whether the addressee anticipated the target object or not. However, their findings

differ from the ones reported here. That study found that the addressee’s behavior affected the

latency to begin speaking, and the duration of the determiner the, but not the duration of the

Page 16 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

17

target word. Given that the latency and determiner regions are associated with utterance

planning, they interpreted that profile of results as evidence that the anticipation behavior

affected planning processes.

By contrast, the current experiment finds effects of distraction on target word duration,

and less clear effects on the planning regions. Distraction affected latency to speak in

Experiment 1, but not Experiment 2. There was no effect of condition on determinersi. This

difference is likely to stem from the nature of the manipulation. Distraction in the current study

was a salient, global manipulation, whereas Arnold et al. (in press) used a transitory

manipulation of anticipation. The salience of addressee distraction – and the fact that it could be

calculated on a one-bit model – may have facilitated the engagement of audience design

processes, in addition to any planning-mediated effects of addressee behavior.

In sum, our findings contribute to mounting evidence that variation in word duration is

affected by the speaker’s perception of the addressee’s behavior and/or mental state. This effect

goes beyond the influence of situational variables like the Lombard effect (Lane & Tranel,

1971). Moreover, we found that distraction has multiple effects, including the lexical specificity

of the utterance, the delay to begin speaking, and the duration of critical words. These findings

contribute to the idea that “audience design” is not a single process, and instead, a single

dimension – like word duration – can respond to addressee behavior in multiple ways.

Page 17 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

18

References

Arnold, J. E. (1998). Reference Form and Discourse Patterns. Dissertation, Stanford University.

Arnold, J. E. (2001). The effects of thematic roles on pronoun use and frequency of reference.

Discourse Processes, 31(2), 137-162. doi:10.1207/S15326950DP3102_02

Arnold, J.E., Tanenhaus, M.K., Altmann, R.J., Fagnano, M. (2004). The Old and Thee, uh, New.

Psychological Science, 15(9), 578-582. doi:10.1111/j.0956-7976.2004.00723.x

Arnold, J.E., Hudson Kam, C., & Tanenhaus, M.K. (2007). If you say thee uh- you're describing

something hard: the on-line attribution of disfluency during reference comprehension.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(5), 914-930.

doi:10.1037/0278-7393.33.5.914

Arnold, J.E. (2008). Reference Production: Production-internal and Addressee-oriented

Processes. Language and Cognitive Processes, 23(4), 495-527.

doi:10.1080/01690960801920099

Arnold, J.E. (2010). How speakers refer: the role of accessibility. Language and Linguistic

Compass, 4(4), 187-203. doi: 10.1111/j.1749-818X.2010.00193.x

Arnold, J.E., Kahn, J.M. & Pancani, G. (in press). Audience Design Affects Acoustic Reduction

Via Production Facilitation. Psychological Bulletin & Review.

Arnold, J.E. & Watson, D. (2012, under review). Synthesizing meaning and processing approaches to

prosody: performance matters.

Page 18 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

19

Balota, D. & Chumbley, J. (1985). The locus of word-frequency effects in the pronunciation

task: Lexical access and/or production? Journal of Memory and Language, 24(1), 89-106.

doi:10.1016/0749-596X(85)90017-8

Bard, E.G., Anderson A.H., Aylett, M., Doherty-Sneddon, G., Newlands, A. (2000). Controlling

the intelligibility of referring expressions in dialogue. Journal of Memory and Language,

42(1), 1-22. doi: 10.1006/jmla.1999.2667

Bard, E.G., Aylett, M. (2005). Referential form, duration, and modelling the listener in spoken

dialogue. In J. Trueswell and M. Tanenhaus (Eds.), Approaches to studying world-situated

language use: Bridging the language-as-product and language-as-action traditions. (173-

191). Cambridge: MIT Press.

Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., Gildea, D. Effects of

disfluencies, predictability, and utterance position on word form variation in English

conversation. (2003). The Journal of the Acoustical Society of America, 113(2), 1001-1024.

doi: 10.1121/1.1534836

Bell, A., Brenier, J.M., Gregory, M., Girand, C., & Jurafsky, D., (2009). Predictability effects on

durations of content and function words in conversational English. Journal of Memory and

Language, 60(1), 92-111. doi:10.1016/j.jml.2008.06.003

Page 19 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

20

Breen, M., Fedorenko, E., Wagner, M. & Gibson, E. (2010). Acoustic correlates of information

structure. Language and Cognitive Processes, 25(7), 1044-1098.

doi:10.1080/01690965.2010.504378

Brennan, S. E. & Clark, H. H. (1996). Conceptual pacts and lexical choice in

conversation. Journal of Experimental Psychology: Learning, Memory and Cognition,

22(6),482-1493. doi: 10.1.1.121.3930

Brown, G. (1983). Prosodic structure and the given/new distinction. In Cutler, A., Ladd, D.R.

(Eds.), Prosody: Models and Measurements. (67-77). Springer: Berlin.

Brown-Schmidt, S. (2009). The role of executive function in perspective taking during online

language comprehension. Psychonomic Bulletin & Review, 16(5), 893-900.

doi:10.3758/PBR.16.5.893

Brown-Schmidt, S., & Tanenhaus, M. K. (2006). Watching the eyes when talking about size: An

investigation of message formulation and utterance planning. Journal of Memory and

Language, 54, 592-609. doi: 10.1016/j.jml.2005.12.008

Chafe, W. (1976) Givenness, contrastiveness, definiteness, subjects, topics and point of view. In

Li, C. (ed.). Subject and Topic. New York: Academic Press.

Chafe, W. (1987). Cognitive Constraints on Information Flow. In Russell Tomlin (Ed.),

Coherence and Grounding in Discourse. (21-51). Amsterdam: John Benjamins.

Christodoulou, A. & Arnold, J.E. (2011, September). Utterance planning and articulatory

duration. Poster session presented at the ETAP-2 conference, Montreal, Canada.

Page 20 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

http://matia.stanford.edu/~herb/

http://www.psychology.sunysb.edu/sbrennan-/papers/b&c.pdf

http://www.psychology.sunysb.edu/sbrennan-/papers/b&c.pdf

For Peer Review O

nly

21

Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.

Clark, H. H. & Krych, M. A. (2004). Speaking while monitoring addressees for

understanding. Journal of Memory and Language, 50(1), 62-81.

doi:10.1016/j.jml.2003.08.004

Clark, H.H. & Wasow, T. (1998). Repeating words in spontaneous speech. Cognitive

Psychology, 37(3), 201-242. doi:10.1006/cogp.1998.0693

Gahl, S. & Garnsey, S. (2004). Knowledge of grammar, knowledge of usage: Syntactic

probabilities affect pronunciation variation. Language, 80(4), 748-775. doi: 10.1.1.94.2380

Galati, A. & Brennan, S.E. (2010). Attenuating information in spoken communication: For the

speaker, or for the addressee? Journal of Memory and Language, 62(1), 35-51.

doi:10.1016/j.jml.2009.09.002

Gorman et al. (2011). Memory representations supporting speakers' choice of referring

expression: Effects of category overlap and shared experience. In L. Carlson, C. Hoelscher,

& T.F. Shipley (Eds.), Proceedings of the 33rd

Annual Conference of the Cognitive Science

Society, Austin, TX: Cognitive Science Society.

Gundel, J., N. Hedberg & R. Zacharski (1993) Cognitive status and the form of referring

expressions in discourse. Language, 69(2), 274-307. doi:10.2307/416535

Halliday, M. A. K. (1967). Intonation and Grammar in British English. The Hague: Mouton.

Heller, D., Gorman, K.S., Tanenhaus, M. K., (in press). To name or to describe: shared

knowledge affects referential form. Topics in Cognitive Science.

Page 21 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

http://dx.doi.org/10.1016/j.jml.2009.09.002

For Peer Review O

nly

22

Horton, W.S. & Keysar, B. (1996). When do speakers take into account common ground?

Cognition, 59(1), 91-117. doi:10.1016/0010-0277(96)81418-1

Jurafsky, D., Bell, A., Gregory, M., & Raymond, W., (2001). Probabilistic relations between

words: Evidence from reduction in lexical production. In J. Bybee and P. Hopper (Eds.),

Frequency and the emergence of linguistic structure. (229-254). Amsterdam: John

Benjamins.

Kahn, J. & Arnold, J. E. (2012, under review, a) A Processing-Centered Look at the Contribution

of Givenness to Durational Reduction.

Kahn, J. & Arnold, J.E. (2012, under review, b) Speaker-internal processes drive durational

reduction

Kuhlen, A. K. & Brennan, S. E. (2010). Anticipating Distracted Addressees: How Speakers’

Expectations and Addressees’ Feedback Influence Storytelling. Discourse Processes, 47,

567-587. doi:10.1080/01638530903441339

Ladd, R. (1996). Intonational Phonology. Cambridge: University Press.

Lam, T.Q. & Watson, D.G. (2010). Repetition is easy: Why Repeated Referents Have Reduced

Prominence. Memory & Cognition 38(8), 1137-1146. doi: 10.3758/MC.38.8.1137

Lane, H. and Tranel, B. (1971). The Lombard sign and the role of hearing in speech. Journal of

Speech and Hearing Research, 14, 677-709.

Page 22 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

23

Lindblom, B. (1990). Exploring phonetic variation: A sketch of the H and H theory. In W.J.

Hardcastle and A. Marchal (Eds.), Speech production and speech modeling. (403-439).

Dordrecht, The Netherlands: Kluwer.

Pasupathi, M., Stallworth, L.M., Murdoch, K. (1998). How What We Tell Becomes What We

Know: Listener Effects on Speakers’ Long-Term Memory for Events. Discourse Processes,

26(1), 1-25. doi:10.1080/01638539809545035

Rosa, E.C. & Arnold, J.E. (2011). The role of attention in choice of referring expression. In L.

Carlson, C. Hoelscher, & T.F. Shipley (Eds.), Proceedings of the 33rd

Annual Conference of

the Cognitive Science Society, Austin, TX: Cognitive Science Society.

Sityaev, D. (2000). The relationship between accentuation and information status of discourse

referents: A corpus-based study. UCL Working Papers in Linguistics (12).

Venditti, J. J., & Hirschberg, J. (2003). Intonation and discourse processing. Proceedings of

ICPhS 2003, Barcelona, 107-114.

Watson, D.G., Arnold, J.E., & Tanenhaus, M.K. (2008). Tic tac TOE: Effects of predictability

and importance on acoustic prominence in language production. Cognition, 106, 156-1557.

doi:10.1016/j.cognition.2007.06.009

Page 23 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

24

i The only analysis in which distraction affected determiner duration was for Experiment 2, when the analysis was

limited to items where the speaker used the intended label for the target object. In this analysis, the effect of

condition was marginal (t(248)=-1.92, p=.056).

Page 24 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

Table 1

Control variables and random effects in each model. For control variables, dashes mean that the

variable was not significant in the control model and was therefore not included in the final

model. The t-values mark significant effects and the direction of the effect (positive/negative);

N.S. means not significant. Empty boxes indicate the control variables were not included in the

control models. Models were run separately for Experiments 1 and 2.

# words target

area

target noun

duration

‘the’ duration Color duration Latency duration

Exp.1 Exp.2 Exp.1 Exp. 2 Exp.1 Exp. 2 Exp.1 Exp. 2 Exp. 1 Exp. 2

Itemset -- 3.34 -- -2.27 -- -- -- -- -- --

Itemset order -- -- -- 1.90 -- -- -- 3.62 -- --

Item order -- -2.68 -- -1.67 -- 2.70 -- -1.50 -3.34 -3.87

Condition order -- -- -- -- -- -- -- -- -- --

target noun

syllables

6.39 9.37

Rate of speech 13.93 12.73 7.20 11.31 6.98 4.44 2.82 1.82

Use of determiner -- -2.26 -- -- N.S. --

Confederate N.S. -- -- N.S. -- N.S. N.S. N.S. -- --

Preceding word -- -3.59

Following word 2.82 2.76

Is color last word 4.02 9.01

Subject intercept * * * * * * * * * *

Item/noun

intercept

* * * * * * * * *

Subj x cond. slope * * *

Item/noun x cond

slope

*

Page 25 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

Table 2

Mean durations (ms) for each region in each condition

Latency ‘the’ Object Color

Attentive 666.36 105.46 363.63 268.48 Experiment 1:

Predictable

Distracted 851.4 112.03 401.42 275.47

Attentive 1490.05 212.63 464.83 334.2 Experiment 2:

Unpredictable

Distracted 1598.89 210.31 494.28 332.31

Page 26 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review O

nly

Figure 1. Duration of target noun in Experiment 1 (left panel) and Experiment 2 (right panel).

Page 27 of 27



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

language and cognitive processes - the university of...

Documents