that was a question? accommodating ... - ur kinder lab
TRANSCRIPT
That was a question? Accommodating variability in intonation interpretation
Andrés Buxó-Lugo & Chigusa Kurumada
Intonation communicates intentions
you don’t choke on a pickle…
‣ Low-level acoustic signal mapped onto intonational representations
New Contrastive
‣ Intonational representations cue intended meanings of utterances in context
(e.g., Bolinger, 1986; Cutler, 1977; Dahan, 2015; Ladd, 1983; Pierrehumbert & Hirschberg, 1990)
you don’t choke on a pickle…
‣ In speech, intonational categories are realized in widely varied acoustics
‣ Linguistic contexts‣ Socio-indexical features‣ Speaking conditions
Puzzles related to variability
New Contrastive
(e.g., Clopper & Smiljanic, 2011; Cole, 2015; Grabe et al., 2005; Green, 2002; Holliday, 2019; Ladd, 2008; Podesva & Callier, 2015; Warren, 2016)
New vs. Contrast
‣ “Red, green, blue. White, gray, black” [New]
‣ “White, green, black. White, gray, black” [Contrast]
(e.g., Buxó-Lugo, Toscano, & Watson, 2018)
(e.g., Buxó-Lugo, Toscano, & Watson, 2018)
New vs. Contrast
‣ “Red, green, blue. White, gray, black” [New]
‣ “White, green, black. White, gray, black” [Contrast]
0
1
2
3
4
dens
ity
50
100
150
200
250
0.25 0.50 0.75Word Duration
Mea
n F0
ContrastNew
0.0000 0.0025 0.0050 0.0075
densityword duration
Mea
n F0
New vs. Contrast
0
1
2
3
4
dens
ity
50
100
150
200
250
0.25 0.50 0.75Word Duration
Mea
n F0
ContrastNew
0.0000 0.0025 0.0050 0.0075
density
‣ “Red, green, blue. White, gray, black” [New]
‣ “White, green, black. White, gray, black” [Contrast]
word duration
(e.g., Buxó-Lugo, Toscano, & Watson, 2018)
Mea
n F0
Accommodating variations
1) Normalization ‣ Interpreting acoustic variations in proportion to a contextually defined baseline e.g., male vs. female have different baseline pitch
2) Adaptation
‣ learning statistical patterns of the input particular to a given context and speaker e.g., individual speakers express the same intonation contour with different combinations of acoustic cues
(Cole, 2015; Dilley & Pitt, 2010; Johnson & Mullenix; 1997; Kraljic & Samuel, 2008; McMurray & Jongman, 2012; Nearey, 1978; Norris et al., 2016, Summerfield, 1981; inter alios)
‣ We investigate normalization and adaptation looking at question vs. statement intonation
‣ (Thought to be) a binary classification with contrasting interpretations
Current study
‣ We investigate normalization and adaptation looking at question vs. statement intonation
‣ (Thought to be) a binary classification with contrasting interpretations
Current study
‣ Study 1: Production: To what extent do native speakers vary in their use of intonation?
‣ Study 2: Comprehension: Can native listeners adapt to speaker-specific variations of intonation?
Study 1: Production
‣ 59 Native speakers of American English (45 Female) ‣ 48 sentences ‣ “It’s X-ing” (e.g., It’s raining)‣ 24 verb types (produced as a question vs. statement once each)Its X ing
Its X ing
QuestionStatement
Statement
Question
175
200
225
250
0.0 0.2 0.4 0.6 0.8
Its X ing
Duration (s)
Section
Mea
n F0
Results 1: Raw acoustic values
0
2
4
6
dens
ity
100
200
300
400
500
0.2 0.3 0.4 0.5 0.6 0.7Last Syllable Duration (s)
Last
Syl
labl
e M
ean
F0
QuestionStatement
0.0000 0.0025 0.0050 0.0075 0.0100
density
Results 1: Raw acoustic values
0
2
4
6
dens
ity
100
200
300
400
500
0.2 0.3 0.4 0.5 0.6 0.7Last Syllable Duration (s)
Last
Syl
labl
e M
ean
F0
QuestionStatement
0.0000 0.0025 0.0050 0.0075 0.0100
density
Results 1: Raw acoustic values
0
2
4
6
dens
ity
100
200
300
400
500
0.2 0.3 0.4 0.5 0.6 0.7Last Syllable Duration (s)
Last
Syl
labl
e M
ean
F0
QuestionStatement
0.0000 0.0025 0.0050 0.0075 0.0100
density
Normalized: Speaker means
0.0
2.5
5.0
7.5
dens
ity
−100
0
100
200
−0.2 −0.1 0.0 0.1 0.2 0.3By Subject Normalized Duration
By S
ubje
ct N
orm
alize
d F0
QuestionStatement
0.000 0.003 0.006 0.009 0.012
density
‣ mean of all sentences produced by a given speaker.
0
1
2
3
4
5
dens
ity
−200
−100
0
100
200
300
−0.2 0.0 0.2 0.4Last Syllable Duration Increase
Last
Syl
labl
e F0
Incr
ease
QuestionStatement
0.000 0.003 0.006 0.009
density
Normalized: Preceding context
‣ mean of the 2nd syllable (It’s X-ing) in the same token
Across talker variability
S47 S98
S33 S34
−0.2−0.1 0.0 0.1 0.2−0.2−0.1 0.0 0.1 0.2
−300−200−100
0100200
−300−200−100
0100200
Last Syllable Duration (s)
Last
Syl
labl
e M
ean
F0
‣ Normalization does not fully resolve the ambiguity → Can listeners adapt to speaker specificity?
Adaptation to speaker’s intonations?
‣ prediction: depending on the patterns of production by a given speaker, ambiguous tokens receive opposing interpretations
S Q
(Kleinschmidt & Jaeger, 2011)
Study 2: Stimuli
step 1 (statement)
step 11 (question)
“It’s moving”
(Kurumada, Brown, & Tanenhaus, 2017)
Pre-exposure (22 trials) ‣ “It’s cooking” sampled from Steps 1-11
‣ 2AFC: “Is this a question or a statement?”
Post-exposure (22 trials) ‣ materials and task identical to the pre-exposure
Exposure (30 trials) with feedback‣ 3 between-subject conditions
Study 2: Design (n=180)
Question-biasing Non-ambiguous Statement-biasing
1 3 5 7 9 111 3 5 7 9 11 1 3 5 7 9 11step
0.00
0.25
0.50
0.75
1.00
1 3 5 7 9 11
PrePost 0.00
0.25
0.50
0.75
1.00
1 3 5 7 9 11
PrePost0.00
0.25
0.50
0.75
1.00
1 3 5 7 9 11
% Q
uest
ion
Res
pons
es
PrePost
most statement-like
most question-like
0.00
0.25
0.50
0.75
1.00
1 3 5 7 9 11
% Q
uest
ion
Res
pons
es
PrePost 0.00
0.25
0.50
0.75
1.00
1 3 5 7 9 11
% Q
uest
ion
Res
pons
es
PrePost
0
1
2
3
4
5
dens
ity
−200
−100
0
100
200
300
−0.2 0.0 0.2 0.4Last Syllable Duration Increase
Last
Syl
labl
e F0
Incr
ease
QuestionStatement
0.000 0.003 0.006 0.009
density
pitc
h in
crea
se (H
z)
duration increase (ms)
Summary
‣ Adapting to acoustics - intonation mappings facilitates reliable interpretations of the speaker intention
Conclusions
0
100
200
300
400
500F0
It’s rain- ing
rise
fall
‣ There is substantial variability in acoustic realizations of intonation contours
Conclusions
0
100
200
300
400
500F0
It’s rain- ing
‣ There is a substantial variability in acoustic realizations of intonation contours
‣ Adaptation leads to better inference over acoustics intonation mapping intended by the given speaker
Thank you!
http://kinderlab.bcs.rochester.edu/
For discussions: T. Florian Jaeger, Xin Xie, HLP Lab, Laura Dilley For R/Praat scripting: Meredith Brown, Dave Kleinschmidt, Xin Xie, For testing and annotation: Sherwin Nourani, Manasvi Chaturvedi, Nicole Vieyto
Funding: The pump primer II, University of Rochester