connecting acoustics to linguistics in chinese intonation greg kochanski (oxford phonetics) chilin...
TRANSCRIPT
![Page 1: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/1.jpg)
Connecting Acoustics to Linguistics in Chinese Intonation
Greg Kochanski (Oxford Phonetics)
Chilin Shih (University of Illinois)
Tan Lee (CUHK)
withHongyan Jing (IBM)
Jiahong Yuan (Cornell)
![Page 2: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/2.jpg)
Questions• Can we usefully include biomechanics into a phonetics
model?• Can we objectively assign an importance to a syllable?
• Can we write a unified description of F0 for both tone and accent languages?
GoalBuild a mathematical model that
takes a sequence of discrete symbols as inputand
produces a quantitative prediction for f0.
![Page 3: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/3.jpg)
TheChallenge
![Page 4: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/4.jpg)
Existing work
Rising?
![Page 5: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/5.jpg)
Basic assumptions used in modeling
• People plan their utterances several syllables in advance.
• People produce speech optimized to communicate with minimal effort.
• A realistic model for the muscles that control f0
![Page 6: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/6.jpg)
Realistic model of muscle control for F0
• We’d like a model of prosody that can apply beyond F0.
![Page 7: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/7.jpg)
People talk nearly as fast as possible.
![Page 8: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/8.jpg)
Speech could be optimal
•Most of what we say is made from bits and pieces we’ve said before.
•There are only 4 (Mandarin) or 6 (Cantonese) tones to combine.
•A speaker has the chance to practice and optimize all the common 3- and 4- tone sequences.
![Page 9: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/9.jpg)
Optimize what?
• People want to minimize effort and/or talk faster– Chairs, Cars
• People want to minimize the chance that they will be misunderstood.– Risk = P(misinterpreted) * cost(misinterpreted)
Minimize: Effort + cost*Error– We allow each syllable to have a different weight,
so error is a sum over syllables or words.– Perhaps cost matches importance.
![Page 10: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/10.jpg)
Effort and Error
22222 pppdtG
How does Effort depend on the form of the pitch curve?
Error = mean-squared deviation between the f0
and the templates.
![Page 11: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/11.jpg)
Model behavior
• For cost>>1, Error dominates, and pitch matches target.
• For cost<<1, Effort dominates, both speaker and listener accept large deviations, and pitch smoothly interpolates.
• For cost~1, everything compromises.
Cost plays the role of a prosodic strength.
![Page 12: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/12.jpg)
Another Challenge
Time (10 ms intervals)
F0 (
Hz)
12
34
Tone shapes
![Page 13: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/13.jpg)
The rest of the model.
• A model is a sequence of targets (used to compute the Error terms).
• Each target has a strength (i.e. the cost of misinterpretation).
• One target per tone.
• Targets are stretched to fit syllable duration.
• Only one phonological rule: 3323
![Page 14: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/14.jpg)
Model fits for Mandarin Chinese
Tone class (input)Strength (result)
Inside a word, strength is distributed by the metrical
pattern
![Page 15: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/15.jpg)
What’s the procedure?
Compute the pitch curve as a function of phonological inputs
and prosodic strength.
Sequence of tones (phonology)
Prosodic strengths
Predicted F0
Data
Nonlinear least-squares fitting algorithm
![Page 16: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/16.jpg)
Model fits to Mandarin Chinese
0.61 free parameters per syllable, 13 Hz RMS error.
![Page 17: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/17.jpg)
Strengths are stable under small changes in the model.
The two models have words defined by different labelers
This model allows extra freedom: different tones are allowed to define their targets differently
This model allows less freedom: all tones have the same type of target.
![Page 18: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/18.jpg)
Model parameters
Mandarin
Cantonese
Phrasing is marked in speech.
Cantonese data courtesy of Prof. Tan Lee
![Page 19: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/19.jpg)
Metrical patterns inside words
Mandarin
“Normal” segmentation of characters into words.
Random segmentation of characters into words.
Lexical acquisition
![Page 20: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/20.jpg)
Other nice properties
•Strengths are correlated with duration:
•(duration is a proxy for prominence)
•r = 0.40 (sentence final)
•r = 0.27 (non-final)
•>95% confidence
•Strength is correlated with mutual information of neighboring syllables:
•r = -0.175
•>95% confidence
•Sloppy when generating unsurprising syllables, and precise for surprising syllables.
![Page 21: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/21.jpg)
Local Conclusion
• Intonation can be represented as:– a small set of discrete symbols, in sequence, with– a per-person or per-style shape for each symbol;– modulated by a variable prosodic strength.
• One symbol per syllable seems enough
• The strength parameter seems real– Similar across languages– Matches language structure
![Page 22: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/22.jpg)
Q: But does it work for English?
A: Yes, under circumstances where the intonational phonology is simple enough to be obvious.
![Page 23: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/23.jpg)
Reminder: Limitations of f0 and complexity of prosody.
To show the range of information that can be carried by prosody, observe an elegant experiment by Stan Freberg (1950):
The text has virtually no lexical information, but it still tells a story. Even so, it is very hard to label individual words.
![Page 24: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/24.jpg)
English
•Sentences in the form “123-456-7890?”
•Speaker is trying to confirm a single digit.
•Models have just 1.1 parameter per sentence.
![Page 25: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/25.jpg)
The model for English
•There are identical boundary tones on every utterance.
•All target shapes are identical, except the focus.
%X B B B | B A B | B B B B Y%
%X B B B | A B B | B B B B Y%
%X B A B | B B B | B B B B Y%
•Rather simple phonology.
•Accent prominence depends on position in phrase and in utterance.
![Page 26: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/26.jpg)
Model fits well over a range of speeds.Suppressed phrasing
Lowspeed
Highspeed
Merger of accent with boundary tone
![Page 27: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/27.jpg)
Model reproduces nontrivial features of the data and fits well over a range of speeds.
Suppressed phrasing
Lowspeed
Highspeed
Merger of accent with boundary tone
![Page 28: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649ce65503460f949b455f/html5/thumbnails/28.jpg)
Conclusion
•Physiologically-based models can capture important aspects of speech.
•A very compact representation of behavior.
•It can be applied broadly:
•Two dialects of Chinese
•Some aspects of English
•It raises questions about where the phonetics/phonology boundary actually sits.
•Introduces an objective acoustic measure of prosodic prominence.
•Suggests that the speaker may help the listener segment the speech stream.