cs 4705 lecture 22 intonation and discourse what does prosody convey? in general, information about:...
Post on 20-Dec-2015
217 views
TRANSCRIPT
What does prosody convey?
• In general, information about:– What the speaker is trying to convey
• Is this a statement or a question?– The speaker state
• Is the speaker getting angry, frustrated?
• In dialogue, information about:– The structure of the dialogue
• Is the user or the system trying to start a new topic?• Is the speaker talking about given or new
information?– The state of the interaction:
• Is the user having trouble being understood?• Is the user having trouble understanding the system?
Current Trends
• New description schemes (e.g. ToBI)
• Corpus-based research and machine learning
• Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue)
• Investigation of spontaneous speech phenomena and variation in speaking style
• Applications to CTS, ASR and SDS
Corpora
• Public and semi-public databases– ATIS, SwitchBoard, Call Home, Meetings
(NIST/DARPA/LDC)– TRAINS/TRIPS (U. Rochester), FM Radio (BU), BDC
(Harvard, AT&T)
• Private collections– Acquired for speech or dialogue research (August,
KTH; Voicemail, AT&T, IBM)– Meetings, call centers, operator services, focus group
collections
• The Web– Newscasts, radio
To(nes and)B(reak)I(ndices)
• Developed by prosody researchers in four meetings over 1991-94
• Goals:
– devise common labeling scheme for Standard American English that is robust and reliable
– promote collection of large, prosodically labeled, shareable corpora
• ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....
• Minimal ToBI transcription:
– recording of speech
– f0 contour
– ToBI tiers:
• orthographic tier: words
• break-index tier: degrees of junction (Price et al ‘89)
• tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80)
• miscellaneous tier: disfluencies, non-speech sounds, etc.
• Online training material,available at:– http://www.ling.ohio-state.edu/phonetics/ToBI/
• Evaluation– Good inter-labeler reliability for expert and naive
labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)
Pitch Accent/Prominence in ToBI
• Which items are made intonationally prominent and how?
• Accent type:
– H* simple high (declarative)
– L* simple low (ynq)
– L*+H scooped, late rise (uncertainty/ incredulity)
– L+H* early rise to stress (contrastive focus)
– H+!H* fall onto stress (implied familiarity)
Functions of Pitch Accent
• Given/new information– S: Do you need a return ticket?
– U: No, thanks, I don’t need a return.
• Contrast (narrow focus)– U: No, thanks, I don’t need a RETURN…. (I need a
time schedule, receipt,…)
• Disambiguation of discourse markers– S: Now let me get you the train information.
– U: Okay (thanks) vs. Okay….(but I really want…)
Predicting Accent: Is it accented or not?
• Applications: TTS and CTS• Corpora: read and spontaneous speech• Features: pos window of 3, sentence position,
position within NP, # of syllables, position in complex nominal, inferred given/new status, inferred focus, mutual information
• Results: 75-85% correct, depending on genre
Prosodic Phrasing in ToBI
• ‘Levels’ of phrasing:
– intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- )
– intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% )
• ToBI break-index tier
– 0 no word boundary
– 1 word boundary
– 2 strong juncture with no tonal markings– 3 intermediate phrase boundary– 4 intonational phrase boundary
Functions of Phrasing
• Disambiguates syntactic constructions, e.g. PP attachment, restrictive/non relative clause:– S: You should buy the ticket with the discount coupon.
– S: The itinerary which I faxed includes deluxe accommodations
• Disambiguates scope ambiguities, e.g. Negation: – S: You aren’t booked through Rome because of the
fare.
• Or modifier scope: – S: This fare is restricted to retired politicians and civil
servants.
Predicting Phrase Boundaries
• Applications: TTS, CTS, ASR• Corpora: AP news, Penn Treebank, ATIS• Features: sentence position, sentence length, pos
window of 4, location of previous predicted boundary, mutual information, constituent information, dependency structure
• Results: 96% correct
Contours: Accent + Phrasing
• What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)?– Speech acts (statements, questions, requests)
S: That’ll be credit card? (L* H- H%)
– Propositional attitude (uncertainty, incredulity)
S: You’d like an evening flight. (L*+H L- H%)
– Speaker affect (anger, happiness, love)
U: I said four SEVEN one! (L+H* L- L%)
– “Personality”
S: Welcome to the Sunshine Travel System.
Pitch Range and Timing
• Level of speaker engagement
– S: Welcome to InfoTravel. How may I help you?
• Contour interpretation
– S: You can take the L*+H bus from Malpensa to Rome L-H%.
– U: Take the bus. vs. Take the bus!
• Discourse/topic structure
– Topic beginnings have higher pitch range, faster, preceded by longer pauses
– Endings the opposite
Prosody and Speaker Emotion
• What makes an utterance sound angry? Sad? – How much comes from the lexical information?
– How much from the acoustic/prosodic?
– Does all anger, e.g., sound the same?
• Cahn ‘88 (examples)
Applications
• Text-to-Speech and Concept-to-Speech generation: improve naturalness
• Speech Recognition: identify suprasegmental meaning
• Spoken Dialogue Systems: understand when people are confused, angry
• Audio Browsing: format corpora for browsing and search
Challenges
• We don’t really know what most contours ‘mean’• Our accent prediction needs more sensitivity to
better model of given/new, focus, grammatical function
• Our phrasing prediction needs better information about e.g. attachment
• We don’t know much about emotional speech or ‘personality’ -- critical to applications