siridus an information state approach to flexible spoken dialogue systems david milward staffan...
Post on 22-Dec-2015
218 views
TRANSCRIPT
Siridus
an information state approach to flexible spoken dialogue systems
David Milward Staffan Larsson [email protected] [email protected]
Overview
• Siridus project• Information State Approach• Main areas of work• Some selected topics
– Trindikit– Flexible issue based dialogue management– Robust interpretation
• Conclusions
Siridus Project
• EU Framework 5• Partners
– Universities of • Gothenburg (technical coordinator)• Seville• Saarland (administrative coordinator)
– Telefonica– Linguamatics– (SRI)
• Duration– Jan 2000 - Dec 2002
The IS Approach
• considering dialogue in terms of transitions between states – structured “information states"
– can be used to model mental states of the dialogue participants
• advantages – theoretical objects/data structures for dialogue analysis
– perspective for comparison of systems and theories
IS transitions
IS for System
IS for System
– what are the best structures to represent the state of the dialogue?–stacks? sets of feature structures?–private beliefs + shared commitments?
– is information in the IS relevant to:–recognition/interpretation–generation/synthesis
– what is explicitly coded, what is emergent?–moves/games emerge as an aggregate of updates
Interpret U Generate U’ IS for System
IS inspired systems
• IS seems well suited for building systems between e.g. VoiceXML and full BDI
• Examples: Delfos, Godis
• Information structure including– lists of feature structures (Delfos) – stack of QUD (Godis)
Main areas of work in Siridus
• Dialogue types• Generating prosody using IS • Enhancing speech recognition through IS • Demonstrators• Toolkit for researchers (Trindikit 2/3)• Flexible dialogue • Robust interpretation and reconfigurability of
dialogue systems
Dialogue Types
• Natural command languages– Quesada, Amores: Cooperation and collaboration in
Natural Command Language Dialogues
• Conditional responses– Kruijff-Korbayova, Karagjosova, Larsson: Enhancing
collaboration with conditional responses in information seeking dialogues
• Clarifications– Cooper & Ginzburg: Using Dependent Record Types in
Clarification Ellipsis
• Negotiative dialogue
Limited Enquiry Negotiation Dialogue (LEND)
• information seeking dialogues often assume a fixed set of parameters users must fill out
• in human-human corpora, participants often negotiate which parameters to use
• they are batting proposals to and fro• users need not supply any particular subset of the
parameters of an action • similarly, the system needn’t be indifferent about the best
set of parameters to use – – it may greatly prefer to get a departure-time if it can
• decide on the next question according to maximal utility
Main areas of work in Siridus
• Dialogue types• Generating prosody using IS • Enhancing speech recognition through IS • Demonstrators• Toolkit for researchers (Trindikit 2/3)• Flexible dialogue • Robust interpretation and reconfigurability of
dialogue systems
Generating Prosody using ISDo we need it?
• Sample dialogue– S: How can I help you?– U: I'd like a flight from London to Paris
– S: When do you want to travel?– S: WHEN do you want to travel?
• Default intonation in Text-To-Speech systems: – When do you WANT to travel? (Festival, ViaVoice) – When DO you want to travel? (Articulator)– When do you want TO travel? (AT&T)
Can we fix the intonation for particular sentences/phrases?
No! Intonation is context dependent
U: A train from London to Edinburgh, the first
S: Sorry, do you want to travel on the first of SEPTEMBER or do you want to travel first CLASS?
U: First of September how much is it?S: SECOND class costs FIFTY pounds. FIRST
class costs ONE HUNDRED pounds
Information State and Prosody
• Motivate prosody using abstract notion of information structure partitioning – c.f. Steedman, Vallduvi, Prague school etc.
• Integration of these concepts into an information state update approach– Theme/rheme -
• questions under discussion (QUD) • QUD can be implicit so more general than QA pairs
– Focus/background• comparison of current utterance vs
– shared commitments from dialogue history– alternatives available in the domain
• aim to avoid over generation of focus
Main areas of work in Siridus
• Dialogue types• Generating prosody using IS • Enhancing speech recognition through IS • Demonstrators• Toolkit for researchers (Trindikit 2/3)• Flexible dialogue • Robust interpretation and reconfigurability of
dialogue systems
Enhancing speech recognition and synthesis using IS
• For finite state or form based dialogue can associate particular grammars with particular prompts
• But what if you generate your prompts automatically?– Set of LMs to cover classes of generated prompts e.g.
• Prompts expecting a np reply
– Mix of grammar based and statistical LMs• Statistical LM used for back off • or embed grammars in a statistical LM
– Single statistical LM for the whole domain + rescoring of n-best/lattice according to syntax/semantics/context
Lattice/n-best approach
• Potential sources of information:– previous move/dialogue history
Where do you want to go.... Boston ...
– syntactic/semantic coherence of fragments switch on the door
– coherence of fragments relative to each other leave ..... at 6pm
– reference resolution turn off all heaters vs. turn off hall heater
– state of the worldturn on|off the light in the hall
• Combine/contrast sources of evidence to decide– most likely utterance – when to clarify
Main areas of work in Siridus
• Dialogue types• Generating prosody using IS • Enhancing speech recognition through IS • Demonstrators• Toolkit for researchers (Trindikit 2/3)• Flexible dialogue • Robust interpretation and reconfigurability of
dialogue systems
TrindiKit
GoDiS
GoDiS-I GoDiS-A
TravelAgency
VCRmanager
IBDM/KOS ISU
approachDelfos
basic system
Delfosframework
DELFOS-NCL
Homemanager
ISUapproach
ATOS Home
manager
Demonstrators: Automated telephone operator
• Natural language telephone-based access to company telephone directory/PABX– Dial by name – Email address– Multi-party conference– Call transfers
• KQML based message passing• Delfos dialogue management• User trials at Telefonica I+D
Demonstrators: Home device control
• Command and control of home devices (following on from the D’Homme Project)– Is the lounge TV on – Switch the TV to Channel 4– Turn all the lights off in the bedrooms– I’d like to record a programme– Night mode
• Reconfigurability issues “plug and play” of devices: what happens when move devices around, add new devices?
• Advanced reference resolution– Quesada and Amores: Knowledge based reference resolution for
Dialogue Management in a Home Domain Environment
Main areas of work in Siridus
• Dialogue types• Generating prosody using IS • Enhancing speech recognition through IS • Demonstrators• Toolkit for researchers (Trindikit 2/3)• Flexible dialogue • Robust interpretation and reconfigurability of
dialogue systems
Main areas of work in Siridus
• Dialogue types• Generating prosody using IS • Enhancing speech recognition through IS • Demonstrators• Toolkit for researchers (Trindikit 2/3)• Flexible dialogue • Robust interpretation and reconfigurability of
dialogue systems
Interpretation in flexible dialogue systems
• System initiative useful in constraining what users are likely to say
• User initiated utterances are more variable and less easy to predict
• Compiling a precise grammar into the recogniser is less likely to give good results
• Want to be able to extract partial results and avoid total failure
Task
• map from IS and recognition lattice/n-best list to a (partial) interpretation
• using IS information after interpretation may be too late
S: what kind of route would you prefer?U: the quick route pleased
• need to use information in IS to help – choose between fragments, – find sensible combinations of fragments
• but can we deal with fragments at all?
Self contained fragments
I want to go from London ... to Birmingham
origin = London destination = Birmingham
• No need to put fragments together• Can extract pieces of semantics independently• Combination of meaning is via implicit
conjunction i.eorigin = London & destination = Birmingham
Dependent fragments
I want to leave Birmingham .... at 3 pm
departure-time = 3pm
• Can’t decide between arrival/departure time without considering other fragments
• Can use some keyword spotters - phrase grammars not enough
Context vs. Syntax
S: When do you want to leave?
U: ... 4pm ...U: I’m not sure when I’d like to leave, but I’d like to arrive by 4pm
• Need to be able to override use of keywords + context using syntactic information
Integrating syntactic constraints
• Phrase spotters go to a particular depth of analysis even if a full parse is possible
• When syntactic information is available should use it• When not should do as well as keyword/phrase spotting• Approach:
– distribute the semantic representation– get inside fragments to access particular components e.g. “leave”– structural constraints available where necessary
• Allows– rules similar to keyword/phrase spotting– more specialised rules involving syntactic constraints
Indexed representations (1)• Can split up any recursive representation into a set
of indexed constraints e.g.
John wants to leave Boston
wants(john, leave(john, boston)
e1: wants
e2: john
e3: leave
e4: john
e5: boston
e0: e1(e2,e6)
e6: e3(e4,e5)
Indexed Representations (2)
• Individual constraints provide content and structure • Uniform representation for utterance fragments or fully
connected sentence– fragments miss certain constraints, or identity between indices
• Provides a description of the original semantic representation
• semantic chart or lattice is an indexed representation where readings are packed – same index -> alternative reading– c.f. edges with ‘index’: 0-1-np
Achieving compositionality
• What happens when the output semantics needs to be more complex?,– I want to go to Glasgow on Tuesday and
Edinburgh on Wednesday– Turn on all the lights except in the kitchen
• Not enough to analyse each fragment independently
• Need to put semantics together
Compositional concept based semantics
What can provide clues for putting things
together if syntax is inadequate?
• regular patterns of hesitations and repairs
• function argument structure
• general ontological knowledge
• specific real world knowledge
Example
• Consider– the light in the back bedroom– the back bedroom light– light .... back ... bedroom
• if there is only one light in each bedroom, no need for clarification
• general ontological knowledge might suggest a device in a room
• specific knowledge about the house allows choice of “back bedroom” vs “back light”
Approach
• Connect concepts according to a model of the domain• Ontological relationships:
– devices can be in locations– lights are devices– rooms are locations
• Specific information:– b1 is a bedroom – b1 orientation is back– l1 is a light– l1 in b1
Conclusions
• Semantic chart provides a convenient knowledge representation for manipulating fragment semantics
• Ontological/domain knowledge used to combine fragments – allows recursive combination– not just filling in a fixed number of slots in a template
• Allows mapping from IS + word lattice to interpretation without requiring a full parse
• Reconciling traditional syntax and compositional semantics with robust approaches
IS Approach: Conclusions
• a framework in which we can do interesting theoretical work– improving prosody/recognition using IS– exploration of different dialogue genres – better comparison of theories
• a research tool for dialogue developers (TrindiKit)
• systems which are– modular – reconfigurable– between FS/form filling and BDI/planning