referring to objects with spoken and haptic modalities frédéric landragin nadia bellalem &...
TRANSCRIPT
Referring to Objectswith Spoken and Haptic Modalities
Frédéric LANDRAGIN
Nadia BELLALEM
& Laurent ROMARY
LORIA Laboratory
Nancy, FRANCE
Overview
• Context: Conception phase of the EU IST/MIAMM project (Multidimensional Information Access using Multiple Modalities - with DFKI, TNO, Sony, Canon)– Study of the design factors for a future haptic PDA like
device
• Background: Interpretation of natural language and spontaneous gestures.– A model of contextual interpretation of multimodal referring
expressions in visual contexts.
• Objective: To show the possible extension of the model to an interaction mode including tactile and kinesthetic feedback (henceforth “Haptic”)
MIAMM - wheel mode
MIAMM - axis mode
QuickTime™ et un décompresseurPhoto - JPEG sont requis pour visualiser
cette image.
MIAMM - galaxy mode
• The use of perceptual grouping
Reference domains and visual contexts
« these three objects »
{ , , }
• The use of perceptual grouping
Reference domains and visual contexts
« these three objects »
{ , , }
« the triangle »
{ }
« the two circles » { , }
• The use of salience
Multimodal fusion architecture
reference domain(s)
history
history
task
module
visual
moduledialogue
manager
request
domains
request
domains
request
domains
referentialexpression
referent
history
language
moduleunder-specification
usermodel
referentialgesture
Haptics and deixis
• Haptic gestures can take three classical functions of gesture in man-machine interaction:– semiotic function: ‘select this object’– ergotic function: ‘reduce the size of this object’– epistemic function: ‘save the compliance of this object’
• How can the system identify the function(s)?– linguistic clues (referential expression, predicate)– task indications (possibilities linked to a type of objects,
“affordances”)
• Deictic role: to make the object salient, whatever the function, in order to focus the addressee’s attention on it.
Haptics and perceptual grouping
• Interest: formalism for the focalization on a subset of objects
• Grouping factors:– objects which have similar haptic (in particular, tactile)
properties (shape, consistency, texture)
– objects that have been browsed by the user (the elements of such a group are ordered)
– objects that are stuck together, parts of a same object...
Haptics and perceptual domains
• Can visual and tactile perceptions work together?– simultaneous visual and tactile perception implies the same
world of objects (and synchronized feedbacks)
– a referring expression can be interpreted in visual context or in tactile context
• How can the system identify the nature of perception?– for immediate references, the visual context gives the
reference domain and haptics gives the starting point in it
– for ieterating references, each type of context can provide the reference domain (so both hypotheses must be tested)
Haptics and dialogue history
• Interpretations that require an ordering within the reference domain: ‘the first one’, ‘the next one’, ‘the last one’– in visual perception, guiding lines can be helpful (if none, an
order can always be built with the reading direction)– in haptic perception, the only criterion can be the
manipulation order
• Some referring expressions that do not need an order may be interpreted in the haptic manipulation history– ‘the big one’ (in the domain of browsed objects)– ‘them’ (the most pressured objects)
Architecture for speech-haptic referring
reference domain(s)
history
history
task
module
visuo-tactile
module
dialogue
manager
request
domains
request
domains
request
domainsreferent
history
language
moduleunder-specification
usermodel
hapticgesture
referentialexpression
Summary
• What does not change from deictic to haptic:– the status of speech and gesture in the architecture
– the repartition of information among speech and gesture
– the need of reference domains
– the use of salience and the use of orderings in domains
– the algorithms for the exploitation of all these notions
• What does change:– some unchanged notions can have more than one cause
– objects must be browsed to be grouped in a haptic domain
– one aspect of the architecture: the visual perception module becomes the visuo-tactile perception module
Application - MMIL design
• MMIL - Multimodal Interface Language– Unique communication language within the MIAMM
architecture
– XML based representation embedded in a web service protocol (SOAP)
– Oriented towards multimodal content representation (cf. Joint ACL/SIGSEM and TC37/SC4 initiative)
• Example: VisHapTac notification to Dialogue Manager
Overall structuree0
set1
s1
s2
set2
s25
…
description
s2-1
s2-2
s2-3
inFocus
inSelection
Visual haptic state
Participant setting
Sub-divisions
MMIL format for VisHapTac - 1<mmilcomponent>
<event id=“e0”>
<evtType>HGState</evtType>
<visMode>galaxy</visMode>
<tempSpan
startPoint=“2000-01-20T14:12:06”
endPoint=“2002-01-20T14:12:13”/>
</event>
<participant id=“set1”>
…
</participant>
<relation type=“description” source=“set1” target=“e0”/>
</mmilcomponent>
MMIL format for VisHapTac - 2<participant id=“set1”>
…<participant id=”s1”>
<Name>Let it be</Name></participant><participant id=“set2”>
<individuation>set</individuation><attentionstatus>inFocus</attentionstatus><participant id=“s2-1”>
<Name>Lady Madonna</Name></participant>…<participant id=“s2-3”>
<attentionStatus>inSelection</attentionStatus>
<Name>Revolution 9</Name></participant>
</participant>…
</participant>
Future work
• Within the dialogue manager module, domains may be confronted, using a relevance criterion
The way the linguistic contraints of the referring expression apply in the different domains may be such a criterion.
• Validation in the MIAMM architecture
The transition from deictic to haptic may not be an additional cost for the development of a dialogue system, both from the architecture point of view and the dialogue management point of view.