a fusion framework for multimodal interactive applications
DESCRIPTION
This research aims to propose a multi-modal fusion framework for high-level data fusion between two or more modalities. It takes as input low level features extracted from dier-ent system devices, analyses and identies intrinsic meanings in these data. Extracted meanings are mutually compared to identify complementarities, ambiguities and inconsistencies to better understand the user intention when interacting with the system. The whole fusion life cycle will be describedand evaluated in an OCE environment scenario, where two co-workers interact by voice and movements, which might show their intentions. The fusion in this case is focusing on combining modalities for capturing a context to enhance the user experience.TRANSCRIPT
![Page 1: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/1.jpg)
A Fusion Framework for Multimodal Interactive Applications
Presented by: Hildeberto Mendonça
Jean-Yves Lionel LawsonOlga Vybornova
Benoit MacqJean Vanderdonckt
ICMI-MLMI 2009 – Cambridge MA, USA, November 2-6, 2009Special Session Fusion Engines for Multimodal Interfaces
November 3, 2009
![Page 2: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/2.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 2
Motivations
How to support multimodal fusion in order to maximize reuse and minimize complexity? If there is complexity on multimodal fusion it should
be about the fusion in itself What already exists should be reused with minimal
adaptation A general life cycle can guarantee a standard
treatment for each modality
![Page 3: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/3.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 3
Research Goal
To define and develop a multipurpose framework for high level data fusion on multimodal
interactive applications
![Page 4: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/4.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 4
Fusion Principles
Type: Parallel + Combined = Synergistic Each modality is endowed of meanings
Level: Feature (i.e. pattern extraction) + decision (i.e. Recognized task)
Input Devices: Multiple Notation: Defined by the developer Ambiguity resolution: Defined by the developer Time representation (Quantitative – Qualitative): Both Application Type : The domain is defined using ontologies
![Page 5: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/5.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 5
Process
Recognition: identification of patterns on input signals. Segmentation: delimitation of identified areas. Meanings Extraction: deeper analysis to identify
meanings and correlations between segments according to specific domains.
Annotation: formal description of segments through domain concepts.
![Page 6: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/6.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 6
Process
The flow is fixed but it can start at any point respecting the sequence.
Not fixed to any particular method. The method is “plugged”.
Focus on good level of analysis, not on real time processing.
![Page 7: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/7.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 7
OpenInterface
![Page 8: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/8.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 8
OpenInterface
![Page 9: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/9.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 9
OpenInterface
![Page 10: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/10.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 10
Fusion Mechanism
Define a process for each modality and put them in parallel.
Data from each stage is buffered and processed together for the purpose of fusion.
Agent-oriented: problem solved in a distributed fashion.
![Page 11: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/11.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 11
Fusion Mechanism
![Page 12: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/12.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 12
Fusion Mechanism
![Page 13: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/13.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 13
Fusion Mechanism – OpenInterface
OI Modeling Tool
![Page 14: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/14.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 14
Fusion Mechanism - Instance
![Page 15: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/15.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 15
Scenario
Maybe I can find a book about it in the
library
Ronald is moving towards the book
shelves
![Page 16: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/16.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 16
Results
managed spatial relationships based on the fixed objects in the room
made semantic fusion of events not coinciding in time achieved good results in speaker identification -
synchronization between image and speech identification created an open framework to manage fusion between two
(in our case) or more modalities (in enhanced future work) designed the system so that each component can run in a
separate machine due to the distribution mechanism interchanging data through a TCP/IP network
![Page 17: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/17.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 17
Next Steps
Implementing the segmentation and annotation of 3D content
Migrate the framework to a real-time implementation
Evaluate other methods under the rules of the framework
Continuously extend the framework to support other fusion concepts and methods of implementation
![Page 18: A Fusion Framework for Multimodal Interactive Applications](https://reader033.vdocuments.net/reader033/viewer/2022061202/547b28fa5806b512408b45a4/html5/thumbnails/18.jpg)
04/09/23 ICMI-MLMI 2009 – Cambridge MA, USA 18
Thank you for your attention!