ccm oi ist project comic vision and approach results first 1.5 years

Click here to load reader

Post on 17-Dec-2015




0 download

Embed Size (px)


  • Slide 1
  • CCM oi IST project COMIC Vision and Approach Results first 1.5 years
  • Slide 2
  • CCM oi Vision of COMIC Multimodal interaction will only be accepted by non-expert users if fundamental cognitive interaction capabilities of human beings are properly taken into account
  • Slide 3
  • CCM oi Approach of COMIC Obtain fundamental knowledge on multimodal interaction use of speech, pen, and facial expressions
  • Slide 4
  • CCM oi Approach (2) Develop new approaches for component technologies that are guided by human factor experiments
  • Slide 5
  • CCM oi Approach (3) Obtain hands-on experience by building an integrated multimodal demonstrator for bathroom design that combines new approaches for: Automatic speech recognition Automatic pen gesture recognition Fusion Dialogue and Action management Fission Output generation combining text and speech and facial expression System integration Cognitive knowledge
  • Slide 6
  • CCM oi The partners of COMIC Max Planck Institute for Psycholinguistics Fundamental Cognitive Research Max Planck Institute for Biological Cybernetics Fundamental Cognitive Research University of Nijmegen ASR and AGR University of Sheffield Dialogue and Action University of Edinburgh Fission and Output DFKI Fusion and System Integration ViSoft Graphical part of Demonstrator
  • Slide 7
  • CCM oi This presentation Explanation of the demonstrator Results of fundamental cognitive research Multimodal Interaction Facial Expressions Results of Human Factor Experiments
  • Slide 8
  • CCM oi The COMIC demonstrator Bathroom design for non-expert users Final goal is to implement 4 phases: 1: Input shape and dimensions of own bathroom (pen and speech input) 2: Choose position of sanitary ware (based on templates) 3: Conversational dialogue about types of sanitary ware and tiles 4: 3D view of negotiated bathroom Result is taken to expert salesman who will proceed from there.
  • Slide 9
  • CCM oi The COMIC demonstrator Three versions T12: Proof of technical integration of all modules T24: Limited functionality fixed bathroom/ only tiles T36: Full functionality own bathroom, sanitary ware, tiles
  • Slide 10
  • CCM oi T12 Demonstrator
  • Slide 11
  • CCM oi Fundamental Research on Human- Human Multimodal Interaction
  • Slide 12
  • CCM oi The SLOT Research Platform Recording dyadic, natural interactions Route negotiation task with road maps Use of electronic pen/ink for drawing routes Elaborate and theory-free coding of data Systematically manipulating available modalities (drawing, visual contact)
  • Slide 13
  • CCM oi
  • Slide 14
  • CCM oi Results Quantitative analysis of turn-taking behavior 4x4 dyads; 6 hours annotated interaction Normally, there is no delay between peoples turns With one-way mirror, the blind person is slower to take up her turn This leads to longer silent periods (pauses) which leads to significantly slower communication
  • Slide 15
  • CCM oi Possible relevance for HCI In conversational HCI with talking head: User sees computers face User might assume that the computer sees his face Speech recognition has a hard time reliably detecting end-of-speech acoustically Therefore we hypothesize that User will notice (even more) that the computer responds very slowly nn:
  • Slide 16
  • CCM oi Fundamental Research on Facial Expressions Faces do a lot in a conversation Lip motion for speaking Emotional Expression (pleasure, surprise, fear) Dialog flow (back-channeling: confusion, comprehension, agreement) Co-expression (emphasis and word/topic stress) Most work on Avatars focuses exclusively on lip motion for speech. We aim to broaden the capabilities of Avatars, allowing for more sophisticated self expression and more subtle dialog control. To this end, we use psychophysical knowledge and procedures as a basis for synthesizing human conversational expressions.
  • Slide 17
  • CCM oi First step: Real expressions We recorded variety of conversational expressions from several individuals. We then experimentally determined how identifiable and believable those expressions were. In general, we found that: The expressions were easily recognized -- even in the complete absence of conversation context! (and thus can be useful for back-channeling). The pattern of confusions between expressions indicate potential trouble areas (e.g., thinking was often mistaken for confusion!) These (enacted) expressions were not always recognized or found to be completely sincere (speech might help here).
  • Slide 18
  • CCM oi Next step: What moves when? We are now performing a fine-grained analysis of the necessary and sufficient components of conversational facial motion. What must move when to produce an identifiable and believable expression?
  • Slide 19
  • CCM oi Relevance for HCI and eCommerce Psychophysical studies of real expressions offer strong insights into how one can produce identifiable, realistic, and believable conversational expressions. The expansion of Avatars expressive capabilities promises to improve the ease of use of HCI systems.
  • Slide 20
  • CCM oi Human Factors Experiments guiding the technology University of Nijmegen investigated Input issues (ASR, AGR, Fusion) University of Edinburgh investigated output issues (Text, Graphics, and Face, Fission)
  • Slide 21
  • CCM oi Human Factors Experiments Exploratory pilot studies Can users combine pen and speech for entering data about the layout of a room? Do they like it, what do they prefer? System-driven vs mixed-initiative dialogs Pen+speech data acquisition and analysis
  • Slide 22
  • CCM oi HF Experiments input Task: study a blueprint and specify this using speech and/or pen Subjects had to specify position + lengths of walls, doors, windows, sanitary ware Experiment is directly related to phase 1 of demonstrator
  • Slide 23
  • CCM oi HF Experiments main results Subjects prefer gestures and speech, or gestures only; speech only is not preferred Subjects show a large variation in behaviour even when restricted to narrowly defined tasks Subjects prefer mixed-initiative dialogue System-driven results in fewer errors, but requires more time
  • Slide 24
  • CCM oi HF Experiments speech Subjects use three types of speech comments Within task here is a wall with width 3 meter 40 Out-of-task, within dialogue now I am going to draw the next wall Out-of-dialogue I hope I'm drawing this in the right way..
  • Slide 25
  • CCM oi HF Experiments pen Large variation in Graphical symbols Deictic gestures Handwriting
  • Slide 26
  • CCM oi Human Factors Experiments Output Fission module: translates abstract dialogue acts into specifications for output channels Goal: model the choices made in the COMIC fission module after naturally- occurring interactions. Question: What are important natural actions in multimodal dialogue?
  • Slide 27
  • CCM oi Human Factors Experiments Output Wizard of Oz recordings Set up of the recordings: Subjects (native English speakers, not bathroom design experts) played the role of a bathroom sales consultant presenting a range of options to the client. Total recordings: 7 interactions; approximately 2.5 hours of video.
  • Slide 28
  • CCM oi Human Factors Experiments Output Making use of the recordings: Annotation Focus on scenes where the consultant says things similar to the planned system output In particular, descriptions and comparisons of options Mark up surface features like those under control of the fission module, and factors predicted to have an effect on those features
  • Slide 29
  • CCM oi Making use of the recordings: Using the results Examine: Range of surface features, deictic gestures, prosody, facial expressions and gaze; both occurrence and timing Correlation between features and factors such as description vs comparison, first vs repeated mention, positive vs negative context Use these results in the development of the Fission module
  • Slide 30
  • CCM oi Sample comparison So they give you a degree of colour, theyre slight theyre obviously slightly busier than looking at something like this, but, umm, theyre not quite as intense as having a whole block of colour, such as those two.
  • Slide 31
  • CCM oi Towards T24 Presentation ViSoft