realizing the interactive speech interface in a multi-user virtual environment

69
Realizing the Interactive Speech Interface in a Multi-user Virtual Environment Advisor Tsai-Yen Li Author Chun-Feng Liao NCCU Department of Computer Science Intelligent Media Lab July 2004

Upload: zena-contreras

Post on 03-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Realizing the Interactive Speech Interface in a Multi-user Virtual Environment. Advisor Tsai-Yen Li Author Chun-Feng Liao NCCU Department of Computer Science Intelligent Media Lab July 2004. More Abstract / High Level. More Concrete / Low Level. Agenda. Introduction Related Work - PowerPoint PPT Presentation

TRANSCRIPT

Realizing the Interactive Speech Interface in

a Multi-user Virtual Environment

AdvisorTsai-Yen Li

AuthorChun-Feng Liao

NCCU Department of Computer ScienceIntelligent Media Lab

July 2004

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

More Abstract / High Level

More Concrete / Low Level

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

Introduction Applications of 3D virtual environments

and voice user interface have received significant attentions recently.

Incorporating VUI into virtual environments can enhance user interaction and immersiveness .

Most related research do not provide an effective mechanism for multi-user dialog management.

Contributions of this Research

1. Suggest a MUVE dialog model based on VoiceXML dialog model.

2. Propose a way to integrate speech interface into MUVE.

3. XAML-V : Extend XAML to provide a speech-enabled interactive animation scripting language.

4. Dealing with implementation problems of XAML-V using software patterns as recipes.

MUVE = Multi-user Virtual Environment

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

VUI / VE Integration Problems

[McGlashan 95] identified 3 types of Virtual Environment – VUI integration problems.• Speech Recognition• Language Understanding• Interaction Metaphor

Scott McGlashan is the editor-in-chief of W3C VoiceXML 2.0.

Integration Considerations Client Interface

• Ad hoc [Cernak02] • VRML – EAI - JSAPI[Wauchope03] [O.Apaydin02]

[Descamps01]

Dialog Management• Database : [Wauchope03] • IDE : [Cernak02] • Scripting Language :

- Based on VoiceXML: DialogXML [Nyberg02] and Galatea [Sagayama03]

- Customize: MPML-VR [Descamps01]

VUI Integration

IMNet – A Client-Server MUVE System

IMNet Server

IMClientA

IMClientB

IMClientC

broadcast broadcastsend

Animation Script Language

Using high-level scripts to control animation characters is not a new idea.

AML focuses on synchronization of facial expression and voice.• lacks the function to extract or modify an

existing animation. STEP can compose new animations

from existing animation components.• falls short on specifying detail animation

attributes.

STEP = Scripting Technology for Embodied Persona

AML = Avatar Markup Language

XAML (eXtensible Animation Markup

Language) Describe character animations at

various command levels . Developers can compose a new

animation from existing animation clips.

The syntax is extensible by providing plug-in modules.

VoiceXML

VoiceXML 1.0 was proposed by W3C in 2000.

Used in telephony interactive applications.

Based on HTTP, using a form-based dialog model.

Server

Client

VoiceXML : An Example

<vxml version="2.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing?</prom

pt> </field> <block> <submit next="http://www.drink.example.com/drink2.asp"/> </block> </form></vxml>

VoiceXML Dialog Model Architectural View

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and Design Conclusion

Definitions & Notations Dialog : Exactly two avatars concentrate

on interacting with each other. Subjects : Avatars in dialog. Observers : Avatars not in a dialog.

U : Avatars controlled by human. S : Avatars controlled by system. Suffix s : Subject avatars. Suffix i (i=1,2,3,…) :

Observer avatars.

Ss Us

Ui

Subjects

Observer

VoiceXML Dialog Model

VoiceXML was designed originally for dialogs in telephony systems.

In most cases there are 2 interactive instances in telephony applications.

Problems with VoiceXML Dialog Model

in MUVE (1)Ss

Us

Document Server

conceptuallyactually

How is the dialog status with Us ???

IMNet Server

Problems with VoiceXML Dialog Model

in MUVE (2)

Ss

Us2

Document Server

conceptuallyactually

Who is talking with me ???

Us1

actually

I’m talking with Ss.

VRML Browser

What should I draw ?

Problems with VoiceXML Dialog Model

in MUVE(3) Conceptually, Us is having Dialog with Ss.

Actually,Us interacting with Document Server which carries Ss’s Dialog script.

VoiceXML is lack of some dialog locking mechanism.

VoiceXML Dialog Model looks unreasonable in MUVE.

Proposed Dialog Model in MUVE

We enhance the originally VoiceXML dialog model to fix this problem.• Proxy Request• Dialog Lock• Dialog State• Dialog Negotiation

Proxy Request

Ss

Us

Document Server

conceptually actually

Ss Proxy the HTTP request for Us

Benefits of Proxy Request Model

By applying this model in MUVE we have following benefits:• Us didn’t aware of Document Server

provides the flexibility to switch different roles.

• Ss did aware of dialog status with Us.

Dialogs without Dialog Lock

A

B

C

It’s impossible for A to accept speech input from multiple avatars at the same time.

A will confuse if B and C talk to him at the same time.

Speech Output from ASpeech Input to A

Dialog Lock

We suggest only 2 people can be in a dialog at the same time.

Dialog Lock mechanism is used to realize this constraint.

Dialog with Dialog Lock

A is currently in dialog with C

A

B

CDialog Lock

Dialog Scripts

Broadcasting Scripts

Broadcasting Scripts

Speech Output from A

Speech Input to A

Dialog States

Initialize a Dialog

Enter negotiation-stateSend dialog request message

Enter negotiation-state

Send dialog accept message

Enter in-dialog-state

Send dialog ack message

Enter in-dialog-state

Fetch first xaml-v script

Send first xaml-v script

Stop a Dialog

Enter not-in-dialog-state

Enter not-in-dialog-state

Send end dialog message

Summary : Proposed Dialog Model in MUVE

Ss Us

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

The XAML Scripting Language

<AnimItem DEF =”WaveWalk” cycle=”2000”>

<AnimImport src=”Walk”>

<AnimItem DEF=”SimpleWave” cycle=”1000”>

<Node target=”r_shoulder”>

<OrientationInterpolator key =”…” keyValue=”…” />

</Node>

<AnimItem>

</AnimItem>

XAML & XAML-V

XAML

XAML-V

XAML-V Features Extension of XAML Scripting Language. Subset of VoiceXML . Supports form-level and field-level

animations. Realizing the concepts discussed in

previous section.• Dialog negotiation• Proxy request• Broadcasting

Nested Plug-in Syntax

Dialog Negotiation

How XAML-V Realize the Proxy Request

Model

Ss

Us

Document Server

2. Issue HTTP Command:http://xxx/helloFormResponse.jsp?helpType=no %20thanks

1. Send proxy request message

4. Return requested dialog script

3. HTTP Response

End Dialog

Summary : Benefits of XAML-V

Extended form XAML animation script, XAML-V inherits its strong animation functions.

Can be dynamically generated by various Server-Side Script technologies.(i.e. JSP or ASP)

Dialog model works in MUVE.

Agenda

Introduction Related Work Dialog Management in MUVE

(Multi-user Virtual Environment) The Design of XAML-V Dialog

Scripting Language System Implementation and

Design Conclusion

System Design and Implementation

System Architecture XAML-V Component Example Scenario Video DEMO

XAML-V Architecture in MUVE

XAML-V Implementation

XAML Platform delegates XAML-V scripting elements to VoicePluginObject.

Embedded animations are sent back to the Animation Manager.

Implementing XAML-V Components with Software

Patterns

XAML-V Components Deployment Diagram

Message Monitor

Intercept all the messages passed by server.

Client

Us talk to Ss

Us talk to Us

Example XAML-V Script

index.jsp

helpFormResponse.jsp

Result: Subject’s View

Result: Observer’s View

Video DEMO

Conclusion We believe that integrating speech interface

will make users to communicate in more natural way on MUVE.

In this thesis, we • Enhance the MUVE by integrating with speech

interface.• Suggest a new dialog model based on VoiceXML

dialog model to work properly in MUVE.• Design a XAML-V dialog script to realize

suggested dialog model.• Implement XAML-V platform using software

patterns.

Future Work

Face animation. Consider range between avatars. Consider 3D sound (sound

direction and volume). Add camera control into XAML-V. More sophisticated

synchronization between animation and speech interface.

Performance optimization.

Q & A

Backup

Research Objective Provide a solution for VUI integration Dialog management mechanism in a multi-

user virtual environment (MUVE). Realizing such a mechanism.

Solving Synchronize Problems when Establishing

Dialog Dialog States Synchronization Mechanism Time-out

Each Client may only negotiate with another client

at a time.

Using Time-out to Prevent Infinite Pending

Protocol Framework

Dialog Lock

XAML-V Interpreter

Input Device

請多多捧場 時間 : 2004 . 7 . 14 ( 二 ) AM 10:00 地點 : 電算中心二樓會議室 考生 : 黃培智、廖峻鋒 旁聽口試的好處 :

• 觀摩他人口試的佈置、流程做為自己日後口試的參考。

• 考生與老師答辯過程可從中獲益良多並體會現場緊張的氣氛。

• 有精緻的點心和飲料可以吃。