speech technologies and voicexml

56
Speech Technologies an d VoiceXML Chun-Feng Liao NCCU Department of Computer Science Intelligent Media Lab [email protected]

Upload: maxine-quinn

Post on 03-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Speech Technologies and VoiceXML. Chun-Feng Liao NCCU Department of Computer Science Intelligent Media Lab [email protected]. Presentation Agenda. Voice technologies Backgrounds ASR/TTS Voice browsing with VoiceXML VoiceXML architecture VoiceXML Programming Future of VoiceXML - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speech Technologies and VoiceXML

Speech Technologies and VoiceXML

Chun-Feng LiaoNCCU Department of Computer Science

Intelligent Media [email protected]

Page 2: Speech Technologies and VoiceXML

Presentation Agenda

Voice technologies Backgrounds• ASR/TTS

Voice browsing with VoiceXML VoiceXML architecture VoiceXML Programming Future of VoiceXML Summary

Page 3: Speech Technologies and VoiceXML

Reference [1]Bob Edgar(2001),“The VoiceXML Handbook” ,NY:CM

P Books. [2]Dave Raggett(2001),”Getting started with VoiceXML

2.0”,W3C. [3]Sun Microsystems(1998),”Java Speech Grammar For

mat Specification v1.0”,Sun Microsystems. [4]Chetan Sharma and Jeff Kunins(2002),”VoiceXML:St

rategies and Techniques for Effective Voice Application Development with VoiceXML 2.0”,Wiley.

[5]Brian Eberman,Jerry Carter,Darren Meyer,David Goddeau(2002),”Building VoiceXML Browsers with OpenVXI”, NY:ACM Press.

Page 4: Speech Technologies and VoiceXML

Reference [6]Microsoft (2002),“Speech Technology Overview ” ,

http://www.microsoft.com/speech/evaluation/techover/

[7] VoiceGenie Technologies Inc.(2001),”White Paper:Speaking Freely About The VoiceGenie VoiceXML Gateway and the VoiceXML Interpreter”,VoiceGenie Technologies Inc.

[8]W3C(2002),”VoiceXML Specification v2.0”,W3C.

Page 5: Speech Technologies and VoiceXML

Voice Technologies

In the mid- to late 1990s, personal computers started to become powerful enough to support ASR

The two key underlying technologies behind these advances are speech recognition (SR) and text-to-speech synthesis (TTS).

Page 6: Speech Technologies and VoiceXML

Speech Recognition

Source:Microsoft Speech.NET Home(http://www.microsoft.com/speech/ )

Page 7: Speech Technologies and VoiceXML

Speech Synthesis

Source:Microsoft Speech.NET Home(http://www.microsoft.com/speech/ )

Page 8: Speech Technologies and VoiceXML

Pervasive Computing Model

E-business has changed from client-server model to web-centric model

Once connect to the Internet,one can get any information he want. But people wants more convenient way to connect to Internet.

Lou Gerstner,CEO of IBM:Pervasive Computing Model is billion people interacting with million e-business with trillion devices interconnected.

Page 9: Speech Technologies and VoiceXML
Page 10: Speech Technologies and VoiceXML

Voice Browsing

VoiceXML instead of HTML A voice browser instead of an ordina

ry web browser Phone instead of PC.

Page 11: Speech Technologies and VoiceXML

VoiceXML Key Design Issues

Speech Input: speech recognition and DTMF

Speech Output: pre-recorded audio and synthesized speech

Internet: XML, IP, HTTP, SSL, JavaScript

Telephony: call transfer, data passing

Page 12: Speech Technologies and VoiceXML

W3C Voice Browser Working Group

Founded May 1999 60 company members Mission — Standards group to prepa

re and review markup languages to enable internet-based speech applications

http://www.w3.org/Voice

Page 13: Speech Technologies and VoiceXML

VoiceXML Forum

Industry Group to promote VoiceXML

550+ member companies Submitted VoiceXML 1.0 to W3C in

May 2000 http://www.voicexml.org

Page 14: Speech Technologies and VoiceXML

• VoiceXML v1.0 (May 2000)• VoiceXML Forum • Specification submitted to the W3C

• VoiceXML v2.0 • W3C Voice Browser Working Group• 50+ members collaborating• Addressed 400+ change requests

Page 15: Speech Technologies and VoiceXML

VoiceXML Overview A language for specifying voice dialogs. Voice dialogs use audio prompts and text-to-spee

ch (TTS) for output; touch-tone keys (DTMF) and automatic speech recognition (ASR) for input.

Main input/output device (initially) is the phone. Leverages the Internet for application developm

ent and delivery. Standard language enables portability.(VoiceXM

L 統一了 Dialog 描述語言 )

Page 16: Speech Technologies and VoiceXML

VoiceXML Platform Architecture

Page 17: Speech Technologies and VoiceXML

VoiceXML Platform Architecture-1

Telephone and Telephone network-Connects caller’s telephone with Telephony Server

VoiceXML Gateway• Voice Browser• Audio input-Speech Recognition (ASR), Touch

tone (DTMF), Audio recording.• Audio output-Audio playback, Speech Synthes

is (TTS)• Interface, Call Controls

Page 18: Speech Technologies and VoiceXML

VoiceXML Platform Architecture-2

VoiceXML Documents• Dialog and flow control• Client-side scripting (ECMAScript)• Speech Recognition grammar• Speech Synthesis pronunciation control

Document servers(web server)• Feeding Static VoiceXML documents or audio file

s. Application servers

• Generate VoiceXML documents dynamically.• Server-side application logic• Connect to Database, or database interface

Page 19: Speech Technologies and VoiceXML

Example

VoiceXML-browser

<% user.storePreference(“try”) %><form> <block> 今天的氣溫是 <%= weather.getTemp() %> 度 </block></form>

Web server+ Servlet/JSP engine

weather.jsp - VoiceXML and JSP

<form> <block> 今天的氣溫是 25 度 </block></form>

DB

Page 20: Speech Technologies and VoiceXML

Voice Gateway

Page 21: Speech Technologies and VoiceXML

Implementations of VoiceXML Gateways

In Taiwan:• Yes Mobile• Chunghwa Telecom Laboratories ( 二代

語音平台 )• eWings Technologies, Inc

Free• IBM VoiceServerSDK

Open Source• CMU:OpenVXI

Page 22: Speech Technologies and VoiceXML

[DEMO]A Simple VoiceXML Applicati

on

Page 23: Speech Technologies and VoiceXML

DEMO A Simple VoiceXML application to i

ntroduce the department of Computer Science .

Exp. show that to build a corresponding HTML version first is helpful.

Page 24: Speech Technologies and VoiceXML

Document A VoiceXML

document defines one or more dialogs

The user is always in one dialog at any time

Each dialog specifies the next dialog to transition to using a URL

Dialog 1

doc1.vxml

Dialog 2

Transition: #dialog 2

Transition: http://xyz.com/doc2.vxml

Page 25: Speech Technologies and VoiceXML

Dialog

A Dialog describes an interaction between a user and the system

Two kinds of dialogs: form and menu

Page 26: Speech Technologies and VoiceXML

VoiceXML Document Structure.

Page 27: Speech Technologies and VoiceXML

Form

output

input

Form 會依照 Grammar 的定義,持續搜集 filed 中的資訊。

eval

<form> <field name="travellers“> <grammar mode=“voice” src=“./number.grxml”/> <prompt>How many are travelling?</prompt>

<filled> <submit next=”http://travel.com/order”/> </filled> </field></form>

Page 28: Speech Technologies and VoiceXML

Menu

<menu id=“commands”>

What service would you like?

<choice next=“/cars”> Car hire </choice>

<choice next=“/hotels”> Hotel reservations </choice>

<choice next=“/news”> Today’s news </choice>

</menu>

menu 其實就是沒有欄位的 form

menu 是一個流程控制的方式,依照 user 的選擇,分別傳送到不同 URL 。

Page 29: Speech Technologies and VoiceXML

Submit

Typically used to send results from client to server

Syntax:<submit next=”URI” namelist=”var1 var2 ...”/>

namelist: 指定要傳到下一頁的Fields 。

Page 30: Speech Technologies and VoiceXML

Submit, Example

<form> <field name=“dest-city"> <prompt> Where do you want to go to? </prompt> <grammar mode=“voice” src=“./cities.grxml”/> </field> <field name="travellers“> <prompt> How many are travelling to <value expr="city"/>?

</prompt> <grammar mode=“voice” src=“./number.grxml”/> </field> <filled> Thank you. Your order is now being processed. <submit next="http://travel.com/order" namelist=“dest-city

travellers"/> </filled></form>

Page 31: Speech Technologies and VoiceXML

Variables

Variables can be manipulated and referenced

•宣告 : <field name="user2">•設值 : <assign name="user1"

expr=”’peter’"/>•清除 : <clear namelist="user1

user2"/>•引用 : How many are travelling to

<value expr=“dest-city”/> ? - 引用時不用加 $

Page 32: Speech Technologies and VoiceXML

Variable Scope

session

application

document

dialog

Session variables are ”read-only”

variables provided by the interpreter

context

Session variables are ”read-only”

variables provided by the interpreter

context

Scope defined by element containing executable content (<block>, <filled> or

event handler)

Scope defined by element containing executable content (<block>, <filled> or

event handler)

Search for variable name

Page 33: Speech Technologies and VoiceXML

錯誤處理 :Events

Events are used to signal ”unexpected” situations

Events are caught by an catch event handler • <catch

event=”com.acme.mailreader”>...</catch>• <catch event=”nomatch

noinput”>...</catch>• Shortcut: <nomatch> is equivalent to <catch

event="nomatch"> • Other shortcuts: <noinput>, <error>

Page 34: Speech Technologies and VoiceXML

<field name=“dest-city">

<prompt> Where do you want to go to? </prompt> <grammar mode=“voice” src=“./cities.grxml”/> <nomatch> Please say the city you want to fly to. </nomatch>

</field>

Events, Example

Page 35: Speech Technologies and VoiceXML

Multimodal Web Browsing xHTML + VoiceXML SALT

Page 36: Speech Technologies and VoiceXML

[DEMO]Multimodal Browsing

Page 37: Speech Technologies and VoiceXML

Future of the “Voice” web and VoiceXML

VoiceXML1.0

VoiceXML2.0

VoiceXML forum (2000)

W3C (2003 -in CR)

Speech synthesis (SSML)

Speech reco. grammar

NLP

Speech semantics

Pronunciation lexicon [early]

Call control [early]

Voice Browser interoperation [early]

W3C

SALT

Microsoft-led (2002)

Speech ApplicationLanguage Tags

JSML

Sun/SpeechWorks (1999)

JSGF

VoiceXML 3?

Page 38: Speech Technologies and VoiceXML

Conclusion

Speech is the most natural way for human to communicate thus it will become an important way in HCI.

VoiceXML has revolutionized speech recognition & telephony application development & deployment.

Page 39: Speech Technologies and VoiceXML

Q & A

Page 40: Speech Technologies and VoiceXML

Backup

Page 41: Speech Technologies and VoiceXML

History of VoiceXMLSource:VoiceXML forum(http://www.voicexml.org)

Page 42: Speech Technologies and VoiceXML

Show : VoiceXML in Daily Life

應用程式

Page 43: Speech Technologies and VoiceXML

Classification of Voice Application

Basic interactive voice response (IVR)• Computer: “For stock quotes, press

1. For trading, press 2. …”• Human: (presses DTMF “1”)

Basic speech ASR• C: “Say the stock name for a price

quote.”• H: “Lucent Technologies”

Page 44: Speech Technologies and VoiceXML

Classification of Voice Application

Advanced speech ASR• C: “Stock Services, how may I help you?”• H: “Uh, what’s Lucent trading at?”

“Near-natural language” ASR• C: “How may I help you?”• H: “Um, yeah, I’d like to get the current price

of Lucent Technologies”• C: “Lucent is up two at sixty eight and a half.”• H: “OK. I want to buy one hundred shares at

market price.”• C: “…”

Page 45: Speech Technologies and VoiceXML

Speech Recognition Capturing speech (analog) signals Digitizing the sound waves,

converting them to basic language units or phonemes,

Constructing words from phonemes, and contextually analyzing the words to ensure correct spelling for words that sound alike (such as write and right).

Page 46: Speech Technologies and VoiceXML

Speech Synthesis

Speech Synthesis, or text-to-speech, is the process of converting text into spoken language. • Breaking down the words into

phonemes; • Analyzing for special handling of text

such as numbers, currency amounts.• Generating the digital audio for

playback.

Page 47: Speech Technologies and VoiceXML

VoiceXML Gateway(detail)

Page 48: Speech Technologies and VoiceXML

Programming VoiceXML

Writing a VoiceXML application is programming.

Control constructs are procedural (if-else etc.)

VoiceXML platform iterates through a <form> until values for all field items have been collected

Page 49: Speech Technologies and VoiceXML

VoiceXML System Components

VoiceXMLserver

Telecom boardsPBX

CT Integration

Speech synthesis (TTS)

Speech recognition (SR)

Speech grammars

Voice Biometrics

Software utilities

VoiceXML servers serve as integratorsof various hardware and software

Callcentre

Page 50: Speech Technologies and VoiceXML

FIA - Form Interpretation

Algorithm The FIA has a main loop that repeatedly selects a form item and then visits it

The first (in document order) form item, whose field item variable is undefined, is selected

As a result, the user is prompted for each field item in turn

Page 51: Speech Technologies and VoiceXML

FIA – Form Example

Field item 1

Field item 2

<form> <prompt>Where do you want to go to and how many are travelling ?

</prompt>

<field name=“dest-city"> <prompt>Where do you want to go to?</prompt> <grammar mode=“voice” src=“./cities.grxml”/> </field>

<field name="travellers”> <prompt>How many are travelling to your destination?</prompt> <grammar mode=“voice” src=“./number.grxml”/> </field> <!-- other fields --></form>

Page 52: Speech Technologies and VoiceXML

if, else and elseif

<form> ... <filled> <if cond="travellers > 10">

Sorry, we cannot handle groups larger than 10 persons <clear namelist="travellers"/> <elseif cond="travellers > 5 && dest-city == 'London'"/> Sorry, we cannot handle groups larger than 5 persons travelling to

London

<clear namelist=”city travellers"/> <else/> <submit next="http://travel.com/order"/> </if> </filled></form>

Page 53: Speech Technologies and VoiceXML

JSML - JSpeech Markup Language

Developed by Sun and SpeechWorks, as a markup language for text-to-speech dialogs.

Based on the Java Speech API Markup Languagehttp://java.sun.com/products/java-media/speech/

Text annotation to provide hints to speech synthesizers• Aimed at making TTS speech more natural, more understandable

Feature set:• hints to word pronunciation• hints to phrasing, emphasis, pitch and speaking rate• “marker” elements -- notifications from the speech synthesizer

to applications when marker is reached.

Page 54: Speech Technologies and VoiceXML

JSML - JSpeech Grammar Format

Developed by Sun and SpeechWorks, as a syntax for expressing speech grammars

Based on the Java Speech Grammar API Grammar Formathttp://java.sun.com/products/java-media/speech/

Page 55: Speech Technologies and VoiceXML

Microsoft’s SALT Speech Application Language Tags

• Microsoft, Cisco, Intel, Comverse, SpeechWorks, Philips

A “lightweight” set of tags designed to be used with HTML and XHTML to enable lightweight telephony applications driven from regular Web documents.

Targeted at supporting multimodal access

Page 56: Speech Technologies and VoiceXML