voice xml team 1 matt ganis, jonathan hill, henry wong anne i. mannette-wright team 1 matt ganis,...

Voice XML

Team 1Matt Ganis, Jonathan Hill, Henry Wong

Anne I. Mannette-Wright

Team 1Matt Ganis, Jonathan Hill, Henry Wong

Anne I. Mannette-Wright

April 8, 2006 Team 1 VoiceXML

Agenda

• History of Voice Applications and Voice XML• Related Voice Type Languages• Advantages of Voice XML• Architecture of VoiceXML• Paper 1• Paper 2• Paper 3• Demonstration• Voice XML 2.0 • Differences between Voice XML 1.0 and 2.0• The Future – Voice XML 2.1


History of Voice Applications

• Voice technologies emerged in the 1990s :– Automatic Speech Recognition (ASR)

• Small vocabulary and speech recognition problems were solved

– Text-to-Speech Systems• Can generate speech responses on the fly

– Interactive Voice Response (IVR) applications


History of Voice Applications

IVRs became programmable but programmable IVRs are:– Difficult to program (call scripting is often

vendor specific) so each vendor had to “reinvent wheel”

– Did not allow for the easy movement of an application from one IVR to another due to the proprietary nature of IVRs


History of Voice XML

• 1995: AT&T started work on Phone Markup Language (PML)

• Oct.1998: Motorola developed VoxML (Voice Markup Language)

• Feb.1999: IBM developed SpeechML technology• Mar.1999: VoiceXML Forum was formed by IBM, AT&T,

Lucent, and Motorola– Mission was to design a standard dialog design language that

developers could use to build conversational applications

• March 2000: VoiceXML Forum releases VoiceXML 1.0 to the general public

• May 2000: accepted by W3C


W3C Speech Interface Framework

From McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”,

retrieved from www.voicexmlreview.org/Dec2001/features/inside.html


Related Voice Type Languages

• Related to VoiceXML– Grammar XML (grXML)

• Provides speech grammars used by speech recognition engines

– Speech Synthesis Markup Language (SSML)• SSML specification is based upon JSML(J Speech Markup

Language) and JSGF (J Speech Grammar Format) specifications, which are owned by Sun.

• Introduced in September 2004 is currently a W3C standard at Version 1.0

• Standardized way of specifying how text is rendered as speech and includes tags for pronunciation, tone, inflection, etc.

• Often embedded in VoiceXML scripts to drive interactive telephony systems.


Related Voice Type Languages

• Related to VoiceXML (Continued)– Call Control XML (CCXML)

• W3C standard markup language for controlling telephony and telephony equipment; currently at Version 1.0

• Performs tasks such as setting up conference calls, transferring incoming calls, etc.

• Works hand-in-hand with VoiceXML


Architecture of VoiceXML

From: http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML™) version 1.0


Advantages of Voice XML

• VoiceXML is a markup language that:– Minimizes client/server interactions by specifying

multiple interactions per document.– Shields application authors from low-level, and

platform-specific details.– Separates user interaction code (in VoiceXML) from

service logic (e.g. CGI scripts).– Promotes service portability across implementation

platforms. VoiceXML is a common language for content providers, tool providers, and platform providers.

– Is easy to use for simple interactions, and yet provides language features to support complex dialogs.


Paper 1

• Authored by Bruce Lucas: “ VoiceXML for Web-based Distributed Conversational Applications”

• Presents an introduction to VoiceXML

• Comparison to HTML

• Support for Natural Dialogue


Paper 1

• VoiceXML is an XML application which results in the following benefits:– Allows the reuse and easy retooling of existing tools for creating,

transforming, and parsing XML documents– Allows VoiceXML to make use of other complementary XML-

based standards. Example: Java Speech Markup Language for speech synthesis

• A form is VoiceXML’s basic dialogue unit– Contains a set of inputs (fields)– Specifies what to do with a set of fields after data is collected

• A field includes a prompt and a specification of what the user is allowed to say


Paper 1 - VoiceXML Code Example

<?xml version=”1.0”?><vxml version=”1.0”> <menu>

<prompt>Say one of: <enumerate/></prompt><choice next=”http://www.sports.example/sports.vxml”>

Sports scores</choice><choice next=”http://www.weather.example/weather.vxml”>

Weather information</choice><choice next=”#login”>

Log in</choice>

</menu>

<form id=”login”> <field name=”phone_number” type=”phone”>

<prompt>Please say your complete phone number</prompt> </field> <field name=”pin_code” type=”digits”>

<prompt>Please say your PIN code</prompt> </field> <block>

<submit next=”/servlet/login”/> </block></form></vxml>


Paper 1

• VoiceXML includes support for common field types including numbers, digits, phone, date and time AND for user-specified fields using grammars

<form>

<field name=”drink”>

<prompt>What would you like to drink?</prompt>

<grammar>

coffee | tea | orange juice | milk | nothing

</grammar>

</field>

<field name=”sandwich”>

<prompt>What sandwich would you like?</prompt>

<grammar src=”sandwiches.gram”/>

</field>

<block>

<submit next=”/servlet/order”/>

</block>

</form>


Paper 1 – The Distributed Model

• VoiceXML provides support for advanced features such as:– Local validation and processing– Audio playback and recording– Support for context specific and taped help and reusable sub

dialogues

From: Lucas, Bruce, “VoiceXML for Web-Based Distributed Conversational Applications,Communications of the ACM, Vol.43, No.9, September 2000.


Paper 1 – VoiceXML compared with HTML

• An HTML document is a single unit specified by a URI and presented to the user all at once– A VoiceXML document contains a number of dialogue units

(menus or forms) presented sequentially

• An HTML document has no markup language to identify distinct units– A VoiceXML document is structured to reflect the sequential

nature of the voice medium

• An HTML document is like one single dialogue– A VoiceXML document requires dialogue elements so they can

be presented one at a time. – VoiceXML has application logic for sequencing among dialogue

units


Paper 1 – Support for Natural Dialogue

• VoiceXML supports “directed” and “mixed initiative” dialogues – “directed” dialogues: the computer directs the conversation at

each step by prompting the user for the next piece of information

Example: C: On what date do you wish to fly?

H: May 6th– “mixed initiative” dialogues: each participant can take the

initiative in leading a conversation. VoiceXML does this by allowing input grammars to be specified at the form level

C: How can I help you?

H: I’d like to fly from New York on May 8th

C: Where would you like to fly to?


Paper 2

• Concepts of Programming by Voice– Motivated by need to program without typing,

therefore preventing repetitive stress injuries (RPI), a common injury among those who spend long hours typing

– Voice-activated software for the disabled is a prime motivator in development

– Paper proposes a system that creates an environment for voice-activated programming


Paper 2

• Costs of such software has fallen dramatically;– $7500 in 1998– $100 in 2005

– Products Include;

– Dragon Naturally Speaking

– IBM Via Voice

– Hausbie Voice Express


Paper 2

• Authors developed a generator called VocalGenerator using Dragon Naturally Speaking with MS Visual C++

• Input = a context-free grammar compatible with most programming languages

• Output = An environment in which a voice recognition, syntax-directed program can be written by voice input alone

• Allows for better recognition and selection of sections of code


Paper 2

• Evaluation of the product– Programming is faster using a Syntax directed

voice recognition system than a natural language DVR

– A programmer suffering from repetitive stress injuries will be able to program at a speed sufficient to ‘maintain competitive employment’


Paper 3

• Paper 3 focuses on ‘V-commerce’ – through a survey of Voice XML applications for business communication

• Looks at the inherent risks in human to human communication and the challenges these pose to human to computer communication

• Examines speech recognition• Seeks to leverage the predominance of

telephone usage globally


Paper 3

• Utilizes the W3C Voice Browser Working Group design criteria including;– Consistency– Interoperability– Generality– Internationalization– Generalization and Readability– Implementation


Paper 3

• Looks at the potential for Voice-activated Web interface

• Looks at a transactional communication method with six phases;– Sender has an idea– Sender transforms the idea into a message– Sender transmits a message– Receiver gets the message– Receiver interprets the message– Receiver reacts and sends feedback


Paper 3

• Challenges Include– Unproven business models– Business Process Change Requirements– Channel conflicts– Technology hurdles– Legal issues– Security & privacy


Paper 3

• Conclusions– Speech is natural, flexible and efficient– Voice technology will improve– Voice recognition capabilities will improve– The intersection of voice recognition, telecom

and Web technologies may lead to a large market for products that take advantage of this intersection


Demo

• Using TellMe Studio (http://studio.tellme.com)• TellMe Studio provides you with resources to:

– Build and test your own Internet-powered "phone sites" with nothing but your Web browser and an ordinary telephone in the following ways:

• Type VoiceXML directly into an area called the “Scratchpad” and then call the phone number to preview the code

• Publish the VoiceXML and audio files on a publically accessible Web server, point Studio at the URL for your application's "home page", and once again call the Studio phone number to preview the application

– Browse and leverage an extensive library of sample code, grammars, audio, and VoiceXML documentation

– Participate in the Voice Web development community through open newsgroups


Demo (Continued)

• This demo – Drink Recipes I - will use one of the “prebuilt” VoiceXML scripts available from the TellMe Studio Code Library

• This version of Drink Recipes – asks the caller for a drink name– in response, plays back the drink's ingredients

list and mixing instructions. – demonstrates the use of large grammars and

how to create data-driven applications.


VoiceXML 2.0

From: McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from http://www.voicexmlreview.org/Dec2001/features/inside.html


Differences Between VoiceXML 2.0

Differences between VoiceXML 1.0 and 2.0:– Interoperability– Functional Completeness– Clarity


VoiceXML 2.0

Interoperability: VoiceXML 2.0 contains the following new formats that guarantee developers that their applications run on any VoiceXML platform conforming to the VoiceXML 2.0 specification:– input: XML Format of the Speech Recognition Grammar

Specification for speech and DTMF input; VoiceXML 1.0 did not require any particular speech grammar format

– output: Speech Synthesis Markup Language (SSML) is used for text-to-speech and audio output; VoiceXML 1.0 did not use SSML and its speech markup elements are not supported in Voice XML 2.0


VoiceXML 2.0

Interoperability: (Continued)– protocol: the HTTP protocol for fetching documents and

resources is supported. Voice XML 1.0 did not require support for HTTP

– audio: audio platforms recommended for support in VoiceXML 1.0 are now required in VoiceXML 1.0


VoiceXML 2.0

Functional Completeness: New elements, attributes and variables have been added in VoiceXML 2.0 that enable developers to ensure that key aspects of the cycle of generating system output, interpreting user input and transitioning from one dialog to another is described.

NOTE: VoiceXML 1.0 contained “gaps” for example: when prompts were played to the user

Some of the new/enhanced elements, variables and support include:– application.lastresult$ variable: provides info about last recognition in

the application

– <log> element: generates a debug message

– <throw> and <catch> elements: enhanced to provide more info

– <audio> element: enhanced with an “expr” attribute

– <menu>: enhanced with “accept” attribute

– Enhanced support for greater control over universal grammars


VoiceXML 2.0

Clarity: Voice XML 2.0 provides a clear description and interpretation of ALL elements (and their attributes), how they interact with one another, and their expected behavior.NOTE: VoiceXML 1.0 contains omissions and contradictions in this respectSome clarification changes include:– Subdialogs: <subdialog> description clarified– Root and Leaf document definitions explicitly defined– Prompt queueing and input collection: relationship between these two

clarified– Relationship between VoiceXML 2.0 and ECMAScript variables clarified– VoiceXML 2.0 clarifies conformance between VoiceXML documents

and VoiceXML processors– Alignment of VoiceXML 2.0 with Speech Grammar and Speech

Synthesis specifications


VoiceXML 2.1

• Voice XML 2.1was released on June 13, 2005 by the W3C as a “candidate” recommendation

• Voice XML 2.1 proposes 8 enhancements to VoiceXML 2.0 as follows:– Referencing grammars dynamically– Referencing scripts dynamically– Using <mark> to detect Barge-in during prompt playback– Using <data> to fetch XML without requiring a dialog transfer– Concatenating prompts dynamically using <foreach>.– Recording user utterances while attempting recognition– Adding namelist to <disconnect>– Adding type to <transfer>


References

1. Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim, “VoiceXML for Business Applications: A Survey”, First Annual ABIT Conference, May 3-5, 2001, Pittsburg, Pennsylvania.

2. Arnold, Stephen A., Mark, Leo and Goldthwaite, John, “Programming by Voice, VocalProgramming”, ASSETS’00, November 13-15, Arlington, Virginia

3. Lucas, Bruce, “VoiceXML for Web-based Distributed Conversational Applications”, Communications of the ACM, September 2000, Vol.43, No.9, pp.53-57.

4. http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 1.0}

5. http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.0)

6. http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.1)

7. https://studio.tellme.com/vxml2/ovw/migrating21.html8. http://www.voicexmlreview.org/Dec2001/features/inside-full.html9. McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from

www.voicexmlreview.org/Dec2001/features/inside.html

voice xml team 1 matt ganis, jonathan hill, henry wong anne i. mannette-wright team 1 matt ganis,...

Documents

voicexml slide

voicexml history of

voice xml team

voicexml scripts

demonstration voice

future voice xml

voicexml grammar xml

voxml voice markup language