voice xml team 1 matt ganis, jonathan hill, henry wong anne i. mannette-wright team 1 matt ganis,...
TRANSCRIPT
Voice XML
Team 1Matt Ganis, Jonathan Hill, Henry Wong
Anne I. Mannette-Wright
Team 1Matt Ganis, Jonathan Hill, Henry Wong
Anne I. Mannette-Wright
April 8, 2006 Team 1 VoiceXML
Agenda
• History of Voice Applications and Voice XML• Related Voice Type Languages• Advantages of Voice XML• Architecture of VoiceXML• Paper 1• Paper 2• Paper 3• Demonstration• Voice XML 2.0 • Differences between Voice XML 1.0 and 2.0• The Future – Voice XML 2.1
April 8, 2006 Team 1 VoiceXML
History of Voice Applications
• Voice technologies emerged in the 1990s :– Automatic Speech Recognition (ASR)
• Small vocabulary and speech recognition problems were solved
– Text-to-Speech Systems• Can generate speech responses on the fly
– Interactive Voice Response (IVR) applications
April 8, 2006 Team 1 VoiceXML
History of Voice Applications
IVRs became programmable but programmable IVRs are:– Difficult to program (call scripting is often
vendor specific) so each vendor had to “reinvent wheel”
– Did not allow for the easy movement of an application from one IVR to another due to the proprietary nature of IVRs
April 8, 2006 Team 1 VoiceXML
History of Voice XML
• 1995: AT&T started work on Phone Markup Language (PML)
• Oct.1998: Motorola developed VoxML (Voice Markup Language)
• Feb.1999: IBM developed SpeechML technology• Mar.1999: VoiceXML Forum was formed by IBM, AT&T,
Lucent, and Motorola– Mission was to design a standard dialog design language that
developers could use to build conversational applications
• March 2000: VoiceXML Forum releases VoiceXML 1.0 to the general public
• May 2000: accepted by W3C
April 8, 2006 Team 1 VoiceXML
W3C Speech Interface Framework
From McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”,
retrieved from www.voicexmlreview.org/Dec2001/features/inside.html
April 8, 2006 Team 1 VoiceXML
Related Voice Type Languages
• Related to VoiceXML– Grammar XML (grXML)
• Provides speech grammars used by speech recognition engines
– Speech Synthesis Markup Language (SSML)• SSML specification is based upon JSML(J Speech Markup
Language) and JSGF (J Speech Grammar Format) specifications, which are owned by Sun.
• Introduced in September 2004 is currently a W3C standard at Version 1.0
• Standardized way of specifying how text is rendered as speech and includes tags for pronunciation, tone, inflection, etc.
• Often embedded in VoiceXML scripts to drive interactive telephony systems.
April 8, 2006 Team 1 VoiceXML
Related Voice Type Languages
• Related to VoiceXML (Continued)– Call Control XML (CCXML)
• W3C standard markup language for controlling telephony and telephony equipment; currently at Version 1.0
• Performs tasks such as setting up conference calls, transferring incoming calls, etc.
• Works hand-in-hand with VoiceXML
April 8, 2006 Team 1 VoiceXML
Architecture of VoiceXML
From: http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML™) version 1.0
April 8, 2006 Team 1 VoiceXML
Advantages of Voice XML
• VoiceXML is a markup language that:– Minimizes client/server interactions by specifying
multiple interactions per document.– Shields application authors from low-level, and
platform-specific details.– Separates user interaction code (in VoiceXML) from
service logic (e.g. CGI scripts).– Promotes service portability across implementation
platforms. VoiceXML is a common language for content providers, tool providers, and platform providers.
– Is easy to use for simple interactions, and yet provides language features to support complex dialogs.
April 8, 2006 Team 1 VoiceXML
Paper 1
• Authored by Bruce Lucas: “ VoiceXML for Web-based Distributed Conversational Applications”
• Presents an introduction to VoiceXML
• Comparison to HTML
• Support for Natural Dialogue
April 8, 2006 Team 1 VoiceXML
Paper 1
• VoiceXML is an XML application which results in the following benefits:– Allows the reuse and easy retooling of existing tools for creating,
transforming, and parsing XML documents– Allows VoiceXML to make use of other complementary XML-
based standards. Example: Java Speech Markup Language for speech synthesis
• A form is VoiceXML’s basic dialogue unit– Contains a set of inputs (fields)– Specifies what to do with a set of fields after data is collected
• A field includes a prompt and a specification of what the user is allowed to say
April 8, 2006 Team 1 VoiceXML
Paper 1 - VoiceXML Code Example
<?xml version=”1.0”?><vxml version=”1.0”> <menu>
<prompt>Say one of: <enumerate/></prompt><choice next=”http://www.sports.example/sports.vxml”>
Sports scores</choice><choice next=”http://www.weather.example/weather.vxml”>
Weather information</choice><choice next=”#login”>
Log in</choice>
</menu>
<form id=”login”> <field name=”phone_number” type=”phone”>
<prompt>Please say your complete phone number</prompt> </field> <field name=”pin_code” type=”digits”>
<prompt>Please say your PIN code</prompt> </field> <block>
<submit next=”/servlet/login”/> </block></form></vxml>
April 8, 2006 Team 1 VoiceXML
Paper 1
• VoiceXML includes support for common field types including numbers, digits, phone, date and time AND for user-specified fields using grammars
<form>
<field name=”drink”>
<prompt>What would you like to drink?</prompt>
<grammar>
coffee | tea | orange juice | milk | nothing
</grammar>
</field>
<field name=”sandwich”>
<prompt>What sandwich would you like?</prompt>
<grammar src=”sandwiches.gram”/>
</field>
<block>
<submit next=”/servlet/order”/>
</block>
</form>
April 8, 2006 Team 1 VoiceXML
Paper 1 – The Distributed Model
• VoiceXML provides support for advanced features such as:– Local validation and processing– Audio playback and recording– Support for context specific and taped help and reusable sub
dialogues
From: Lucas, Bruce, “VoiceXML for Web-Based Distributed Conversational Applications,Communications of the ACM, Vol.43, No.9, September 2000.
April 8, 2006 Team 1 VoiceXML
Paper 1 – VoiceXML compared with HTML
• An HTML document is a single unit specified by a URI and presented to the user all at once– A VoiceXML document contains a number of dialogue units
(menus or forms) presented sequentially
• An HTML document has no markup language to identify distinct units– A VoiceXML document is structured to reflect the sequential
nature of the voice medium
• An HTML document is like one single dialogue– A VoiceXML document requires dialogue elements so they can
be presented one at a time. – VoiceXML has application logic for sequencing among dialogue
units
April 8, 2006 Team 1 VoiceXML
Paper 1 – Support for Natural Dialogue
• VoiceXML supports “directed” and “mixed initiative” dialogues – “directed” dialogues: the computer directs the conversation at
each step by prompting the user for the next piece of information
Example: C: On what date do you wish to fly?
H: May 6th– “mixed initiative” dialogues: each participant can take the
initiative in leading a conversation. VoiceXML does this by allowing input grammars to be specified at the form level
C: How can I help you?
H: I’d like to fly from New York on May 8th
C: Where would you like to fly to?
April 8, 2006 Team 1 VoiceXML
Paper 2
• Concepts of Programming by Voice– Motivated by need to program without typing,
therefore preventing repetitive stress injuries (RPI), a common injury among those who spend long hours typing
– Voice-activated software for the disabled is a prime motivator in development
– Paper proposes a system that creates an environment for voice-activated programming
April 8, 2006 Team 1 VoiceXML
Paper 2
• Costs of such software has fallen dramatically;– $7500 in 1998– $100 in 2005
– Products Include;
– Dragon Naturally Speaking
– IBM Via Voice
– Hausbie Voice Express
April 8, 2006 Team 1 VoiceXML
Paper 2
• Authors developed a generator called VocalGenerator using Dragon Naturally Speaking with MS Visual C++
• Input = a context-free grammar compatible with most programming languages
• Output = An environment in which a voice recognition, syntax-directed program can be written by voice input alone
• Allows for better recognition and selection of sections of code
April 8, 2006 Team 1 VoiceXML
Paper 2
• Evaluation of the product– Programming is faster using a Syntax directed
voice recognition system than a natural language DVR
– A programmer suffering from repetitive stress injuries will be able to program at a speed sufficient to ‘maintain competitive employment’
April 8, 2006 Team 1 VoiceXML
Paper 3
• Paper 3 focuses on ‘V-commerce’ – through a survey of Voice XML applications for business communication
• Looks at the inherent risks in human to human communication and the challenges these pose to human to computer communication
• Examines speech recognition• Seeks to leverage the predominance of
telephone usage globally
April 8, 2006 Team 1 VoiceXML
Paper 3
• Utilizes the W3C Voice Browser Working Group design criteria including;– Consistency– Interoperability– Generality– Internationalization– Generalization and Readability– Implementation
April 8, 2006 Team 1 VoiceXML
Paper 3
• Looks at the potential for Voice-activated Web interface
• Looks at a transactional communication method with six phases;– Sender has an idea– Sender transforms the idea into a message– Sender transmits a message– Receiver gets the message– Receiver interprets the message– Receiver reacts and sends feedback
April 8, 2006 Team 1 VoiceXML
Paper 3
• Challenges Include– Unproven business models– Business Process Change Requirements– Channel conflicts– Technology hurdles– Legal issues– Security & privacy
April 8, 2006 Team 1 VoiceXML
Paper 3
• Conclusions– Speech is natural, flexible and efficient– Voice technology will improve– Voice recognition capabilities will improve– The intersection of voice recognition, telecom
and Web technologies may lead to a large market for products that take advantage of this intersection
April 8, 2006 Team 1 VoiceXML
Demo
• Using TellMe Studio (http://studio.tellme.com)• TellMe Studio provides you with resources to:
– Build and test your own Internet-powered "phone sites" with nothing but your Web browser and an ordinary telephone in the following ways:
• Type VoiceXML directly into an area called the “Scratchpad” and then call the phone number to preview the code
• Publish the VoiceXML and audio files on a publically accessible Web server, point Studio at the URL for your application's "home page", and once again call the Studio phone number to preview the application
– Browse and leverage an extensive library of sample code, grammars, audio, and VoiceXML documentation
– Participate in the Voice Web development community through open newsgroups
April 8, 2006 Team 1 VoiceXML
Demo (Continued)
• This demo – Drink Recipes I - will use one of the “prebuilt” VoiceXML scripts available from the TellMe Studio Code Library
• This version of Drink Recipes – asks the caller for a drink name– in response, plays back the drink's ingredients
list and mixing instructions. – demonstrates the use of large grammars and
how to create data-driven applications.
April 8, 2006 Team 1 VoiceXML
VoiceXML 2.0
From: McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from http://www.voicexmlreview.org/Dec2001/features/inside.html
April 8, 2006 Team 1 VoiceXML
Differences Between VoiceXML 2.0
Differences between VoiceXML 1.0 and 2.0:– Interoperability– Functional Completeness– Clarity
April 8, 2006 Team 1 VoiceXML
VoiceXML 2.0
Interoperability: VoiceXML 2.0 contains the following new formats that guarantee developers that their applications run on any VoiceXML platform conforming to the VoiceXML 2.0 specification:– input: XML Format of the Speech Recognition Grammar
Specification for speech and DTMF input; VoiceXML 1.0 did not require any particular speech grammar format
– output: Speech Synthesis Markup Language (SSML) is used for text-to-speech and audio output; VoiceXML 1.0 did not use SSML and its speech markup elements are not supported in Voice XML 2.0
April 8, 2006 Team 1 VoiceXML
VoiceXML 2.0
Interoperability: (Continued)– protocol: the HTTP protocol for fetching documents and
resources is supported. Voice XML 1.0 did not require support for HTTP
– audio: audio platforms recommended for support in VoiceXML 1.0 are now required in VoiceXML 1.0
April 8, 2006 Team 1 VoiceXML
VoiceXML 2.0
Functional Completeness: New elements, attributes and variables have been added in VoiceXML 2.0 that enable developers to ensure that key aspects of the cycle of generating system output, interpreting user input and transitioning from one dialog to another is described.
NOTE: VoiceXML 1.0 contained “gaps” for example: when prompts were played to the user
Some of the new/enhanced elements, variables and support include:– application.lastresult$ variable: provides info about last recognition in
the application
– <log> element: generates a debug message
– <throw> and <catch> elements: enhanced to provide more info
– <audio> element: enhanced with an “expr” attribute
– <menu>: enhanced with “accept” attribute
– Enhanced support for greater control over universal grammars
April 8, 2006 Team 1 VoiceXML
VoiceXML 2.0
Clarity: Voice XML 2.0 provides a clear description and interpretation of ALL elements (and their attributes), how they interact with one another, and their expected behavior.NOTE: VoiceXML 1.0 contains omissions and contradictions in this respectSome clarification changes include:– Subdialogs: <subdialog> description clarified– Root and Leaf document definitions explicitly defined– Prompt queueing and input collection: relationship between these two
clarified– Relationship between VoiceXML 2.0 and ECMAScript variables clarified– VoiceXML 2.0 clarifies conformance between VoiceXML documents
and VoiceXML processors– Alignment of VoiceXML 2.0 with Speech Grammar and Speech
Synthesis specifications
April 8, 2006 Team 1 VoiceXML
VoiceXML 2.1
• Voice XML 2.1was released on June 13, 2005 by the W3C as a “candidate” recommendation
• Voice XML 2.1 proposes 8 enhancements to VoiceXML 2.0 as follows:– Referencing grammars dynamically– Referencing scripts dynamically– Using <mark> to detect Barge-in during prompt playback– Using <data> to fetch XML without requiring a dialog transfer– Concatenating prompts dynamically using <foreach>.– Recording user utterances while attempting recognition– Adding namelist to <disconnect>– Adding type to <transfer>
April 8, 2006 Team 1 VoiceXML
References
1. Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim, “VoiceXML for Business Applications: A Survey”, First Annual ABIT Conference, May 3-5, 2001, Pittsburg, Pennsylvania.
2. Arnold, Stephen A., Mark, Leo and Goldthwaite, John, “Programming by Voice, VocalProgramming”, ASSETS’00, November 13-15, Arlington, Virginia
3. Lucas, Bruce, “VoiceXML for Web-based Distributed Conversational Applications”, Communications of the ACM, September 2000, Vol.43, No.9, pp.53-57.
4. http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 1.0}
5. http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.0)
6. http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.1)
7. https://studio.tellme.com/vxml2/ovw/migrating21.html8. http://www.voicexmlreview.org/Dec2001/features/inside-full.html9. McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from
www.voicexmlreview.org/Dec2001/features/inside.html