new challenge: telephone
DESCRIPTION
New challenge: telephone. Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com. Telephone. Caller to system: speech recognition, using grammars (limited vocabulary, general audience, no training) optional use of touch tones (numbers) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/1.jpg)
New challenge: telephone
Text To Speech & audio
Speech recognition
VoiceXML
Homework: sign up on studio.tellme.com
![Page 2: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/2.jpg)
Telephone
• Caller to system: speech recognition, – using grammars (limited vocabulary, general audience,
no training)– optional use of touch tones (numbers)
• System to caller: recorded audio (wav files) plus TTS (text to speech)
• Limited bandwidth, in comparison to other applications, but very familiar, ubiquitous medium
• 800 long distance, some airline information systems, others?
![Page 3: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/3.jpg)
Problems in context
• Speech recognition: very difficult if – no restrictions on speakers
– grammar for all of English with aim of 'natural language understanding'
• Text to speech: much easier problem (but English is more difficult than more fully phonetic languages like Spanish. (I've been told.)
(More next class)
![Page 4: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/4.jpg)
studio.tellme.com• Company that provides ‘engine’ for applications• Provides developing environment
– We are doing the tellme version of VoiceXML, but it appears to be standard.
• Register as a developer:– Provide your own id; assigned a PIN– Scratchpad for quick testing
• Put VoiceXML in ScratchPad place (no audio files)• 1-800-555-VXML (8965)
– SAY id and then PIN.– Application URL for projects with multiple files
• To look at someone else's project, you change your Application URL– called pointing your account to a new source.
![Page 5: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/5.jpg)
![Page 6: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/6.jpg)
VoiceXML• XML document (VXML header)• VoiceXML has tags for flow-of-control and
calculations.– Also can use <script> for JavaScript
• Grammars come in different varieties. We will use the tellme way. – Grammars are included in CDATA tags to prevent
XML interpretation.– Many grammars constructed for you.
• <field name="answer" type="boolean" >…will listen for yes or no. <field name="price" type="currency" > … will listen for currency.
– <menu > <choice > <choice> for list
![Page 7: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/7.jpg)
VoiceXML basics, continued• <form> element can contain
– <block> elements, which can contain <audio>, <go>, other
– <field> which can contain• <prompt>• <grammar> (if not one of built-in grammars)• <filled>
• <var> tags can be at different levels (for example, document, block, or higher levels)
• <if> <elseif><else> tags• <script> elements for JavaScript (which can also
appear in expressions>
![Page 8: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/8.jpg)
VoiceXML basics: typical case
• a form element – <field>
• <prompt>, made up of <audio>, with reference to recorded wav file and backup text
• <grammar>, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section.
• <filled> with (follow-on) code using field
• <catch> for nomatch, noinput cases
![Page 9: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/9.jpg)
Caution
A form contains various elements,
including
a field.
If a field has a grammar and the grammar is satisfied, control goes to a
filled tag
![Page 10: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/10.jpg)
obligatory…
<?xml version="1.0"?><vxml version="2.0"> <form> <block> <audio src="prompt1.wav">Hello, world </audio>
</block> </form></vxml>
recorded using tellme studio
backup using TTS, just in case src file missing
![Page 11: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/11.jpg)
Preparation: objects
• JavaScript (and other languages) use classes and objects
• Objects (aka object instances) are declared (created, instantiated) as members of a class
• Objects have– properties ('the data')
– methods (functions that you can use 'on' the objects)
– static methods• Math.random
![Page 12: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/12.jpg)
Example: tm_date
• var dt = new tm_date; creates a date/time object.• Use methods to extract/manipulate information held
'in' dt.var day = dt.get_day();
• Use static methods supplied to do common tasks:var dn=tm_date.to_day_of_week_name(day);
or directly:var dn=tm_date.to_day_of_week_name(dt.get_day());
![Page 13: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/13.jpg)
outline
• Header stuff
• script with external reference
• script (code) encased in CDATA notation
• Form/Block, with text to speech using value produced by script
• Closing stuff
![Page 14: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/14.jpg)
<?xml version="2.0"?> <vxml><script src="http://resources.tellme.com/lib/code/tm_date.js"/>
Will make use of data functions
![Page 15: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/15.jpg)
<script> <![CDATA[ var dt = new tm_date(); var monis = tm_date.to_month_name(dt.get_month());
var dateis = dt.get_date(); var dayis = tm_date.to_day_of_week_name(dt.get_day());
var yearis = tm_date.to_year_name(dt.get_full_year());
var houris= dt.get_hours() - 4; var minutesis=dt.get_minutes() var whole = 'The date is '+ monis+' '+dateis+'. It is ' + dayis+'. The time is ' + houris + ' ' + minutesis;
]]> </script> brute force correction from GMT
![Page 16: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/16.jpg)
<form>
<block>Hello.
<value expr="whole"/>
Good bye.
</block>
</form>
</vxml>Can use block for audio
![Page 17: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/17.jpg)
Example: my family• Directed responses to 3 family members:
– Daniel, • question/response on activities
– Aviva, • question/response on number of cranes
– Esther • response
• Calculations (arithmetic) done using variables• if tags
– The cond attribute is a condition test.
• limited error handled: exit on no-match event– alternative is to repeat prompt, generally using count
attribute
![Page 18: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/18.jpg)
<vxml version="2.0"> <form> <field name="childid"> <prompt> <audio src="whosthis.wav">Hello. Who is calling?</audio>
</prompt>
![Page 19: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/19.jpg)
<grammar type="application/x-gsl" mode="voice">
<![CDATA[[[dan daniel (daniel meyer) (dan meyer)] {<childid "daniel">}
[aviva (aviva meyer)] {<childid "aviva">}
[esther (esther minkin) ] {<childid "esther">}
]]]></grammar>
![Page 20: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/20.jpg)
<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>
<filled> <if cond="'daniel'==childid"> <goto next="#danfollowup"/> <elseif cond="'aviva'==childid"/> <goto next="#avivafollowup"/> <elseif cond="'esther'==childid"/> <goto next="#estherfollowup"/> <else/> <reprompt/> </if> </filled> </field></form>
never happens Note inner, single quote marks. Note double ='s
![Page 21: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/21.jpg)
<form id="danfollowup"> <field name="today" > <prompt> <audio src="congratsdan.wav" >Congratulations on the new job.
Did you work on your thesis, or do aikido or jo today?</audio> </prompt><grammar type="application/x-gsl" mode="voice"><![CDATA[[[aikido (i key dough)] {<today "aikido">}[thesis (work)] {<today "thesis">}[jo (joe) ] {<today "jo">}[both (all) (everything) ((i key dough) jo)]{<today "both">}[none nothing (sort of)] {<today "nothing">}]]]></grammar><catch event="noinput nomatch"> <audio >I didn't quite
understand. Call or send e-mail.</audio> <exit/> </catch>
![Page 22: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/22.jpg)
<filled><if cond="today=='aikido'" > <audio>Some aikido is fine. </audio> <elseif cond="today=='thesis'" /> <audio>Good, but do other things also.</audio> <elseif cond="today=='jo'" /> <audio>don't get hit in the head.</audio> <elseif cond="today=='both'" /> <audio>Doing some of everything is best. </audio> <elseif cond="today=='nothing'"/> <audio> You deserve a break, but remember you want to
be done by September. </audio> <else/> <audio> See you soon.</audio> </if></filled> </field> <block> <audio> Good bye </audio> </block> </form>
![Page 23: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/23.jpg)
<form id="avivafollowup">
<var name="rest" expr="1000"/>
<field name="bcount" type="number">
<prompt>
<audio src="howmanycranes.wav">Hello, Aviva. How many cranes have you made? </audio>
</prompt>
<grammar type="application/x-gsl" mode="voice" >
<![CDATA[
NATURAL_NUMBER_THRU_9999
]]>
</grammar>
<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>
![Page 24: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/24.jpg)
<filled> <assign name="rest" expr="1000-bcount"/> <audio> <value expr="rest" /> </audio> <audio src="togo.wav"> to go. </audio> <if cond="rest<200" > <audio src="homestretch.wav">You're in the home stretch
</audio> <elseif cond="rest<500" /> <audio src="morethanhalf.wav">More than half way
</audio> <elseif cond="rest<800" /> <audio src="goodstart.wav">Off to a good start </audio> <else/> <audio> Get a move on </audio> </if> <audio src="goodbye.wav">Good bye. </audio> </filled> </field> </form>
can't use <
![Page 25: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/25.jpg)
<form id="estherfollowup">
<block>
<audio >Hello, Mommy. This is all I can do now. </audio>
</block>
</form>
</vxml>
![Page 26: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/26.jpg)
Application logic• VoiceXML elements (for example, <if> and
<var>.– Note: more powerful than XSLT: <assign> tag
• JavaScript code in attributes (for example, cond, expr)
• JavaScript code in <script> </script>– Encase in CDATA to avoid problems with certain
characters
• external JavaScript code, cited using <script src=file address />
![Page 27: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/27.jpg)
Class work
• EVERYONE (who hasn't already) signup studio.tellme.com tonight
• Design simple application (you may work in groups):– Ask one question– Detect and respond to each of 2 or 3 answers– Use examples here for models– All text to speech
• Pick (at least) one and implement.• (Do this a short time and then go on to next lecture.
Resume after 9pm when minutes are free.)
![Page 28: New challenge: telephone](https://reader033.vdocuments.net/reader033/viewer/2022052603/568133ca550346895d9ac41c/html5/thumbnails/28.jpg)
Homework
• (Majors requirement overdue: there will be a deduction but better late than never.)
• Go to studio.tellme.com & signup as developer.– try examples (using scratch pad)
– record some voice samples
– do tellme tutorials
• ALSO try and report on– 800 long distance or some other commercial
application