ssml extensions for multi-language usage
DESCRIPTION
Davide Bonardo. SSML extensions for multi-language usage. W3C Workshop on Internationalizing SSML Crete, 30-31 May 2006. About Loquendo. R&D of speech technology Over 30 years experience (from CSELT laboratories) Technologies: TTS (text to speech) - PowerPoint PPT PresentationTRANSCRIPT
SSML extensions for multi-language usage
Davide Bonardo
W3C Workshopon Internationalizing SSML
Crete, 30-31 May 2006
2
About Loquendo
• R&D of speech technology• Over 30 years experience (from CSELT laboratories)• Technologies:
– TTS (text to speech)– ASR (automatic speech recognition) & SV (Speaker Verification)
• Solutions:– Easy integration of speech technologies– Speech servers (MRCPv1 & v2 protocols)– Speech platforms (VoiceXML & CCXML interpreters)– Embedded solutions (for many OS and devices)
3
Ideas for SSML extensions
• <say-as> element– Extension of the values for the “interpret-as”
attribute
• New element– <token>
4
Proposal 1: <say-as> extension (1/3)
• Problem:– How to interpret a part of an input text– Different contexts of dialog require different interpretations– The interpretation could be language dependent
• Many contexts could be defined: sms, e-mails, news, application for rescue operations, …
• The TTS engines may use context information to activate the best configuration for:– reading acronyms– abbreviation expansions– using customized prosodic phrasing– activating a special reading style
5
Proposal 1: <say-as> extension (2/3)
Proposal:• To extend the “interpret-as” attribute with new
values, for instance:– sms– e-mail– news– banking– navigation– …
6
Proposal 1: <say-as> extension (3/3)
Examples
<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xml:lang="en-US">
I call you asap.<say-as interpret-as="sms">
I call you asap </say-as>
</speak>
<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xml:lang="en-GB">
<say-as interpret-as="sms"> Mtfbwu </say-as></speak>
7
Proposal 2: New element <token> (1/3)• Problem 1: the activation of the correct language knowledge
at the specific point of the text
• “xml:lang” attribute is currently available in <speak>, <voice>, <p> and <s> elements
• The behavior for the engine could be different: – In the root <speak> element, “xml:lang” defines the language of
the whole document, but for the engine it involves the selection of a voice
– In the <voice> element, it is an important recommendation in order to load the correct voice
– In the <p> and <s> elements, it is mainly a language information and the engine, if able to do this, can use the same voice but a different language knowledge (e.g. phonetic mapping)
• Problem 2: it could be necessary to specify a language change for a text unit smaller than a sentence.
8
Proposal 2: New element <token> (2/3)
Proposal:• To introduce a new element <token>• To extend the use of “xml:lang” attribute to the <token> element
Advantages:• It is a generic element• It is extensible
– Without attributes, it could be used to give information on the segmentation, where needed.
– With other attributes, it could specify new information for the token (i.e. part of speech)
9
Proposal 2: New element <token> (3/3)
Examples
<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xml:lang="en-US">
The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella": director, co-writer, and star.
</speak>
<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xml:lang="en-US">
The movie is the product of Italian comic sensation <token xml:lang="it-IT">Roberto Benigni</token>, who wore three hats for <token xml:lang="it-IT"> "La vita è bella"</token>: director, co-writer, and star.
</speak>
10
Conclusions
• Proposal 1:– To increase the number of “interpret-as” values with the
identification of new context of speech
• Proposal 2:– To introduce a new element to define some specific information (i.e.
the language) for a single word, or phrase and so on.