ssml extensions for multi-language usage

SSML extensions for multi-language usage

Davide Bonardo

W3C Workshopon Internationalizing SSML

Crete, 30-31 May 2006

2

About Loquendo

• R&D of speech technology• Over 30 years experience (from CSELT laboratories)• Technologies:

– TTS (text to speech)– ASR (automatic speech recognition) & SV (Speaker Verification)

• Solutions:– Easy integration of speech technologies– Speech servers (MRCPv1 & v2 protocols)– Speech platforms (VoiceXML & CCXML interpreters)– Embedded solutions (for many OS and devices)

3

Ideas for SSML extensions

• <say-as> element– Extension of the values for the “interpret-as”

attribute

• New element– <token>

4

Proposal 1: <say-as> extension (1/3)

• Problem:– How to interpret a part of an input text– Different contexts of dialog require different interpretations– The interpretation could be language dependent

• Many contexts could be defined: sms, e-mails, news, application for rescue operations, …

• The TTS engines may use context information to activate the best configuration for:– reading acronyms– abbreviation expansions– using customized prosodic phrasing– activating a special reading style

5


Proposal:• To extend the “interpret-as” attribute with new

values, for instance:– sms– e-mail– news– banking– navigation– …

6


Examples

<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xml:lang="en-US">

I call you asap.<say-as interpret-as="sms">

I call you asap </say-as>

</speak>

<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xml:lang="en-GB">

<say-as interpret-as="sms"> Mtfbwu </say-as></speak>

7

Proposal 2: New element <token> (1/3)• Problem 1: the activation of the correct language knowledge

at the specific point of the text

• “xml:lang” attribute is currently available in <speak>, <voice>, <p> and <s> elements

• The behavior for the engine could be different: – In the root <speak> element, “xml:lang” defines the language of

the whole document, but for the engine it involves the selection of a voice

– In the <voice> element, it is an important recommendation in order to load the correct voice

– In the <p> and <s> elements, it is mainly a language information and the engine, if able to do this, can use the same voice but a different language knowledge (e.g. phonetic mapping)

• Problem 2: it could be necessary to specify a language change for a text unit smaller than a sentence.

8

Proposal 2: New element <token> (2/3)

Proposal:• To introduce a new element <token>• To extend the use of “xml:lang” attribute to the <token> element

Advantages:• It is a generic element• It is extensible

– Without attributes, it could be used to give information on the segmentation, where needed.

– With other attributes, it could specify new information for the token (i.e. part of speech)

9

Proposal 2: New element <token> (3/3)

Examples


The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella": director, co-writer, and star.

</speak>


The movie is the product of Italian comic sensation <token xml:lang="it-IT">Roberto Benigni</token>, who wore three hats for <token xml:lang="it-IT"> "La vita è bella"</token>: director, co-writer, and star.

</speak>

10

Conclusions

• Proposal 1:– To increase the number of “interpret-as” values with the

identification of new context of speech

• Proposal 2:– To introduce a new element to define some specific information (i.e.

the language) for a single word, or phrase and so on.

ssml extensions for multi-language usage

Documents

language information

new information

new values

tts text

language change

root element

attributenew element

different language knowledge