printed documentation - lumenvox · 2 release notes version 6.0: supports n-best. reduced server...

Printed Documentation

ii

Table Of Contents

Welcome to the LumenVox Speech Recognition Engine......................................1

Release Notes ......................................................................................................2

Version 6.0: .......................................................................................................2

Version 5.0: .......................................................................................................2

Version 4.0: .......................................................................................................2

Programmers Guide..............................................................................................4

Initializing a Speech Port...................................................................................4

C Code ..........................................................................................................4

C++ Code ......................................................................................................4

C Code ..........................................................................................................5

C++ Code ......................................................................................................6

Working with Grammars....................................................................................7

Loading A Grammar ......................................................................................7

C Code ..........................................................................................................7

C++ Code ......................................................................................................7

Activating A Grammar....................................................................................7

C Code ..........................................................................................................8

C++ Code ......................................................................................................8

See Also ........................................................................................................8

Table Of Contents

iii

Adding Audio.....................................................................................................9

Batched Audio ...............................................................................................9

C Code ..........................................................................................................9

C++ Code ......................................................................................................9

Streaming ......................................................................................................9

C++ Code ......................................................................................................9

C Code ........................................................................................................10

Decoding .........................................................................................................13

C Code ........................................................................................................13

C++ Code ....................................................................................................14

Streaming ....................................................................................................14

Getting The Return Value ............................................................................15

C Code ........................................................................................................15

C++ Code ....................................................................................................15

C Code ........................................................................................................16

C++ Code ....................................................................................................16

See Also ......................................................................................................17

Using the Speech Parse Tree .........................................................................18

Example 1: Print the Tags in the tree...........................................................18

Example 2: Print a structured tree ...............................................................19


iv

See Also ......................................................................................................21

Using the Interpretation Object........................................................................22

C API ...........................................................................................................22

C++ API .......................................................................................................22

Semantic Data Examples ............................................................................22

Example 1: Access Data Directly.................................................................24

C++ Code ....................................................................................................24

C Code ........................................................................................................24

Example 2: Traverse a Semantic Data Structure.........................................24

C Code ........................................................................................................24

Result ..........................................................................................................25

See Also ......................................................................................................26

Shutting Down the Speech Port ......................................................................27

C Code ........................................................................................................27

C++ Code ....................................................................................................27

Gotchas .......................................................................................................27

Example Code.................................................................................................28

A Working Example .....................................................................................28

main.cpp ......................................................................................................29

SimpleRecognizer.h.....................................................................................30

Table Of Contents

v

SimpleRecognizer.cpp.................................................................................31

AudioStreamer.h..........................................................................................36

AudioStreamer.cpp ......................................................................................37

HeaderClasses.h .........................................................................................39

SRGS Grammars ............................................................................................43

A Simple Grammar ......................................................................................43

Rule Expansions by Example ......................................................................46

Rule References ..........................................................................................49

Special Rules...............................................................................................51

Tags.............................................................................................................53

Applying Grammar Weights .........................................................................56

SRGS Definitions.........................................................................................58

Example Grammars.....................................................................................65

Semantic Interpretation ...................................................................................68

Intro to Semantic Interpretation....................................................................68

Semantic Interpretation by Example ............................................................70

Getting The Return Value ............................................................................74

Phonemes .......................................................................................................75

Phrases ...........................................................................................................78

BNF Refresher.............................................................................................78


vi

LumenVox SpeechRec API ................................................................................80

Cautions ..........................................................................................................80

LV_SRE C API Functions................................................................................81

LV_SRE.......................................................................................................81

API Functions ..............................................................................................86

LVInterpretation C API Functions..................................................................161

LVInterpretation Summary.........................................................................161

LVSemanticData Summary........................................................................163

API Functions ............................................................................................166

LVParseTree C API functions........................................................................190

API Functions ............................................................................................191

Related APIs..............................................................................................204

LVParseTree Class....................................................................................218

LVGrammar C API Functions........................................................................221

LVGrammar Summary...............................................................................221

API Functions ............................................................................................225

LVSpeechPort Class .....................................................................................261

class LVSpeechPort ..................................................................................261

Methods.....................................................................................................266

LVInterpretation Class...................................................................................334

Table Of Contents

vii

Intro To LVInterpretation............................................................................334

LVInterpretation: Constructing and Copying ..............................................336

ResultData.................................................................................................337

ResultName...............................................................................................338

Language...................................................................................................339

Mode..........................................................................................................340

TagFormat .................................................................................................341

InputSentence............................................................................................342

GrammarLabel...........................................................................................343

Score .........................................................................................................344

LVSemanticData Class..............................................................................345

LVSemanticObject Class ...........................................................................354

LVSemanticArray Class.............................................................................360

LVParseTree Class .......................................................................................363

LVParseTree Class....................................................................................363

Methods.....................................................................................................366

LVParseTree Inner Classes.......................................................................375

LVGrammar Class.........................................................................................388

class LVGrammar ......................................................................................388

Methods.....................................................................................................393


viii

Callback Functions ........................................................................................426

Logging Callback Function.........................................................................426

Streaming Callback Function.....................................................................427

Grammar Logging Callback Function ........................................................428

Constants ......................................................................................................429

Decoder Flags ...........................................................................................429

Error Codes ...............................................................................................431

Properties ..................................................................................................434

Sound Formats ..........................................................................................440

Standard Grammars ..................................................................................442

Semantic Data Type ..................................................................................443

Semantic Data Print Format.......................................................................444

Stream Parameters....................................................................................445

Environment Variables ..................................................................................449

Environment Variables...............................................................................449

FAQs.................................................................................................................451

FAQs .............................................................................................................451

How to Contact LumenVox LLC........................................................................458

Copyright Information........................................................................................459

Glossary............................................................................................................460

Table Of Contents

ix

Index .................................................................................................................461

1

Welcome to the LumenVox Speech Recognition Engine We strive to make our products as user-friendly as possible and we value your opinion. If there is something you would like added to the Help system, please email your suggestions to [email protected].

2

Release Notes Version 6.0:

Supports n-best.

Reduced server memory footprint.

Speed up on recognition algorithm.

Reduced server new thread start up time.

New American English acoustic models with 8~10% relative improvement on recognition accuracy.

Improved confidence score.

Global grammars are stored on server.

Version 5.0:

Support for the Speech Recognition Grammar Specifiacation (SRGS). SRGS grammars are now the official grammar format for the LumenVox Engine. SRGS grammars are powerful probabilistic context free grammars that allow a lot of flexibility in writing grammars.

Support for the Semantic Interpretation for Speech Recognition working draft (SISR). Semantic Interpretation makes it easy to transform spoken input into machine understandable data.

Version 4.0:

A new header file <LV_SRE2.h> is provided for the new C interface functions. This should be used in conjuction with <LV_SRE.h>

A new header file <LVSpeechPort2.h> is provided. This contains a new C++ wrapper class (with same name "class LVSpeechPort") which contains new methods. This replaces the <LVSpeechPort.h> header.

A new dll called "LVSpeechPort_stdcall.dll" is included to allow programming environments which require standard calls (like VB) to use

Release Notes

3

the SRE engine. The file SREAPI.txt contains a sample interface for use with VB.

4

Programmers Guide Initializing a Speech Port

The only thing you must do to initialize a speech port is to have an Speech Engine service running on your machine, and call OpenPort

C Code

HPORT port; long error_code; port = LV_SRE_OpenPort2(&error_code,NULL,NULL,0);

switch(error_code) { case LV_OPEN_PORT_FAILED__LICENSES_EXCEEDED: printf("licenses exceeded"); break; case LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING: case LV_NO_SERVER_RESPONDING printf("SRE server unavailable"); break; case LV_SUCCESS: printf("port opened"); break; }

C++ Code

LVSpeechPort port; port.OpenPort( ); int error_code = port.GetOpenPortStatus();

switch(error_code) { case LV_FAILURE: cout <<"licenses exceeded"; break; case LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING: case LV_NO_SERVER_RESPONDING cout << "SRE server unavailable"; break; case LV_SUCCESS: cout << "port opened"; break; }

Other things you can do besides opening a port include

Programmers Guide

5

Register logging callback functions

Register multiple servers

Turn on Engine sound file and result logging, for application tuning.

C Code

/* a structure to hold logfile info */ typedef struct logdata_s { long file; long message_count; }logdata_t; void logdata_callback(const char* message, void* userdata) { logdata_t* mydata = (logdata_t*)userdata; fprintf(mydata->file,"%s\n",message); ++(mydata->message_count; } int init_port (HPORT* port, logdata_t* app_message, logdata_t* log_message ) { long error_code; /* Register a callback to accept messages from the server or client library, at warning level 3 */ LV_SRE_RegisterAppLogMsg(logdata_callback,app_message, 3); /* point the client library to a local server and a remote server */ LV_SRE_SetPropertyEx(NULL,PROP_EX_SRE_SERVERS, PROP_EX_VALUE_TYPE_STRING, "127.0.0.1,10.0.0.1", PROP_EX_TARGET_CLIENT, 0); /* open the port, registering a callback to accept messages from the port at warning level 3 */ port = LV_SRE_OpenPort2(error_code, logdata_callback,log_message,3); /* turn on sound and response file logging */ int save_sound_files=1; LV_SRE_SetPropertyEx(port,PROP_EX_SAVE_SOUND_FILES, PROP_EX_VALUE_TYPE_INT_PTR, &save_sound_files, PROP_EX_TARGET_PORT,0);


6

return error_code; }

C++ Code

// a class to hold logfile info struct logdata { ofstream file; long message_count; static void callback(const char* message, void* userdata) { logdata* self = (logdata*)userdata; mydata->file << message << endl; ++(mydata->message_count; } }; int init_port (LVSpeechPort& port, logdata* app_message, logdata* log_message ) { long error_code; // Register a callback to accept messages from the server // or client library, at warning level 3. LVSpeechPort::RegisterAppLogMsg(logdata_callback,app_message, 3); // point the client library to a local server and a remote server LVSpeechPort::SetClientPropertyEx(PROP_EX_SRE_SERVERS, PROP_EX_TYPE_STRING, "127.0.0.1,10.0.0.1"); // open the port, registering a callback to accept messages // from the port at warning level 3. port.OpenPort(logdata_callback,log_message,3); // turn on sound and response file logging port.SetPropertyEx(PROP_EX_SAVE_SOUND_FILES, PROP_EX_VALUE_TYPE_INT_PTR, &save_sound_files); return port.GetOpenPortStatus(); }

Programmers Guide

7

Working with Grammars

Grammars tell the Speech Recognition Engine what words and phrases can be recognized by the engine, and in what order. The LumenVox grammar format is an implementation of Speech Recognition Grammar Specification, published by the W3C. A short tutorial on writing SRGS grammars is provided here.

Loading A Grammar

In order to decode audio, there must be at least one grammar loaded. Grammars can be loaded a variety of ways, a few of which are demonstrated below:

C Code

HPORT hport; /* Load a grammar into the global (application-level) space, and name it * nav_menu" * This grammar will be usable by any speech port on the client machine. * Any syntax warnings or error messages will be sent to the * application-level logging callback. */ LV_SRE_LoadGlobalGrammar ("nav_menu","c:/MyGrammars/top_level_navigation.gram"); /* Load a built-in grammar into the speech port, name it "yes_no". * Syntax error or warning messages * will be sent to the port's logging callback. * The hport needs to be open first, of course. */ LV_SRE_LoadGrammar(hport, "yes_no", "builtin:grammar/boolean");

C++ Code

LVSpeechPort port; port.OpenPort(); LVSpeechPort::LoadGlobalGrammar("nav_menu","c:/MyGrammars/top_level_navigation.gram"); port.LoadGrammar("yes_no", "builtin:grammar/boolean");

Activating A Grammar


8

When a grammar is loaded, it is compiled into a file usable by the Engine. But to use the grammar for a decode you must activate it. You may activate multiple grammars for a single decode; the Engine will tell you which grammar was matched.

C Code

/* Activates the "nav_menu" grammar that was loaded above. * Activate searches for a grammar named "nav_menu" in its port, then searches the global * space if it can't find it. */ LV_SRE_ActivateGrammar (hport, "nav_menu");

C++ Code

port.ActivateGrammar("nav_menu");

See Also

Grammar Writing Tutorial

Programmers Guide

9

Adding Audio

Because the LumenVox Speech Engine is hardware independent, the client application has greater flexibility when collecting the audio data. Once the audio is acquired, the client application should ensure the data is in a supported audio format. The audio must be header-less, otherwise known as "raw" audio format. For example, the standard Windows .wav files have a header which needs to be removed.

The audio data is stored in a voice channel. Each speech port has 64 different voice channels. This allows 64 different audio data samples to be stored in a speech port at once, although most applications will only need 2, one for the main answer, and one holding the results of a confirmation yes/no question.

Audio may be entered at once, as a batch decode, or it may be streamed in.

Batched Audio

To get your audio into the port all you have to do is collect your audio into a buffer and call LoadVoiceChannel

C Code

void LoadAudio(HPORT hport, void* audio, int audiolength) {

LV_SRE_LoadVoiceChannel(hport, 1, audio, audiolength, PCM_16KHZ);

}

C++ Code

void LoadAudio(LVSpeechPort& myPort, void* audio, int audiolength) {

myPort.LoadVoiceChannel(1, audio, audiolength, PCM_16KHZ); }

Streaming

In order to stream audio into the server, there are several parameters to set. We will set them to the most commonly used settings:

C++ Code


10

// The port gets opened and initialized. LVSpeechPort Port; Port.OpenPort(); // ... // let the port detect beginning and end of speech, // and handle the speech decoding automatically port.StreamSetParameter(STREAM_PARM_DETECT_BARGE_IN,1); port.StreamSetParameter(STREAM_PARM_DETECT_END_OF_SPEECH,1); port.StreamSetParameter(STREAM_PARM_AUTO_DECODE,1); //pick a voice channel to record audio and send responses to. port.StreamSetParameter(STREAM_PARM_VOICE_CHANNEL, 1); // If you wish to use your activated SRGS grammars, the grammar set // must be LV_ACTIVE_GRAMMAR_SET port.StreamSetParameter(STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET);

C Code

LV_SRE_StreamSetParameter(hport,STREAM_PARM_DETECT_BARGE_IN,1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_DETECT_END_OF_SPEECH,1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_AUTO_DECODE,1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_VOICE_CHANNEL, 1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET);

The rest of this example will be in C++. The C version can be an exercise for the reader. Suppose we have an interface that intermittently provides audio to us. For simplicity, assume it always sends audio in u-Law 8KHz:

typedef bool (*AudioStreamCallback)(char* audio_chunk, int audio_length, void* user_data) class AudioStreamer { public: //non-blocking function. Sends audio through the callback function //at regular intervals on a separate thread. It will stop sending //audio if the callback returns "false". void StartStream(AudioStreamCallback cb, void* user_data); //The audio thread will stop sending audio through the callback if //StopStream is called. When StopStream returns, the audio thread //is no longer sending.

Programmers Guide

11

void StopStream( ); //constructors, destructors, hardware hooks, etc. //... };

The speech port also has a callback mechanism for letting the user know what state of processing it is in.

typedef void (*StreamStateChangeFn)(long new_state, unsigned long total_bytes, unsigned long recorded_bytes, void* user_data);

We can connect our speech port and the audio streamer together by way of their callbacks.

struct SimpleRecognizer { LVSpeechPort port; AudioStreamer audio; }; bool AudioCB(char* audio_chunk, int audio_length, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; self->port.StreamSendData(audio_chunk,audio_length); return true; } static void PortCB(long new_state, unsigned long total_bytes, unsigned long recorded_bytes, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; switch (new_state) { case STREAM_STATUS_READY: self->audio.StartStream(AudioCB,self); break; case STREAM_STATUS_STOPPED: case STREAM_STATUS_END_SPEECH: self->audio.StopStream(); //retrieve answers: we will define this later break; case STREAM_STATUS_BARGE_IN: //stop playing prompt break; } }

Now all that has to happen is to plug the PortCB function into the port.


12

SimpleRecognizer reco; //initialize the speech port and the audio streamer //... //start the stream. reco.port.StreamSetStateChangeCallBack(PortCB,&reco); reco.port.StreamSetParameter(STREAM_PARM_SOUND_FORMAT,ULAW_8KHZ); //StreamStart will put the port into the STREAM_STATUS_READY state, which //will trigger the audio streamer to start sending audio to the port. reco.port.StreamStart();

Programmers Guide

13

Decoding

Once grammars have been activated, and the speech port is receiving audio, The decode process can begin. The decode process sends audio and grammars to the Engine to be parsed and interpreted for meaning.

Batched Audio

With audio that is dropped directly into a speech port's voice channel, the user can explicitly call Decode, and wait for results to come back

C Code

HPORT hport; /* Let the port decide if the audio is suited for the MODEL_MALE or MODEL_FEMALE acoustic models. Otherwise, two decodes will be performed, and the port will choose afterward */ int choose_model = 1; LV_SRE_SetPropertyEx(NULL, PROP_EX_CHOOSE_MODEL, PROP_EX_VALUE_TYPE_INT_PTR, &choose_model, PROP_EX_TARGET_CLIENT,0); /* If you wish to use the LumenVox Semantic Interpretation process this flag needs to be present. */ unsigned long flags = LV_DECODE_SEMANTIC_INTERPRETATION; /* voice_channel is wherever you loaded the audio */ int voice_channel = 1; /* you should use the LV_ACTIVE_GRAMMAR_SET if you are using SRGS grammars. It is the grammar set that holds all of your active grammars. */ int grammar_set = LV_ACTIVE_GRAMMAR_SET; /* wait a max of 3 seconds before abandoning hope for the Engine to return an answer */ int timeout = 3000; LV_SRE_Decode(hport, voice_channel, grammar_set, flags); int code = LV_SRE_WaitForEngineToIdle(hport,timeout,voice_channel); if (code == LV_TIME_OUT) { /*do some clean up and exit */ } else { /* process the answers contained in the voice channel */ }


14

C++ Code

LVSpeechPort port; int choose_model = 1; LVSpeechPort::SetClientPropertyEx(PROP_EX_CHOOSE_MODEL, PROP_EX_VALUE_TYPE_INT_PTR, &choose_model); unsigned long flags = LV_DECODE_SEMANTIC_INTERPRETATION; int voice_channel = 1; int grammar_set = LV_ACTIVE_GRAMMAR_SET; int timeout = 3000; port.Decode(voice_channel, grammar_set, flags); int code = port.WaitForEngineToIdle(timeout,voice_channel); if (code == LV_TIME_OUT) { /*do some clean up and exit */ } else { /* process the answers contained in the voice channel */ }

Streaming

If you are streaming the audio into the speech port, you can elect to have the speech port handle the decode process automatically, as we did in the section on adding audio when we wrote the line:

port.SetStreamParameter(STREAM_PARM_AUTO_DECODE,1);

In order to wait for the Engine to return with results, we need to modify our callback function:

void ProcessResults(SimpleRecognizer* reco) { reco->audio.StopStream(); int code = reco->port.WaitForEngineToIdle(3000, voice_channel); if (code == LV_TIME_OUT) { /*do some clean up and exit */ } else { /* process the answers contained in the voice channel */ } } static void PortCB(long new_state, unsigned long total_bytes, unsigned long recorded_bytes, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; switch (new_state) { case STREAM_STATUS_READY: self->audio.StartStream(AudioCB,self); break; case STREAM_STATUS_STOPPED:

Programmers Guide

15

case STREAM_STATUS_END_SPEECH: ProcessResults(self); break; case STREAM_STATUS_BARGE_IN: //stop playing prompt break; } }

Getting The Return Value

If WaitForEngineToIdle returns successfully, you can grab answers out of the port. If you are using the semantic interpretation processor, you retrieve LVInterpretation objects.

C Code

if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_interp = LV_SRE_GetNumberOfInterpretations(hport,voice_channel); for (int i = 0; i < num_interp; ++i) { printf("interpretation %i:\n", i); H_SI interp = LV_SRE_CreateInterpretation(hport,voice_channel,i); const char* grammar = LVInterpretation_GetGrammarLabel(interp); int score = LVInterpretation_GetScore(interp); printf("utterance matched grammar %s with confidence %i\n",grammar,score); /* See "Using Semantic Data" to see how to handle the semantic data contained in this interpretation object by example */ /* release the interpretation handle when finished with it */ LVInterpretation_Release(interp); } }

C++ Code

if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_interp =


16

port.GetNumberOfInterpretations(voice_channel); for (int i = 0; i < num_interp; ++i) { cout <<"interpretation "<< i <<":"<<endl; LVInterpretation interp = port.GetInterpretation(voice_channel,i); const char* grammar = interp.GrammarLabel( ); int score = interp.Score( ); cout <<"utterance matched grammar "<<grammar<<" with confidence "<<score<<endl; // See "Using Semantic Data" to see how to handle the semantic data // contained in this interpretation object by example } }

If you are not using semantic interpretation, you can receive LVParseTree objects from the Engine.

C Code

if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_parses = LV_SRE_GetNumberOfParses(hport,voice_channel); for (int i = 0; i < num_parses; ++i) { printf("interpretation %i:\n", i); H_PARSE_TREE parse = LV_SRE_CreateParseTree(hport,voice_channel,i); /* See "Using the Parse Tree" to see how to handle the parse tree by example */ /* release the parse tree when finished with it */ LVParseTree_Release(parse); } }

C++ Code

if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_parses = port.GetNumberOfParses(voice_channel); for (int i = 0; i < num_parses; ++i) {

Programmers Guide

17

cout <<"interpretation "<< i <<":"<<endl; LVParseTree parse = port.GetParseTree(voice_channel,i); // See "Using the Parse Tree" to see how to handle // the parse tree by example } }

See Also

Using Semantic Data

Using the Parse Tree


18

Using the Speech Parse Tree

#include <LV_SRE_ParseTree.h>

A ParseTree represents a sentence diagram of engine output, according to the SRGS grammar that was matched. Information about the tree is accessed through iterators.

Here are a few code examples to show how information can be accessed from the speech parse tree. In every example, the active grammar will be:

#ABNF 1.0; language en-US; mode voice; tag-format <XML>; //a made up tag format.

root $PhoneNumber;

$Digit = one {1} | two {2} | three {3} | four {4} | five {5} | six {6} | seven {7} | eight {8} | nine {9} | (zero | oh) {0};

$AreaCode = [area code | one] {<AREA_CODE>} $Digit<3> {</AREA_CODE>};

$PhoneNumber = [$AreaCode] {<PHONE>} $Digit<7> {</PHONE>};

And the decoded sentence will be "area code eight five eight seven o seven o seven o seven". If you do not understand how to write an SRGS Grammar, read the tutorial now.

Example 1: Print the Tags in the tree

C++ API

#include <LV_SRE_ParseTree.h> #include <iostream>

using namespace std;

void PrintTags(LVParseTree& Tree) { LVParseTree::Iterator Itr = Tree.Begin(); LVParseTree::Iterator End = Tree.End();

Programmers Guide

19

for (; Itr != End; ++Itr) { if (Itr->IsTag()) { cout << Itr->Text() << "\n"; } } }

C API

#include <LV_SRE_ParseTree.h>

void PrintTags(H_PARSE_TREE Tree) { H_PARSE_TREE_NODE N; H_PARSE_TREE_ITR Itr; Itr = LVParseTree_CreateIteratorBegin(Tree); for (; !LVParseTree_Iterator_IsPastEnd(Itr); LVParseTree_Iterator_Advance(Itr)) { N = LVParseTree_Iterator_GetNode(Itr); if (LVParseTree_Node_IsTag(N)) { printf("%s ",LVParseTree_Node_GetLabel(N)); } } LVParseTree_Iterator_Release(Itr); }

Result

"<AREA_CODE> 8 5 8 </AREA_CODE> <PHONE> 7 0 7 0 7 0 7 </PHONE>"

Example 2: Print a structured tree

C++ API

#include <LV_SRE_ParseTree.h> #include <iostream> using namespace std; void PrintNode(LVParseTree::Node& N)


20

{ for (int i = 0; i < N.Level(); ++i) cout << " "; if (N.IsTerminal()) cout << "\"" << N.Text() << "\"\n"; if (N.IsTag()) cout << "{ " << N.Text() << " }\n"; if (N.IsRule()) { cout << "$" << N.RuleName() << ":\n"; LVParseTree::ChildrenIterator Itr = N.ChildrenBegin(); LVParseTree::ChildrenIterator End = N.ChildrenEnd(); for (;Itr != End; ++Itr) PrintNode(*Itr); } } void PrintTree(LVParseTree& Tree) { PrintNode(Tree.Root()); }

C API

#include <LV_SRE_ParseTree.h> #include <stdio.h> void PrintNode(H_PARSE_TREE_NODE N) { H_PARSE_TREE_CHILDREN_ITR I; int i; for (i = 0; i < LVParseTree_Node_GetLevel(N); ++i) printf(" "); if (LVParseTree_Node_IsTerminal(N)) printf("\"%s\"\n",LVParseTree_Node_GetText(N)); if (LVParseTree_Node_IsTag(N)) printf("{ %s }\n",LVParseTree_Node_GetText(N)); if (LVParseTree_Node_IsRule(N)) { printf("$%s:\n",LVParseTree_Node_GetRuleName(N)); I = LVParseTree_Node_CreateChildrenIterator(N); while (!LVParseTree_ChildrenIterator_IsPastEnd(I)) { PrintNode(LVParseTree_ChildrenIterator_GetNode(I)); LVParseTree_ChildrenIterator_Advance(I); } LVParseTree_ChildrenIterator_Release(I); } }

Programmers Guide

21

void PrintTree(H_PARSE_TREE Tree) { PrintNode(LVParseTree_GetRoot(Tree)); }

Result:

$PhoneNumber: $AreaCode: "AREA" "CODE" { <AREA_CODE> } $Digit: "EIGHT" { 8 } $Digit: "FIVE" { 5 } $Digit: "EIGHT" { 8 } { </AREA_CODE> } { <PHONE> } $Digit: "SEVEN" { 7 } $Digit: "OH" { 0 } $Digit: "SEVEN" { 7 } $Digit: "OH" { 0 } $Digit: "SEVEN" { 7 } $Digit: "OH" { 0 } $Digit: "SEVEN" { 7 } { </PHONE> }

See Also

LVParseTree C API

LVParseTree C++ API


22

Using the Interpretation Object

#include <LV_SRE_Semantic.h>

When the speech port executes your semantic interpretation tags, the output is an ECMAScript (JavaScript) object. LumenVox provides a C and C++ API for examining this object. When the speech port has finished its decode, and processed the resulting parse tree and tags, you may request an interpretation object. The interpretation object contains information about the decode -- confidence score, matching grammar, etc -- plus a single semantic data object.

C API

H_SI interpretation = LV_SRE_CreateInterpretation (hport,voicechannel,index); /* the name of the active grammar that matched this interpretation */ const char* grammar = LVInterpretation_GetGrammarLabel (interpretation); /* the SRE's confidence in this interpretation */ int confidence = LVInterpretation_GetScore (interpretation); /* the sentence that the SRE decoded */ const char* sentence = LVInterpretation_GetInputSentence (interpretation); /* the object returned by the semantic interpretation process */ H_SI_DATA result_data = LVInterpretation_GetResultData (interpretation);

C++ API

LVInterpretation interpretation = port.GetInterpretation (voicechannel, index); const char* grammar = interpretation.GrammarLabel( ); int confidence = interpretation.Score( ); const char* sentence = interpretation.InputSentence ( ); LVSemanticData result_data = interpretation.ResultData ( );

Semantic Data Examples

In the following examples, the grammar will be:

Programmers Guide

23

#ABNF 1.0; language en-US; mode voice; tag-format <lumenvox/1.0>; //This line tells the engine how to interpret the grammar's tags. //currently, only "lumenvox/1.0" or "semantics/1.0" is supported. root $small_number_and_text; $base = (one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9") { $ = parseInt($) }; $teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19" { $ = parseInt($) }; $twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"| seventy:"70"|eighty:"80"|ninety:"90"){ $ = parseInt($) } [$base { $ += $base }]; $tens = ($base|$teen|$twenty_to_ninetynine) { $ = $$ }; $hundred = ([a] hundred {$ = 100} | $base hundred {$ = 100 * $base}); $small_number = $hundred {$ = $$} [[and] $tens {$ += $$}] | $tens { $ = $$ }; $small_number_and_text = $small_number { $.number = $$; $.text = $$$.text };

And the input sentence will be "four hundred and six". If you do not understand how SRGS grammars are written, or how the semantic interpretation process works, please read the SRGS Grammar and/or Semantic Interpretation tutorials now.

The result of the semantic interpretation process on the input sentence is an ECMAScript object that looks like this:

small_number_and_text : // return value of type SI_TYPE_OBJECT { number: 406, // property of type SI_TYPE_INT text: "four hundred and six" // property of type SI_TYPE_STRING }


24

Example 1: Access Data Directly

If we knew that our application would always be receiving an object containing an integer property named "number", and a string property named "text", we could write code to retrieve the data as follows:

C++ Code

LVSemanticObject result_obj = interpretation.ResultData().GetSemanticObject( ); int number = result_obj["number"].GetInt( ); const char* text = result_obj["text"].GetString( );

C Code

H_SI_DATA result = LVInterpretation_GetResultData(interpretation); H_SI_DATA number_container = LVSemanticObject_GetPropertyValue(result,"number"); int number = LVSemanticData_GetInt(number_container); H_SI_DATA text_container = LVSemanticObject_GetPropertyValue(result,"text"); const char* text = LVSemanticData_GetString(text_container);

Example 2: Traverse a Semantic Data Structure

The following code prints a generic interpretation object as an XML fragment.

C Code

void PrintXML(H_SI hsi) { const char* result_name = LVInterpretation_GetResultName(hsi); printf("<%s>\n",result_name); PrintDataXML(LVInterpretation_GetResultData(hsi)); printf("</%s>\n",result_name); } void PrintDataXML(H_SI_DATA hsi) { int i; int n; const char* property_name; H_SI_DATA data;

Programmers Guide

25

switch(LVSemanticData_GetType(hsi)) { case SI_TYPE_BOOL: LVSemanticData_GetBool(hsi) ? printf("true\n") : printf("false\n"); break; case SI_TYPE_INT: printf("%d\n", LVSemanticData_GetInt(hsi)); break; case SI_TYPE_DOUBLE: printf("%f\n", LVSemanticData_GetDouble(hsi)); break; case SI_TYPE_STRING: printf("%s\n", LVSemanticData_GetString(hsi));

break; case SI_TYPE_OBJECT: n = LVSemanticObject_GetNumberOfProperties(hsi); for (i = 0; i < n; i++) { property_name = LVSemanticObject_GetPropertyName(hsi, i) data = LVSemanticObject_GetPropertyValue(hsi,property_name); printf("<%s>\n", property_name); PrintDataXML(data); printf("</%s>\n", property_name); } break; case SI_TYPE_ARRAY: n = LVSemanticArray_GetSize(hsi); for (i = 0; i < n; i++) { data = LVSemanticArray_GetElement(hsi,i); printf("<item>\n"); PrintDataXML(data); printf("</item>\n"); } break; } }

Result

<small_number_and_text> <number> 406 </number> <text> four hundred and six </text> </small_number_and_text>


26

See Also

Semantic Interpretation C API

Semantic Interpretation C++ API

Programmers Guide

27

Shutting Down the Speech Port

When the speech port is no longer needed it should be closed. Closing every unnecessary speech port frees up licensed ports, and releases all of the speech port's resources.

C Code

HPORT hport; /* open it...do some stuff...close when done */ LV_SRE_ClosePort (hport);

C++ Code

LVSpeechPort Port; //open it...do some stuff...close when done Port.ClosePort ( );

Gotchas

While closing the port may seem trivial, as soon as you start streaming audio to the port from a separate thread, the trivial can be problematic. Remember to completely disengage your stream from the port before you close it.


28

Example Code

A Working Example

Included in this documentation is a working example that incorporates streaming audio, SRGS grammars, and Semantic Interpretation. It is written in C++, is based on examples throughout this documentation, and compiles under Visual C++ 6.0.

It consists of six files.

main.cpp -- The entry point into the application.

SimpleRecognizer.h -- Definition of a recognizer, backed by LVSpeechPort.

SimpleRecognizer.cpp -- Implementation file.

AudioStreamer.h -- Definition of an object that mimics streaming by reading an audio file.

AudioStreamer.cpp -- Implementation file.

HeaderClasses.h -- Thread code to help implement AudioStreamer.

Programmers Guide

29

main.cpp

#include "AudioStreamer.h" #include "SimpleRecognizer.h" #include <iostream> int main() { SimpleRecognizer Reco; Reco.LoadGrammar("yesno","builtin:grammar/boolean"); AudioStreamer Audio("yesplease.ulaw"); Reco.Recognize(&Audio,"yesno"); Reco.WaitUntilDone(); std::cout << std::endl << Reco.GetResult() << std::endl << std::endl; return 0; }


30

SimpleRecognizer.h

#ifndef SIMPLE_RECOGNIZER_H #define SIMPLE_RECOGNIZER_H #include "AudioStreamer.h" #include <LVSpeechPort.h> class SimpleRecognizer { public: SimpleRecognizer(); ~SimpleRecognizer(); void WaitUntilDone(); void LoadGrammar(const std::string& grammar_name, const std::string& grammar_location); void Recognize(AudioStreamer* Stream, const std::string& grammar_name); const std::string& GetResult(); private: static void PortCB(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData); static bool AudioCB(char* audio_data, int audio_data_size, void* user_data); bool finished_decode; AudioStreamer* AudioThread; LVSpeechPort port; int voiceChannel; void GetAnswers(); std::string result; }; #endif//SIMPLE_RECOGNIZER_H

Programmers Guide

31

SimpleRecognizer.cpp

#include "SimpleRecognizer.h" #include <sstream> //============================================================================================== // callback for messages from the speech port void logger(const char* msg, void* userdata) { std::cout << msg << std::endl; } //============================================================================================== // code to plug LVSemanticData into any standard stream std::ostream& operator << (std::ostream& os ,const LVSemanticData& Data) { int i; LVSemanticObject Obj; switch (Data.Type()) { case SI_TYPE_BOOL: os << Data.GetBool() << "\n"; break; case SI_TYPE_INT: os << Data.GetInt() << "\n"; break; case SI_TYPE_DOUBLE: os << Data.GetDouble() << "\n"; break; case SI_TYPE_STRING: os << Data.GetString() << "\n"; break; case SI_TYPE_OBJECT: Obj = Data.GetSemanticObject(); for (i = 0; i < Obj.NumberOfProperties(); ++i) { os <<"<property name=" << Obj.PropertyName(i) << ">\n"; os << Obj.PropertyValue(i); os << "</property>\n"; } break; case SI_TYPE_ARRAY: for (i = 0; i < Data.GetSemanticArray().Size(); ++i) { os << "<element>\n"; os << Data.GetArray().At(i); os << "</element>\n";


32

} break; } return os; } //============================================================================================== // code to plug LVInterpretation into any standard stream std::ostream& operator << (std::ostream& os, const LVInterpretation& Interp) { os << "<interpretation grammar=\""<<Interp.GrammarLabel() <<"\" score=\""<<Interp.Score()<<"\">"<<std::endl; os << "<result name=\""<<Interp.ResultName()<<"\">"<<std::endl; os << Interp.ResultData(); os << "</result>"<<std::endl; os << "<input>"<<std::endl; os << Interp.InputSentence()<<std::endl; os << "</input>"<<std::endl; os << "</interpretation>"; return os; } //============================================================================================== void SimpleRecognizer::WaitUntilDone() { while (!finished_decode) Sleep(50); } //============================================================================================== SimpleRecognizer::SimpleRecognizer() : voiceChannel(1), finished_decode(true), AudioThread(NULL) { LVSpeechPort::RegisterAppLogMsg(logger,NULL,6); int v = port.OpenPort(logger,NULL,6); if (v != LV_SUCCESS) { std::cout << LVSpeechPort::ReturnErrorString(port.GetOpenPortStatus()) << std::endl; exit(-1); } // Turn on frequency based voice activity detector port.StreamSetParameter(STREAM_PARM_USE_FREQ_VAD,1); port.StreamSetParameter(STREAM_PARM_DETECT_BARGE_IN, 1); port.StreamSetParameter(STREAM_PARM_DETECT_END_OF_SPEECH, 1); port.StreamSetParameter(STREAM_PARM_VOICE_CHANNEL, voiceChannel); port.StreamSetParameter(STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET); //Let the port handle the decode process port.StreamSetParameter(STREAM_PARM_AUTO_DECODE, 1);

Programmers Guide

33

//and use semantic interpretation processor port.StreamSetParameter(STREAM_PARM_DECODE_FLAGS, LV_DECODE_SEMANTIC_INTERPRETATION); port.StreamSetStateChangeCallBack(PortCB, this); } //============================================================================================== SimpleRecognizer::~SimpleRecognizer() { port.ClosePort(); } //============================================================================================== void SimpleRecognizer::PortCB(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData) { SimpleRecognizer* self = (SimpleRecognizer*)UserData; switch (NewState) { case STREAM_STATUS_END_SPEECH: if (!self->finished_decode) { self->AudioThread->StopStream(); self->GetAnswers(); self->finished_decode = true; } break; case STREAM_STATUS_STOPPED: if (!self->finished_decode) { self->AudioThread->StopStream(); self->GetAnswers(); self->finished_decode = true; } break; case STREAM_STATUS_NOT_READY: break; case STREAM_STATUS_READY: self->finished_decode = false; self->AudioThread->StartStream(AudioCB,self); break; } } //============================================================================================== void SimpleRecognizer::LoadGrammar(const std::string& grammar_name, const std::string& grammar_location) {


34

port.LoadGrammar(grammar_name.c_str(), grammar_location.c_str()); } //============================================================================================== bool SimpleRecognizer::AudioCB(char* audio_data, int audio_data_size, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; self->port.StreamSendData(audio_data,audio_data_size); return true; } //============================================================================================== void SimpleRecognizer::Recognize(AudioStreamer* Audio, const std::string& grammar_name) { finished_decode = false; AudioThread = Audio; port.DeactivateGrammars();//clear out old grammars. port.ActivateGrammar(grammar_name.c_str()); port.AddEvent(EVENT_START_DECODE_SEQ); port.StreamSetParameter(STREAM_PARM_SOUND_FORMAT,ULAW_8KHZ); port.StreamStart(); } //============================================================================================== void SimpleRecognizer::GetAnswers() { int val; val = port.WaitForEngineToIdle(3000,voiceChannel); if (val < 0) { result = "<noanswer/>"; return; } //view the results of the decode: std::stringstream ss; int numInterp = port.GetNumberOfInterpretations(voiceChannel); for (int t = 0; t < numInterp; ++t) { ss << port.GetInterpretation(voiceChannel,t); } result = ss.str(); } //============================================================================================== const std::string& SimpleRecognizer::GetResult() {return result;}

Programmers Guide

35

//==============================================================================================


36

AudioStreamer.h

#include "HeaderClasses.h" #ifndef AUDIO_STREAMER_H #define AUDIO_STREAMER_H typedef bool (*AudioStreamCB)(char* audio_chunk, int chunk_size, void* user_data); /** class AudioStreamer Mimics live audio being streamed. It reads audio a bit at a time from a file, periodically calling a user provided callback function to transmit the audio. It stops transmitting audio when the user callback function returns false. If it reaches the end of file before the callback tells it to stop, then it just sends silence. The audio is assumed to be a headerless u-Law audio file at 8Khz **/ class AudioStreamer : Demo::Thread { public: AudioStreamer(const char* filename); void StartStream(AudioStreamCB _cb, void* _user_data); void StopStream(); ~AudioStreamer(); private: char* audio_buffer; char* end_buffer; int audio_buffer_size; int increment_ms; AudioStreamCB cb; void* user_data; virtual void ThreadAction(); }; #endif//AUDIO_STREAMER_H

Programmers Guide

37

AudioStreamer.cpp

#include "AudioStreamer.h" #include <stdio.h> #include <fcntl.h> #include <io.h> //================================================================================== AudioStreamer::AudioStreamer(const char* filename): increment_ms(300), audio_buffer_size(0), audio_buffer(NULL) { int audio_handle = _open(filename, _O_BINARY | _O_RDONLY); if (audio_handle <= 0) { printf("could not open audio file %s\n",filename); exit(-1); } audio_buffer_size = _lseek(audio_handle,0L,SEEK_END); _close(audio_handle); audio_handle = _open(filename, _O_BINARY | _O_RDONLY); audio_buffer = new char[audio_buffer_size]; _read(audio_handle, audio_buffer, audio_buffer_size); _close(audio_handle); } //================================================================================== AudioStreamer::~AudioStreamer() { ThreadStop(); delete[] audio_buffer; } //================================================================================== void AudioStreamer::StartStream(AudioStreamCB CB, void* UserData) { cb = CB; user_data = UserData; ThreadActivate(); ThreadStart(); printf("audio stream started\n"); } //================================================================================== void AudioStreamer::StopStream()


38

{ ThreadStop(); printf("audio stream stopped\n"); } //================================================================================== void AudioStreamer::ThreadAction() { printf("audio thread working\n"); int chunk_size; int end_chunk_size; char* current_pos = audio_buffer; bool feed_more = true; chunk_size = 8000*1*increment_ms/1000; end_chunk_size=chunk_size; end_buffer = new char[end_chunk_size]; memset(end_buffer,0,end_chunk_size); while(current_pos != audio_buffer + audio_buffer_size && feed_more && !IsThreadShuttingDown()) { if(current_pos + chunk_size > audio_buffer + audio_buffer_size) { chunk_size = (audio_buffer+audio_buffer_size) - current_pos; } feed_more = cb(current_pos,chunk_size,user_data); current_pos += chunk_size; printf("sending audio\n"); Sleep(increment_ms); } while(feed_more && !IsThreadShuttingDown()) { feed_more = cb(end_buffer,end_chunk_size,user_data); Sleep(increment_ms); printf("sending dead air\n"); } printf("audio thread told to shut down\n"); delete[] end_buffer; } //==================================================================================

Programmers Guide

39

HeaderClasses.h

#ifndef HEADER_ONLY_HELPER_CLASSES_DEFINED #define HEADER_ONLY_HELPER_CLASSES_DEFINED #include <string> #include <process.h> #include <time.h> #include <sys/types.h> #include <sys/timeb.h> #include <Windows.h> #undef GetObject namespace Demo { //critical section wrapper class CS { public: CS(): m_busy(false) { InitializeCriticalSection( &m_cs ); } virtual ~CS() { DeleteCriticalSection( &m_cs ); } bool IsBusy() const { return m_busy; } //only valid at time of call void Enter() { EnterCriticalSection( &m_cs ); m_busy = true; } void Leave() { // Be careful, linux allows other non-owner of cs to unlock m_busy = false; LeaveCriticalSection( &m_cs ); } bool Try() { if (m_busy) return false; Enter(); return true; }


40

private: volatile bool m_busy; CRITICAL_SECTION m_cs; }; //simple way to lock critical section (releases in destructor) class CSLock { public: CSLock(CS& cs) { m_localCs = &cs; m_localCs->Enter(); } virtual ~CSLock() { m_localCs->Leave(); } private: CS* m_localCs; }; //simple windows event wrapper class Event { public: Event() { m_event = CreateEvent(NULL, false, false, NULL); } virtual ~Event() { CloseHandle( m_event ); } bool Wait(unsigned int timeout = INFINITE) { return WaitForSingleObject( m_event, timeout ) != WAIT_TIMEOUT; } bool Reset() { return ResetEvent( m_event ) != 0; } bool Signal() { return SetEvent( m_event ) != 0; } bool Try() { return Wait(0);

Programmers Guide

41

} private: HANDLE m_event; }; //a thread class. Have your class derive from this one, override the Thread() function. class Thread { bool Running; bool ShuttingDown; bool InUserThread; HANDLE hThread; unsigned int thrdaddr; CS CS; Event Event; public: Thread() { Running = false; ShuttingDown = true; InUserThread = false; } virtual ~Thread(){ ThreadStop(); } virtual void ThreadAction() = 0; //derive and override the ThreadAction function bool ThreadActivate() { CSLock L(CS); if (Running) return false; ShuttingDown = false; Running = true; InUserThread = false; hThread = (HANDLE) _beginthreadex(NULL, 0, CallBackThread ,(LPVOID) this, 0, &thrdaddr); return true; } bool ThreadStart() { CSLock L(CS); if (!Running || ShuttingDown || InUserThread) return false; Event.Signal(); return true; } bool ThreadStop(unsigned long WaitTime = 1000) { { CSLock L(CS); if (!Running) return false;


42

ShuttingDown = true; Event.Signal(); } if (WaitForSingleObject(hThread, WaitTime) == WAIT_TIMEOUT) TerminateThread(hThread,0); Sleep(50); thrdaddr = 0; Running = false; return true; } bool IsThreadRunning(){CSLock L(CS);return Running;}; bool IsThreadShuttingDown(){CSLock L(CS);return ShuttingDown;}; private: static unsigned int __stdcall CallBackThread(void* p) { ((Thread*)p)->InternalThread(); return 0; } void InternalThread() { while (!ShuttingDown) { if (Event.Wait(2000)) { { CSLock L(CS); InUserThread = true; } ThreadAction(); { CSLock L(CS); InUserThread = false; } } } { CSLock L(CS); Running = false; } } }; }//namespace Demo #endif

Programmers Guide

43

SRGS Grammars

A Simple Grammar

We will begin our look at writing SRGS grammars with a simple grammar that lets the engine recognize the words "yes" or "no". Yes or no grammars are the "hello world" of grammar writing.

Example

#ABNF 1.0; language en-US; //use the American English pronunciation dictionary. mode voice; //the input for this grammar will be spoken words (as opposed to DTMF)

root $yesorno;

$yes = yes; $no = no; $yesorno = $yes | $no;

This grammar contains most of the elements of any grammar you will write. Let's take it apart.

The Grammar Identifier

Any SRGS grammar written in ABNF notation must begin with the line

#ABNF 1.0;

With no additional characters. This identifies to the LumenVox grammar compiler that the file being read is an ABNF grammar, as opposed to an SRGS XML grammar, or other future supported grammar formats.

The Grammar Header

Following the identifier, a well formed grammar will contain information about the language the grammar is written in, the expected interaction mode, and the name of a rule where the engine will begin its search (the root rule). In addition, the header may contain one or more tags, and an identifier describing the tag format for this grammar. Tags will be discussed later in this tutorial.


44

The contents of the grammar header may be in any order, but no header data may occur in the file after the first rule is written.

Comments

ABNF grammars may contain comments anywhere in their body (with the exception of the first line, containing the grammar identifier). The comment format is the same one used by the C, C++, and Java programming languages.

Rules

The rules of a grammar specify what word combinations the engine may recognize. They are the heart of the grammar. Each rule has a name, appearing on the left hand side of an "=" sign, and a rule expansion, appearing on the right hand side.

The rule name starts with a "$", then a letter followed by additional letters, numbers, or underscore characters.

The rule expansion describes to the engine what sequences of words will allow a rule to be matched. An entire grammar is matched if its root rule is matched.

The first rule in the above grammar is matched if the engine detects the word "yes" being spoken. The second rule is matched if the word "no" is detected. The third rule contains a "|" symbol, which is a logical "or" operator. So the third rule is matched if the $yes or $no rules are matched.

Most of the rest of this tutorial will be concerned with writing more and more expressive rule expansions.

How the Speech Engine Uses a Grammar

When the engine begins decoding your audio, it starts at the root rule of the provided grammar, in this case the rule $yesorno. It then steps through all legal expansions, looking for the first words it's allowed to listen for. It moves into the rules $yes and $no, since it's allowed to match against either rule. Since the first words in the rules $yes and $no are "yes" and "no", the engine knows that it is allowed to recognize either word.

If the engine detects "yes" as a possibility, it then looks for the next word it can recognize in the $yes rule. Since there are no more words in the $yes rule, the rule is matched. And since the $yes rule is matched, the $yesorno root rule is matched, so the entire grammar is matched.

Programmers Guide

45

Next Rule Expansions


46

Rule Expansions by Example

Rule expansions are built by combining together small phrases with a number of grammar operations. The operations are

Operation Example Description

Alternatives $rule = $A | $B;

match A or B

Optional Expansion

$rule = $A [$B];

match A possibly followed by B

Repetition $rule = $A <7>;

match A 7 times

Rule Alternatives

As we saw in the previous "yes no" grammar, the SRE can be told to accept one or more possibilities by using the rule alternative operator "|".

Example

$toppings = pepperoni | sausage | green peppers;

The above rule is matched by the phrases "pepperoni", "sausage", or "green peppers".

Note that the rule alternative operator is greedy. It collects "peppers" with "green" to form the alternative "green peppers". If you wish to scope the effects of the rule alternative operator, you can use parentheses.

Example

$pizza = (pepperoni | sausage) pizza;

This rule matches "pepperoni pizza" or "sausage pizza". Without the parentheses, it would match "pepperoni" or "sausage pizza".

Programmers Guide

47

Optional Expansion

If you wish to make a portion of a rule expansion optional, you can wrap that portion of the expansion in the optional operator "[ ]"

Example

$yes = yes [please];

This rule matches "yes" or "yes please".

Any of the SRGS operators may be wrapped inside each other, or used in sequence, to create more and more expressive sentences.

Example

$yes = yes [please | thank you];

This rule matches "yes", "yes please", or "yes thank you".

Repetition

If you wish to allow a portion of a rule expansion to be repeated a number of times, you can use the repeat operator "< >". The repeat operator can be used to specify a fixed number of repetitions, or a range of repetitions.

Example

$digit = one | two | three | four | five | six | seven | eight | nine | zero; $seven_digits = $digit <7>; $seven_to_ten_digits = $digit <7-10>; $one_or_more_digit = $digit <1->;

The $seven_digits rule allows any seven digit combination to be recognized. The $seven_to_ten_digits rule allows any seven to ten digit combination to be recognized. The $one_or_more_digit rule allows one or more digits to be recognized.

The repeat operator is tightly binding; it only applies to whatever immediately precedes it. Use parentheses to control how much of a rule expansion it applies to.

Example


48

$oh_boy1 = oh boy <3>; $oh_boy2 = (oh boy)<3>;

The rule $oh_boy1 matches "oh boy boy boy". $oh_boy2 matches "oh boy oh boy oh boy";

Next Rule References

Programmers Guide

49

Rule References

You can reference grammar rules inside rule expansions, as we have already seen. You can also reference external grammar files--or rules within external files -- to create more complex grammars, and re-use existing grammar solutions. As an example, suppose you had a simple phone number grammar in a remote location that looked like this:

http://www.mycompany.com/phone_number.gram

#ABNF 1.0; language en-US; mode voice; root $phone_number;

$phone_number = [$area_code] $number;

$digit = one | two | three | four | five | six | seven | eight | nine | zero; $area_code = [one | area code] $digit<3>; $number = $digit<7>;

You can use this grammar in another grammar by using its location as a rulename.

#ABNF 1.0; language en-US; mode voice; root $main;

$main = (my | the) [phone] number is $<http://www.mycompany.com/phone_number.gram>;

The above grammar is using the root rule of the phone_number grammar in its $main rule. You can reference grammar files using http, ftp, or your operating systems local or network file descriptors. When writing grammars that utilize external grammar files, it's usually a good idea to specify a base URI in your grammar header.

To use a single rule in an external grammar, append the grammar name with the "#" symbol.

Example


50


$main = (my | the) area code is $<http://www.mycompany.com/phone_number.gram#area_code>;

In addition to referencing external grammar files, you can also reference any of the LumenVox built-in grammars.

Example


$main = (my | the) [phone] number is $<builtin:grammar/phone>;

Next Special Rules

Programmers Guide

51

Special Rules

In addition to the rules you create, there are several reserved rules that dictate special behaviour for the Speech Engine. These rules are

$NULL

$VOID

$GARBAGE

NULL

The $NULL rule is automatically matched as soon as it is seen. Users rarely need to use the $NULL rule, but it can be useful when creating grammars programmatically. The $NULL rule is illustrated below with standard grammar operations rewritten to use the $NULL rule.

Example 1

$yes = $yes [please];

/* Identical rule expansion using the $NULL rule */ $yes = $yes (please | $NULL);

Example 2

$oh_boy = (oh boy)<0->;

/* Identical rule expansion using the $NULL rule */ $oh_boy = oh boy $oh_boy | $NULL;

VOID

The $VOID rule invalidates any rule that contains it, and hence any answer that contains it.

Example

#ABNF 1.0; language en-US; mode voice;


52

root $yesorno;

$yes = yes [please]; $no = no $VOID;

If the engine recognizes the word no being spoken with the above grammar, it will invalidate the answer, and the engine will return with no answer.

GARBAGE

The $GARBAGE rule engages the out-of-vocabulary filter of the engine, allowing it to listen for arbitrary phonetic sequences until it hears the next matching word in the grammar. The garbage that was matched will not be returned by the engine.

Example

#ABNF 1.0; language en-US; mode voice;

root $yesorno;

$yes = yes [please]; $no = no $GARBAGE;

The above grammar could allow the user to say "no", "no thank you", or "no you stupid machine" (Though we've never heard anyone say that last one).

When using the $GARGAGE rule, keep in mind that engaging the out-of-vocabulary filter can slow down recognition times, and even cause additional mis-recognitions if used too aggressively. We recommend creating specific "filler" models using grammar rules that match frequently occurring out-of-vocabulary words instead of using the $GARBAGE rule, if possible.

Next Tags

Programmers Guide

53

Tags

Tags are special grammar tokens that can contain any information you wish to put in them. Tags are completely ignored when the engine uses your grammar. Any time the engine sees a tag in a rule, it skips right over it. But what makes tags useful, is that when the engine returns the results of a decode, it returns the tags it saw -- in the order it saw them -- along with the words and rules it recognized. This makes tags an good way to store post-processing information.

Example

#ABNF 1.0; language en-US; mode voice; root $yesorno;

$yes = yes [please] {!{ returnvalue: true }!}; // This is a tag $no = no [way | thank you] { returnvalue: false }; // Another tag $yesorno = $yes | $no;

To understand how you might use tags, we need to examine the form of an engine decode response.

Example

#ABNF 1.0; language en-US; mode voice; root $navigate;

$direction = forward | back | backward | left | right; $number = one | two | three | four | five;

$navigate = ( go | move | walk | step) $direction $number (steps | paces | units);

With the above grammar, if the engine recognizes "walk forward three paces", it will return a parse tree, or sentence diagram, that looks like this:

$navigate: "walk" $direction: "forward"


54

$number: "three" "paces"

You can read more about the parse tree return type here.

In order to convert the parse tree return type into data useful to your application, You need to walk the tree and convert it into a result your application expects. For instance, your application might expect a result that looks like this:

instruction:[ direction: 1, units: 3, ]

While it is certainly possible to make the conversion, there are disadvantages to interpreting the parse tree directly to do so. One disadvantage is that your application becomes directly dependant on knowing the structure of your grammar. If the form of your grammar changes, your application code will have to change as well. Another disadvantage is that if your application uses multiple grammars (as most do), then you will most likely have to have a different set of parse tree processing code for each of your

Instead of manipulating the parse tree directly, you can put the conversion process in your grammar using tags. To do so, you adopt a consistent format for your tags, and a uniform way of processing your tags + parse tree. Then the shape of your grammar does not matter, as long as you process your tags and parse tree in the same way each time.

For this example we will adopt a very simple method for post-processing: we will walk the tree, ignoring anything that is not a tag. We will treat the tags as string data, and concatenate the strings as we see them in the parse tree.

Example

#ABNF 1.0; language en-US; mode voice; root $navigate; tag-format <my_simple_tag_format>;

$direction = { direction: }( forward { 1, } | back { 2, } | backward { 2, } | left { 3, } | right { 4, } );

Programmers Guide

55

$number = { units: } ( one { 1, } | two { 2, } | three { 3, } | four { 4, } | five { 5, } );

$navigate = { instruction:[ } ( go | move | walk | step) $direction $number (steps | paces | units) { ] };

Now, with the above grammar, when the engine recognizes "walk forward three paces", the parse tree returned will look like:

$navigate: {!{ instruction:[ }!} "walk" $direction: {!{ direction: }!} "forward" {!{ 1, }!} $number: {!{ units: }!} "three" {!{ 3, }!} "paces" {!{ ] }!}

And when we concatenate the tags we get the result type our application expects.

Admittedly, this is a very naive tag processing scheme, and as a result it requires a hefty number of tags to accomplish a simple task, but it does achieve the goal we want of processing our tree in a way that is independent of the form of the grammar. As a result, if ever the form of the grammar needs to change, the tags in the grammar can change, too, and the application code can stay the same.

The LumenVox API provides a much more powerful post-processing scheme based on the Semantic Interpretation for Speech Recognition working draft . It is described in detail here.

Next Applying grammar weights.


56

Applying Grammar Weights

Ultimately, the engine is just a large probability machine. Inside the engine there are huge tables that store probability scores for phonemes and the sounds the sounds those phonemes are likely to generate when a person speaks. When the engine decodes audio input, it searches through these tables to find the most likely path through a sequence of phonemes given the audio input. Your SRGS grammars have the ability to modify these scores by providing grammar weights.

As an example, suppose we have a grammar that recognizes a person speaking a number that is four digits long.

#ABNF 1.0; language en-US; mode voice; root $number; $one_digit = zero | one | two | three | four | five | six | seven | eight | nine; $teens = ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen; $above_twenty = (twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety)[$one_digit]; $double_digit = $teens | $above_twenty;

$single_digits = $one_digit<4>; //one two three four $double_digits = $double_digit<2>; //twelve thirty four $single_double = $one_digit<2> $double_digit; //one two thirty four $double_single = $double_digit $single_digit<2>; //twelve three four

$number = $single_digits | $double_digits | $single_double | $double_single;

This is a flexible grammar, but if you used it in practice you might be disappointed. You might notice that too often words like "four three" are being misrecognized as "forty". In general, your callers may be speaking a sentence that matches $single_digits 95% of the time, but the engine too frequently returns a result that matches one of the other three rules.

You can help the engine get the right answer more frequently by predisposing it to choose the $single_digits rule. Here is the same grammar with grammar weights applied.

Programmers Guide

57

#ABNF 1.0; language en-US; mode voice; root $number; $one_digit = zero | one | two | three | four | five | six | seven | eight | nine; $teens = ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen; $above_twenty = (twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety)[$one_digit]; $double_digit = $teens | $above_twenty;

$single_digits = $one_digit<4>; //one two three four $double_digits = $double_digit<2>; //twelve thirty four $single_double = $one_digit<2> $double_digit; //one two thirty four $double_single = $double_digit $single_digit<2>; //twelve three four

// $single_digits has a 95% chance of being the right rule to match. // The other rules combine to take up the remaining 5%. $number = /0.95/ $single_digits | /0.05/ ($double_digits | $single_double | $double_single); /********************************************************** * you could also write the weights as * /95/ $single_digits | /5/($double_digits | $single_double | $double_single); * or * /19/ $single_digits | $double_digits | $single_double | $double_single; **********************************************************/

Now, in cases where the engine has a borderline decision to make between matching $single_digits or one of the others, it will more frequently choose $single_digits. We weighted the rules 95% to 5% only because we had records of our callers to back up the decision.

Do Not Apply Weights Without Data

Applying grammar weights should never be the first thing you do to your grammar. Initially, you don't really know how often each rule will be matched, so you are better off letting all rules be treated equally. Only after you have a compelling amount of data to suggest that applying grammar weights will help, as we did above, should you apply them. And after you do apply them, you must test their effects on real call data. Badly applied weights are worse than no weights at all.


58

SRGS Definitions

Interaction Mode

An interaction mode specifies the type of interaction the speech port is to having with a user. An interaction mode can be voice or DTMF.

In a grammar, you specify whether the grammar will be used in a DTMF interaction, or a voice interaction. When grammars are activated in a speech port, only the voice grammars get used to decode speech, and only the DTMF grammars get used to process a DTMF string.

To specify the interaction mode in a grammar, use the following syntax:

ABNF

mode voice; or mode dtmf;

XML

<grammar mode="voice" ...> or <grammar mode="dtmf" ...>

Programmers Guide

59

Tag Format

In an SRGS grammar, you may place pieces of data called tags anywhere in a grammar rule. When a rule is matched, the tag is returned to the user in a parse tree, along with the words spoken that caused the rule to match.

A common use for tags is to transform a speakers sentence into data that your application can understand. The LumenVox speech port is capable of manipulating the tags in your parse tree, if they are in a form known as the Semantic Interpretation for Speech Recognition (SISR) tag format. Examples of this tag format can be found in this help file here.

To do any kind of interpretation, you must specify the format your tags are in.

Within the speech port, the following tag format specifiers are acceptable. Currently, both formats tell the engine to perform the same interpretation process, but as other interpretation schemes are adopted, or interpretation schemes are modified, the tag format specifier you decide on will become more important.

semantics/1.0 Use the latest working draft of the SISR, as of this help file's publication.

lumenvox/1.0 Use the working draft of the SISR published on April 1 2003.

lumenvox/1.1 Use the next working draft of the SISR (since this next draft does not exist, this tag format does nothing -- its for example only).

If the tag format of your grammar does not match one of these specifiers, the speech port will not attempt to interpret your tags. You can still use the tag data in the Parse Tree to perform your own interpretation.

To specify the format of the tags in a grammar, use the following syntax:

ABNF

tag-format <lumenvox/1.0>;

XML


60

<grammar tag-format="semantics/1.0" ...>

Programmers Guide

61

Language Identifier

A language identifier specifies the language being spoken to the speech port.

The format of the language identifier follows the convention set out by RFC 3066. In a nutshell, the identifier is either a language and country pair -- like "en-US" for United States English, or its just a language descriptor -- like "fr" for generic French.

Within the speech port, the following language identifiers are acceptable:

"en-US" or "en"

Use the LumenVox AmericanEnglish acoustic models and dictionary

"fr-CA" or "fr" Use the LumenVox French acoustic models and dictionary

"es-MX" or "es"

Use the LumenVox Spanish acoustic models and dictionary

To specify the interaction mode in a grammar, use the following syntax in your grammar:

ABNF

language en-US;

XML

<grammar language="en-US" ... >

62

Tags

Tags are special tokens in a grammar that are automatically recognized whenever they are seen by the Speech Engine. They are usually filled with information useful to the author of the grammar, or to an application using a grammar. Tags may appear in the header or the body of a grammar. When the engine recognizes a rule containing a tag, it returns the tag information along with the rule.

Filling tags with snippets of JavaScript is the basis of the semantic information process.

ABNF

{!{ tag information }!}; //this is a header tag. //Its contents will be returned if the grammar is matched. $rule = some text {!{ tag information }!} more text; //this is a tag declared in a rule.

XML

 <tag> tag information </tag> <rule id="rule"> some text  <tag> tag information </tag> more text </rule>

Programmers Guide

63

Base URI

Declaring a base URI in a grammar tells the grammar how to resolve relative path names in the grammar. If no base URI is present, they will be resolved from the location of the grammar file. Grammars loaded by buffer should have a base URI if they contain relative path names. Grammars may have multiple base paths, and they are searched in the order provided.

ABNF

base <http://www.mycompany.com/grammars>; base <http://www.mycompany.com/more_grammars>;

XML

<grammar xml:base="http://www.mycompany.com/grammars" xml:base="http://www.mycompany.com/more_grammars" ... >


64

Built-in Grammars

LumenVox provides the built-in grammars expected by VoiceXML users. All of them provide the required output format

URI Sample Input Output

builtin:grammar/boolean "yes", "no thank you", etc.

"true" or "false"

builtin:grammar/date "january thirteenth" or "december first two thousand"

"????0113" or 20001201"

builtin:grammar/digits "one two three four" "1234"

builtin:grammar/currency "eighteen dollars and four cents"

"USD18.04"

builtin:grammar/number "four hundred point five"

"400.5"

builtin:grammar/phone "area code eight five eight seven oh seven oh seven oh seven"

"8587070707"

builtin:grammar/time "six o clock" or "five thirty p m"

"0600?" or "0530p"

Programmers Guide

65

Example Grammars

phone_number.gram

#ABNF 1.0; mode voice; language en-US; tag-format <lumenvox/1.0>; // The lumenvox tag format tracks the current working draft of // the W3Cs semantic interpretation proposal. // 1.0 corresponds to the working draft released on 01 April 2003

root $PhoneNumber;

/* ONE:"1" is shorthand for * ONE {!{ $="1" }!} * "$" refers to the current rule being matched ($Digit) * So the net effect is that $Digit resolves to a one digit string * after semantic interpretation. */ $Digit = (ONE:"1" | TWO:"2" | THREE:"3" | FOUR:"4" | FIVE:"5" | SIX:"6" | SEVEN:"7" | EIGHT:"8" | NINE:"9" | (ZERO | O):"0" ); /* $AreaCode resolves to a three digit string * after semantic interpretation. */ $AreaCode = { $ = "" } ( $Digit { $ += $Digit } ) <3>; /* $Number resolves to a seven digit string * after semantic interpretation. */ $Number = { $ = "" } ( $Digit { $ += $$ } ) <7>; //$$ is shorthand for the last rule detected //i.e. $Digit /* After semantic interpretation, * $PhoneNumber resolves to a structure with two member variable strings, * areacode (which defaults to "858"), and number. */ $PhoneNumber = ([AREA CODE | ONE] $AreaCode { $.areacode = $$ } $Number { $.number = $$ } ) | ( $Number ) { $.areacode = "858"; $.number = $$ };


66

Programmers Guide

67

top_level_navigation.gram

#ABNF 1.0; mode voice; language en-US; tag-format <lumenvox/1.0>; // The lumenvox tag format tracks the current working draft of // the W3Cs semantic interpretation proposal. // 1.0 corresponds to the working draft released on 01 April 2003 root $directive; $directive = (go back) {$ = "APPLICATION_BACK"} | (main menu) {$ = "APPLICATION_TOP"} | (goodbye | quit | exit) {$ = "APPLICATION_EXIT"};


68

Semantic Interpretation

Intro to Semantic Interpretation

When constructing an application using speech recognition, it is often not enough to know what the user said. You have to know what the user meant. In fact, often you don't care whether you heard the user correctly, as long as you got the meaning right. In the speech recognition world, semantic interpretation refers to the process of extracting meaning from what was spoken.

Creating a grammar and examining the parse tree that was generated by a user's speech input is the first step toward semantic interpretation. But sometimes, it is not enough to just read off the values of the tree; significant post processing of the tree is necessary to extract meaning.

As an example, here is an SRGS/ABNF grammar that matches speaking numbers from zero to nine hundred and ninety nine (it is by no means complete; for instance, it cannot recognize "two forty six" for 246):

#ABNF 1.0; language en-US; mode voice; root $small_number;

$base = one|two|three|four|five|six|seven|eight|nine; $teen = ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen; $twenty_to_ninetynine = (twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety)[$base];

$tens = $base|$teen|$twenty_to_ninetynine;

$hundred = ([a] hundred | $base hundred);

$small_number = $hundred [[and] $tens] | $tens;

If the engine recognizes "two hundred twelve", Then the parse tree looks like this:

$small_number: $hundred: $base: "TWO" "HUNDRED" $tens:

Programmers Guide

69

$teen: "TWELVE"

But if your application needs to find out if the speaker spoke a number larger than 500, then it's not enough to know the parse tree; all you have is a structure of words. You need to write code to transform the tree into the number 212, which is meaningful to your application. The logic to do this transformation is going to be tied closely to the grammar's rules. For instance, within the $hundred rule, you have to know that there is an optional $base rule that has to be multiplied by 100. But in the $twenty_to_ninetynine rule, the optional $base has to be added to the total of the number you are building.

Because of the close relationship between a grammar's rules, and the semantic interpretation process, it can be convenient if you can put the semantic interpretation directly into the grammar. This is where grammar tags come into play.

The LumenVox semantic interpretation scheme is an implementation of the W3C's Semantic Interpretation working draft . The W3C will likely make changes to the draft before approving it, and LumenVox will track those changes, while maintaining backward compatibility.

The basic idea behind the LumenVox semantic interpretation scheme is this:

1. Each tag contains snippets of ECMAScript code (still popularly known as JavaScript).

2. Each grammar rule can be thought of as a function that executes the ECMAScript code in its tags from left to right, and returns a value based on that executed code.

3. Any other rules that are referenced in a grammar rule are also executed left to right, and any tag that appears after a rule reference may use that rules return value.

4. Grammar rules are only executed if the recognizer detects something to match the rule.

There are other facets to master, but understanding these four concepts will help you with everything else.

Next Semantic Interpretation by Example


70

Semantic Interpretation by Example

The details of semantic interpretation will be discussed through example, by editing the numbers grammar from the introduction.

Literals

If you do not need to process any code to provide a return value for a rule, you can just attach a literal to the rule, as follows:

$foo = ($reference1 $reference2 some text):"bar";

Now, when the rule $foo is referenced, it will return the value "bar". Note: If no tags or literals exist in a grammar rule, the rule will just return text corresponding to the spoken words that matched the rule.

Literals can also be attached to individual words or phrases, as in this example:

$base = one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9";

$teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19";

Now $base and $teen return a numeric representation of the word that matched them. Note: Since a literal is the return value of a grammar rule, only one can be returned per rule. Since we have only one literal per rule alternative, this is no problem.

The Return Value

The return value of a grammar rule is an ECMAScript object named "$". You can build the return value up by writing code in tags that manipulates this symbol. For instance, our $foo rule above is equivalent to writing

$foo = ($reference1 $reference2 some text) { $ = "bar" };

This more meaningful example allows the $twenty_to_ninetynine rule to return a numeric representation of the words it matches.

Programmers Guide

71

$twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"|seventy:"70"| eighty:"80"|ninety:"90")[$base {$ = parseInt($) + parseInt($base)];

In this example, first the return value $ is set to "20" or "30" or "40", etc. Then, if the optional $base rule is matched, its value is added to $. Notice the use of the JavaScript operator parseInt. This is used because literals are always strings, so without parseInt, the addition above would resolve to string concatenation. Since it can be confusing to have a rule that sometimes returns a number, and other times returns a string, we will use parseInt in all of our rules:

$base = (one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9") { $ = parseInt($) };

$teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19" { $ = parseInt($) };

$twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"|seventy:"70"| eighty:"80"|ninety:"90"){ $ = parseInt($) } [$base { $ += $base }];

The "$$" object

So far we have seen that a rule's return can be referenced by its name after that rule has been matched. Sometimes, when there are lots of rule alternatives in a rule, it can be cumbersome to reference rules by name. Other times, a matched rule can't be referenced at all. For instance, you can never access an external rule reference by name in a tag, because its name is not a valid ECMAScript identifier. For these reasons, the "$$" object exists. The "$$" object is always equal to the last rule matched. Using the "$$" object, we can write the $tens, $hundred and $small_number rules like this:

$tens = ( $base | $teen | $twenty_to_ninetynine ) { $ = $$ };

$hundred = [a] hundred {$ = 100} | $base hundred {$ = 100 * $$} ;

$small_number = $hundred {$ = $$} [[and] $tens {$ += $$}] | $tens {$ = $$};


72

Composite return types

Our small numbers grammar now returns an integer named small_number. If that is all we want out of this grammar, then great. Sometimes, however, we want more than one piece of information for a return type. A grammar rule always returns an object type, and object types can have additional properties. Lets say in our grammar we also want to know the text that was spoken, possibly for transcription or reading the text back to the speaker. Each rule reference $foo also has a corresponding data structure called $foo$ (yes, the W3C working group is aware that they are seriously overworking the dollar symbol), with a property called "text". Also, the text of $$ can be referenced using $$$.text.

The following change to our grammar creates a composite return type containing the text that was spoken, and the numeric representation of that text.

root $small_number_and_text;

$small_number_and_text = $small_number { $.number = $$; $.text = $$$.text }; //Note: use semi-colons to separate ECMAScript commands within tags.

Now a successful grammar match returns an object with two member properties, number and text. Here is the grammar in one place:

#ABNF 1.0; language en-US; mode voice; tag-format <lumenvox/1.0>; //This line tells the engine how to interpret the grammar's tags. //currently, only "lumenvox/1.0" or "semantics/1.0" is supported. root $small_number_and_text;

$base = (one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9") { $ = parseInt($) };

$teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19" { $ = parseInt($) };

$twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"|seventy:"70"|

Programmers Guide

73

eighty:"80"|ninety:"90"){ $ = parseInt($) } [$base { $ += $base }];

$tens = ($base|$teen|$twenty_to_ninetynine) { $ = $$ };

$hundred = ([a] hundred {$ = 100} | $base hundred {$ = 100 * $base});

$small_number = $hundred {$ = $$} [[and] $tens {$ += $$}] | $tens { $ = $$ };

$small_number_and_text = $small_number { $.number = $$; $.text = $$$.text };

Next Getting the Return Value


74

Getting The Return Value

So far we have described how to use grammar tags to create a semantic interpretation result. So how do you access that result to use in your application?

LumenVox provides an XML fragment representation of the return type. This conforms to the W3C's proposal for generating XML from semantic interpretation results (except that do not enclose the XML in a top-level tag). LumenVox also provides an API for accessing the return value as a data structure.

Under the XML scheme, if the engine recognized "four hundred and six" using our example grammar, then the result would look like:

<number> 406 </number> <text> FOUR HUNDRED AND SIX </text>

To access the return value of semantic interpretation scheme you must do the following:

1. Set the LV_DECODE_SEMANTIC_INTERPRETATION flag in your decode function call.

2. After decode, get the number of different interpretations that exist using GetNumberOfInterpretations (usually there will only be one, but an ambiguous grammar might return more than one).

3. For each result, get the interpretation result by calling GetInterpretation.

Programmers Guide

75

Phonemes

The unit of sound the recognition engine actually recognizes are phonemes. All phrase formats are ultimately translated into phonetic spelling for decoding. These phonetic spellings can be directly entered if surrounded by curly braces.

The phonetic alphabet used by the decoder:

Phoneme Example #1

Phonetic Spelling #1

Example #2

Phonetic Spelling #2

Vowels

AA barn B AA R N top T AA P

AE bat B AE T crab K R AE B

AH what W AH T cut K AH T

AO more M AO R auto AO T OW

AW cow C AW house HH AW S

AX about AX B AW T dial D AY AX L

AXR butter B AH DX AXR

career K AXR IH R

AY type T AY P life L AY F

EH check CH EH K mess M EH S

ER church CH ER CH bird B ER D

EY take T EY K hail HH EY L

IH little L IH DX AX L rib R IH B


76

IX action AE K SH IX N

women W IH M IX N

IY team T IY M keep K IY P

OW loan L OW N robe R OW B

OY hoist H OY S T joy JH OY

UH book B UH K look L UH K

UW flew F L UW who HH UW

Consonants

B web W EH B bear B EH R

CH chair CH EY R statue S T AE CH UW

D reed R IY D dark D AA R K

DH with W IH DH other AH DH ER

DX forty F AO R DX IY

butter B AH DX AXR

F four F AO R graph G R AE F

G peg P EH G exam IH G Z AE M

HH halt HH AO L T Jose HH OW Z EY

JH cage K EY JH Jack JH AE K

K coin K OY N back B AE K

Programmers Guide

77

L late L EY T really R IH L IY

M lemon L EH M AH N mail M EY L

N night N AY T any EH N IY

NG ring R IH NG ankle AE NG K AH L

P pay P EY beep B IY P

R rest R EH S T prior P R AY ER

S sit S IH T bass B AE S

SH blush B L AH SH sure SH UH R

T raft R AE F T taped T EY P T

TH three TH R IY youth Y UW TH

V van V AE N river R IH V AXR

W swap S W AA P wing W IH NG

Y yes Y EH S year Y IY R

Z arms AA R M Z blaze B L EY Z

ZH Asian EY ZH AH N genre ZH AA N R AH


78

Phrases

The phrase is what the decoder attempts to match to speech.

A phrase can be in one or more of the following formats.

One of more words. Examples: "California" "how do I"

BNF format. Example: "[that's] (right | correct)" - that's right, that's correct , right or correct

Raw phonemes (inclosed in curly braces {} ) Example: "{Y EH S P L IY Z}" - yes please

Combination of above formats Example: "is that ( correct | {R AY T} )" - is that correct or is that right

The engine has an internal dictionary of approxiamately 120,000 words. There is also a robust phonetic speller for words not found in the dictionary. The only valid punctuation marks are the apostrophe (') and the dash. Dashes should be used for multiple words that should be looked up in the internal dictionary as a single word, an example being new-orleans. If the multiple words do not exist in the dictionary the dashes will be replaced by spaces words will be looked up in the dictionary separately.

BNF Refresher

BNF is an acronym for "Backus Naur Form". We use only terminal symbols. The pipe "|" is an OR operator and the square brackets "[ ]" surround optional words. The parenthesis clarify order of operation and nesting. Here are some examples.

( (I would like to speak | Please connect me ) with ) John Doe [please] translates to these variations:

1. I WOULD LIKE TO SPEAK WITH JOHN DOE PLEASE

Programmers Guide

79

2. PLEASE CONNECT ME WITH JOHN DOE PLEASE 3. I WOULD LIKE TO SPEAK WITH JOHN DOE 4. PLEASE CONNECT ME WITH JOHN DOE

I ( want | need ) [ to ( know | hear ) ] [ the ] directions [ to ] 1. I WANT TO KNOW THE DIRECTIONS TO 2. I NEED TO KNOW THE DIRECTIONS TO 3. I WANT TO HEAR THE DIRECTIONS TO 4. I NEED TO HEAR THE DIRECTIONS TO 5. I WANT THE DIRECTIONS TO 6. I NEED THE DIRECTIONS TO 7. I WANT TO KNOW DIRECTIONS TO 8. I NEED TO KNOW DIRECTIONS TO 9. I WANT TO HEAR DIRECTIONS TO 10. I NEED TO HEAR DIRECTIONS TO 11. I WANT DIRECTIONS TO 12. I NEED DIRECTIONS TO 13. I WANT TO KNOW THE DIRECTIONS 14. I NEED TO KNOW THE DIRECTIONS 15. I WANT TO HEAR THE DIRECTIONS 16. I NEED TO HEAR THE DIRECTIONS 17. I WANT THE DIRECTIONS 18. I NEED THE DIRECTIONS 19. I WANT TO KNOW DIRECTIONS 20. I NEED TO KNOW DIRECTIONS 21. I WANT TO HEAR DIRECTIONS 22. I NEED TO HEAR DIRECTIONS 23. I WANT DIRECTIONS 24. I NEED DIRECTIONS

80

LumenVox SpeechRec API Cautions

Calling LV_SRE functions using the same HPORT in different threads at the same time can have unexpected results.

Calling LVSpeechPort methods using the same LVSpeechPort object in different threads at the same time can have unexpected results.

Win32

The environment variable LVLANG specifies the location of the Lang subdirectory. The installation package will create this variable. If the client application needs to relocate the Lang subdirectory or the API was not installed using the installation package, the client application must make sure LVLANG has the correct location of the Lang subdirectory.

LVLANG\Dict is used to store static data files (primarily the language model files for the engine, which contain acoustic models and dictionaries).

LVLANG\Responses is used to store run-time created files (the Engine's call files which contain all the details of each recognition - audio data, grammar, recognized text, etc.). A sub-directory will be created for each day's data.

Linux

LVLANG is hard-coded to /usr/LumenVox/Dict by default and is used to store static data files (primarily the language model files for the Speech Engine, which contain acoustic models and dictionaries).

LVRESPONSE is hard-coded to /var/LumenVox/Responses by default and is used to store run-time created files (the Speech Engine call files which contain all the details of each recognition - audio data, grammar, recognized text, etc). A sub-directory will be created for each day's data.

The client application can create or modify either (or both) of these two environment variables to use custom locations if desired.

LumenVox SpeechRec API

81

LV_SRE C API Functions

LV_SRE

The following "C" API is exported from the LVSpeechPort dll. For C++ programmers, these functions are wrapped in class LVSpeechPort.

Port Management Functions

int LV_SRE_ClosePort(HPORT hport);

int LV_SRE_Decode(HPORT hport,int VoiceChannel,int grammarset,unsigned int flags);

int LV_SRE_GetVoiceChannelData(HPORT hport, int VoiceChannel, short** PCM, unsigned int Samples);

int LV_SRE_LoadVoiceChannel(HPORT hport,int VoiceChannel,void* M,int Length,SOUND_FORMAT Format,const char* SoundFileName);

HPORT LV_SRE_OpenPort(ExportLogMsg log,void *p,int verbosity);

void LV_SRE_RegisterAppLogMsg(ExportLogMsg Log,void* p,int NewMsgVerbosity);

const char* LV_SRE_ReturnErrorString(int ReturnCode);

int LV_SRE_SetProperty(HPORT hport, int property, int Value);

int LV_SRE_SetProperty(HPORT hport, int property, int valuetype, void *pvalue, int target, int ndx);

int LV_SRE_WaitForEngineToIdle(HPORT hport,int voicechannel,int ms);

int LV_SRE_WaitForDecode(HPORT hport, int voicechannel);

Streaming API Functions

int LV_SRE_StreamStart(HPORT hport);


82

int LV_SRE_StreamSendData(HPORT hport, void* SoundData, int SoundDataLength);

int LV_SRE_StreamGetStatus(HPORT hport);

int LV_SRE_StreamGetLength(HPORT hport);

int LV_SRE_StreamSetStateChangeCallBack(HPORT hport, LV_SRE_StreamStateChangeFn* fn, void* UserData);

void LV_SRE_StreamStateChangeFn(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData);

int LV_SRE_StreamStop(HPORT hport);

int LV_SRE_StreamCancel(HPORT hport);

int LV_SRE_StreamSetParameter(HPORT hport, int StreamParameter, unsigned long StreamParameterValue);

int LV_SRE_StreamGetParameter(HPORT hport, int StreamParameter, unsigned long* StreamParameterValue);

int LV_SRE_StreamSetParameterToDefault(HPORT hport, int StreamParameter);

SRGS Grammar Functions

int LV_SRE_LoadGrammar(HPORT hport, const char* GrammarLabel, const char* GrammarLocation);

int LV_SRE_LoadGrammarIdx(HPORT hport, int GrammarIndex, const char* GrammarLocation);

int LV_SRE_LoadGlobalGrammar(const char* GrammarLabel, const char* GrammarLocation);

int LV_SRE_LoadGrammarFromBuffer(HPORT hport, const char* GrammarLabel, const char* GrammarContents);


83

int LV_SRE_LoadGrammarFromBufferIdx(HPORT hport, int GrammarIndex, const char* GrammarContents);

int LV_SRE_LoadGlobalGrammarFromBuffer(const char* GrammarLabel, const char* GrammarContents);

int LV_SRE_LoadGrammarFromObject(HPORT hport, const char* GrammarLabel, HGRAMMAR hgrammar);

int LV_SRE_LoadGrammarFromObjectIdx(HPORT hport, int GrammarIdx, HGRAMMAR hgrammar);

int LV_SRE_LoadGlobalGrammarFromObject(const char* GrammarLabel, HGRAMMAR hgrammar);

int LV_SRE_UnloadGrammar(HPORT hport, const char* GrammarLabel);

int LV_SRE_UnloadGrammarIdx(HPORT hport, int GrammarIndex);

int LV_SRE_UnloadGlobalGrammar(const char* GrammarLabel);

int LV_SRE_UnloadGrammars(HPORT hport);

int LV_SRE_UnloadGlobalGrammars(void);

int LV_SRE_IsGrammarLoaded(HPORT hport,const char* GrammarLabel);

int LV_SRE_IsGrammarLoadedIdx(HPORT hport, int GrammarIndex);

int LV_SRE_IsGlobalGrammarLoaded(const char* GrammarLabel);

int LV_SRE_ActivateGrammar(HPORT hport, const char* GrammarLabel);

int LV_SRE_ActivateGrammarIdx(HPORT hport, int GrammarIndex);

int LV_SRE_ActivateGlobalGrammar(HPORT hport, const char* GrammarLabel);

int LV_SRE_DeactivateGrammar(HPORT hport, const char* GrammarLabel);

int LV_SRE_DeactivateGrammarIdx(HPORT hport, int GrammarIndex);

int LV_SRE_DeactivateGrammars(HPORT hport);


84

SRGS Result Functions

int LV_SRE_GetNumberOfParses(HPORT hport, int VoiceChannel);

const char* LV_SRE_GetParseTreeString(HPORT hport, int VoiceChannel, int index);

H_PARSE_TREE LV_SRE_CreateParseTree(HPORT hport, int VoiceChannel, int Index);

int LV_SRE_GetNumberOfInterpretations(HPORT hport, int VoiceChannel);

const char* LV_SRE_GetInterpretationString(HPORT hport, int VoiceChannel, int index);

H_SI LV_SRE_CreateInterpretation(HPORT hport, int VoiceChannel, int index);

N-Best Result Functions

int LV_SRE_GetNumberOfNBestAlternatives(HPORT hport, int VoiceChannel);

int LV_SRE_SwitchToNBestAlternative(HPORT hport, int VoiceChannel, int index);

Concept-Phrase Grammar Functions (for backward compatibility)

int LV_SRE_AddPhrase(HPORT hport,int GrammarSet, const char* Concept, const char* Phrase);

int LV_SRE_LoadStandardGrammar(HPORT hport,int grammarset,int defaultgrammar);

int LV_SRE_ResetGrammar(HPORT hport,int GrammarSet);

const char* LV_SRE_GetConcept(HPORT hport,int VoiceChannel, int Index);

int LV_SRE_GetConceptScore(HPORT hport,int VoiceChannel, int Index);

int LV_SRE_GetNumberOfConceptsReturned(HPORT hport,int VoiceChannel);


85

int LV_SRE_GetPhonemesDecoded(HPORT hport, int VoiceChannel, int Index);

int LV_SRE_GetPhraseDecoded(HPORT hport, int VoiceChannel, int Index);

int LV_SRE_GetRawTextDecoded(HPORT hport, int VoiceChannel, int Index);

int LV_SRE_RemoveConcept(HPORT hport,int GrammarSet, const char* Concept);


86

API Functions

LV_SRE_OpenPort

Opens the speech port and initializes a connection to the Speech Engine.

Functions

HPORT LV_SRE_OpenPort(ExportLogMsg Log, void* p, int verbosity);

HPORT LV_SRE_OpenPort2(unsigned long* error_code, ExportLogMsg Log, void* p, int verbosity);

Return Values

Note: the returned handle is used by most other API functions, and must be closed by calling LV_SRE_ClosePort.

Non-NULL

Port initialized successfully.

NULL

Licensing has been exceeded. There are too many ports active.

Parameters

Log

Pointer to a function which will receive logging information from the object.

p

A void pointer to client application-defined data. This data will be passed into the ExportLogMsg function to identify the calling port.

verbosity

range: 0 - 6

0 - minimal logging info


87

6 - maximum logging info

error_code

An error message indicating why the port failed to open

Error Code Return Values for OpenPort2

LV_SUCCESS

The port opened successfully

LV_NO_SERVER_RESPONDING or LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING

The client could not find a server to request a licensed port from.

LV_OPEN_PORT_FAILED__LICENSES_SUCCEEDED

The primary server has too many ports connected for the number of licenses it has to give out.

This function activates the speech port object. The recognition engine will begin initializing when this function is called. Control will return to the application immediately.

p is passed into the ExportLogMsg function to enable client-application-defined behavior.

Remarks

This method activates the speech port object. The recognition engine will begin initializing when this function is called. Control will return to the application immediately.


See Also

Logging Callback Function

LV_SRE_ClosePort


88

LVSpeechPort::OpenPort


89

LV_SRE_ClosePort

Closes the port, and releases its resources.

int LV_SRE_ClosePort(HPORT hport);

Return Values

LV_SUCCESS

No errors; the port has successfully shutdown.

LV_FAILURE

The Port was unable to shutdown.

LV_INVALID_HPORT

The port was never successfully opened, or was already closed.

Note:

Frees this port from counting against the number of ports allowed by your license. Close every port not needed anymore.

See Also

LV_SRE_OpenPort

LVSpeechPort::ClosePort


90

LV_SRE_RegisterAppLogMsg

Registers an application level log msg callback..

void LV_SRE_RegisterAppLogMsg(ExportLogMsg log,void *p,int verbosity);

Return Values

none.

Parameters

Log

Pointer to a function which will receive logging information.

p

p is a void pointer to Application defined data. This data will be passed into the ExportLogMsg function to identify the application.

verbosity

range: 0 - 6



Remarks

This is in addition to the port log message callback, because some log messages are generated while not associated with any one port.

There currently is no equivalent in LVSpeechPort.

See Also



91

LV_SRE_ActivateGrammar functions

If you wish to use an SRGS grammar for decode, you need to activate it. Activating a grammar puts it in the multi-grammar grammarset called LV_ACTIVE_GRAMMAR_SET. The grammars that were activated can then be used for a decode by specifying LV_ACTIVE_GRAMMAR_SET as the grammarset parameter in a call to Decode, or by setting the STREAM_PARM_GRAMMAR_SET equal to the LV_ACTIVE_GRAMMAR_SET before calling StreamStart. The reason for this mechanism is to maintain backward compatibility with previous APIs.

When ActivateGrammar is called, first the grammar is searched for among the grammars in the speech port's loaded grammars. If it can not be found there, the collection of application level grammars is searched. If you wish to explicitly activate an application level grammar, use LV_SRE_ActivateGlobalGrammar.

Functions

LV_SRE_ActivateGrammar(HPORT hport, const char* gram_name);

LV_SRE_ActivateGrammarIdx(HPORT hport, int gram_name);

Parameters

hport

The handle of the speech port for which you are activating the grammar.

gram_name

The identifier for the grammar being activated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID if you use the *Idx version of the function call. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.

Return Values

LV_SUCCESS

No errors; this grammar is now active.

LV_GRAMMAR_LOADING_ERROR


92

This grammar could not be activated, because it was not found in the speech port's set of loaded grammars.

Remarks

Detailed error and warning messages are sent to the speech port's logging callback function at priorities 0 and 1, respectively.

See Also

LV_SRE_DeactivateGrammar functions

LV_SRE_ActivateGlobalGrammar

LVSpeechPort::ActivateGrammar functions (C++ API)


93


You only need to use this function if you have a grammar in the speech port with same name as a grammar in the global space, and you wish to activate the global grammar.

Function

int LV_SRE_ActivateGlobalGrammar(HPORT hport,const char* gram_name);

Parameters

hport


gram_name

The identifier for the grammar being activated. This is the same identifier that was given to the grammar when it was loaded.

Return Values

LV_SUCCESS


LV_FAILURE

This grammar could not be activated, because it was not found in the application-level set of grammars.

Remarks

Since LV_SRE_ActivateGrammar searches the speech port's loaded grammars, and then searches the application level grammars, you only need to use LV_SRE_ActivateGlobalGrammar if there is a name conflict between your local and app-level grammars, and you need to activate the app-level one.



94

See Also



LVSpeechPort::ActivateGlobalGrammar (C++ API)


95


These functions remove a grammar from the set of active grammars. The last function clears the active grammar set

Functions

int LV_SRE_DeactivateGrammar(HPORT hport, const char* gram_name);

int LV_SRE_DeactivateGrammarIdx(HPORT hport, int gram_name);

int LV_SRE_DeactivateGrammars(HPORT hport);

Parameters

hport


gram_name

The identifier for the grammar being deactivated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID if you use the *Idx version of the function call. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.

Return Values

LV_SUCCESS

No errors; this grammar is no longer active.

LV_FAILURE

This grammar could not be deactivated, because it was never successfully activated.

See Also



96


LVSpeechPort::DeactivateGrammar (C++ API)


97

LV_SRE_LoadGrammar functions

Before you can use a grammar, you must load it into the speech port's collection of grammars, or you must load it into the collection of application-level (global) grammars. When you load a grammar, it is compiled for use in the LumenVox Speech Engine.

These functions load an SRGS grammar that will be usable by a single speech port object.

Functions

LV_SRE_LoadGrammar(HPORT hport, const char* gram_name, const char* gram_location);

LV_SRE_LoadGrammarIdx(HPORT hport, int gram_name, const char* gram_location);

LV_SRE_LoadGrammarFromBuffer(HPORT hport, const char* gram_name, const char* gram_contents);

LV_SRE_LoadGrammarFromBufferIdx(HPORT hport, int gram_name, const char* gram_contents);

LV_SRE_LoadGrammarFromObject(HPORT hport, const char* gram_name, HGRAMMAR gram_handle);

LV_SRE_LoadGrammarFromObjectIdx(HPORT hport, int gram_name, HGRAMMAR gram_handle);

Parameters

hport

The handle for the speech port you are loading the grammar into.

gram_name

The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use. This can be a string, or an integer ID if you use the *Idx version of the function call. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.


98

gram_location

A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"

gram_contents

A null terminated string containing the contents of a valid SRGS grammar file.

gram_handle

A handle for an LVGrammar object, created by LVGrammar_Create

Return Values

LV_SUCCESS

No errors; this grammar is now ready for use.

LV_GRAMMAR_SYNTAX_WARNING

The grammar file was not fully conforming, but it was understandable and is now ready to be used

LV_GRAMMAR_SYNTAX_ERROR

The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.


The grammar compiler was unable to find the location of the grammar you loaded.

Remarks


99


See Also

LV_SRE_UnloadGrammar functions

LV_SRE_IsGrammarLoaded functions

LV_SRE_LoadGlobalGrammar functions

LVSpeechPort::LoadGrammar functions (C++ API)


100


These functions remove a loaded grammar from a speech port object. The last function removes all loaded grammars from the speech port.

Functions

int LV_SRE_UnloadGrammar(HPORT hport, const char* gram_name);

int LV_SRE_UnloadGrammarIdx(HPORT hport, int gram_name);

int LV_SRE_UnloadGrammars(HPORT hport);

Parameters

hport

The handle for the speech port you are unloading the grammar out of.

gram_name

The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it. It can be a null terminated string, or an integer if you use the *Idx version of the method.

Return Values

LV_SUCCESS

No errors; this grammar is removed.

LV_FAILURE

The grammar was not present. Nothing was removed.

Remarks

Grammars that were activated and then unloaded are still active; they must be explicitly deactivated.

See Also


101


LV_SRE_UnloadGlobalGrammar functions


LVSpeechPort::UnloadLoadGrammar functions (C++ API)


102

LV_SRE_UnloadGlobalGrammar

These functions removes a loaded grammar from the application level space of grammars. The second function removes all application-level grammars.

Functions

int LV_SRE_UnloadGlobalGrammar(const char* gram_name);

void LV_SRE_UnloadGlobalGrammars(void);

Parameters

gram_name

The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it.

Return Values

LV_SUCCESS


LV_GLOBAL_GRAMMAR_TRANSACTION_ERROR

Fail to unload the grammar on all servers.

LV_GLOBAL_GRAMMAR_TRANSACTION_PARTIAL_ERROR

Fail to unload the grammar on some of the servers.

Remarks

A global grammar is unloaded on the server only when users have called unload functions on all labels that are associated with the grammar.

See Also


LV_SRE_IsGlobalGrammarLoaded functions


103


LVSpeechPort::UnloadGlobalGrammar functions (C++ API)


104


When loading a global grammar, the grammar will be sent to the server. And all following decode requests only contain global grammar ID's, instead of the actual grammars, to avoid network transportation overhead on large grammars.

A global grammar is associated with the client process that loads that grammar. All speech ports that are belong to that client have access to that global grammar. However, different client processes don't share global grammars with each other.

Generally, the lifetime of a global grammar is controlled by load and unload functions. However, in the case that users terminate client process without unloading global grammars, in order to release un-used global grammars, the server periodically checks if the client process is still alive. Once the server detected that a client process has been inactive for more than 10 minutes, it will remove all grammars associated with that client process.

In multi-threaded program, it is safe to access global grammars in read-only fashion on multiple threads simultaneously. For instance, querying whether a global grammar is loaded, or calling decode with global grammars. In the case that loading or unloading takes place, such as unloading a global grammar while decoding on another thread with that grammar, it is users' responsibility to prevent racing from happening.

Functions

LV_SRE_LoadGlobalGrammar (const char* gram_name, const char* gram_location);

LV_SRE_LoadGlobalGrammarFromBuffer (const char* gram_name, const char* gram_contents);

LV_SRE_LoadGlobalGrammarFromObject (const char* gram_name, HGRAMMAR gram_handle);

Parameters

gram_name

The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use.

gram_location


105


gram_contents


gram_handle

A handle for an LVGrammar object, created by LVGrammar_Create

Return Values

LV_SUCCESS

No errors; this grammar is now ready to use.


The grammar file was not fully conforming, but it was understandable and is now ready for use.






Fail to send the grammar to all servers.



106

Fail to send the grammar to some of the servers.

Remarks

Detailed error and warning messages are sent to the LVSpeechPort application-level logging callback function at priorities 0 and 1, respectively.

Users can load the same grammar with different labels. That will only create one instance of that grammar on the server.

See Also


LV_SRE_IsGlobalGrammarLoaded functions

LV_SRE_UnloadGlobalGrammar functions

LVSpeechPort::LoadGlobalGrammar functions (C++ API)


107


Functions

int LV_SRE_IsGrammarLoaded(HPORT hport, const char* gram_name);

int LV_SRE_IsGrammarLoadedIdx(HPORT hport, int gram_name);

Parameters

hport

The port being queried for gram_name.

gram_name

The identifier for the grammar being queried. This is the same identifier you gave the grammar when you loaded it.

Return Values

1 if a grammar was found with the label gram_name in the space of application-level grammars; 0 otherwise.

Remarks

Note: This function only tells you if a grammar with the name gram_name is loaded. It does not tell you if there are two identical grammar bodies loaded.

See Also


LV_SRE_IsGlobalGrammarLoaded


LVSpeechPort::IsGrammarLoaded functions (C++ API)


108

LV_SRE_IsGlobalGrammarLoaded

Function

int LV_SRE_IsGlobalGrammarLoaded(const char* gram_name);

Parameters

gram_name


Return Values


Remarks


See Also

LV_SRE_UnloadGlobalGrammar



LVSpeechPort::IsGlobalGrammarLoaded functions (C++ API)


109

LV_SRE_AddPhrase

Adds a phrase to a new or existing concept.

int LV_SRE_AddPhrase(HPORT hport, int GrammarSet, const char* Concept , const char* Phrase);

Return Values

LV_SUCCESS

No errors; the phrase was added to the concept.

LV_BAD_HPORT

The engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.

LV_GRAMMAR_SET_OUT_OF_RANGE

The grammar set is out of range.

LV_GRAMMAR_SYNTAX_ERROR or LV_GRAMMAR_SYNTAX_WARNING

The phrase entered has bad syntax, such as mismatched parenthesis.

Parameters

GrammarSet

Which grammar set to add the phrase. Integer value between 0 - 63, inclusive.

Concept

Which concept to add the phrase. Null-terminated string.

Phrase

The new phrase.


110

Remarks

The concept can be a new or existing concept; the call will automatically add the new concept with the single phrase.

See Also

Phrase Formats

Phonemes

LVSpeechPort::AddPhrase


111

LV_SRE_RemoveConcept

Removes a concept and all of its phrases.

int LV_SRE_RemoveConcept(HPORT hport, int GrammarSet, const char* Concept);

Return Values

LV_SUCCESS

No errors; the concept and all phrases are removed form the grammar set.


The grammar set specified is outside the valid range.

LV_BAD_HPORT

The engine is no longer running. This is the result of a LV_SRE_ClosePort call or a unrecoverable engine error.

Parameters

GrammarSet

Which grammar set to remove concept from. Possible value range 0 - 63.

Concept

The Existing concept to remove. Null-terminated string.

See Also

LVSpeechPort::RemoveConcept


112

LV_SRE_ResetGrammar

Removes all concepts from a grammar.

int LV_SRE_ResetGrammar(HPORT hport, int GrammarSet);

Return Values

LV_SUCCESS

No errors; grammar reset.


The grammar set value is out of expected range (0-63).

See Also

LVSpeechPort::ResetGrammar


113

LV_SRE_LoadStandardGrammar

Standard Grammars are deprecated in favor of SRGS built-in grammars

Loads a standard, pre-defined grammar to easily recognize and format numbers, monetary figures or digits.

int LV_SRE_LoadStandardGrammar(HPORT hport,int GrammarSet, int StdGrammar);

Return Values

LV_SUCCESS

No errors; the standard grammar is loaded.

LV_STANDARD_GRAMMAR_OUT_OF_RANGE

The standard grammar value is not a recognized grammar type.


The standard grammar was loaded into a set that is not in range.

Parameters

GrammarSet

Which grammar set this phrase is being added to. Possible value range 0 - 63.

StandardGrammar

The standard grammars are:

1. GRAMMAR_DIGITS String of single digits like a phone number or pin code.

2. GRAMMAR_MONEY Monetary value (only implemented for SRGS decodes).


114

3. GRAMMAR_NUMERIC Numeric value like 12,000, 24.45, or 35).

4. GRAMMAR_SPELLING Alphabet letters for spelling (not implemented).

5. GRAMMAR_ALPHA_NUMERIC (Not implemented).

6. GRAMMAR_DATE Date values (only implemented for SRGS decodes).

7. GRAMMAR_NONE Clears out the standard grammar, without clearing out any phrases that were added. ResetGrammar( ) will clear out the entire grammar.

Remarks

The client application can load only one standard grammar, but can add any number of concepts with AddPhrase. This is not true, however, if you use SRGS grammars. The correct way to augment as standard SRGS grammar is to load a grammar to a different location, and then activate both. When a standard grammar is loaded, the decoder will return the number, dollar amount, or digit string as either a single concept, or a single interpretation string, depending on whether SRGS is used or not .

As an example, the client application loads GRAMMAR_NUMBER and also adds the concept and phrase "Widgets". If the sound data contained the speech "twelve widgets". The decoder will return two concepts: the first is the string "12" and the second the string "Widgets". If the speech was "one thousand one hundred and twenty nine Widgets seven point two Widgets", the decoder would return four concepts: "1129" , "Widgets", "7.2" and "Widgets" .

However, If you use SRGS, this is not what happens. In order to get this sort of functionality in the SRGS setting, you would create a grammar that looks like the following:

#ABNF 1.0; language en-US; mode voice; tag-format <semantics/1.0>; root $how_many_widgets;


115

$how_many_widgets = $<builtin:grammar/number> widgets {$=$$;}

In this case you wouldn't bother using LoadStandardGrammar() at all, since the standard number grammar will get loaded when you load this grammar. The return type would be an interpretation string representing the number that was recognized, like "1129" or "7.2". The word "widgets" would not be returned in this grammar.

See Also

Standard Grammars

LVSpeechPort::LoadStandardGrammar


116

LV_SRE_LoadVoiceChannel

Loads the audio data into the specified voice channel prior to a call to LV_SRE_Decode (which decodes the audio data).

int LV_SRE_LoadVoiceChannel(HPORT hport,int VoiceChannel, void* M, int Length,SOUND_FORMAT);

Return Values

LV_SUCCESS

No errors; the voice channel audio successfully loaded.

LV_BAD_HPORT

The engine is no longer running. This is the result of an LV_SRE_ClosePort call or a unrecoverable engine error.

LV_FAILURE

Sound format was incorrectly specified.

Parameters

VoiceChannel

Accepted values 0 through 63.

M

Pointer to audio data.

Length

Memory size in bytes of the audio data.

Format

The audio data sound format.


117

Remarks

Each LV_SpeechPort supports 64 separate voice channels. Each channel has its own separate storage for decode data, so once the call is made, the client application can release its own copy. LV_SRE_LoadVoiceChannel will accept the audio data and prepare it for decoding.

See Also

LVSpeechPort::LoadVoiceChannel


118

LV_SRE_Decode

Processes the voice channel audio data against the active grammar.

int LV_SRE_Decode(HPORT hport,int VoiceChannel,int grammarset,unsigned int flags);

Return Values

Zero (0) or greater indicates success.

A negative result indicates a specific error.

Parameters

VoiceChannel

The voice channel to process.

GrammarSet

The grammar to use to process.

Flags (bitwise OR flags to set desired options)

LV_DECODE_BLOCK - Decode will not return until it has finished.

LV_DECODE_GENDER_MALE - Gender identifier.

LV_DECODE_GENDER_FEMALE – Gender identifier.

LV_DECODE_FIRST_TIME_USER – Reset caller weights in Recognition Engine (not implemented).

LV_DECODE_USE_OOV - Use the Out-Of-Vocabulary filter (OOV) during decode.

Remarks


119

If LV_DECODE_BLOCK is set, LV_SRE_Decode will not return until it has finished processing the data.

If LV_DECODE_BLOCK is not set, LV_SRE_Decode returns immediately (but continues processing the data on a separate thread); the client application can continue its own work. Calling other LVSpeechPort methods may block until the Decode is finished. Once the client application is ready to check for results, call either 1) LV_SRE_GetNumberOfConceptsReturned, or 2) LV_SRE_WaitForEngineToIdle and then LV_SRE_GetNumberOfConceptsReturned. LV_SRE_WaitForEngineToIdle will only wait for a specified time, and returns regardless of whether LV_SRE_Decode is finished, where LV_SRE_GetNumberOfConceptsReturned will block until Decode is finished.

LV_DECODE_GENDER_FEMALE and LV_DECODE_GENDER_MALE identify which gender acoustic model to use. If these flags are not specified, the engine automatically decodes each audio file against both gender models. While this slows the engine by requiring two decodes, evaluating against both models has a very significant positive effect on recognition accuracy. Since the engine is multithreaded, unless CPU loads are a serious issue, do not use these flags.

On an error, call LV_SRE_ReturnErrorString with the negative result from LV_SRE_Decode to get a description of the error.

See Also

LV_SpeechPort::Decode


120

LV_SRE_WaitForEngineToIdle

(Deprecated in favor of LV_SRE_WaitForDecode)

Blocks the client application until the port is idle (not decoding).

int LV_SRE_WaitForEngineToIdle(HPORT hport, int MillisecondsToWait, int VoiceChannel);

Return Values

LV_SUCCESS

No errors or timeout; the engine is now idle.

LV_TIME_OUT

WaitForEngineToIdle's timeout was reached before the engine became idle.

Parameters

MillisecondsToWait

The number of milliseconds to wait before returning if the Speech Port does not become idle.

VoiceChannel

Which VoiceChannel to wait on, -1 waits on all the voice channels for the port.

Remarks

This function is deprecated in favor of LV_SRE_WaitForDecode. To achieve the same behavior as LV_SRE_WaitForDecode, use property PROP_EX_DECODE_TIMEOUT, and set MillisecondsToWait to TIMEOUT_INFINITE.

Some of the LV_SRE functions run asynchronously, in particular, LV_SRE_Decode. LV_SRE_WaitForEngineToIdle is primarily useful when LV_SRE_Decode is called without LV_DECODE_BLOCK. In this case, LV_SRE_Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will


121

eventually need the results, the clients need a way to query the port to see if LV_SRE_Decode has finished. LV_SRE_WaitForEngineToIdle will wait the specified time for the engine to idle; check the return value to ensure the engine is idle, indicating that decode results are available.

LV_SRE_WaitForEngineToIdle is also useful to ensure the engine has finished initializing, prior to calls to LV_SRE_Decode.

See Also

LV_SRE_Decode

LVSpeechPort::WaitForEngineToIdle

LV_SRE_WaitForDecode


122

LV_SRE_GetNumberOfInterpretations

Returns the number of semantic interpretation results that were generated by the previous decode.

Function

int LV_SRE_GetNumberOfInterpretations(HPORT hport, int voicechannel)

Parameters

hport

A handle to the speech port.

voicechannel

The audio channel holding the decoded audio.

See Also

LV_SRE_CreateInterpretation

LV_SRE_GetInterpretationString

LVSpeechPort::GetNumberOfInterpretations (C++ API)


123


Returns a handle to a data structure representing the results of the semantic interpretation process. The handle must be released with LVInterpretation_Release when you are finished with it.

Function

H_SI LV_SRE_CreateInterpretation (HPORT hport, int voicechannel, int index)

Parameters

hport

A handle to the speech port

voicechannel

The channel that the decode took place on.

index

An utterance could give rise to multiple interpretations, particularly if the grammars involved are ambiguous. index ranges from 0 to LV_SRE_GetNumberOfInterpretations - 1.

Return Value

The return type is a handle to an interpretation object. The object is a representation of the ECMAScript object made by the matching grammar, using the Semantic Interpretation for Speech Recognition process. It also contains additional information such as the confidence score, matching grammar label, and the input sentence.

Remarks

The H_SI handle can be manipulated using the functions prefixed by "LVInterpretation_"

See Also



124


LVInterpretation C API

LVParseTree::GetInterpretation (C++ API)


125


Provides the user with a string representation of the semantic interpretation result data.

Function

const char* LV_SRE_GetInterpretationString(HPORT hport, int voicechannel, int index)

Parameters

hport

A handle to the speech port

voicechannel

The channel containing the decoded audio

index

A value between 0 and LV_SRE_GetNumberOfInterpretations -1

Remarks

Logically, the interpretation string is the same as the result data contained in a semantic interpretation object.

See Also



LVSpeechPort::GetInterpretationString (C++ API)


126

LV_SRE_GetNumberOfParses

Returns the number of parse trees that were generated by the previous decode.

Function

int LV_SRE_GetNumberOfParses(HPORT hport, int voicechannel)

Parameters

hport

A handle to the speech port.

voicechannel


See Also

LV_SRE_CreateParseTree

LV_SRE_GetParseTreeString

Speech Parse Tree Introduction

LVSpeechPort::GetNumberOfParses (C++ API)


127


Provides the user with a handle to a speech parse tree, representing the sentence structure of what was decoded by the Speech Engine, according to the active grammars. You must release the handle with LVParseTree_Release when you are finished with it.

Function

H_PARSE_TREE LV_SRE_CreateParseTree(HPORT hport, int voicechannel, int index)

Parameters

hport

The handle to the speech port.

voicechannel

The audio channel containing the input audio

index

It is possible to have more than one parse tree for an utterance (for instance if the grammar is ambiguous); this is the index of the tree

Return Value

A handle to a parse tree. The parse tree handle is manipulated with functions having the prefix "LVParseTree_".

Remark

Logically, a parse tree and the parse string returned to the user are the same. However, a speech parse tree makes it easy to search the parse tree for useful information.

See Also




128

Parse Tree Introduction

LVParseTree C API

LVSpeechPort::GetParseTree (C++ API)


129


Provides the user with a string representation of a speech parse tree.

Function

const char* LV_SRE_GetParseTreeString(HPORT hport, int voicechannel, int index)

Parameters

hport

The handle to the speech port.

voicechannel


index

It is possible to have more than one parse tree possibility (for instance if the grammar is ambiguous); this is the index of the tree

Remark

Logically, a speech parse tree and the parse string returned to the user are the same. However, a speech parse tree makes it easy to search the parse tree for useful information. The parse tree string is based on the examples provided by the W3C SRGS specification

See Also




LVSpeechPort::GetParseTreeString (C++ API)


130

LV_SRE_GetNumberOfConceptsReturned

Returns the number of concepts found in the last call to LV_SRE_Decode.

int LV_SRE_GetNumberOfConceptsReturned(HPORT hport,int VoiceChannel);

Return Values

The number of concepts found for this voice channel.

Parameters

VoiceChannel

The voice channel processed by LV_SRE_Decode.

See Also

LVSpeechPort::GetNumberOfConceptsReturned


131

LV_SRE_GetConcept

Returns one concept found in the last call to LV_SRE_Decode.

const char* LV_SRE_GetConcept(HPORT hport,int VoiceChannel, int Index);

Return Values

A null-terminated string representing the matched concept .

NULL indicates that Index was outside the possible range.

Parameters

VoiceChannel


Index

The recognition position of the concept, between 0 and (LV_SRE_GetNumberOfConceptsReturned - 1), inclusive.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine would return the concepts highlighted:

See Also

LVSpeechPort::GetConcept


132

LV_SRE_GetConceptScore

Returns the confidence score of a concept found in the last call to LV_SRE_Decode.

int LV_SRE_GetConceptScore(HPORT hport,int VoiceChannel, int Index);

Return Values

The confidence score of the matched concept. The range of possible values is 0 to 1000.

Parameters

VoiceChannel


Index

The recognition position of the concept, between 0 and (LV_SRE_GetNumberOfConceptsReturned - 1), inclusive.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the scores highlighted:

See Also

LV_SpeechPort::GetConceptScore


133

LV_SRE_GetPhraseDecoded

Returns the decoded phrase (with BNF formatting retained) found in the last call to LV_SRE_Decode.

const char* LV_SRE_GetPhraseDecoded(HPORT hport, int VoiceChannel, int Index);

Return Values

A static string

Parameters

VoiceChannel


Index

The recognition position of the phrase to decode.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phrases highlighted:

The main difference between LV_SRE_GetPhraseDecoded and LV_SRE_GetRawTextDecoded is in BNF formatting. LV_SRE_GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. LV_SRE_GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, LV_SRE_GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.


134

See Also

LV_SRE_GetPhonemesDecoded

LV_SRE_GetRawTextDecoded

LVSpeechPort::GetPhraseDecoded


135

LV_SRE_GetPhonemesDecoded

Returns the actual phonemes found in a call to LV_SRE_Decode.

const char* LV_SRE_GetPhonemesDecoded(HPORT hport,int VoiceChannel, int Index);

Return Values

A null-terminated static string of the decoded phonemes.

Parameters

VoiceChannel


Index

The recognition position of the decoded phonemes.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phonemes highlighted:

See Also



LVSpeechPort::GetPhonemes


136


Returns the decoded raw text (without BNF formatting) found in the last call to Decode.

const char* LV_SRE_GetRawTextDecoded(HPORT hport,int VoiceChannel, int Index);

Return Values

A null-terminated string representing the decoded raw text.

Parameters

VoiceChannel


Index

The recognition position of the decoded raw text.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the raw text highlighted:

The main difference between LV_SRE_GetPhraseDecoded and LV_SRE_GetRawTextDecoded is in BNF formatting. LV_SRE_GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. LV_SRE_GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, LV_SRE_GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.


137

See Also

LV_SRE_GetPhonemes


LVSpeechPort::GetRawTextDecoded


138

LV_SRE_GetVoiceChannelData

Sets the pointers to the voice channel's copy of the original preprocessed audio data.

int LV_SRE_GetVoiceChannelData(HPORT hport, int VoiceChannel, short** PCM, unsigned int* Samples);

Return Values

LV_SUCCESS

No errors; PCM and Samples have been successfully set.

LV_SOUND_CHANNEL_OUT_OF_RANGE

The grammar set specified is outside the valid range; possible values are 0-63, inclusive.

LV_BAD_HPORT

The Speech Engine is no longer running. This is the result of a ClosePort call or a unrecoverable Speech Engine error.

Parameters

VoiceChannel


PCM

A pointer to a pointer that will be set to the post-processed audio data.

Samples

A pointer to an integer to the set the number of samples.

See Also

LVSpeechPort::GetVoiceChannelData


139

LV_SRE_ReturnErrorString

Returns a description of an error code.

const char* LV_SRE_ReturnErrorString(int ReturnCode);

Return Values

A null-terminated static string describing the error code.

Parameters

ReturnCode

The error code.

Remarks

If the error code is an invalid error code, "Invalid Error Code" is returned.

See Also

LVSpeechPort::ReturnErrorString


140

LV_SRE_SetProperty

Sets various properties on the port.

int LV_SRE_SetProperty(HPORT hport, PROPERTIES Property, int Value);

Return Values

LV_SUCCESS

No errors; Property is set to Value.

LV_BAD_HPORT

hport was invalid.

LV_NOT_A_VALID_PROPERTY_VALUE

Value is invalid for the given property.

Parameters

HPort

The port's handle.

Property

Which property to modify.

Value

Property-dependent.

Remarks

Currently, only PROP_SAVE_SOUND_FILES is implemented; setting Value to 1 will cause the port to save request and answer files to disk; setting Value to 0


141

turns this feature off. The request and answer files are invaluable for troubleshooting and tuning applications, but will quickly fill up a hard drive.

See Also

Properties

LVSpeechPort::SetProperty


142

LV_SRE_SetPropertyEx

Sets various properties for a port, client, soundchannel, or grammar.

int SetProperty(int propertyname, int valuetype, void* pvalue, int target = PROP_EX_TARGET_PORT, int index = 0 );

Return Values

LV_SUCCESS

No errors; property is set to the value pointed to by pvalue.

LV_INVALID_PROPERTY

The property does not exist.

LV_INVALID_PROPERTY_VALUE

The property value is invalid for the designated property (e.g. out of range).

LV_INVALID_PROPERTY_TARGET

The property cannot be set for the specified target.

LV_INVALID_PROPERTY_VALUE_TYPE

The property's type is incompatible with the declared type.

LV_INVALID_PROPERTY_TARGET_IDX

The target's index (grammar set, voicechannel) is out of range for this property.

Note: If more than one error occurs, which error code is returned is undefined.

Parameters

propertyname


143


valuetype

The value type of the property being set. Legal values are:

PROP_EX_VALUE_TYPE_INT

PROP_EX_VALUE_TYPE_INT_PTR

PROP_EX_VALUE_TYPE_STRING

PROP_EX_VALUE_TYPE_FLOAT_PTR

Each property has a set of legal set of value types. See Properties.

pvalue

A pointer to the new value for propertyname. pvalue will be reinterpreted according to the value type provided.

target

The portion of the API that this property is set for. Legal values are:

PROP_EX_TARGET_PORT -- pvalue affects an entire speech port object

PROP_EX_TARGET_CHANNEL -- pvalue affects one voice channel in the speech port. The channel is specified by index.

PROP_EX_TARGET_GRAMMAR -- pvalue affects one grammar set in the speech port. The set is specified by index.

PROP_EX_TARGET_CLIENT -- pvalue is global, and affects all ports on the client.

Remarks


144

See Properties for a list of modifiable properties.

See Also

Properties

LVSpeechPort::SetPropertyEx (C++ API)


145

LV_SRE_StreamStart

Sets up a new stream.

int LV_SRE_StreamStart(HPORT hport);

Return Values

LV_SUCCESS

Stream set up.

LV_FAILURE

Parameters incorrectly set.

Parameters

HPort

The port's handle.

Remarks

Call this function to set up a new stream. You need to call this function after calling LV_SRE_StreamStop, LV_SRE_StreamCancel or after end-of-speech has been detected on previous utterance.

See Also

LV_SRE_StreamSetParameter

LV_SRE_StreamStop

LV_SRE_StreamCancel


146

LV_SRE_StreamSendData

Send data buffer of sound data to stream.

int LV_SRE_StreamSendData(HPORT hport, void* SoundData, int SoundDataLength);

Return Values

LV_SUCCESS

Data accepted

LV_FAILURE

Stream not active or NULL sound data.

Parameters

HPort

The port's handle.

SoundData

Pointer to the memory buffer containing sound data.

SoundDataLength

Length in bytes of sound data.

Remarks

Used to do the actual streaming. Call this function with each sound data buffer. This call copies sound data to an internal buffer and returns immediatly. Processing of sound data takes place on a background thread.

See Also

LV_SRE_StreamSetStateChangeCallBack

LV_SRE_StreamGetStatus


147


Returns status of stream.


Return Values

Returns a stream status define. See Steam Status.

Parameters

HPort

The port's handle.

Remarks

Called to check the current state of stream.

See Also



148

LV_SRE_StreamGetLength

Returns length of sound data in stream buffer.


Return Values

Number of bytes in internal buffer for sound stream.

Parameters

HPort

The port's handle.

Remarks

This is the total number of bytes streamed. Does not include bytes sent before barge-in is detected (if STREAM_PARM_DETECT_BARGE_IN is active) Can be useful if application wants to stop post barge-in stream after a certain amount of time (as example, to limit a user speech to 10 seconds)

See Also



149


Set up a call back to receive state change notification of a stream.

int LV_SRE_StreamSetStateChangeCallBack(HPORT hport, LV_SRE_StreamStateChangeFn* fn, void* UserData);

Return Values

LV_SUCCESS

LV_BAD_HPORT

Parameters

HPort

The port's handle.

LV_SRE_StreamStateChangeFn

Pointer to callback function to receive state change updates. See Stream Callback.

UserData

Application defined data sent back in callback.

Remarks

Each time a streams status changes, this callback will be called.

See Also




150

LV_SRE_StreamStop

Stops stream and loads sound channel with streamed data.

int LV_SRE_StreamStop(HPORT hport);

Return Values

LV_SUCCESS

LV_BAD_HPORT

LV_FAILURE Stream not active.

Parameters

HPort

The port's handle.

Remarks

This function ends streaming and puts streamed data into the voice channel defined with the STREAM_PARM_VOICE_CHANNEL parameter. If the STREAM_PARM_AUTO_DECODE parameter is active, the decode will begin (non-blocking) when this function is called.

See Also


LV_SRE_StreamCancel

Stream Parameters


151

LV_SRE_StreamCancel

Stops stream, sound data is discarded.

int LV_SRE_StreamCancel(HPORT hport);

Return Values

LV_SUCCESS

LV_BAD_HPORT


Parameters

HPort

The port's handle.

Remarks

This kills the stream. Can be called to cancel a stream (particularly auto-decode types streams) in order to start new stream.

See Also

LV_SRE_StreamStop


152


Sets a new value for a stream property.


Return Values

LV_SUCCESS

LV_INVALID_PROPERTY StreamParameter does not exist.

LV_INVALID_PROPERTY_VALUE StreamParamerterValue is out of range for the stream parameter.

Parameters

HPort

The port's handle.

StreamParameter

Stream parameter to change. See Stream Parameters.

StreamParameterValue

New stream parameter value.

Remarks

Sets a stream parameter value.

See Also


153

LV_SRE_StreamGetParameter

LV_SRE_StreamSetParameterToDefault

Stream Parameters


154


Gets the current value of a stream property.


Return Values

LV_SUCCESS



Parameters

HPort

The port's handle.

StreamParameter




Remarks


See Also


155



Stream Parameters


156


Sets a stream property to its default value.

int LV_SRE_StreamSetParameterToDefault(HPORT hport, int StreamParameter);

Return Values

LV_SUCCESS

LV_INVALID_PROPERTY Stream parameter does not exist.

Parameters

HPort

The port's handle.

StreamParameter

Stream parameter to reset . See Stream Parameters.

Remarks

Sets a stream parameter value back to default value.

See Also



Stream Parameters


157

LV_SRE_GetNumberOfNBestAlternatives

Returns the number of n-best alternatives found by the engine.

int LV_SRE_GetNumberOfNBestAlternatives(HPORT hport, int voicechannel);

Return Values

Number of n-best alternatives. It will always less than or equal to the value set for PROP_EX_MAX_NBEST_RETURNED.

Parameters

HPort

The port's handle.

voicechannel

The channel containing the decoded audio.

Remarks

Sets a stream parameter value back to default value.

See Also

PROP_EX_MAX_NBEST_RETURNED

LV_SRE_SwitchToNBestAlternative

LVSpeechPort::GetNumberOfNBestAlternatives


158


Switch the n-best alternative that is viewable. After this function call, subsequent result retrieval functions, such as LV_SRE_CreateInterpretation will come from this n-best alternative.

int LV_SRE_SwitchToNBestAlternatives(HPORT hport, int voicechannel, int index);

Return Values

LV_SUCCESS

LV_FAILURE The index is not valid.

Parameters

HPort

The port's handle.

voicechannel


index

The index of the n-best alternative to switch to. It may be any value in the range [0, LV_SRE_GetNumberOfNBestAlternatives).

Remarks

Each alternative represents a distinct sentence. However, since some sentences can have multiple interpretations or multiple parses, it is possible that for some alternatives you will have multiple parse tree or interpretation objects returned. For this reason, you should get all results out as follows:

int nbest_count; int nbest_total = LV_SRE_GetNumberOfNBestAlternatives(port, vc);


159

int interp_count; for (nbest_count=0; nbest_count<nbest_total; ++nbest_count) { LV_SRE_SwitchToNBestAlternative(port, vc, nbest_count); int interp_total = LV_SRE_GetNumberOfInterpretations(port, vc); for (interp_count=0; interp_count<interp_total; ++interp_count) { H_SI interp = LV_SRE_CreateInterpretation(port, vc, interp_count); /* do something with the interp */ LVInterpretation_Release(interp); } }

Even though more than one interpretation can live in a single n-best result, the same interpretation will not live in more than one n-best result. The lower scoring interpretations are pruned out.

See Also


LVSpeechPort::SwitchToNBestAlternative


160


Blocks the client application until the decode is finished.

int LV_SRE_WaitForDecode(HPORT hport, int VoiceChannel);

Return Values

LV_SUCCESS

No errors or timeout; the decode interaction is finished.

LV_TIME_OUT

The timeout value associated with PROP_EX_DECODE_TIMEOUT was exceeded before a result was returned from the Speech Engine. The decode was dropped from the Engine, and the LVSpeechPort may now start a new decode request.

Parameters

VoiceChannel

Which voice channel to wait on. Setting VoiceChannel equal to -1 causes a wait on all the voice channels for the port.

Remarks

Some of the API functions run asynchronously, in particular, LV_SRE_Decode. LV_SRE_WaitForDecode is primarily useful when LV_SRE_Decode is called without LV_DECODE_BLOCK. In this case, LV_SRE_Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will eventually need the results, the clients need a way to query the port to see if LV_SRE_Decode has finished. LV_SRE_WaitForDecode will wait the specified time (determined by set value of PROP_EX_DECODE_TIMEOUT) for the engine to idle; check the return value to ensure the decode interaction is finished before attempting to retrieve answers from the speech port.

See Also

PROP_EX_DECODE_TIMEOUT

LV_SRE_Decode

LVSpeechPort::WaitForDecode


161

LVInterpretation C API Functions

LVInterpretation Summary

The LVInterpretation object contains a fully processed decode result. It includes

The raw input the Speech Engine recognized

The name of the grammar that was matched

A confidence score for the interpretation

The semantic data object -- the result of processing the input sentence against the matching grammar, and executing the semantic tags in the sentence's parse tree

Use <LVSpeechPort.h> or <LV_SRE_Semantic.h>

Return Type Function Description

H_SI LVInterpretation_Create (void) Creates an empty LVInterpretation handle.

H_SI LVInterpretation_CreateFromCopy (H_SI other)

Create a copy of another LVInterpretation handle

void LVInterpretation_Release(H_SI hsi) Destroys the LVInterpretation handle

H_SI_DATA LVInterpretation_GetResultData (H_SI hsi)

The result object, representing the end product of the semantic interpretation process.

const char* LVInterpretation_GetResultName (H_SI hsi)

The name of the result


162

hsi) data, according to the matching grammar.

const char* LVInterpretation_GetGrammarLabel (H_SI hsi)

Returns the name of the grammar as it was provided to the speech port.

const char* LVInterpretation_GetMode (H_SI hsi) returns the interaction mode for this interpretation.

const char* LVInterpretation_GetLanguage (H_SI hsi)

Returns the language identifier for this interpretation.

const char* LVInterpretation_GetInputSentence (H_SI hsi)

The sentence that generated this interpretation.

int LVInterpretation_GetScore (H_SI hsi) Confidence score for this interpretation.

const char* LVInterpretation_GetTagFormat (H_SI hsi)

The tag format (interpretation scheme) that created the semantic data object.


163

LVSemanticData Summary

An LVSemanticData object is the result of the semantic interpretation process. A user's spoken input is combined with a grammar containing semantic tag instructions to create a compound object. An LVSemanticData object can be one of the following types:

SI_TYPE_INT -- A simple integer value

SI_TYPE_DOUBLE -- A double precision floating point value

SI_TYPE_BOOL -- An integer that is either 1 or 0

SI_TYPE_STRING -- A null-terminated character array.

SI_TYPE_OBJECT -- A structure containing one or more property-value pairs.

SI_TYPE_ARRAY -- An indexed collection of values.

SI_TYPE_NULL -- A null object.

Return Value

Function Description

H_SI_DATA LVSemanticData_CreateFromCopy(H_SI_DATA other) Creates a new object from an old one. The new one will need to be released when no longer in use.

const char*

LVSemanticData_Print(H_SI_DATA data, int format) Prints the data in XML or ECMAScript formats.

int LVSemanticData_GetType(H_SI_DATA data) Returns the type of the data.

const char*

LVSemanticData_GetString(H_SI_DATA data) If the data is of type SI_TYPE_STRING,


164

returns the string contents.

int LVSemanticData_GetInt(H_SI_DATA data) If the data is of type SI_TYPE_INT, returns the integer.

double LVSemanticData_GetDouble(H_SI_DATA data) If the data is of type SI_TYPE_DOUBLE, returns the double.

int LVSemanticData_GetBool(H_SI_DATA data) If the data is of type SI_TYPE_BOOL, returns a 1 for true, 0 for false

int LVSemanticObject_GetNumberOfProperties(H_SI_DATA data)

If the data is of type SI_TYPE_OBJECT, returns the number of properties (member data) it contains.

const char*

LVSemanticObject_GetPropertyName(H_SI_DATA data, int i)

If the data is of type SI_TYPE_OBJECT, returns the name of the ith property

int LVSemanticObject_PropertyExists(H_SI_DATA data, const char* prop_name)

If the data is of type SI_TYPE_OBJECT, returns 1 if the object contains a value named prop_name, 0 otherwise.

H_SI_DATA LVSemanticObject_GetPropertyValue(H_SI_DATA data, const char* prop_name)

If the data is of type SI_TYPE_OBJECT, returns the member data named prop_name.

int LVSemanticArray_GetSize(H_SI_DATA data) If the data is of type


165

SI_TYPE_ARRAY, returns the number of elements in the array.

H_SI_DATA LVSemanticArray_GetElement(H_SI_DATA data, int i)

If the data is of type SI_TYPE_ARRAY, returns the ith element in the array.


166

API Functions

LVInterpretation: Creating, Copying and Releasing

LVInterpretation objects are fully copyable.

Functions

H_SI LVInterpretation_Create(void)

H_SI LVInterpretation_CreateFromCopy(H_SI other_si)

void LVInterpretation_Copy(H_SI hsi, H_SI other_si)

void LVInterpretation_Release(H_SI hsi)

Parameters

hsi

The interpretation handle being copied into, or being released

other_hsi

The interpretaion handle whose contents are being copied.

Remarks

Any new handle given to you via Create or CreateFromCopy must be released. Also, any handle given to you by the speech port through LV_SRE_CreateInterpretation must be released.

Example

HPORT Port; H_SI Interp;

//open the port and do a decode //... //when the decode is finished,grab an interpretation handle Interp = LV_SRE_CreateInterpretation(Port, voicechannel, index);


167

//start using the interpretation data. //... //When you are done with it, release it. LVInterpretation_Release(Interp);

See Also

Constructing, Copying and Destroying an LVInterpretation Object (C++ API)


168

LVInterpretation_GetResultData

Returns a handle for the semantic data generated by user input and a matching grammar. The returned handle does not allocate any additional memory, so do not release it.

Function

H_SI_DATA LVInterpretation_GetResultData(H_SI hsi)

Returns

A handle to the results of a semantic interpretation process.

Parameters

hsi

An interpretation handle.

Remarks

The semantic data handle provided to the user via this function is owned by the interpretation handle hsi. It will be released when hsi is released.

See Also

LVSemanticData C API

LVInterpretation::ResultData (C++ API)


169

LVInterpretation_GetResultName

Returns the name of the name of the result data for this interpretation. The result name is usually the root rule of the matching grammar for this interpretation.

Function

const char* LVInterpretation_GetResultName (H_SI hsi)

Parameters

hsi


See Also

LVInterpretation::ResultName (C++ API)


170

LVInterpretation_GetLanguage

Returns the language identifier of the grammar that generated this interpretation.

Function

const char* LVInterpretation_GetLanguage(H_SI hsi)

Parameters

hsi


Returns

An RFC 3066 language identifier, such as "en-US" for United States English, or "fr" for French.

See Also

LVInterpretation::Language ( C++ API )


171

LVInterpretation_GetMode

Returns the interaction mode that created the interaction.

Function

const char* LVInterpretation_GetMode(H_SI hsi)

Parameters

hsi


Returns

"voice" or "dtmf"

See Also

LVInterpretation::Mode (C++ API)


172

LVInterpretation_GetInputSentence

Returns the input that was fed to the matching grammar to create this interpretation. It may represent the speech the engine recognized, or a dtmf sequence.

Function

const char* LVInterpretation_GetInputSentence(H_SI hsi)

Parameters

hsi

An interpretation handle

See Also

LVInterpretation::InputSentence (C++ API)


173

LVInterpretation_GetGrammarLabel

Returns the name of the grammar that generated this interpretation.

Function

const char* LVInterpretation_GetGrammarLabel (H_SI hsi)

Parameters

hsi


Remarks

LVInterpretation_GetGrammarLabel will always return the name of one of the grammars you activated for decode. If the active grammar had an integer label, then the returned label will be a string representation of that integer.

See Also

LVInterpretation::GrammarLabel ( C++ API )


174

LVInterpretation_GetScore

Returns a confidence score for this interpretation.

Function

int LVInterpretation_GetScore(H_SI hsi)

Parameters

hsi

An interpretation handle

Returns

A number between 0-1000. Higher numbers indicate more confidence by the speech port in this interpretation.

See Also

LVInterpretation::Score (C++ API)


175

LVInterpretation_GetTagFormat

Returns the name of the tag format declared in the matching grammar for this interpretation. The tag format determines the semantic interpretation scheme.

Function

const char* LVInterpretation_GetTagFormat(H_SI hsi)

Parameters

hsi


See Also

LVInterpretation::TagFormat (C++ API)


176

LVSemanticData_Release

Release memory used by a H_SI_DATA handle.

Function

void SI_DATA_Release(H_SI_DATA h_si_data)

Parameters

h_si_data

Semantic Data Handle.


177

LVSemanticData_CreateFromCopy

Copies the contents of another handle into a new handle and returns the new handle. This function allocates memory for the new handle, so user is required to release the new handle.

H_SI_DATA LVSemanticData_CreateFromCopy(H_SI_DATA h_si_data)

Return Value

Non-zero

Successful.

NULL

Copying failed.

Parameters

h_si_data

Semantic data handle.


178

LVSemanticData_Print

Returns a string describing the contents of a semantic data handle. The function can return XML or ECMAScript formatted text.

const char* LVSemanticData_Print(H_SI_DATA h_si_data, int format)

Return Values

A pointer to the string which contains the print out information.

Parameters

h_si_data


format

The format type.

Remark

The string contents are stored with the semantic data handle, and will be released when the handle is released.


179

LVSemanticData_GetType

Returns the underlying data type of a given H_SI_DATA handle.

Function

int LVSemanticData_GetType(H_SI_DATA h_si_data)

Return Value

One of seven semantic data types.

Parameters

h_si_data



180

LVSemanticData_GetString

Returns the string contained in a given handle. This function assumes that the handle is of type SI_TYPE_STRING. If the user passes in a handle with non SI_TYPE_STRING type, this function always return NULL.

Function

const char* LVSemanticData_GetString(H_SI_DATA h_si_data)

Return Values

NULL

Either the handle is not of type SI_TYPE_STRING, or some error occurred.

Other

A pointer to a buffer containing the string.

Parameters

h_si_data



181

LVSemanticData_GetDouble

Returns a double precision floating point value contained in the given semantic data handle. This function assumes that the handle is of type SI_TYPE_DOUBLE . If the user passes in a handle with non SI_TYPE_DOUBLE type, this function always returns 0.0.

Function

double LVSemanticData_GetDouble(H_SI_DATA h_si_data)

Return Values

A double.

Parameters

h_si_data



182

LVSemanticData_GetInt

Returns the integer value contained in a given semantic data handle. This function assumes that the handle is of type SI_TYPE_INT. If the user passes in a handle with non SI_TYPE_INT type, this function always returns 0.

Function

int LVSemanticData_GetInt(H_SI_DATA h_si_data)

Return Values

An integer value.

Parameters

h_si_data



183

LVSemanticData_GetBool

Returns an integer value contained in a given handle. A non-zero integer value represents a true value, and a zero value represents a false value. This function assumes that the semantic data handle being passed in is of type SI_TYPE_BOOL. If the user passes in a handle with non SI_TYPE_BOOL type, this function always returns false.

Function

int LVSemanticData_GetBool(H_SI_DATA h_si_data)

Return Values

An integer value.

Parameters

h_si_data



184

LVSemanticObject_GetNumberOfProperties

If a semantic data handle is of type SI_TYPE_OBJECT this function returns the number of elements in this object. Otherwise, it returns -1.

Function

int LVSemanticObject_GetNumberOfProperties(H_SI_DATA h_si_data)

Return Value

The number of elements in the object.

Parameters

h_si_data



185

LVSemanticObject_GetPropertyName

If a handle is of type SI_TYPE_OBJECT, this function returns the name of a property of the object. Otherwise this function returns NULL. Usually, the user obtains the number of properties by calling LVSemanticObject_GetNumberOfProperties, then gets each property name in sequence.

Function

const char* LVSemanticObject_GetPropertyName(H_SI_DATA h_si_data, int index)

Return Values

Non-NULL pointer

A pointer to a buffer containing the name of the property specified by index.

NULL

Either the handle is not of SI_TYPE_OBJECT type, or the index exceeds the total number of properties in this object.

Parameters

h_si_data


index

The index of the property you are inspecting. The index begins at 0. If the index is greater or equal to the value returned by LVSemanticObject_GetNumberOfProperties, this function will return NULL.


186

LVSemanticObject_GetPropertyValue

If the handle is of SI_TYPE_OBJECT type, this function return the handle to the semantic data associated with the property name in the object.If the handle is not of SI_TYPE_OBJECT type, this function always return 0. This function does not allocate memory for the new handle, so do not try to release the new handle.

Function

H_SI_DATA LVSemanticObject_GetPropertyValue(H_SI_DATA h_si_data, const char *property_name)

Return Values

Non-zero value

A handle to the semantic data associated with the property name in the object..

NULL

The property name does not exist in the object, or the handle is not of SI_TYPE_OBJECT type.

Parameters

h_si_data


property_name

A string containing the property name.


187

LVSemanticObject_PropertyExists

If a handle is of SI_TYPE_OBJECT type, this function returns a boolean value indicating if the property name exists in the object. If the handle is not of SI_TYPE_OBJECT type, this function always return false.

Function

int LVSemanticObject_PropertyExists(H_SI_DATA h_si_data, const char *property_name)

Return Values

1

The property name exists in the object..

0

The property name does not exist in the object. Or the handle is not SI_TYPE_OBJECT type.

Parameters

h_si_data

A semantic data handle.

property_name



188

LVSemanticArray_GetSize

If a handle is of SI_TYPE_ARRAY type, this function returns the number of elements in the array. Otherwise this function returns -1.

Function

int LVSemanticArray_GetSize(H_SI_DATA h_si_data)

Return Values

Non-negtive value

The number of elements in the array.

-1

Either the handle is not of SI_TYPE_ARRAY type, or some error occurred.

Parameters

h_si_data



189

LVSemanticArray_GetElement

If the handle is of SI_TYPE_ARRAY type, this function returns a handle to the semantic data specified by the index. If the handle is not of SI_TYPE_ARRAY type, this function always returns NULL. This function does not allocate memory for the new handle, so do not try to release it.

Function

H_SI_DATA LVSemanticArray_GetElement(H_SI_DATA h_si_data, int index)

Return Values

Non-zero value

The handle to the semantic data specified by the index in the array..

0

The index exceeds the number of elements. Or the handle is not SI_TYPE_ARRAY type.

Parameters

h_si_data


index

The index begins at 0. If the index is greater or equal to the value returned by LVSemanticArray_GetSize, this function will return NULL.


190

LVParseTree C API functions


191

API Functions

Creating, Copying and Releasing a LVParseTree Handle

LVParseTree objects are fully copyable and assignable.

Functions

H_PARSE_TREE LVParseTree_Create()

H_PARSE_TREE LVParseTree_CreateFromCopy(H_PARSE_TREE Other)

void LVParseTree_Copy (H_PARSE_TREE Tree, H_PARSE_TREE Other)

void LVParseTree_Release (H_PARSE_TREE Tree)

Parameters

Tree

A handle to a parse tree being released or copied into

Other

A handle to a parse tree being copied.

Remarks

CreateFromCopy and Copy both perform deep copies on the handles in question. Both handles will have to be released after either function call to release all allocated memory. Tree handles given to the user via LV_SRE_CreateParseTree must also be released.

Example

HPORT Port;

//open the port and do a decode //... //when the decode is finished,grab a parse tree handle H_PARSE_TREE Tree = LV_SRE_CreateParseTree(Port, voicechannel, index);


192

//start using the tree. //... //When you are done with it, release it. LVParseTree_Release(Tree);

See Also

Constructing, Copying and Destroying an LVParseTree Object (C++ API)


193

LVParseTree_GetGrammarLabel

Returns the name of the grammar that generated this tree.

Function

const char* LVParseTree_GrammarLabel (H_PARSE_TREE Tree)

Parameters

Tree

A handle to the parse tree.

Remarks

LVParseTree_GetGrammarLabel( ) will always return the name of one of the grammars you activated for decode. If the active grammar had an integer label, then the returned label will be a string representation of that integer.

See Also

LVParseTree::GrammarLabel ( C++ API )


194

LVParseTree_GetLanguage

Returns the language identifier of the grammar that generated this tree.

Function

const char* LVParseTree_GetLanguage(H_PARSE_TREE Tree)

Parameters

Tree

A handle to a parse tree.

Returns


See Also

LVParseTree::Language ( C++ API )


195

LVParseTree_GetMode

Returns the interaction mode that created the tree.

Function

const char* LVParseTree_GetMode(H_PARSE_TREE Tree)

Parameters

Tree

A handle to a parse tree.

Returns

"voice" or "dtmf"

See Also

LVParseTree::Mode (C++ API)


196

LVParseTree_GetTagFormat

Returns the name of the tag format declared in the matching grammar for this tree.

Function

const char* LVParseTree_GetTagFormat(H_PARSE_TREE Tree)

Parameters

Tree

A handle to a parse tree

See Also

LVParseTree::TagFormat (C++ API)


197

LVParseTree_GetRoot

Gets the root parse tree node.

Function

H_PARSE_TREE_NODE LVParseTree_GetRoot(H_PARSE_TREE Tree);

Parameters

Tree

Handle to a parse tree.

Return Values

An H_PARSE_TREE_NODE handle representing the toplevel rule of the matching grammar.

Remarks

This node will always be a rule node (i.e will always satisfy LVParseTree_Node_IsRule(root) == 1). If the matching grammar specified a root rule then this node will always represent that rule.

See Also

LVParseTree::Root ( C++ API )


198

LVParseTree_CreateIteratorBegin and LVParseTree_CreateIteratorEnd

LVParseTree_CreateIteratorBegin and LVParseTree_CreateIteratorEnd provide iterators for visiting every node in the tree in a top-to-bottom, left-to-right descent. It is also the basis for the Tag and Terminal iterators.

Functions

H_PARSE_TREE_ITR LVParseTree_CreateIteratorBegin(H_PARSE_TREE Tree)

H_PARSE_TREE_ITR LVParseTree_CreateIteratorEnd(H_PARSE_TREE Tree)

Parameters

Tree


Example

The following code prints out every node in a parse tree.

H_PARSE_TREE_ITR Itr; H_PARSE_TREE_ITR End; H_PARSE_TREE_NODE Node;

Itr = LVParseTree_CreateIteratorBegin(Tree); End = LVParseTree_CreateIteratorEnd(Tree);

while (!LVParseTree_Iterator_AreEqual(Itr,End)) { H_PARSE_TREE_NODE Node = LVParseTree_Iterator_GetNode(Itr); for (int i = 0; i < LVParseTree_Node_GetLevel(Node); ++i) printf("\t"); if (LVParseTree_Node_IsRule(Node)) printf("$%s:\n",LV_ParseTree_Node_GetRuleName(Node)); if (LVParseTree_Node_IsTag(Node)) printf("{%s}\n",LVParseTree_Node_GetText(Node)); if (LVParseTree_Node_IsTerminal(Node)) printf("\"%s\"\n",LVParseTree_Node_GetText(Node)); LVParseTree_Iterator_Advance(Itr); }


199

LVParseTree_Iterator_Release(Itr); LVParseTree_Iterator_Release(End);

/* Note: Node handles don't get released; They are part of the tree, and the tree releases them when it gets released */

If the grammar was the top level navigation example grammar, and the engine recognized "go back", the the above code would print out:

$directive: "go" "back" {$ = "APPLICATION_BACK"}

See Also

LVParseTree::Begin and LVParseTree::End (C++ API)


200

LVParseTree_CreateTerminalIteratorBegin LVParseTree_CreateTerminalIteratorEnd

LVParseTree_CreateTerminalIteratorBegin and LVParseTree_CreateTerminalIteratorEnd provide access to the "terminals" of the tree. Terminals are the words and phrases in your grammar, so a TerminalIterator gives you access the the exact words the SRE heard a speaker say to match a grammar, in the order that the SRE heard those words.

Functions

H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorBegin(H_PARSE_TREE Tree)

H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorEnd(H_PARSE_TREE Tree)

Parameters

Tree


Example

The following code prints out the sentence SRE heard, with a word-level confidence score attached to each word.

H_PARSE_TREE_TERMINAL_ITR Itr; H_PARSE_TREE_TERMINAL_ITR End; H_PARSE_TREE_NODE Node;

Itr = LVParseTree_CreateTerminalIteratorBegin(Tree); End = LVParseTree_CreateTerminalIteratorEnd(Tree);

while (!LVParseTree_TerminalIterator_AreEqual(Itr,End)) { Node = LVParseTree_TerminalIterator_GetNode(Itr); printf("\"%s\":(%i)\n",LVParseTree_Node_GetText(Node), LVParseTree_Node_GetScore(Node)); LVParseTree_TerminalIterator_Advance(Itr); } printf("\n");


201

LVParseTree_TerminalIterator_Release(Itr); LVParseTree_TerminalIterator_Release(End);


So if the grammar being used was the top level navigation example grammar, and the SRE recognized "go back", then the output of the above code might look like:

"go":(850) "back":(901)

See Also

LVParseTree::TerminalsBegin and LVParseTree::TerminalsEnd (C++ API)


202

LVParseTree_CreateTagIteratorBegin LVParseTree_CreateTagIteratorEnd

LVParseTree_CreateTagIteratorBeginand LVParseTree_CreateTagIteratorEnd provide iterators for visiting the tags in the tree's body.

Functions

H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorBegin(H_PARSE_TREE Tree)

H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorEnd(H_PARSE_TREE Tree)

Parameters

Tree


Example

The following code prints out every tag in a parse tree.

H_PARSE_TREE_TAG_ITR Itr; H_PARSE_TREE_TAG_ITR End; H_PARSE_TREE_NODE Node;

Itr = LVParseTree_CreateTagIteratorBegin(Tree); End = LVParseTree_CreateTagIteratorEnd(Tree);

while (!LVParseTree_TagIterator_AreEqual(Itr,End)) { Node = LVParseTree_TagIterator_GetNode(Itr); printf("%s;\n",LVParseTree_Node_GetText(Node)); LVParseTree_TagIterator_Advance(Itr); }

LVParseTree_TagIterator_Release(Itr); LVParseTree_TagIterator_Release(End);



203


$ = "APPLICATION_BACK";

Remark

The TagIterator does not visit the tags in a tree's header. Use LVParseTree::HeaderTag to access the contents of those tags.

See Also

LVParseTree::TagsBegin and LVParseTree::TagsEnd (C API)


204

Related APIs

LVParseTree_Node API

An LVParseTree is made out of Node objects. Each node represents a word, rule, or tag that was seen by the engine as it decoded an utterance against the matching grammar.

Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>

Return Type Function

H_PARSE_TREE_NODE LVParseTree_Node_GetParent (H_PARSE_TREE_NODE Node)

H_PARSE_TREE_CHILDREN_ITR LVParseTree_Node_CreateChildrenIteratorBegin(H_PARSE_TREE_NODE Node)

H_PARSE_TREE_CHILDREN_ITR LVParseTree_Node_CreateChildrenIteratorEnd(H_PARSE_TREE_NODE Node)

H_PARSE_TREE_ITR LVParseTree_Node_CreateIteratorBegin(H_PARSE_TREE_NODE Node)


205

H_PARSE_TREE_ITR LVParseTree_Node_CreateIteratorBegin(H_PARSE_TREE_NODE Node)

H_PARSE_TREE_TERMINAL_ITR LVParseTree_Node_CreateTerminalIteratorBegin(H_PARSE_TREE_NODE Node)

LH_PARSE_TREE_TERMINAL_ITR LVParseTree_Node_CreateTerminalIteratorEnd(H_PARSE_TREE_NODE Node)

H_PARSE_TREE_TAG_ITR LVParseTree_Node_CreateTagIteratorBegin(H_PARSE_TREE_NODE Node)

H_PARSE_TREE_TAG_ITR LVParseTree_Node_CreateTagIteratorEnd(H_PARSE_TREE_NODE Node)

int LVParseTree_Node_IsRule (void)


206

int LVParseTree_Node_IsTerminal (void)

int LVParseTree_Node_IsTag (void)

const char* LVParseTree_Node_GetText (void)


207

const char* LVParseTree_Node_GetPhonemes (void)

const char* LVParseTree_Node_GetRuleName (void)

int LVParseTree_Node_GetScore (void)

int LVParseTree_Node_GetStartTime (void)


208

int LVParseTree_Node_GetEndTime (void)


209


210

LVParseTree_Iterator C API

An LVParseTree_Iterator object traverses a parse tree in a top-to-bottom, left-to-right fashion (sometimes called a pre-order or LL traversal). You can get an iterator over a subtree rooted at a Node by calling:

LVParseTree_Node_CreateIteratorBegin(H_PARSE_TREE_NODE Node)

LVParseTree_Node_CreateIteratorEnd(node)



H_PARSE_TREE_ITR LVParseTree_Iterator_Create(void)

Creates a blank Iterator; its not pointing over anything.

H_PARSE_TREE_ITR LVParseTree_Iterator_CreateFromCopy(H_PARSE_TREE_ITR Other)

Creates a new Iterator from another. Both Iterators will need to be released when no longer needed.

void LVParseTree_Iterator_Copy(H_PARSE_TREE Iterator, H_PARSE_TREE_ITR Other)

Copies the data from one handle into another.

void LVParseTree_Iterator_Release(H_PARSE_TREE Iterator)

Releases the memory allocated to the Iterator


211

handle.

void LVParseTree_Iterator_Advance(H_PARSE_TREE_ITR Iterator) Advances the Iterator one position.

H_PARSE_TREE_NODE LVParseTree_Iterator_GetNode(H_PARSE_TREE_ITR Iterator)

Provides access to a node in the parse tree.

int

LVParseTree_Iterator_AreEqual(H_PARSE_TREE_ITR Iterator1, H_PARSE_TREE_ITR Iterator2)

Tests equality with another Iterator. Two Iterators are equal if they are pointing to the same node in a parse tree.


212

LVParseTree_ChildrenIterator C API

An LVParseTree_ChildrenIterator object traverses the immediate children of a rule node, from left to right. You get a ChildrenIterator object from a Node by calling

LVParseTree_Node_CreateChildrenIteratorBegin(H_PARSE_TREE_NODE Node)

LVParseTree_Node_CreateChildrenIteratorEnd(H_PARSE_TREE_NODE Node)

With these iterators, you can traverse the immediate children of Node.



H_PARSE_TREE_CHILDREN_ITR LVParseTree_ChildrenIterator_Create(void)

H_PARSE_TREE_CHILDREN_ITR LVParseTree_ChildrenIterator_CreateFromCopy (H_PARSE_TREE_CHILDREN_ITR Other)

void LVParseTree_ChildrenIterator_Copy(H_PARSE_TREE_CHILDREN_ITR Itr, H_PARSE_TREE_CHILDREN_ITR Other)

void LVParseTree_ChildrenIterator_Release(H_PARSE_CHILDREN_ITR Itr)


213

void LVParseTree_ChildrenIterator_Advance(H_PARSE_TREE_CHILDREN_ITR Itr)

H_PARSE_TREE_NODE LVParseTree_ChildrenIterator_GetNode(H_PARSE_TREE_CHILDREN_ITR Itr)

int

LVParseTree_ChildrenIterator_AreEqual(H_PARSE_TREE_CHILDREN_ITR Itr1, H_PARSE_TREE_CHILDREN_ITR Itr2)


214

LVParseTree_TerminalIterator C API

An LVParseTree_TerminalIterator object is an adaptation of the standard LVParseTree_Iterator. It only visits the nodes in a tree that are terminals. You can get a TerminalIterator from a Node by calling:

LVParseTree_Node_CreateTerminalIteratorBegin(H_PARSE_TREE_NODE Node)

LVParseTree_Node_CreateTerminalIteratorEnd(H_PARSE_TREE_NODE Node)

With these iterators, you can visit all of the terminal nodes in the subtree rooted by Node.



H_PARSE_TREE_TERMINAL_ITR LVParseTree_TerminalIterator_Create(void)

H_PARSE_TREE_TERMINAL_ITR LVParseTree_TerminalIterator_CreateFromCopy (H_PARSE_TREE_TERMINAL_ITR Other)

void LVParseTree_TerminalIterator_Copy(H_PARSE_TREE_TERMINAL_ITR Itr, H_PARSE_TREE_TERMINAL_ITR Other)

void LVParseTree_TerminalIterator_Release(H_PARSE_TERMINAL_ITR Itr)


215

void LVParseTree_TerminalIterator_Advance(H_PARSE_TREE_TERMINAL_ITR Itr)

H_PARSE_TREE_NODE LVParseTree_TerminalIterator_GetNode(H_PARSE_TREE_TERMINAL_ITR Itr)

int

LVParseTree_TerminalIterator_AreEqual(H_PARSE_TREE_TERMINAL_ITR Itr1, H_PARSE_TREE_TERMINAL_ITR Itr2)


216

LVParseTree_TagIterator C API

An LVParseTree_TagIterator object is an adaptation of the standard LVParseTree_Iterator. It only visits the nodes in a tree that are tags. You can get a tag iterator from a Node by calling:

LVParseTree_Node_CreateTagIteratorBegin(H_PARSE_TREE_NODE Node)

LVParseTree_Node_CreateTagIteratorEnd(H_PARSE_TREE_NODE Node)

With these iterators, you can traverse all of the tags in the subtreee rooted by Node.



H_PARSE_TREE_TAG_ITR LVParseTree_TagIterator_Create(void)

Creates a blank iterator; its not pointing over anything.

H_PARSE_TREE_TAG_ITR LVParseTree_TagIterator_CreateFromCopy (H_PARSE_TREE_TAG_ITR Other)

Creates a new iterator from another Both iterators will need to be released when no longer needed.

void LVParseTree_TagIterator_Copy(H_PARSE_TREE_TAG_ITR Itr, H_PARSE_TREE_TAG_ITR Other)

Copies the data from one handle into another.

void LVParseTree_TagIterator_Release(H_PARSE_TREE_TAG_ITR Itr) Releases the memory allocated to


217

the iterator handle.

void LVParseTree_TagIterator_Advance(H_PARSE_TREE_TAG_ITR Itr) Advances the iterator one position.

H_PARSE_TREE_NODE LVParseTree_TagIterator_GetNode(H_PARSE_TREE_TAG_ITR Itr)

Provides access to a node in the parse tree.

int

LVParseTree_TagIterator_AreEqual(H_PARSE_TREE_TAG_ITR Itr1, H_PARSE_TREE_TAG_ITR Itr2)

Tests equality with another iterator. Two iterators are equal if they are pointing to the same node in a parse tree.


218

LVParseTree Class

The following C API is exported from "LV_SRE_ParseTree.h". An LVParseTree class is available for C++ programmers which wraps this API.

See Also Using the Parse Tree Tutorial


H_PARSE_TREE LVParseTree_Create(void) Constructs an LVParseTree object.

H_PARSE_TREE LVParseTree_CreateFromCopy(H_PARSE_TREE Other)

Copy constructor

void LVParseTree_Copy(H_PARSE_TREE Tree, H_PARSE_TREE Other)

Assignment operator

void LVParseTree_Release (H_PARSE_TREE Tree) Destroys the LVParseTree object

H_PARSE_TREE_NODE LVParseTree_GetRoot (H_PARSE_TREE Tree)

Provides access to the parent node in the parse tree.

H_PARSE_TREE_ITR LVParse_CreateIteratorBegin (H_PARSE_TREE Tree)

Provides an iterator that walks each node in the tree in a top-to-bottom, left-to-right fashion


219

H_PARSE_TREE_ITR LVParseTree_CreateIteratorEnd (H_PARSE_TREE Tree)

Marks the end of traversal for the parse tree iterator

H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorBegin (H_PARSE_TREE Tree)

Traverses the terminals of the parse tree (words).

H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorEnd (H_PARSE_TREE Tree)

Marks the end of traversal for the TerminalIterator.

H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorBegin (H_PARSE_TREE Tree)

Traverses the tags in the parse tree (semantic data).

H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorEnd (H_PARSE_TREE Tree)

Marks the end of traversal for the TagIterator

const char* LVParseTree_GetTagFormat (H_PARSE_TREE Tree)

Returns the tag format, as described by the grammar that this tree matched (e.g. "lumenvox/1.0" or "semantics/1.0")

int LVParseTree_GetNumberOfTagsInHeader (H_PARSE_TREE Tree)

Returns the number of tags (semantic data) that were defined in the matching


220

grammar's header.

const char* LVParseTree_GetHeaderTag (H_PARSE_TREE Tree, int i)

Returns the ith header tag from the matching grammar.

const char* LVParseTree_GetGrammarLabel (H_PARSE_TREE Tree)

Returns the name of the matching grammar that was provided to the speech port when it was loaded

const char* LVParseTree_GetMode (H_PARSE_TREE Tree)

Returns the mode of the utterance decode that created this tree: "voice" or "dtmf"

const char* LVParseTree_GetLanguage (H_PARSE_TREE Tree )

Returns the language of the matching grammar (e.g. "en-US" or "es-MX")


221

LVGrammar C API Functions

LVGrammar Summary

The LVGrammar API allows you to manipulate a context-free grammar object that can be used in the engine to recognize speech.

Use <LVSpeechPor.h> or <LV_SRE_Grammar.h>


HGRAMMAR LVGrammar_Create() Constructs an grammar object.

HGRAMMAR LVGrammar_CreateFromCopy (HGRAMMAR other)

Constructs an grammar object by copying an existing one.

void LVGrammar_Copy (HGRAMMAR hgram, HGRAMMAR other)

Copy object pointed by other to the object pointed by hgram.

void LVGrammar_Release (HGRAMMAR hgram)

Destroys the grammar object.

int LVGrammar_Reset (HGRAMMAR hgram) Reset an grammar object.

void LVGrammar_RegisterLoggingCallback (HGRAMMAR hgram, GrammarLogCB Log, void* UserData)

Registers a callback so the object can report warnings and errors to the grammar author.

int LVGrammar_SaveCompiledGrammar (HGRAMMAR hgram, const char* filename)

Save the grammar object to a binary file


222

int LVGrammar_LoadCompiledGrammar (HGRAMMAR hgram, const char* filename)

Load the grammar object from a binary file

int LVGrammar_LoadGrammar (HGRAMMAR hgram, const char* uri)

Loads a grammar from a location specified by the "uri" argument.

int LVGrammar_LoadGrammarFromBuffer (HGRAMMAR hgram, const char* buffer)

Loads a grammar from a null terminated string containing the contents of the grammar.

int LVGrammar_AddRule (HGRAMMAR hgram, const char* left_hand_side, const char* right_hand_side)

Inserts a new rule into the grammar.

int LVGrammar_RemoveRule (HGRAMMAR hgram, const char* left_hand_side)

Removes a rule from the grammar.

int LVGrammar_SetRoot (HGRAMMAR hgram, const char* root)

Sets a starting rule for the grammar.

void LVGrammar_SetMode (HGRAMMAR hgram, const char* mode)

Declare the mode of grammar (the style of decode to be processed). Legal arguments are "voice" or "dtmf".

const char* LVGrammar_GetMode (GRAMMAR hgram)

Retrieve the mode of the grammar.

void LVGrammar_SetLanguage (HGRAMMAR hgram, const char* language)

Specify the language of this grammar as a language/country code pair. Legal arguments include "en-US" and "es-MX".


223

const char* LVGrammar_GetLanguage (HGRAMMAR hgram)

Retrieve the language setting of the grammar.

int

LVGrammar_SetTagFormat (HGRAMMAR hgram, const char* tag_format)

Identify the tag format of the grammar. To use the LumenVox semantic interpretation, the tag format must be "lumenvox/1.0" or "semantics/1.0".

const char* LVGrammar_GetTagFormat (HGRAMMAR hgram)

Retrieve the tag format of the grammar.

int LVGrammar_GetNumberOfMetaData (HGRAMMAR hgram)

Retrieve number of meta data in the grammar

const char* LVGrammar_GetMetaDataKey (HGRAMMAR hgram, int index)

Returns the key of the meta data indicated by the index.

const char* LVGrammar_GetMetaDataValue (HGRAMMAR hgram, int index)

Returns the value of the meta data indicated by the index.

int

LVGrammar_ParseSentence (HGRAMMAR hgram, const char* sentence)

Use the grammar to parse a sentence.

int LVGrammar_GetNumberOfParses (HGRAMMAR hgram)

Returns the number of parses created by the most recent LVGrammar_ParseSentence call.

H_PARSE_TREE LVGrammar_CreateParseTree (HGRAMMAR hgram, int index)

Returns the parse tree handle indicated by the index.


224

int LVGrammar_InterpretParses (HGRAMMAR hgram)

Generate interpretations form parses trees created by the most recent LVGrammar_ParseSentence call.

int LVGrammar_GetNumberOfInterpretations (HGRAMMAR hgram)

Returns the number of interpretations created by the most recent LVGrammar_InterpretParses call.

H_SI LVGrammar_CreateInterpretation (HGRAMMAR hgram, int index)

Returns the semantic interpretation handle indicated by the index


225

API Functions

LVGrammar_AddRule

Add rules to a grammar object.

Function

int LVGrammar_AddRule(HGRAMMAR hgram, const char* rule_name, const char* rule_definition)

Parameters

hgram

A handle to the grammar.

rule_name

The name of the rule

rule_definition

The definition of the rule

Return Values

LV_SUCCESS

No errors; the rule has been successfully added or removed.


The new rule was not fully conforming, but it was understandable and is now ready to be used


The new rule was not understandable to the grammar compiler. You will not be able to decode with this grammar.

Example


226

LVGrammar_AddRule(hgram, "foo", "hello [world]");

Is the same as writing a rule:

$foo = hello [world];

Remarks

New rules must be written in ABNF notation. Detailed error and warning messages are sent to the grammar object's logging callback function.

See Also

LVGrammar_RemoveRule

LVGrammar::AddRule (C++ API)


227

LVGrammar_SetRoot

Identifies one of the grammar rules as the root rule. The root rule is where the engine starts its search.

Function

int LVGrammar_SetRoot(HGRAMMAR hgram, const char* rule_name)

Parameters

hgram


rule_name

The name of the rule.

Example

LVGrammar_SetRule(hgram, "foo");

Is the same as writing in a grammar:

root $foo;

See Also

LVGrammar::SetRoot (C++ API)


228

LVGrammar_SetMode

Set mode property for the grammar,

Function

int LVGrammar_SetMode(HGRAMMAR hgram, const char* mode)

Parameters

hgram


mode

The interaction mode of the grammar.

Example

LVGrammar_SetLanguage(hgram, "en-US"); LVGrammar_SetMode(hgram,"voice"); LVGrammar_SetTagFormat(hgram,"lumenvox/1.0");

Is the same as writing in your grammar:

language "en-US; mode "voice"; tag-format <lumenvox/1.0>;

See Also

LVGrammar_GetMode

LVGrammar::SetMode (C++API)


229

LVGrammar_Create

Creates an empty grammar object, and returns the handle.

Function

HGRAMMAR LVGrammar_Create()

Parameters

Return Values

A handle to the created grammar object.

Remarks

The memory pointed by the returned handle will not be released until the user called LVGrammar_Release explicitly.

See Also

LVGrammar_Release


230

LVGrammar_CreateFromCopy

Creates an grammar object by copying another one, and returns the handle.

Function

HGRAMMAR LVGrammar_CreateFromCopy(HGRAMMAR another)

Parameters

another

The grammar object to copy from.

Return Values

A handle to the created grammar object.

Remarks

The memory pointed by the returned handle will not be released until the user called LVGrammar_Release explicitly.

See Also

LVGrammar_Release


231

LVGrammar_Copy

Copy one grammar object to another.

Function

int LVGrammar_Copy (HGRAMMAR hgram, HGRAMMAR other)

Parameters

hgrammar

Destination grammar object handle.

other

Source grammar object handle.

Return Values

LV_SUCCESS

LV_FAILURE

Remarks

This function doesn't create new objects for the destination handle. So no memory will be allocated. It is users' responsibility to make sure that the object pointed by the destination handle has already been created before calling this function.

See Also

LVGrammar::operator = (C++ API)


232

LVGrammar_Reset

Reset a grammar object.

Function

int LVGrammar_Reset (HGRAMMAR Grammar)

Parameters

hgram

The handle to the grammar object to be reset.

Return Values

LV_SUCCESS

LV_FAILURE

See Also

LVGrammar::Reset (C++ API)


233

LVGrammar_Release

Destroy a grammar object.

Function

void LVGrammar_Release (HGRAMMAR Grammar)

Parameters

hgram

The handle to the grammar object to be released.

Remarks

The grammar object created by LVGrammar_Create and LVGrammar_CreateFromCopy need to be explicitly destroyed by calling LVGrammar_Release.

See Also

LVGrammar_Create

LVGrammar_CreateFromCopy


234

LVGrammar_RegisterLoggingCallback

Registers a callback so the object can report warnings and errors to the grammar author via the callback function.

Function

void LVGrammar_RegisterLoggingCallback (HGRAMMAR hgram, GrammarLogCB log, void* userData)

Parameters

hgram

The handle to the grammar object.

log

The logging callback function pointer.

userdata

The pointer to user defined data associated with the grammar object pointed by Grammar. It will be passed into the callback function.

Remarks

The call back function need to have signature defined by GrammarLogCB.

See Also

LVGrammar::RegisterLoggingCallback (C++ API)


235

LVGrammar_SaveCompiledGrammar

Save a grammar object to a binary file.

Function

int LVGrammar_SaveCompiledGrammar (HGRAMMAR hgram, const char* filename)

Parameters

hgram

The handle to a grammar object.

filename

File name.

Return Values

LV_SUCCESS

LV_FAILURE

Remarks

The saved compiled grammar can be later loaded into a grammar object with LVGrammar_LoadCompiledGrammar.

See Also

LVGrammar_LoadCompiledGrammar

LVGrammar::SaveCompiledGrammar (C++ API)


236

LVGrammar_LoadCompiledGrammar

Load a grammar object from a binary file previously saved by LVGrammar_SaveCompiledGrammar.

Function

int LVGrammar_LoadCompiledGrammar (HGRAMMAR hgram, const char* filename)

Parameters

hgram


filename

File name.

Return Values

LV_SUCCESS

LV_FAILURE

See Also

LVGrammar_SaveCompiledGrammar

LVGrammar::LoadCompiledGrammar (C++ API)


237

LVGrammar_LoadGrammar

Loads a grammar from a local file or remote file via http or ftp. Grammar can be written in ABNF or XML notations.

Function

int LVGrammar_LoadGrammar(HGRAMMAR hgram, const char* grammar_location)

Parameters

hgram

Handle to a grammar object.

gram_location


Return Values

LV_SUCCESS









238

Remarks

Detailed error and warning messages are sent to the grammar object's logging callback function.

See Also

LVGrammar::LoadGrammar (C++ API)


239

LVGrammar_LoadGrammarFromBuffer

Loads a grammar from a null terminated string buffer. Grammar can be written in ABNF or XML notations.

Function

int LVGrammar_LoadGrammarFromBuffer(HGRAMMAR hgram, const char* grammar_contents);

Parameters

hgram

Handle to a grammar object.

gram_contents

A null terminated string containing the contents of a valid SRGS grammar.

Return Values

LV_SUCCESS








Remarks


240


See Also

LVGrammar::LoadGrammarFromBuffer (C++ API)


241

LVGrammar_RemoveRule

Remove rules to a grammar object.

Function

int LVGrammar_RemoveRule(HGRAMMAR hgram, const char* rule_name)

Parameters

hgram


rule_name


Return Values

LV_SUCCESS






Remarks


See Also


242

LVGrammar_AddRule

LVGrammar::RemoveRule (C++ API)


243

LVGrammar_SetLanguage

Set language for the grammar,

Function

int LVGrammar_SetLanguage(HGRAMMAR hgram, const char* language)

Parameters

hgram


language

The language identifier for the grammar

Example




See Also

LVGrammar_GetLanguage

LVGrammar::SetLanguage (C++ API)


244

LVGrammar_SetTagFormat

Set interpretation tag format of the grammar.

Function

int LVGrammar_SetTagFormat(HGRAMMAR hgram, const char* tag_format)

Parameters

hgram


tag_format

The grammar's tag format.

Example




See Also

LVGrammar_GetTagFormat

LVGrammar::SetTagFormat (C++ API)


245

LVGrammar_GetMode

Return the mode setting for the grammar,

Function

const char* LVGrammar_GetMode(HGRAMMAR hgram)

Parameters

hgram


Return Values


See Also

LVGrammar_SetMode

LVGrammar::GetMode (C++API)


246

LVGrammar_GetLanguage

Return the language setting for the grammar,

Function

const char* LVGrammar_GetLanguage(HGRAMMAR hgram)

Parameters

hgram


Return Values

The language identifier of the grammar.

See Also

LVGrammar_SetLanguage

LVGrammar::GetLanguage (C++API)


247


Return the interpretation tag format setting for the grammar,

Function

const char* LVGrammar_GetTagFormat(HGRAMMAR hgram)

Parameters

hgram


Return Values

The tag format of the grammar.

See Also

LVGrammar_SetTagFormat

LVGrammar::GetTagFormat (C++API)


248

LVGrammar_GetNumberOfMetaData

Return the number of meta data contained in the grammar.

Function

int LVGrammar_GetNumberOfMetaData(HGRAMMAR hgram)

Parameters

hgram


Example

If the grammar contains the following lines:

meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';

You can access meta data as follows:

int count = LVGrammar_GetNumberOfMetaData(grammar); // returns 2 const char* key = LVGrammar_GetMetaDataKey(grammar, 0); // returns "description" const char* value = LVGrammar_GetMetaDataValue(grammar, 1); // returns "05/12/2005"

See Also

LVGrammar_GetMetaDataKey

LVGrammar_GetMetaDataValue

LVGrammar::GetNumberOfMetaData (C++ API)


249


Return the key of the meta data indicated by the index.

Function

int LVGrammar_GetMetaDataKey(HGRAMMAR hgram, int index)

Parameters

hgram


index

Index of the meta data. It should be in the range [0, LVGrammar_GetNumberOfMetaData).

Return Values

null

The index is not valid.

non-null

A pointer to the value string.

Example

If the grammar has following lines:





250

See Also



LVGrammar::GetMetaDataKey (C++ API)


251


Return the value of the meta data indicated by the index.

Function

int LVGrammar_GetMetaDataValue(HGRAMMAR hgram, int index)

Parameters

hgram


index

Index of the meta data. It should be in the range [0, LVGrammar_GetNumberOfMetaData).

Return Values

null


non-null


Example






252

See Also



LVGrammar::GetMetaDataValue (C++ API)


253

LVGrammar_ParseSentence

Use a loaded grammar object to parse a sentence.

Function

int LVGrammar_ParseSentence(HGRAMMAR hgram, const char* sentence)

Parameters

hgram


sentence

The sentence to parse.

Return Values

0

The sentence is not covered by the grammar.

non-0

The number of distinct parses.

Example

Assume a grammar was defined as:

root $yes_no; $yes_no = $yes | $no; $yes = yes [please]; $no = no [thank you];

You can use this grammar to validate sentences as follows:

int count = LVGrammar_ParseSentence(grammar, "no thank you"); // returns 1 int count = LVGrammar_ParseSentence(grammar, "no thanks"); // returns 0


254

Remarks

With this function, you can identify how well a grammar covers your targeted transcript set.

See Also

LVGrammar_GetNumberOfParses

LVGrammar_CreateParseTree

LVGrammar::ParseSentence (C++ API)


255


Return the number of parses created by the most recent call of LVGrammar_ParseSentence.

Function

int LVGrammar_GetNumberOfParses(HGRAMMAR hgram)

Parameters

hgram


Return Values

0


non-0


Remarks

This function can be used after a call to LVGrammar_ParseSentence. It is merely a convenience, as it returns the save value as the return value for LVGrammar_ParseSentence.

See Also



LVGrammar::NumberOfParses (C++ API)


256


Return the parse tree handle with the specified index.

Function

H_PARSE_TREE LVGrammar_CreateParseTree(HGRAMMAR hgram, int index)

Parameters

hgram


index

The index of the parse tree handle to be returned. It should be in the range [0, LVGrammar_GetNumberOfParses).

Return Values

null


non-null

The parse tree handle.

Remarks

This function should be used after a call to LVGrammar_ParseSentence.

If the returned handle is not null, you need to call LVParseTree_Release to destroy the parse tree object pointed by the handle.

See Also




257

LVGrammar::GetParseTree (C++ API)


258

LVGrammar_InterpretParses

Generate semantic interpretation results from the parse trees generated by the previous call to LVGrammar_ParseSentence.

Function

int LVGrammar_InterpretParses(HGRAMMAR hgram)

Parameters

hgram

A handle to a grammar.

Return Values

integer (>=0)

Number of available interpretations.

Remarks

Before passing a grammar object handle to this function, you should call LVGrammar_ParseSentence using that handle. Otherwise, that handle doesn't contain any parse tree information.

See Also


LVGrammar_GetNumberOfInterpretations

LVGrammar_CreateInterpretation

LVGrammar::InterpretParses (C++ API)


259


Return the number of semantic interpretations created by the most recent call to LVGrammar_InterpretParses.

Function

int LVGrammar_GetNumberOfInterpretations(HGRAMMAR hgram)

Parameters

hgram


Return Values

integer (>=0)


Remarks

This function can be used after a call to LVGrammar_InterpretParses. It is merely a convenience, as the return value of LVGrammar_InterpretParses provides the same information.

See Also



LVGrammar::GetNumberOfInterpretations (C++ API)


260


Returns the semantic interpretation handle indicated by the index.

Function

H_SI LVGrammar_CreateInterpretation (HGRAMMAR hgram, int index)

Parameters

hgram


index

The index of the interpretation handle to be returned. It should be in the range [0, LVGrammar_GetNumberOfInterpretations).

Return Values

null


non-null

The interpretation handle.

Remarks

This function should be used after a call to LVGrammar_InterpretParses. A non-null interpretation handle needs to be released after you are done using it, by calling LVInterpretation_Release

See Also



LVGrammar::GetInterpretation (C++ API)


261

LVSpeechPort Class

class LVSpeechPort

An LVSpeechPort Object represents one Speech Recognition Port and processes its sound data into text; all port instances can process their data in parallel. If the client application is multi-threaded, every thread that needs to process audio data should have its own LVSpeechPort.

Each port has multiple voice channels and grammar sets.

Each voice channel holds raw audio data. Before processing any data, the client application must call LoadVoiceChannel to load the channel. The channel keeps its own copy of this sound data, so the client application can free its copy after the call to LoadVoiceChannel. The voice channel will store the data until the client application loads new data into the channel. This allows the client application to decode the same sound data against different grammars without reloading the data.

The Decode method processes a voice channel against a grammar set, returning the concepts from the grammar set recognized in the channel’s audio data. Multiple voice channels are provided as a convenience, but only one voice channel can decode concurrently per port.

Use <LVSpeechPort.h>

Constructor/Destructors

LVSpeechPort Constructs an LVSpeechPort object.

~LVSpeechPort Closes the speech port object and releases its resources.

Functions


262

OpenPort Opens the speech port and initializes the SRE.

ClosePort Closes the port, and releases its resources.

Decode Processes the voice channel audio data against the active grammar.

ReturnErrorString Returns a description of an error code.

SetProperty Sets various properties on the port.

SetPropertyEx Sets various properties on various scopes.

SetClientPropertyEx Sets various properties on client process level. (static)

WaitForDecode Blocks the client application until the decode is finished.

WaitForEngineToIdle Blocks the client application until the port is idle (not decoding).

AddPhrase Adds a phrase to a new or existing concept.

GetConcept Returns one concept found in the last call to Decode.

GetConceptScore Returns the confidence score of a concept found in the last call to Decode.

GetNumberOfConceptsReturned Returns the number of concepts found in the last call to Decode.

GetPhonemesDecoded Returns the actual phonemes found in the


263

last call to Decode.

GetPhraseDecoded Returns the decoded phrase (with BNF formatting) found in the last call to Decode.

GetRawTextDecoded Returns the decoded raw text (without BNF formatting) found in the last call to Decode.

GetVoiceChannelData Returns the (original) preprocessed audio data for the voice channel.

LoadStandardGrammar Loads a standard, pre-defined grammar to easily recognize and format numbers, monetary figures or digits.

LoadVoiceChannel Loads the audio data into the specified voice channel prior to a call to Decode (which decodes the audio data).

RemoveConcept Removes a concept and all of its phrases.

ResetGrammar Removes all concepts from a grammar.

StreamStart Sets up a new stream.

StreamSendData Send data buffer of sound data to stream.

StreamGetStatus Returns status of stream.

StreamGetLength Returns length of sound data in stream buffer.

StreamSetStateChangeCallBack Set up a call back to receive state change notification of a stream.

StreamStop Stops stream and loads sound channel


264

with streamed data.

StreamCancel Stops stream, sound data is discarded.

StreamSetParameter Sets a new value for a stream property.

StreamGetParameter Gets the current value of a stream property.

StreamSetParameterToDefault Sets a stream property to its default value.

LoadGrammar functions Loads and compiles an SRGS grammar

UnloadGrammar functions Unloads a grammar from the speech port.

IsGrammarLoaded Checks if a grammar has already been compiled and loaded into port.

ActivateGrammar functions Activates an SRGS grammar for decoding

DeactivateGrammar functions Removes a grammar from the active grammar set.

GetNumberOfParses Returns the number of parses generated by the decode, according to the active grammars.

GetParseTree Returns a Parse Tree result.

GetParseTreeString Returns a string representation of the parse tree.

GetNumberOfInterpretations Returns the number of interpretations generated by the decode + semantic interpretation process.


265

GetInterpretation Returns an interpretation result.

GetInterpretationString Returns an XML snippet representation of the interpretation result.

GetNumberOfNBestAlternatives Returns number of n-best alternatives found by the engine.

SwitchToNBestAlternative Set the n-best alternative that is viewable.

Constants

Error Codes Error codes returned by methods.

Properties Property settings for the port.

Sound Formats Sound data format constants.

Standard Grammars Build-in grammar constants.


266

Methods

LVSpeechPort::LVSpeechPort

Constructs an LVSpeechPort object.

LVSpeechPort(void);

Remarks

Does not automatically open the port.


267

LVSpeechPort::~LVSpeechPort

Closes the speech port object and releases its resources.

~LVSpeechPort(void)

See Also

ClosePort


268


Opens the speech port and initializes the Speech Engine.

int OpenPort(ExportLogMsg Log, void* p, int verbosity);

Return Values

LV_SUCCESS

No errors; the port initialized successfully,

LV_FAILURE

Licensing has been exceeded. There are too many LVSpeechPorts active.

LV_SYSTEM_ERROR

The port is already open.

Parameters

Log

Pointer to a function which receives logging information from the LVSpeechPort instance.

p

A pointer to client application-defined data.

verbosity

range: 0 - 6



Remarks


269

This method activates the speech port object. The recognition engine will begin initializing when this function is called. Control will return to the application immediately.


See Also


ClosePort

LV_SRE_OpenPort


270

LVSpeechPort::GetOpenPortStatus

Returns a detailed code about the results of opening the speech port.

LVSpeechPort::GetOpenPortStatus( );

Return Values

LV_SUCCESS

The port opened successfully

LV_NO_SERVER_RESPONDING or LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING

The client could not find a server to request a licensed port from.

LV_OPEN_PORT_FAILED__LICENSES_SUCCEEDED

The primary server has too many ports connected for the number of licenses it has to give out.

See Also

OpenPort

ClosePort

LV_SRE_OpenPort


271

LVSpeechPort::ClosePort

Closes the port, and releases its resources.

int ClosePort(void);

Return Values

LV_SUCCESS

No errors; the port has successfully shutdown.

LV_FAILURE

The port was unable to shutdown.

LV_INVALID_HPORT

The port was never successfully opened, or was already closed.

Note:

Frees this port from counting against the number of ports allowed by your license. Close every port not needed anymore.

See Also

OpenPort

LV_SRE_ClosePort


272

LVSpeechPort::Decode

Processes the voice channel audio data against the active grammar.

int Decode(int VoiceChannel, int grammarset, unsigned int flags = 0);

Return Values

Zero (0) or greater indicates success.

A negative result indicates a specific error.

Parameters

VoiceChannel


GrammarSet

The grammar set to process.

Flags (bitwise OR flags to set desired options)

LV_DECODE_BLOCK - Decode will not return until it has finished.

LV_DECODE_GENDER_MALE - Gender identifier.

LV_DECODE_GENDER_FEMALE – Gender identifier.

LV_DECODE_FIRST_TIME_USER – Reset caller weights in Recognition Engine (not implemented).

LV_DECODE_USE_OOV - Use the Out-Of-Vocabulary filter (OOV) during decode.

Remarks

If LV_DECODE_BLOCK is set, Decode will not return until it has finished processing the data.


273

If LV_DECODE_BLOCK is not set, Decode returns immediately (but continues processing the data on a separate thread); the client application can continue its own work. Calling other LVSpeechPort methods may block until the Decode is finished. Once the client application is ready to check for results, call either 1) GetNumberOfConceptsReturned, or 2) WaitForEngineToIdle and then GetNumberOfConceptsReturned. WaitForEngineToIdle will only wait for a specified time, and returns regardless of whether Decode is finished, where GetNumberOfConceptsReturned will block until Decode is finished.

LV_DECODE_GENDER_FEMALE and LV_DECODE_GENDER_MALE identify which gender acoustic model to use. If these flags are not specified, the engine automatically decodes each audio file against both gender models. While this slows the engine by requiring two decodes, evaluating against both models has a very significant positive effect on recognition accuracy. Since the engine is multithreaded, unless CPU loads are a serious issue, do not use these flags.

On an error, call ReturnErrorString with the negative result from Decode to get a description of the error.

See Also

LV_SRE_Decode


274

LoadGrammar functions

Before you can use a grammar, you must load it into the speech port's collection of grammars, or you must load it into the collection of application-level (global) grammars. When you load a grammar, it is compiled for use in the LumenVox Speech Engine.

These functions load an SRGS grammar that will be usable by a single speech port object.

Functions

int LoadGrammar(const char* gram_name, const char* gram_location);

int LoadGrammar(int gram_name, const char* gram_location);

int LoadGrammarFromBuffer(const char* gram_name, const char* gram_contents);

int LoadGrammarFromBuffer(int gram_name, const char* gram_contents);

int LoadGrammarFromObject(const char* gram_name, LVGrammar& gram_obj);

int LoadGrammarFromObject(int gram_name, LVGrammar& gram_obj);

Parameters

gram_name

The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use. This can be a string, or an integer ID. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.

gram_location


gram_contents


275


gram_obj

An LVGrammar object.

Return Values

LV_SUCCESS








Remarks


See Also

LVSpeechPort::UnloadGrammar functions

LVSpeechPort::IsGrammarLoaded functions

LVSpeechPort::LoadGlobalGrammar functions


276

LV_SRE_LoadGrammar functions (C API)


277

LoadGlobalGrammar functions

When loading a global grammar, the grammar will be sent to the server. And all following decode requests only contain global grammar ID's, instead of the actual grammars, to avoid network transportation overhead on large grammars.

A global grammar is associated with the client process that loads that grammar. All speech ports that are belong to that client have access to that global grammar. However, different client processes don't share global grammars with each other.

Generally, the lifetime of a global grammar is controlled by load and unload functions. However, in the case that users terminate client process without unloading global grammars, in order to release un-used global grammars, the server periodically checks if the client process is still alive. Once the server detected that a client process has been inactive for more than 10 minutes, it will remove all grammars associated with that client process.

In multi-threaded program, it is safe to access global grammars in read-only fashion on multiple threads simultaneously. For instance, querying whether a global grammar is loaded, or calling decode with global grammars. In the case that loading or unloading takes place, such as unloading a global grammar while decoding on another thread with that grammar, it is users' responsibility to prevent racing from happening.

Functions

static int LoadGlobalGrammar (const char* gram_name, const char* gram_location);

static int LoadGlobalGrammarFromBuffer (const char* gram_name, const char* gram_contents);

static int LoadGlobalGrammarFromObject (const char* gram_name, LVGrammar& gram_obj);

Parameters

gram_name

The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use.

gram_location


278


gram_contents


gram_obj

An LVGrammar object.

Return Values

LV_SUCCESS

No errors; this grammar is now ready to use.


The grammar file was not fully conforming, but it was understandable and is now ready for use.






Fail to send the grammar to all servers.



279

Fail to send the grammar to some of the servers.

Remarks

Detailed error and warning messages are sent to the LVSpeechPort application-level logging callback function at priorities 0 and 1, respectively.

Users can load the same grammar with different labels. That will only create one instance of that grammar on the server.

See Also

LVSpeechPort::LoadGrammar functions

LVSpeechPort::IsGlobalGrammarLoaded functions

LVSpeechPort::UnloadGlobalGrammar functions

LV_SRE_LoadGlobalGrammar functions (C API)


280

UnloadGrammar functions

These functions remove a loaded grammar from a speech port object. The last function removes all loaded grammars from the speech port.

Functions

int UnloadGrammar(const char* gram_name);

int UnloadGrammar(int gram_name);

void UnloadGrammars();

Parameters

gram_name

The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it. It can be a null terminated string, or an integer.

Return Values

LV_SUCCESS


LV_FAILURE


Remarks

Grammars that were activated and then unloaded are still active; they must be explicitly deactivated.

See Also




281


LV_SRE_UnloadGrammar functions (C API)


282

UnloadGlobalGrammar functions

These functions remove a loaded grammar from the application-level set of grammars. The second function removes all application-level grammars.

Functions

static int UnloadGlobalGrammar(const char* gram_name);

static void UnloadGlobalGrammars( );

Parameters

gram_name

The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it.

Return Values

LV_SUCCESS


LV_FAILURE


Remarks

A global grammar is unloaded on the server only when users have called unload functions on all labels that are associated with the grammar.

See Also




LV_SRE_UnloadGlobalGrammar functions (C API)


283

IsGrammarLoaded functions

Functions

bool IsGrammarLoaded(const char* gram_name);

bool IsGrammarLoaded(int gram_name);

Parameters

gram_name


Return Values


Remarks


See Also




LV_SRE_IsGrammarLoaded functions (C API)


284

IsGlobalGrammarLoaded

Function

static bool IsGlobalGrammarLoaded(const char* gram_name);

Parameters

gram_name


Return Values

true if a grammar was found with the label gram_name in the space of application-level grammars; false otherwise.

Remarks


See Also




LV_SRE_IsGlobalGrammarLoaded (C API)


285

ActivateGrammar functions

If you wish to use a speech port's loaded SRGS grammar for decode, you need to activate it. Activating a grammar puts it in the multi-grammar grammarset called LV_ACTIVE_GRAMMAR_SET. The grammars that were activated can then be used for a decode by specifying LV_ACTIVE_GRAMMAR_SET as the grammarset parameter in a call to Decode, or by setting the STREAM_PARM_GRAMMAR_SET equal to the LV_ACTIVE_GRAMMAR_SET before calling StreamStart. The reason for this mechanism is to maintain backward compatibility with previous APIs.

When ActivateGrammar is called, first the grammar is searched for among the grammars in the speech port's loaded grammars. If it can not be found there, the collection of application level grammars is searched. If you wish to explicitly activate an application level grammar, use ActivateGlobalGrammar

Functions

int ActivateGrammar(const char* gram_name);

int ActivateGrammar(int gram_name);

Parameters

gram_name

The identifier for the grammar being activated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.

Return Values

LV_SUCCESS



This grammar could not be activated, because it was not found in the speech port's set of loaded grammars.

Remarks


286


See Also


LVSpeechPort::ActivateGlobalGrammar

LV_SRE_ActivateGrammar functions (C API)


287

DeactivateGrammar functions

These functions remove a grammar from the set of active grammars. The last function clears the active grammar set

Functions

int DeactivateGrammar(const char* gram_name);

int DeactivateGrammar(int gram_name);

int DeactivateGrammars();

Parameters

hport


gram_name

The identifier for the grammar being deactivated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.

Return Values

LV_SUCCESS

No errors; this grammar is no longer active.

LV_FAILURE

This grammar could not be deactivated, because it was never successfully activated.

See Also

LVSpeechPort::ActivateGrammar functions

LV_SRE_DeactivateGrammar functions (C++ API)


288

GetNumberOfInterpretations

Returns the number of semantic interpretation results that were generated by the previous decode.

Function

int GetNumberOfInterpretations(int voicechannel)

Parameters

voicechannel


See Also

LVSpeechPort::GetInterpretation

LVSpeechPort::GetInterpretationString

LV_SRE_GetNumberOfInterpretations (C API)


289

GetInterpretation

Returns an LVInterpretation object representing the results of the semantic interpretation process.

Function

LVInterpretation GetInterpretation (int voicechannel, int index)

Parameters

voicechannel

The channel that the decode took place on.

index

An utterance could give rise to multiple interpretations, particularly if the grammars involved are ambiguous. index ranges from 0 to GetNumberOfInterpretations - 1.

Return Value

The return type is an interpretation object. The object is a representation of the ECMAScript object made by the matching grammar, using the Semantic Interpretation for Speech Recognition process. It also contains additional information such as the confidence score, matching grammar label, and the input sentence.

See Also

LVSpeechPort::GetNumberOfInterpretations

LVSpeechPort::GetInterpretationString

LVInterpretation C++ API

LV_SRE_CreateInterpretation (C API)


290

LVSpeechPort::GetNumberOfParses

Returns the number of parse trees that were generated by the previous decode.

Function

int GetNumberOfParses(int voicechannel)

Parameters

voicechannel


See Also

LVSpeechPort::GetParseTree

LVSpeechPort::GetParseTreeString


LV_SRE_GetNumberOfParses (C++ API)


291


Provides the user with an LVParseTree object representing the sentence structure of what was decoded by the Speech Engine, according to the active grammars.

Function

LVParseTree GetParseTree(int voicechannel, int index)

Parameters

voicechannel


index

It is possible to have more than one parse tree for an utterance (for instance if the grammar is ambiguous); this is the index of the tree

Return Value

A parse tree.

Remark

Logically, a parse tree and the parse string returned to the user are the same. However, an LVParseTree object makes it easy to search the parse tree for useful information.

See Also




LVParseTree C++ API

LV_SRE_CreateParseTree (C API)


292


Provides the user with a string representation of a speech parse tree.

Function

const char* GetParseTreeString(int voicechannel, int index)

Parameters

voicechannel


index

It is possible to have more than one parse tree possibility (for instance if the grammar is ambiguous); this is the index of the tree

Remark

Logically, a speech parse tree and the parse string returned to the user are the same. However, a speech parse tree makes it easy to search the parse tree for useful information. The parse tree string is based on the examples provided by the W3C SRGS specification

See Also




LV_SRE_GetParseTreeString (C API)


293

LVSpeechPort::GetNumberOfConceptsReturned

Returns the number of concepts found in the last call to Decode.

int GetNumberOfConceptsReturned(int VoiceChannel);

Return values

The number of concepts found for this voice channel.

Parameters

VoiceChannel

The voice channel processed by Decode.

See Also

LV_SRE_GetNumberOfConceptsReturned


294

LVSpeechPort::GetConcept

Returns one concept found in the last call to Decode.

const char* GetConcept(int VoiceChannel, int Index);

Return Values

A null-terminated string representing the matched concept .

NULL indicates that Index was outside the possible range.

Parameters

VoiceChannel


Index

The recognition position of the concept, between 0 and (GetNumberOfConceptsReturned - 1), inclusive.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the concepts highlighted:

See Also

LV_SRE_GetConcept


295

LVSpeechPort::GetConceptScore

Returns the confidence score of a concept found in the last call to Decode.

int GetConceptScore(int VoiceChannel, int Index);

Return Values

The confidence score of the matched concept. The range of possible values is 0 to 1000.

Parameters

VoiceChannel


Index

The recognition position of the concept, between 0 and (GetNumberOfConceptsReturned - 1), inclusive.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the scores highlighted:

See Also

LV_SRE_GetConceptScore


296

LVSpeechPort::LVGetPhonemesDecoded

Returns the actual phonemes found in a call to Decode.

const char* GetPhonemesDecoded(int VoiceChannel, int Index);

Return Values

A null-terminated static string of the decoded phonemes.

Parameters

VoiceChannel


Index

The recognition position of the decoded phonemes.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phonemes highlighted:

See Also

GetPhraseDecoded

GetRawTextDecoded

LV_SRE_GetPhonemes


297

LVSpeechPort::GetPhraseDecoded

Returns the decoded phrase (with BNF formatting) found in the last call to Decode.

const char* GetPhraseDecoded(int VoiceChannel, int Index);

Return Values

A null-terminated string representing the decoded string.

Parameters

VoiceChannel


Index

The recognition position of the decoded phrase.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phrases highlighted:

The main difference between LVSpeechPort::GetPhraseDecoded and LVSpeechPort::GetRawTextDecoded is in BNF formatting. LVSpeechPort::GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. LVSpeechPort::GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, LVSpeechPort::GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.

See Also


298

GetPhonemesDecoded

GetRawTextDecoded



299

LVSpeechPort::GetRawTextDecoded

Returns the decoded raw text (without BNF formatting) found in the last call to Decode.

const char* GetRawTextDecoded(HPORT hport,int VoiceChannel, int Index);

Return Values

A null-terminated string representing the decoded raw text.

Parameters

VoiceChannel


Index

The recognition position of the decoded raw text.

Remarks

Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the raw text highlighted:

The main difference between GetPhraseDecoded and GetRawTextDecoded is in BNF formatting. GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.

See Also


300

GetPhonemesDecoded

GetPhraseDecoded



301

LVSpeechPort::GetVoiceChannelData

Sets the pointers to the voice channel's copy of the original preprocessed audio data.

int GetVoiceChannelData(int VoiceChannel, short** PCM, unsigned int* Samples);

Return Values

LV_SUCCESS

No errors; PCM and Samples have been successfully set.

LV_SOUND_CHANNEL_OUT_OF_RANGE

The grammar set specified is outside the valid range; possible values are 0-63, inclusive.

LV_BAD_HPORT

The Speech Engine is no longer running. This is the result of a ClosePort call or a unrecoverable Speech Engine error.

Parameters

VoiceChannel


PCM

A pointer to a pointer to set to the post-processed audio data.

Samples

A pointer to an integer to set the number of samples.

See Also

LV_SRE_GetVoiceChannelData


302

LVSpeechPort::LoadStandardGrammar

Standard Grammars are deprecated in favor of SRGS built-in grammars

Loads a standard, pre-defined grammar to easily recognize and format numbers, monetary figures or digits.

int LoadStandardGrammar(int GrammarSet, int StdGrammar);

Return Values

LV_SUCCESS

No errors; the standard grammar is loaded.

LV_STANDARD_GRAMMAR_OUT_OF_RANGE

The standard grammar value is not a recognized grammar type.


The GrammarSet value is out of range.

Parameters

GrammarSet

Which grammar set this phrase is being added to. Possible value range 0 - 63.

StandardGrammar

The standard grammars are:

1. GRAMMAR_DIGITS String of single digits like a phone number or pin code.

2. GRAMMAR_MONEY Monetary value (only implemented for SRGS decodes).


303

3. GRAMMAR_NUMERIC Numeric value like 12,000, 24.45, or 35).

4. GRAMMAR_SPELLING Alphabet letters for spelling (not implemented).

5. GRAMMAR_ALPHA_NUMERIC (Not implemented).

6. GRAMMAR_DATE Date values (only implemented for SRGS decodes).

7. GRAMMAR_NONE Clears out the standard grammar, without clearing out any phrases that were added. ResetGrammar( ) will clear out the entire grammar.

Remarks

The client application can load only one standard grammar, but can add any number of concepts with AddPhrase. This is not true, however, if you use SRGS grammars. The correct way to augment as standard SRGS grammar is to load a grammar to a different location, and then activate both. When a standard grammar is loaded, the decoder will return the number, dollar amount, or digit string as either a single concept, or a single interpretation string, depending on whether SRGS is used or not .

As an example, the client application loads GRAMMAR_NUMBER and also adds the concept and phrase "Widgets". If the sound data contained the speech "twelve widgets". The decoder will return two concepts: the first is the string "12" and the second the string "Widgets". If the speech was "one thousand one hundred and twenty nine Widgets seven point two Widgets", the decoder would return four concepts: "1129" , "Widgets", "7.2" and "Widgets" .

However, If you use SRGS, this is not what happens. In order to get this sort of functionality in the SRGS setting, you would create a grammar that looks like the following:

#ABNF 1.0; language en-US; mode voice; tag-format <semantics/1.0>; root $how_many_widgets;


304

$how_many_widgets = $<builtin:grammar/number> widgets {$=$$;}

In this case you wouldn't bother using LoadStandardGrammar() at all, since the standard number grammar will get loaded when you load this grammar. The return type would be an interpretation string representing the number that was recognized, like "1129" or "7.2". The word "widgets" would not be returned in this grammar.

See Also

Standard Grammars

LV_SRE_LoadStandardGrammar


305

LVSpeechPort::LoadVoiceChannel

Loads the audio data into the specified voice channel prior to a call to Decode (which decodes the audio data).

int LoadVoiceChannel(int VoiceChannel, void* M, int Length, SOUND_FORMAT Format = ULAW_8KHZ);

Return Values

LV_SUCCESS

No errors; the voice channel audio successfully loaded.

LV_BAD_HPORT


LV_FAILURE

Sound format was incorrectly specified.

Parameters

VoiceChannel

Accepted values 0 through 63.

M

Pointer to audio data.

Length

Memory size in bytes of the audio data.

Format

The audio data sound format.


306

Remarks

Each LV_SpeechPort supports 64 separate voice channels. Each channel has its own separate storage for decode data, so once the call is made, the client application can release its own copy. LoadVoiceChannel will accept the audio data and prepare it for decoding.

See Also

LV_SRE_LoadVoiceChannel


307

LVSpeechPort::AddPhrase

Adds a phrase to a new or existing concept.

int AddPhrase(int GrammarSet, const char* Concept , const char* Phrase);

Return Values

LV_SUCCESS

No errors; the phrase was added to the concept.

LV_BAD_HPORT



The grammar set is out of range.

LV_GRAMMAR_SYNTAX_ERROR or LV_GRAMMAR_SYNTAX_WARNING

The phrase entered has bad syntax, such as mismatched parenthesis.

Parameters

GrammarSet

Which grammar set to add the phrase. Integer value between 0 - 63, inclusive.

Concept

Which concept to add the phrase. Null-terminated string.

Phrase

The new phrase.


308

Remarks

The concept can be a new or existing concept; the call will automatically add the new concept with the single phrase.

See Also

Phrase Formats

Phonemes

LV_SRE_AddPhrase


309

LVSpeechPort::RemoveConcept

Removes a concept and all of its phrases.

int RemoveConcept(int GrammarSet, const char* Concept);

Return Values

LV_SUCCESS

No errors; the concept and all phrases are removed form the grammar set.


The grammar set specified is outside the valid range.

LV_BAD_HPORT


Parameters

GrammarSet

Which grammar set to remove the concept from. Possible value range 0 - 63.

Concept

Existing concept to remove. Null-terminated string.

See Also

LV_SRE_RemoveConcept


310

LVSpeechPort::ResetGrammar

Removes all concepts from a grammar.

int ResetGrammar(int GrammarSet);

Return Values

LV_SUCCESS

No errors; grammar reset.


The grammar set value is out of expected range (0-63).

See Also

LV_SRE_ResetGrammar


311

LVSpeechPort::ReturnErrorString

Returns a description of an error code.

const char* ReturnErrorString(int ReturnCode);

Return Values

A null-terminated static string describing the error code.

Parameters

ReturnCode

The error code.

Remarks

If the error code is an invalid error code, "Invalid Error Code" is returned.

See Also

LV_SRE_ReturnErrorString


312

LVSpeechPort::SetProperty

SetProperty is deprecated in favor of using SetPropertyEx.

Sets various properties on the port.

int SetProperty(PROPERTIES Property, int Value);

Return Values

LV_SUCCESS

No errors; Property is set to Value.

LV_NOT_A_VALID_PROPERTY_VALUE

The property value is not a valid for the designated property.

Parameters

Property


Value

Property-dependent.

Remarks

Currently, only PROP_SAVE_SOUND_FILES is implemented; setting Value to 1 will cause the port to save request and answer files to disk; setting Value to 0 turns this feature off. The request and answer files are invaluable for troubleshooting and tuning applications, but will quickly fill up a hard drive.

See Also

Properties

LV_SRE_SetProperty

SetPropertyEx


313

LVSpeechPort::SetPropertyEx

Sets various properties for a port, client, soundchannel, or grammar.

int SetPropertyEx(int propertyname, int valuetype, void* pvalue, int target = PROP_EX_TARGET_PORT, int index = 0 );

Return Values

LV_SUCCESS


LV_INVALID_PROPERTY








LV_INVALID_PROPERTY_TARGET_IDX

The target's index (grammar set, voicechannel) is out of range for this property.


Parameters

propertyname


314


valuetype







pvalue


target

The portion of the API that this property is set for. Legal values are:

PROP_EX_TARGET_PORT -- pvalue affects an entire speech port object

PROP_EX_TARGET_CHANNEL -- pvalue affects one voice channel in the speech port. The channel is specified by index.

PROP_EX_TARGET_GRAMMAR -- pvalue affects one grammar set in the speech port. The set is specified by index.

PROP_EX_TARGET_CLIENT -- pvalue is global, and affects all ports on the client.

Remarks


315


You can use this function only after open a port. Calling this function before opening a port will result in failure. To set client scope property, use static function LVSpeechPort::ClientPropertyEx.

See Also

Properties


(static) LVSpeechPort::SetClientPropertyEx


316

LVSpeechPort::StreamStart

Sets up a new stream.

int StreamStart();

Return Values

LV_SUCCESS

Stream set up.

LV_FAILURE

Parameters incorrectly set.

Remarks

Call this function to set up a new stream. You need to call this function after calling StreamStop, StreamCancel or after end-of-speech has been detected on previous utterance.

See Also

StreamSetParameter

StreamStop

StreamCancel


317

LVSpeechPort::StreamSendData

Send data buffer of sound data to stream.

int StreamSendData(void* SoundData, int SoundDataLength);

Return Values

LV_SUCCESS

Data accepted

LV_FAILURE

Stream not active or NULL sound data.

Parameters

SoundData

Pointer to the memory buffer containing sound data.

SoundDataLength

Length in bytes of sound data.

Remarks

Used to do the actual streaming. Call this function with each sound data buffer. This call copies sound data to an internal buffer and returns immediatly. Processing of sound data takes place on a background thread.

See Also

StreamSetStateChangeCallBack

StreamGetStatus


318

LVSpeechPort::StreamGetStatus

Returns status of stream.

int StreamGetStatus();

Return Values

Returns a stream status define. See Steam Status.

Remarks

Called to check the current state of stream.

See Also



319

LVSpeechPort::StreamGetLength

Returns length of sound data in stream buffer.

int StreamGetStatus();

Return Values

Number of bytes in internal buffer for sound stream.

Remarks

This is the total number of bytes streamed. Does not include bytes sent before barge-in is detected (if STREAM_PARM_DETECT_BARGE_IN is active) Can be useful if application wants to stop post barge-in stream after a certain amount of time (as example, to limit a user speech to 10 seconds)

See Also



320

LVSpeechPort::StreamSetStateChangeCallBack

Set up a call back to receive state change notification of a stream.

int StreamSetStateChangeCallBack(LV_SRE_StreamStateChangeFn* fn, void* UserData);

Return Values

LV_SUCCESS

Parameters


Pointer to callback function to receive state change updates. See Stream Callback.

UserData

Application defined data sent back in callback.

Remarks

Each time a streams status changes, this callback will be called.

See Also


StreamGetStatus


321

LVSpeechPort::StreamStop

Stops stream and loads sound channel with streamed data.

int StreamStop();

Return Values

LV_SUCCESS


Remarks

This function ends streaming and puts streamed data into the voice channel defined with the STREAM_PARM_VOICE_CHANNEL parameter. If the STREAM_PARM_AUTO_DECODE parameter is active, the decode will begin (non-blocking) when this function is called.

See Also

StreamSetParameter

StreamCancel

Stream Parameters


322

LVSpeechPort::StreamCancel

Stops stream, sound data is discarded.

int StreamCancel();

Return Values

LV_SUCCESS


Remarks

This kills the stream. Can be called to cancel a stream (particularly auto-decode types streams) in order to start new stream.

See Also

StreamStop


323

LVSpeechPort::StreamSetParameter

Sets a new value for a stream property.

int StreamSetParameter(int StreamParameter, unsigned long StreamParameterValue);

Return Values

LV_SUCCESS



Parameters

StreamParameter




Remarks


See Also

StreamGetParameter

StreamSetParameterToDefault

Stream Parameters


324

LVSpeechPort::StreamGetParameter

Gets the current value of a stream property.

int StreamSetParameter(int StreamParameter, unsigned long StreamParameterValue);

Return Values

LV_SUCCESS



Parameters

StreamParameter




Remarks


See Also

StreamGetParameter

StreamSetParameterToDefault

Stream Parameters


325

LVSpeechPort::StreamSetParameterToDefault

Sets a stream property to its default value.

int StreamSetParameterToDefault(int StreamParameter);

Return Values

LV_SUCCESS

LV_INVALID_PROPERTY Stream parameter does not exist.

Parameters

StreamParameter

Stream parameter to reset. See Stream Parameters.

Remarks

Sets a stream parameter value back to default setting.

See Also

StreamGetParameter

StreamSetParameter

Stream Parameters


326

LVSpeechPort::WaitForEngineToIdle

(Deprecated in favor of LVSpeechPort::WaitForDecode.)

Blocks the client application until the port is idle (not decoding).

int WaitForEngineToIdle(int MillisecondsToWait, int VoiceChannel = -1);

Return Values

LV_SUCCESS

No errors or timeout; the engine is now idle.

LV_TIME_OUT

WaitForEngineToIdle's timeout was reached before the engine became idle.

Parameters

MillisecondsToWait

The number of milliseconds to wait before returning if the Speech Port does not become idle.

VoiceChannel

Which VoiceChannel to wait on, -1 waits on all voice channels for the port.

Remarks

This function is deprecated in favor of LVSpeechPort::WaitForDecode. To achieve the same behavior as LVSpeechPort::WaitForDecode, use property PROP_EX_DECODE_TIMEOUT, and set MillisecondsToWait to TIMEOUT_INFINITE.

Some of the LVSpeechPort methods run asynchronous, in particular, Decode. WaitForEngineToIdle is primarily useful when Decode is called without LV_DECODE_BLOCK. In this case, Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will eventually need the results, the clients need a way to query the port to see if Decode has finished. WaitForEngineToIdle will wait the specified


327

time for the engine to idle; check the return value to ensure the engine is idle, indicating that decode results are available.

WaitForEngineToIdle is also useful to ensure the LVSpeechPort has finished initializing, prior to calls to Decode.

See Also

Decode


LV_SREWaitForEngineToIdle


328


Returns the number of n-best alternatives found by the engine.

int GetNumberOfNBestAlternatives(int voicechannel);

Return Values

Number of n-best alternatives. It will always less than or equal to the value set for PROP_EX_MAX_NBEST_RETURNED.

Parameters

voicechannel


See Also





329


Switch the n-best alternative that is viewable. After this function call, following result retrieval functions, such as LVSpeechPort::GetInterpretation will be bound to this n-best alternative.

int SwitchToNBestAlternatives(int voicechannel, int index);

Return Values

LV_SUCCESS

LV_FAILURE The index is not valid.

Parameters

voicechannel


index

The index of the n-best alternative to switch to. It may be any value in the range [0, LVSpeechPort::GetNumberOfNBestAlternatives).

Remarks

Each alternative represents a distinct sentence. However, since some sentences can have multiple interpretations or multiple parses, it is possible that for some alternatives you will have multiple parse tree or interpretation objects returned. For this reason, it is recommended to get all result out as follows:

int nbest_count; int nbest_total = port.GetNumberOfNBestAlternatives(vc); int interp_count; for (nbest_count=0; nbest_count<nbest_total; ++nbest_count) { port.SwitchToNBestAlternative(vc, nbest_count); int interp_total = port.GetNumberOfInterpretations(vc); for (interp_count=0; interp_count<interp_total; ++interp_count) {


330

LVInterpretation interp = port.GetInterpretation(vc, interp_count); /* do something with the interp */ } }

Even though more than one interpretation can live in a single n-best result, the same interpretation will not live in more than one n-best result. The lower scoring interpretations are pruned out.

See Also




331


Blocks the client application until the decode is finished.

int WaitForDecode(int VoiceChannel);

Return Values

LV_SUCCESS

No errors or timeout; the decode interaction is finished.

LV_TIME_OUT

The timeout value associated with PROP_EX_DECODE_TIMEOUT was exceeded before a result was returned from the Speech Engine. The decode was dropped from the Engine, and the LVSpeechPort may now start a new decode request.

Parameters

VoiceChannel

Which voice channel to wait on. Setting VoiceChannel equal to -1 causes a wait on all the voice channels for the port.

Remarks

Some of the API functions run asynchronous, in particular, LVSpeechPort::Decode. LVSpeechPort::WaitForDecode is primarily useful when LVSpeechPort::Decode is called without LV_DECODE_BLOCK. In this case, LVSpeechPort::Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will eventually need the results, the clients need a way to query the port to see if LVSpeechPort::Decode has finished. LVSpeechPort::WaitForDecode will wait the specified time (determined by set value of PROP_EX_DECODE_TIMEOUT) for the engine to idle; check the return value to ensure the decode interaction is finished before attempting to retrieve answers from the speech port.

See Also


LVSpeechPort::Decode



332

LVSpeechPort::SetClientPropertyEx

Sets various properties on the scope of client process..

static int SetClientPropertyEx(int propertyname, int valuetype, void* pvalue);

Return Values

LV_SUCCESS


LV_INVALID_PROPERTY









Parameters

propertyname


valuetype



333






pvalue


Remarks


A client property can be modified by calling this function even before opening a port.

See Also

Properties



334

LVInterpretation Class

Intro To LVInterpretation

Use <LVSpeechPort.h> or <LV_SRE_Semantic.h>


LVInterpretation (void) Constructs an LVInterpretation object.

LVInterpretation (const LVInterpretatione& other) Copy constructor

LVInterpretation & operator= (const LVInterpretation& other) Assignment operator

~LVInterpretation(void) Destroys the LVInterpretation object

LVSemanticData & ResultData (void)

The result object, representing the end product of the semantic interpretation process.

const char* ResultName (void)

const char* GrammarLabel (void)

Returns the name of the grammar as it was provided to the speech port.

const char* Mode (void) returns the interaction mode for this answer.


335

const char* Language (void) Returns the language identifier for this answer.

const char* InputSentence (void) The sentence that generated this interpretation.

int Score (void) Confidence score for this interpretation.

const char* TagFormat (void) The tag format that created the Data object.


336

LVInterpretation: Constructing and Copying

LVInterpretation objects are fully copyable.

Functions

LVInterpretation(void)

LVInterpretation(const LVInterpretation& other_si)

LVInterpretation& operator=(const LVInterpretation& other_si)

~LVInterpretation()

Parameters

other_hsi

The interpretaion object whose contents are being copied.

Remarks

Example

LVSpeechPort Port;

//open the port and do a decode //... //when the decode is finished,grab an interpretation object LVInterpretation Interp = Port.GetInterpretation(voicechannel, index);

//start using the interpretation data. //...

See Also

Creating, Copying and Releasing an LVInterpretation Handle (CAPI)


337

ResultData

Returns a semantic data object generated by the user input and a matching grammar.

Function

const LVSemanticData& LVInterpretation::ResultData( )

Returns

An object representing the results of the semantic interpretation process.

See Also

LVSemanticData C++ API

LVInterpretation_GetResultData (C API)


338

ResultName

Returns the name of the name of the result data for this interpretation. The result name is usually the root rule of the matching grammar for this interpretation.

Function

const char* LVInterpretation::ResultName ( )

See Also

LVInterpretation_GetResultName (C API)


339

Language

Returns the language identifier of the grammar that generated this interpretation.

Function

const char* LVInterpretation::Language( )

Returns


See Also

LVInterpretation_GetLanguage ( C API )


340

Mode

Returns the interaction mode that created the interaction.

Function

const char* LVInterpretation::Mode()

Returns

"voice" or "dtmf"

See Also

LVInterpretation_GetMode (C API)


341

TagFormat

Returns the name of the semantic process that created this interpretation.

Function

const char* LVInterpretation::TagFormat()

Returns

tag format identifier

See Also

LVInterpretation_GetMode (C API)


342

InputSentence

Returns the input that was fed to the matching grammar to create this interpretation. It may represent the speech the Speech Engine recognized, or a dtmf sequence.

Function

const char* LVInterpretation::InputSentence()

See Also

LVInterpretation_GetInputSentence (CAPI)


343

GrammarLabel

Returns the name of the grammar that generated this interpretation.

Function

const char* LVInterpretation::GrammarLabel ()

Remarks

GrammarLabel will always return the name of one of the grammars you activated for decode. If the active grammar had an integer label, then the returned label will be a string representation of that integer.

See Also

LVInterpretation_GetGrammarLabel ( C API )


344

Score

Function

int LVInterpretation::Score()

Returns

A number between 0-1000. Higher numbers indicate more confidence by the speech port in this interpretation.

See Also

LVInterpretation_GetScore (C API)


345

LVSemanticData Class

LVSemanticData

LVSemanticData is the C++ class presenting semantic data. Think of an LVSemanticData object as a container containing one of the following items

A boolean

An integer

A floating point number

A composite object

An array

Return Value Function Description

LVSemanticData( ) Constructor

LVSemanticData (const LVSemanticData& other)

Copy constructor

LVSemanticData operator = (const LVSemanticData& other)

Assignment operator

~LVSemanticdata ( ) Destructor

int Type ( ) Returns the semantic data type contained in this object.

bool GetBool ( ) If thedata in this object is of type


346

SI_TYPE_BOOL, returns the boolean value.

int GetInt ( ) If the data in this object is of type SI_TYPE_INT, returns the integer value

double GetDouble ( ) If the data in this object is of type SI_TYPE_DOUBLE, returns the floating point value.

const char* GetString ( ) If the data in this object is of type SI_TYPE_STRING, returns the string value.

LVSemanticObject GetSemanticObject ( ) If the data in this object is of type SI_TYPE_OBJECT, returns the semantic object value.

LVSemanticArray GetSemanticArray ( ) If the data in this object is of type SI_TYPE_ARRAY, returns the semantic array value.


347

Type

Returns the data type contained in a given LVSemanticData object.

Function

int LVSemanticData::Type( )

Return Value

One of seven semantic data types.


348

GetBool

Returns a boolean value contained in an LVSemanticData object. This function assumes that the object contains data of type SI_TYPE_BOOL. If the user calls this function when its type is not SI_TYPE_BOOL, the function always returns false.

Function

bool LVSemanticData::GetBool( )

Return Values

A boolean value.


349

GetInt

Returns the integer value contained in a given semantic data object. This function assumes that the data contained is of type SI_TYPE_INT. If it is not, this function always returns 0.

Function

int LVSemanticData::GetInt( )

Return Values

An integer value.


350

GetDouble

Returns a double precision floating point value contained in the given semantic data object. This function assumes that the contained data is of type SI_TYPE_DOUBLE . If it is not, this function always returns 0.0.

Function

double LVSemanticData::GetDouble( )

Return Values

A double.


351

GetString

Returns the string contained in a given LVSemanticData object. This function assumes that the contained data is of type SI_TYPE_STRING. If it is not, this function always returns NULL.

Function

const char* LVSemanticData::GetString( )

Return Values

NULL

Either the contained data is not of type SI_TYPE_STRING, or some error occurred.

Other

A pointer to a buffer containing the string.


352

GetSemanticObject

If the LVSemanticData object contains an element of type SI_DATA_OBJECT, this function returns the composite object. Otherwise, it returns an empty object.

Function

LVSemanticObject LVSemanticData::GetSemanticObject ( );

Returns

A semantic object

See Also

LVSemanticObject C++ API


353

GetSemanticArray

If the LVSemanticData object contains an element of type SI_DATA_ARRAY this function returns the array object. Otherwise, it returns an empty array object.

Function

LVSemanticArray LVSemanticData::GetSemanticArray ( );

Returns

A semantic array

See Also

LVSemanticArray C++ API


354

LVSemanticObject Class

LVSemanticObject

LVSemanticObject represents a composite object. The user can get an LVSemanticObject by calling LVSemanticData::GetObject().

Return Types Functions Description

LVSemanticObject() Constructor

LVSemanticObject(const LVSemanticObject & other)

Copy constructor

~LVSemanticObject() Destructor

LVSemanticObject& operator = (const LVSemanticObject & other)

Assignment operator

int NumberOfProperties() Returns the number of properties in this object.

const char* PropertyName (int index)

Returns the property name corresponding to index.

LVSemanticData

PropertyValue(const char* property_name) PropertyValue(int index)

Returns the semantic data associated corresponding to property_name, or index

bool PropertyExists(const char* property_name)

If this object has a property named property_name, this method returns true, otherwise false.


355


356

NumberOfProperties

Returns the number of properties in this LVSemanticObject

Function

int LVSemanticObject::NumberOfProperties ( )


357

PropertyName

Returns the ith name of a property (member data) in this object.

Function

const char* LVSemanticObject::PropertyName(int i)

Parameter

i

An index between 0 and NumberOfProperties - 1


358

PropertyValue

Returns a property (member data) of this object.

Functions

LVSemanticData LVSemanticObject::PropertyValue(const char *property_name)

LVSemanticData LVSemanticObject::PropertyValue(int property_index)

Return Values

Returns a semantic data object. The first returns the object named property_name. The second returns the object corresponding to PropertyName(property_index)

Parameters

property_index

A number between 0 and NumberOfProperties - 1

property_name



359

PropertyExists

Function

bool LVSemanticObject::PropertyExists(const char *property_name)

Return Values

Returns true if there exists a property of this object named property_name.

Parameters

property_name

A property name.


360

LVSemanticArray Class

LVSemanticArray

LVSemanticArray represents an array type. You can get an array out of a data type container by calling LVSemanticData::GetArray().

Return Values Functions Description

LVSemanticArray() Constructor

LVSemanticArray(const LVSemanticArray& other) Copy constructor

LVSemanticArray&

operator=(const LVSemanticArray& other) Assignment Operator

~LVSemanticArray() Destructor

int Size() Return the number of properties in this array.

LVSemanticData operator [] (int Index)

Return the semantic data indicated by the index. If the Index does not exist, the returned semantic data will have type SI_TYPE_NULL.

LVSemanticData

At(int Index)

Return the semantic data indicated by the index. If the Index does not exist, the returned semantic data will have type SI_TYPE_NULL.


361

Size

Returns the size of an LVSemanticArray.

Function

int LVSemanticArray::Size( )


362

Operator [ ] or At

Access elements in an LVSemanticArray the way you would a conventional array.

Functions

LVSemanticData LVSemanticArray::operator [] (int index)

LVSemantidData LVSemanticArray::At(int index)

Example

LVSemanticData myData = myArray[6];


363

LVParseTree Class

LVParseTree Class

An LVParseTree object represents the results of a decode using a context free grammar.


See Also Using the Parse Tree Tutorial


LVParseTree(void) Constructs an LVParseTree object.

LVParseTree(const LVParseTree& other) Copy constructor

LVParseTree operator=(const LVParseTree& other) Assignment operator

~LVParseTree(void) Destroys the LVParseTree object

LVParseTree::Node Root (void) Provides access to the parent node in the parse tree.

LVParseTree::Iterator Begin (void)

Provides an iterator that walks each node in the tree in a top-to-bottom, left-to-right fashion


364

LVParseTree::Iterator End (void) Marks the end of traversal for the parse tree iterator

LVParseTree::TerminalIterator TerminalsBegin (void) Traverses the terminals of the parse tree (words).

LVParseTree::TerminalIterator TerminalsEnd (void) Marks the end of traversal for the TerminalIterator.

LVParseTree::TagIterator TagsBegin (void) Traverses the tags in the parse tree (semantic data).

LVParseTree::TagIterator TagsEnd (void) Marks the end of traversal for the TagIterator.

const char* TagFormat (void)

Returns the tag format, as described by the grammar that this tree matched (e.g. "lumenvox/1.0" or "semantics/1.0")

int NumberOfTagsInHeader (void)

Returns the number of tags (semantic data) that were defined in the matching grammar's header.

const char* HeaderTag (int i) Returns the ith header tag from the matching grammar.

const char* GrammarLabel (void) Returns the name of the grammar as it was


365

provided to the speech port.

const char* Mode (void) "voice" or "dtmf"

const char* Language (void)

Returns the language of the matching grammar (e.g. "en-US" or "es-MX")


366

Methods

LVParseTree Construction, Assignment and Destruction

LVParseTree objects are fully copyable and assignable.

Functions

LVParseTree()

LVParseTree(const LVParseTree& Other)

LVParseTree& operator = (const LVParseTree& Other)

~LVParseTree()

Parameters

Other

The LVParseTree object being copied

Remarks

You shouldn't have to worry too much about construction or destruction of an LVParseTree object. When you declare an LVParseTree, an empty tree is created. Just set it equal to the results of a decode, and begin using it.

Example

LVSpeechPort Port;

//open the port and do a decode //... //when the decode is finished, grab a parse tree from the speech port LVParseTree Tree = Port.GetParseTree (voicechannel, index);

//start using the tree. It is valid as long as its in scope.

See Also

Creating and Releasing an LVParseTree Handle (C API)


367

LVParseTree::GrammarLabel

Returns the name of the grammar that generated this tree.

Function

const char* GrammarLabel( )

Remarks

GrammarLabel( ) will always return the name of one of the grammars you activated for decode. It will be the name of the grammar that matched the speakers input, according to the engine. If the active grammar had an integer label, then the returned label will be a string representation of that integer.

See Also

LVParseTree_GetGrammarLabel ( C API )


368

LVParseTree::Language

Returns the language identifier of the grammar that generated this tree.

Function

const char* Language()

Returns


See Also

LVParseTree_GetLanguage ( C API )


369

LVParseTree::Mode

Returns the interaction mode that created the tree.

Function

const char* Mode(void)

Returns

"voice" or "dtmf"

See Also

LVParseTree_GetMode (C API)


370

LVParseTree::TagFormat

Returns the name of the tag format declared in the matching grammar for this tree.

Function

const char* TagFormat(void)

See Also

LVParseTree_GetTagFormat (C API)


371

LVParseTree::Root

Gets the root parse tree node.

Function

LVParseTree::Node Root();

Return Values

An LVParseTree::Node object representing the toplevel rule of the matching grammar.

Remarks

This node will always be a rule node (i.e will always satisfy Tree.Root().IsRule() == true). If the matching grammar specified a root rule then this node will always represent that rule.

See Also

LVParseTree_GetRoot ( C API )


372

LVParseTree::Begin and LVParseTree::End

Begin and End provide iterators for visiting every node in the tree in a top-to-bottom, left-to-right descent. It is the basis for the Tag and Terminal iterators.

Functions

LVParseTree::Iterator Begin ()

LVParseTree::Iterator End ()

Example

The following code prints out every node in a parse tree.

LVParseTree::Iterator Itr = Tree.Begin(); LVParseTree::Iterator End = Tree.End();

for (; Itr != End; Itr++) { for (int i = 0; i < Itr->Level(); ++i) cout << "\t"; if (Itr->IsRule()) cout << "$" << Itr->RuleName() << ":" << endl; if (Itr->IsTag()) cout << "{" << Itr->Text() << "}" << endl; if (Itr->IsTerminal()) cout << "\"" << Itr->Text() << "\"" << endl; }

If the grammar was the top level navigation example grammar, and the engine recognized "go back", the above code would print out:

$directive: "go" "back" {$ = "APPLICATION_BACK"}

See Also

LVParseTree_GetIteratorBegin and LVParseTree_GetIteratorEnd (C API)


373

LVParseTree::TerminalsBegin and LVParseTree::TerminalsEnd

TerminalsBegin and TerminalsEnd provide access to the "terminals" of the tree. Terminals are the words and phrases in your grammar, so a TerminalIterator gives you access the the exact words the engine heard a speaker say to match a grammar, in the order that the engine heard those words.

Functions

LVParseTree::TerminalIterator TerminalsBegin()

LVParseTree::TerminalIterator TerminalsEnd()

Example

The following code prints out the sentence engine heard, with a word-level confidence score attached to each word.

LVParseTree::TerminalIterator Itr = Tree.TerminalsBegin(); LVParseTree::TerminalIterator End = Tree.TerminalsEnd();

for (; Itr != End; ++Itr) { cout << "\"" << Itr->Text() << "\"":(" << Itr->Score() << ") "; } cout << endl;

So if the grammar being used was the top level navigation example grammar, and the engine recognized "go back", then the output of the above code might look like:

"go":(850) "back":(901)

See Also

LVParseTree_GetTerminalIteratorBegin and LVParseTree_GetTerminalIteratorEnd (C API)


374

LVParseTree::TagsBegin and LVParseTree::TagsEnd

TagsBegin and TagsEnd provide iterators for visiting the tags in the tree's body.

Functions

LVParseTree::TagIterator TagsBegin ()

LVParseTree::TagIterator TagsEnd ()

Example

The following code prints out every tag in a parse tree.

LVParseTree::TagIterator Itr = Tree.TagsBegin(); LVParseTree::TagIterator End = Tree.TagsEnd();

for (; Itr != End; Itr++) { cout << Itr->Text() << ";" << endl; }


$ = "APPLICATION_BACK";

Remark

The TagIterator does not visit the tags in a tree's header. Use LVParseTree::HeaderTag to access the contents of those tags.

See Also

LVParseTree_GetTagIteratorBegin and LVParseTree_GetTagIteratorEnd (C API)


375

LVParseTree Inner Classes

LVParseTree::Node

An LVParseTree is made out of Node objects. Each node represents a word, rule, or tag that was seen by the engine as it decoded an utterance against the matching grammar.



Node(void) Constructs an empty node.

Node(const Node& other)

Copy constructor

LVParseTree::Node& operator=(const Node& other)

Assignment operator

~Node(void) destructor

LVParseTree::Node Parent (void)

Provides access to the parent node of this node. Note: the parent of the tree's root node has an empty parent.

LVParseTree::ChildrenIterator ChildrenBegin (void)

Traverses the immediate children of this node.

LVParseTree::ChildrenIterator ChildrenEnd (void) Marks the end of traversal for the


376

ChildrenIterator

LVParseTree::Iterator SubTreeBegin (void)

Provides an iterator that walks each node in the sub tree rooted by this node in a top-to-bottom, left-to-right fashion.

LVParseTree::Iterator SubTreeEnd (void) Marks the end of traversal for the parse tree iterator

LVParseTree::TerminalIterator TerminalsBegin (void)

Traverses the terminals(words) of the subtree rooted by this node.

LVParseTree::TerminalIterator TerminalsEnd (void)

Marks the end of traversal for the TerminalIterator.

LVParseTree::TagIterator TagsBegin (void)

Traverses the tags (semantic data) in the subtree rooted by this node.

LVParseTree::TagIterator TagsEnd (void) Marks the end of traversal for the TagIterator.

bool IsRule (void)

Returns true if this node represents a matched rule in a grammar. Note: rule nodes are the only nodes that can have children. The children


377

of a rule node match the right hand side of the grammar rule that is represented by this node.

bool IsTerminal (void)

Returns true if this node represents a terminal (word) in a grammar. Note: the parent of a terminal node is always a rule in the matching grammar that contains this terminal.

bool IsTag (void)

Returns true if this node represents a tag (semantic data) in a grammar. Note: the parent of a tag node is always a rule in the matching grammar that contains this tag.

const char* Text (void)

For a rule node, this is the partial sentence that caused the rule to match. For a terminal node, this is the word that the node represents. For a tag node, this is the tag data.

const char* Phonemes (void)

For a rule node, this is the phonetic pronunciation of the partial sentence that caused the rule to match. For a terminal node,


378

this is the phonetic pronunciation of the word that was spoken. For a tag node, this is empty.

const char* RuleName (void)

For a rule node, this is the name of the rule being represented. For a tag or terminal node, this is the name of the node's parent.

int Score (void)

For a rule node, this is the confidence of the rule being matched. For a terminal node, this is the confidence of the word being spoken. For a tag node, this is the parent rule's score.

int StartTime (void)

For a rule node, this is the the start time of the first word that matched this rule (elapsed time from the start of the utterance, in milliseconds). For a terminal node, this is the start time of the word. For a tag node, this is the start time of the first word after the tag/ the end time of the last word before the tag.

int EndTime (void)

For a rule node, this is the the end time of the last word that matched this rule (elapsed time from the start of the utterance, in


379

milliseconds). For a terminal node, this is the end time of the word. For a tag node, this is the start time of the first word after the tag/ the end time of the last word before the tag.


380

LVParseTree::Iterator

An LVParseTree::Iterator Object traverses a parse tree in a top-to-bottom, left-to-right fashion (sometimes called a pre-order or LL traversal)

Use <LVSpeechPort2.h> or <LV_SRE_ParseTree.h>


Iterator(void) Constructs a blank Iterator; its not pointing over anything.

Iterator(const Iterator& other)

Copy constructor.

LVParseTree::Iterator& operator=(const Iterator& other)

Assignment operator.

~Iterator(void) Destructor.

LVParseTree::Iterator& operator ++ (void) pre-increments the iterator (++itr).

LVParseTree::Iterator operator ++ (int) post-increments the iterator (itr++).

const LVParseTree::Node* operator -> (void)

provides pointer-like access to the node the iterator is currently over ( e.g const char* text = itr->Text( ) )

const LVParseTree::Node&

operator * (void) provides access to the node the iterator is currently over


381

(e.g. LVParseTree::Node n = *itr )

bool operator == (const Iterator& other)

Tests equality with another Iterator. Two Iterators are equal if they are pointing to the same node in the same tree.

bool operator != (const Iterator& other)

returns true if and only if the equality operator returns false.


382

LVParseTree::ChildrenIterator

An LVParseTree::ChildrenIterator Object traverses the immediate children of a rule node, from left to right. You get a ChildrenIterator object from a Node by calling

LVParseTree::Node::ChildrenBegin( )

and

LVParseTree::Node::ChildrenEnd( )



Iterator(void) Constructs a blank ChildrenIterator; its not pointing over anything.

Iterator(const ChildrenIterator& other) Copy constructor.

LVParseTree::ChildrenIterator& operator=(const ChildrenIterator& other) Assignment operator.

~ChildrenIterator(void) Destructor.

LVParseTree::ChildrenIterator& operator ++ (void) pre-increments the iterator (++itr).

LVParseTree::ChildrenIterator operator ++ (int) post-increments the iterator (itr++).

const LVParseTree::Node* operator -> (void) provides pointer-like access to the node the


383

iterator is currently over ( e.g const char* text = itr->Text( ) )

const LVParseTree::Node& operator * (void)

provides access to the node the iterator is currently over (e.g. LVParseTree::Node n = *itr )

bool operator==(const ChildrenIterator& other)

Tests equality with another ChildrenIterator. Two ChildrenIterators are equal if they are pointing to the same node in the same tree. (e.g if itr1 == itr2 do something)

bool operator!=(const ChildrenIterator& other)



384

LVParseTree::TerminalIterator

An LVParseTree::TerminalIterator object is an adaptation of the standard LVParseTree::Iterator. It only visits the nodes in a tree that are terminals. You get a TerminalIterator by calling:

LVParseTree::Node::TerminalsBegin( ) LVParseTree::Node::TerminalsEnd( )



Iterator(void)

Constructs a blank TerminalIterator; its not pointing over anything.

Iterator(const TerminalIterator& other)

Copy constructor.

LVParseTree::TerminalIterator& operator=(const TerminalIterator& other)


~TerminalIterator(void) Destructor.

LVParseTree::TerminalIterator& operator ++ (void) pre-increments the iterator (++itr).

LVParseTree::TerminalIterator operator ++ (int) post-increments the iterator (itr++).


385

const LVParseTree::Node* operator -> (void)

provides pointer-like access to the node the iterator is currently over ( e.g const char* text = itr->Text( ) )

const LVParseTree::Node& operator * (void)


bool operator==(const TerminalIterator& other)

Tests equality with another TerminalIterator. Two TerminalIterators are equal if they are pointing to the same node in the same tree. (e.g if itr1 == itr2 do something)

bool operator!=(const TerminalIterator& other)



386

LVParseTree::TagIterator

An LVParseTree::TagIterator object is an adaptation of the standard LVParseTree::Iterator. It only visits the nodes in a tree that are tags. You get a TagIterator by calling:

LVParseTree::Node::TagsBegin( ) LVParseTree::Node::TagsEnd( )



Iterator(void) Constructs a blank TagIterator; its not pointing over anything.

Iterator(const TagIterator& other)

Copy constructor.

LVParseTree::TagIterator& operator=(const TagIterator& other)


~TagIterator(void) Destructor.

LVParseTree::TagIterator& operator ++ (void) pre-increments the iterator (++itr).

LVParseTree::TagIterator operator ++ (int) post-increments the iterator (itr++).

const LVParseTree::Node* operator -> (void) provides pointer-like access to the node the iterator is currently over


387

( e.g const char* text = itr->Text( ) )

const LVParseTree::Node&

operator * (void)


bool operator==(const TagIterator& other)

Tests equality with another TagIterator. Two TagIterators are equal if they are pointing to the same node in the same tree. (e.g if itr1 == itr2 do something)

bool operator!=(const TagIterator& other)



388

LVGrammar Class

class LVGrammar

An LVGrammar object represents a context-free grammar that can be used in the Speech Engine to recognize speech. An LVGrammar object can also be used to test the functionality of a grammar by processing transcripts.

Use <LVSpeechPor.h> or <LV_SRE_Grammar.h>


LVGrammar (void) Constructs an LVGrammar object.

LVGrammar (GrammarLogCB log, void* userdata)

Constructs an LVGrammar object, with an initial logging function.

LVGrammar (const LVGrammar& other)

Copy constructor.

~LVGrammar (void) Destroys the LVGrammar object.

LVGrammar& operator = (const LVGrammar& other) Assignment operator

void

RegisterLoggingCallback (GrammarLogCB log, void* userdata)

Registers a callback so the object can report warnings and errors to the grammar author.

int Reset (void) Reset a grammar


389

object.

int SaveCompiledGrammar (const char* filename)

Save the grammar object to a binary file.

int LoadCompiledGrammar (const char* filename)

Load the grammar object from a binary file

HGRAMMAR GetHGrammar (void) Returns the underlying object handle.

int LoadGrammar (const char* location)

Loads a grammar from a location specified by the "uri" argument.

int LoadGrammarFromBuffer (const char* contents)

Loads a grammar from a null terminated string containing the contents of the grammar.

int AddRule (const char* rulename, const char* definition)

Inserts a new rule into the grammar.

int RemoveRule (const char* rulename)

Removes a rule from the grammar.

int SetRoot (const char* rulename) Sets a starting rule for the grammar.

void SetMode (const char* mode)

Declare the mode of grammar (the style of decode to be processed). Legal arguments are "voice" or "dtmf".


390

const char* GetMode (void) Return the interaction mode of the grammar.

void SetLanguage (const char* language)

Specify the language of this grammar as a language/country code pair. Legal arguments include "en-US" and "es-MX".

const char* GetLanguage (void)

Return the language setting of the grammar.

void SetTagFormat (const char* tag_format)

Identify the tag format of the grammar. To use the LumenVox semantic interpretation, the tag format must be "lumenvox/1.0" or "semantics/1.0".

const char* GetTagFormat (void)

Return the tag format setting of the grammar.

int GetNumberOfMetaData (void)

Return the number of meta data in the grammar.

const char* GetMetaDataKey (int index)

Return the key of the meta data with a specified index

const char* GetMetaDataValue (int index)

Return the value of the meta data with a specified index


391

int ParseSentence (const char* sentence)

Use the grammar to parse a sentence.

int NumberOfParses (void)

Returns the number of parses created by the most recent ParseSentence call.

LVParseTree GetParseTree (int index)

Returns the parse tree object created with a specified index

int InterpretParses (void)

Generate interpretations form parses trees created by the most recent ParseSentence call.

int GetNumberOfInterpretations (void)

Returns the number of interpretations created the most recent InterpretParses call.

LVInterpretation GetInterpretation (int index)

Returns the semantic interpretation with the specified index


392


393

Methods

LVGrammar Constructor/Destructor

Functions

LVGrammar()

LVGrammar(GrammarLogCB log, void* userdata)

LVGrammar(const LVGrammar& other)

~LVGrammar()

Parameters

log

Error/warning reporting callback function pointer.

userdata


other

Existing grammar object.

Remarks


See Also

LVGrammar_Create (C API)

LVGrammar_CreateFromCopy (C API)

LVGrammar_Release (C API)


394

LVGrammar::operator =


Function

LVGrammar& operator = (const LVGrammar& other)

Parameters

other

Existing grammar object.

See Also

LVGrammar_Copy (C API)


395

LVGrammar::RegisterLoggingCallback

Registers a callback so the object can report warnings and errors to the grammar author via the callback function.

Function

void RegisterLoggingCallback (GrammarLogCB log, void* userData)

Parameters

log


userdata

The pointer to user defined data associated with the grammar object pointed by Grammar. It will be passed into the callback function.

Remarks


See Also

LVGrammar__RegisterLoggingCallback (C API)


396

LVGrammar::Reset

Reset a grammar object.

Function

int Reset (void)

Return Values

LV_SUCCESS

LV_FAILURE

See Also

LVGrammar_Reset (C API)


397

LVGrammar::SaveCompiledGrammar

Save a grammar object to a binary file.

Function

int SaveCompiledGrammar (const char* filename)

Parameters

filename

File name.

Return Values

LV_SUCCESS

LV_FAILURE

Remarks

The saved compiled grammar can be later loaded into a grammar object with LVGramma::LoadCompiledGrammar.

See Also

LVGramma::LoadCompiledGrammar

LVGrammar_SaveCompiledGrammar (C API)


398

LVGrammar::LoadCompiledGrammar

Load a grammar object from a binary file previously saved by LVGrammar::SaveCompiledGrammar.

Function

int LoadCompiledGrammar (const char* filename)

Parameters

hgram


filename

File name.

Return Values

LV_SUCCESS

LV_FAILURE

See Also

LVGrammar::SaveCompiledGrammar

LVGrammar_LoadCompiledGrammar (C API)


399

LVGrammar::GetHGrammar

Return underlying grammar object handle.

Function

HGRAMMAR GetHGrammar (void)

Return Values

A pointer to the underlying grammar object.

Remarks

class LVGrammar is just a thin wrapper of grammar object handle HGRAMMAR.

See Also

HGRAMMAR


400

LVGrammar::LoadGrammar

Loads a grammar from a local file or remote file via http or ftp. Grammar can be written in ABNF or XML notations.

Function

int LoadGrammar(const char* grammar_location)

Parameters

gram_location


Return Values

LV_SUCCESS








Remarks


401


See Also

LVGrammar_LoadGrammar (C API)


402

LoadGrammarFromBuffer

Loads a grammar from a null terminated string buffer. Grammar can be written in ABNF or XML notations.

Function

int LoadGrammarFromBuffer(const char* grammar_contents);

Parameters

gram_contents

A null terminated string containing the contents of a valid SRGS grammar.

Return Values

LV_SUCCESS








Remarks


See Also


403

LVGrammar_LoadGrammarFromBuffer (C API)


404

LVGrammar::AddRule

Add rules to a grammar object.

Function

int AddRule(const char* rule_name, const char* rule_definition)

Parameters

rule_name


rule_definition

The definition of the rule

Return Values

LV_SUCCESS






Example

grammar.AddRule("foo", "hello [world]");

Is the same as writing a rule:

$foo = hello [world];


405

Remarks

New rules must be written in ABNF notation. Detailed error and warning messages are sent to the grammar object's logging callback function.

See Also

LVGrammar::RemoveRule

LVGrammar_AddRule (C API)


406

LVGrammar::RemoveRule

Remove rules to a grammar object.

Function

int RemoveRule(const char* rule_name)

Parameters

rule_name


Return Values

LV_SUCCESS






Remarks


See Also

LVGrammar::AddRule

LVGrammar_RemoveRule (C API)


407

LVGrammar::SetRoot

Identifies one of the grammar rules as the root rule. The root rule is where the engine starts its search.

Function

int SetRoot(const char* rule_name)

Parameters

rule_name

The name of the rule.

Example

grammar.SetRule("foo");

Is the same as writing in a grammar:

root $foo;

See Also

LVGrammar_SetRoot (C API)


408

LVGrammar::SetMode

Set mode property for the grammar,

Function

int SetMode(const char* mode)

Parameters

mode


Example

grammar.SetLanguage("en-US"); grammar.SetMode("voice"); grammar.SetTagFormat("lumenvox/1.0");



See Also

LVGrammar::GetMode

LVGrammar_SetMode (C API)


409

LVGrammar::SetLanguage

Set language for the grammar,

Function

int SetLanguage(const char* language)

Parameters

language

The language identifier for the grammar

Example




See Also

LVGrammar::GetLanguage

LVGrammar_SetLanguage (C API)


410

LVGrammar::SetTagFormat

Set interpretation tag format of the grammar.

Function

int SetTagFormat(const char* tag_format)

Parameters

tag_format

The grammar's tag format.

Example




See Also


LVGrammar_SetTagFormat (C++ API)


411

LVGrammar::GetMode

Return the mode setting for the grammar,

Function

const char* GetMode(void)

Return Values


See Also

LVGrammar::SetMode

LVGrammar_GetMode (C API)


412

LVGrammar::GetLanguage

Return the language setting for the grammar,

Function

const char* GetLanguage(void)

Return Values

The language identifier of the grammar.

See Also

LVGrammar::SetLanguage

LVGrammar_GetLanguage (C API)


413

LVGrammar::GetTagFormat

Return the interpretation tag format setting for the grammar,

Function

const char* GetTagFormat(void)

Parameters

hgram


Return Values

The tag format of the grammar.

See Also

LVGrammar::SetTagFormat

LVGrammar_GetTagFormat (C API)


414

LVGrammar::GetNumberOfMetaData

Return the number of meta data contained in the grammar.

Function

int GetNumberOfMetaData(void)

Example




int count = grammar.GetNumberOfMetaData(); // returns 2 const char* key = grammar.GetMetaDataKey(0); //returns "description" const char* value = grammar.GetMetaDataValue(1); //returns "05/12/2005"

See Also

LVGrammar::GetMetaDataKey

LVGrammar::GetMetaDataValue

LVGrammar_GetNumberOfMetaData (C API)


415

LVGrammar::GetMetaDatakey

Return the key of the meta data indicated by the index.

Function

int GetMetaDataKey(int index)

Parameters

index

Index of the meta data. It should be in the range [0, LVGrammar::GetNumberOfMetaData).

Return Values

null


non-null


Example




int count = grammar.GetNumberOfMetaData(); // returns 2 const char* key = grammar.GetMetaDataKey(0); //returns "description" const char* value = grammar.GetMetaDataValue(1); //returns "05/12/2005"

See Also



416


LVGrammar_GetMetaDataKey (C API)


417


Return the value of the meta data indicated by the index.

Function

int GetMetaDataValue(int index)

Parameters

index

Index of the meta data. It should be in the range [0, LVGrammar::GetNumberOfMetaData).

Return Values

null


non-null


Example




int count = grammar.GetNumberOfMetaData(); // returns 2 const char* key = grammar.GetMetaDataKey(0); // returns "description" const char* value = grammar.GetMetaDataValue(1); // returns "05/12/2005"

See Also



418

LVGrammar::GetMetaDataKey

LVGrammar_GetMetaDataValue (C API)


419

LVGrammar::ParseSentence

Use a loaded grammar object to parse a sentence.

Function

int ParseSentence(const char* sentence)

Parameters

sentence

The sentence to parse.

Return Values

0


non-0


Example

Assume a grammar was defined as:

root $yes_no; $yes_no = $yes | $no; $yes = yes [please]; $no = no [thank you];

You can use this grammar to validate sentences as follows:

int count = grammar.ParseSentence("no thank you"); // returns 1 int count = grammar.ParseSentence("no thanks"); // returns 0

Remarks

With this function, you can identify how well a grammar covers your targeted transcript set.


420

See Also

LVGrammar::GetNumberOfParses

LVGrammar::GetParseTree

LVGrammar_ParseSentence (C API)


421

LVGrammar::NumberOfParses

Return the number of parses created by the most recent call of LVGrammar::ParseSentence.

Function

int GetNumberOfParses(void)

Return Values

0


non-0


Remarks

This function can be used after a call to LVGrammar::ParseSentence. It is provided as a convenience; it returns the same value as LVGrammar::ParseSentence.

See Also



LVGrammar_GetNumberOfParses (C API)


422


Return the parse tree object with the specified index.

Function

LVParseTree GetParseTree(int index)

Parameters

index

The index of the parse tree handle to be returned. It should be in the range [0, LVGrammar::GetNumberOfParses).

Return Values

null


non-null

The parse tree handle.

Remarks

This function should be used after a call to LVGrammar::ParseSentence.

See Also


LVGrammar::GetNumberOfParses

LVGrammar_CreateParseTree (C API)


423

LVGrammar::InterpretParses

Generate semantic interpretation results from parses created by previous calls to LVGrammar::ParseSentence.

Function

int InterpretParses(void)

Return Values

integer (>=0)


Remarks

Before calling this function , you have to call LVGrammar::ParseSentence on that grammar object. Otherwise, that grammar object doesn't contain any parse tree information.

See Also


LVGrammar::GetNumberOfInterpretations

LVGrammar::GetInterpretation

LVGrammar_InterpretParses (C API)


424


Return the number of semantic interpretations created by the most recent call to LVGrammar::InterpretParses.

Function

int GetNumberOfParses(void)

Return Values

integer (>=0)


Remarks

This function can be used after a call to LVGrammar::InterpretParses. It is provided as a convenience; it returns the same value as LVGrammar::InterpretParses.

See Also



LVGrammar_GetNumberOfInterpretions (C API)


425


Returns the semantic interpretation handle indicated by the index.

Function

LVInterpretation GetInterpretation (int index)

Parameters

index

The index of the interpretation handle to be returned. It should be in the range [0, LVGrammar::GetNumberOfInterpretations).

Return Values

null


non-null

The interpretation handle.

Remarks

This function should be used after a call to LVGrammar_InterpretParses.

See Also



LVGrammar_CreateInterpretation (C API)


426

Callback Functions


typedef void (*ExportLogMsg)(const char* String, void* p)

The callback function is called by the speech port with informational and error messages. It is the second parameter to LV_SRE_OpenPort, and LV_SRE_RegisterAppLogMsg, and the first parameter to LVSpeechPort::OpenPort.

p is a pointer to a user-defined class or function which can customize behavior when the engine sends logging messages to the callback.

See Also

LV_SRE_OpenPort

LV_SRE_RegisterAppLogMsg



427

Streaming Callback Function

typedef void (*LV_SRE_StreamStateChangeFn)(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData)

The callback function is called by the speech port each time a stream status changes. Primarily this is used with streams performing barge-in detection and/or end-of-speech detection to notify hardware to stop playing prompt (barge-in) or stop recording user (end-of-speech).

Parameters

NewState

New state of stream. See Stream Status.

TotalBytes

Total bytes streamed (at point of stream status change), more sound data may still be in the internal unprocessed queue.

RecordedBytes

Total bytes minus data discarded before barge-in was detected.

UserData

Pointer to application defined data.

See Also

LV_SRE_StreamSendData



428

Grammar Logging Callback Function

typedef void (*GrammarLogCB)(const char* message, int error_level, void* user_data)

The callback function is called by the LVGrammar object when an error or warning is generated during the grammar compilation process. The types of errors which can be passed through the callback via the error_level parameter are:

LV_GRAMMAR_LOADING_ERROR -- the grammar could not be loaded from the location provided.

LV_GRAMMAR_SYNTAX_ERROR -- one or more rules or statements in the grammar was badly formed. The message parameter provides more detailed information.

LV_GRAMMAR_SYNTAX_WARNING -- one or more statements in the grammar were either missing, or not strictly conforming to specifications, but the grammar builder was able to recover. The message parameter provides more detailed information.

user_data is a pointer to a user-defined class or function which can customize behavior when the LVGrammar object sends logging messages through the callback.

See Also

LVGrammar_RegisterLoggingCallback

LVGrammar::RegisterLoggingCallback


429

Constants

Decoder Flags

The engine accepts several different flags for use when calling LV_SRE_Decode (C API) and LVSpeechPort::Decode (C++ API). The flags can be bitwise OR'd ( "|" ) to customize behavior.

LV_DECODE_BLOCK

Normally, calls to the decode function/method will immediately return to allow the client application to continue working on other tasks while the engine processes the data. This flag blocks the client application until the engine has finished.

LV_DECODE_GENDER_MALE

LV_DECODE_GENDER_FEMALE

LV_DECODE_GENDER_MALE and LV_DECODE_GENDER_MALE identify which gender acoustic model to use during decode. If these flags are not specified, the engine automatically decodes each audio file against both gender models. While this slows the engine by requiring two decodes, evaluating against both models has a very significant positive effect on recognition accuracy. Since the engine is multit-hreaded, unless CPU loads are a serious issue, do not use these flags.

LV_DECODE_FIRST_TIME_USER

Reset caller weights in Recognition Engine (not implemented).

LV_DECODE_USE_OOV

Use the Out-Of-Vocabulary filter (OOV) during decode. The OOV filter, when set, processes each audio file against both the grammar specified by the client application, and a special grammar which detects words not in the grammar. If the engine detects these OOV words, it will not return them. Generally, the OOV filter slows the engine down without a large gain in accuracy, so client applications should use the filter only if OOV words seem to be a problem.

LV_DECODE_RETURN_EACH_DIGIT


430

When using standard grammars, a string of digits, monetary value etc. is passed back as a single concept. If this flag is used, each digit comes back as a separate concept. (Since each concept has a confidence score, this can be useful for determining poorly recognized individual digits.)

LV_DECODE_SRGS_GRAMMAR

Normally, you do not need to use this flag. But if you want to use a concept-phrase grammar as an SRGS grammar, and are not using the LV_ACTIVE_GRAMMAR_SET, this flag is necessary.

LV_DECODE_SEMANTIC_INTERPRETATION

This flag tells the decoder to process the parse tree return type for semantic information in the tree's tags.


431

Error Codes

0 LM_SUCCESS No errors.

-1 LM_FAILURE General failure.

-2 LV_SYSTEM_ERROR The speech recognition engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.

-4 LV_BAD_SOUND_DATA There was a problem with sound data.

-5 LV_INVALID_SOUND_FORMAT The sound format value is not one of the allowable formats.

-6 LV_TIME_OUT WaitForEngineToIdle's timeout was reached before the engine became idle. Also losing connection to an engine server during decode may return this error code.

-7 LV_GRAMMAR_SET_OUT_OF_RANGE The grammar set value is out of expected range (0-63).

-8 LV_SOUND_CHANNEL_OUT_OF_RANGE The sound channel value out of expected range.

-9 LV_STANDARD_GRAMMAR_ALREADY_LOADED Only one standard grammar can be loaded for a grammar set.

-10 LV_STANDARD_GRAMMAR_OUT_OF_RANGE The standard grammar value is not a recognized grammar type.

-11 LV_NOT_A_VALID_PROPERTY_VALUE The property value is not a valid for the designated property.

-12 LV_BAD_HPORT The specified port handle not valid.


432

-13 LV_NOT_IMPLEMENTED The action was not implemented in the current version.

-14 LV_SOCKETS_ERROR General network communication error.

-15 LV_INVALID_PROPERTY_TARGET The target type used in a call to LV_SRE_SetPropertyEx() is invalid for the property given.

-16 LV_INVALID_PROPERTY_VALUE_TYPE The value type used in a call to LV_SRE_SetPropertyEx() is invalid for the property given.

-17 LV_INVALID_PROPERTY The propert supplied in a call to LV_SRE_SetPropertyEx() or LV_SRE_SetProperty() is invalid.

-18 LV_INVALID_PROPERTY_TARGET_NDX When calling LV_SRE_SetPropertyEx() and using a target type of PROP_EX_TARGET_CHANNEL or PROP_EX_TARGET_GRAMMAR the index value was out or range.

-19 LV_STREAM_NOT_ACCEPTED Stream functions called on a stopped stream.

-20 LV_FUNCTION_NOT_FOUND LVSpeechPort_stdcall.dll is a wrapper dll around LVSpeechPortl.dll. If a newer version of the standard call dll is used, it may not find a function in LVSpeechPortl.dll.

-21 LV_STRING_BUFFER_TOO_SMALL The application supplied string buffer was too small.

-22 LV_NO_SERVER_AVAILABLE No engine servers where found to connect to.

-23 LV_GRAMMAR_SYNTAX_WARNING The grammar contained a syntax warning in one or more of its rules or declarations. A specific message from the grammar builder has been logged. The grammar was successfully built, despite the warning.


433

-24 LV_GRAMMAR_SYNTAX_ERROR The grammar contained a syntax error in one or more of its rules or declarations. A specific message from the grammar builder has been logged. The grammar was not built.

-25 LV_GRAMMAR_LOADING_ERROR The grammar could not be loaded, because a specified url was invalid.

-26 LV_OPEN_PORT_FAILED__LICENSE_EXCEEDED Can not open ports due to exceeding the number of ports allowed by license.

-31 LV_GLOBAL_GRAMMAR_TRANSACTION_PARTIAL_ERROR Global grammar operation failed on some of the servers.

-32 LV_GLOBAL_GRAMMAR_TRANSACTION_ERROR Global grammar operation failed on all servers.

Note:

Not all the error codes are implemented.


434

Properties

#define PROP_EX_SAVE_SOUND_FILES 2 #define PROP_EX_LANGUAGE 3 #define PROP_EX_SRE_SERVERS 4 #define PROP_EX_CHOOSE_MODEL 8 #define PROP_EX_SET_SERVER_IP 10 #define PROP_EX_SET_SERVER_PORT 11 #define PROP_EX_SEARCH_BEAM_WIDTH 12 #define PROP_EX_CONCEPT_REPETITION_MIN 13 #define PROP_EX_CONCEPT_REPETITION_MAX 14 #define PROP_EX_ENABLE_LATTICE_CONFIDENCE_SCORE 15 #define PROP_EX_MAX_NBEST_RETURNED 16 #define PROP_EX_DECODE_TIMEOUT 17 #define PROP_EX_MOD_SEL_LOW_THLD 18 #define PROP_EX_MOD_SEL_HIGH_THLD 19

PROP_EX_SAVE_SOUND_FILES

Value Types:



Targets: PROP_EX_TARGET_PORT

Default Value: 1

Save request and answer files to disk.

Setting to 1 saves request and answer files for each call to Decode to LVLANG\Responses (Win32) or LVRESPONSES/Responses (Linux). Setting to 0 stops saving the files. Turning this property on can quickly fill up a hard drive, but is invaluable for troubleshooting and tuning the application.

PROP_EX_LANGUAGE

Value Types:



435

Targets: PROP_EX_TARGET_PORT

Default Value: "AmericanEnglish"

The language model to use for decodes.

PROP_EX_SRE_SERVERS

Value Types: PROP_EX_VALUE_TYPE_STRING Targets: PROP_EX_TARGET_CLIENT Default Value: "127.0.0.1:5000"

The list of Speech Engine servers which will handle decodes for this client. A comma (or semicolon) delimited list of IP addresses (and ports) the client will attempt to connect to. Use a colon to separate IPs and Ports. 5000 is the default port.

Example: "127.0.0.1;10.0.0.1:5001;10.10.0.1" Client will attempt to attach to the local machine, port 5000; IP address "10.0.0.1" port 5001; and IP address "10.10.0.1" port 5000.

PROP_EX_SEARCH_BEAM_WIDTH

Value Types:


Targets: PROP_EX_TARGET_CLIENT

PROP_EX_TARGET_PORT

PROP_EX_TARGET_CHANNEL

Default Value: 1e-6

The beam controls how thorough the Speech Engine search is. Legal values can range from 0.0 to 1.0. The smaller the value, the more thorough the search is, leading to potentially more accurate searches, but also leading to more time intensive searches. Use the default at first, and only


436

experiment with this value while tuning your application for speed and accuracy. Make small changes only. For instance, try going from 1e-6 to 1e-9, but not 1e-30.

PROP_EX_CONCEPT_REPETITION_MIN

Value Types:



Targets: PROP_EX_TARGET_GRAMMAR

Default Value: 1

PROP_EX_CONCEPT_REPETITION_MAX

Value Types:



Targets: PROP_EX_TARGET_GRAMMAR

Default Value: -1 (infinity)

PROP_EX_CONCEPT_REPETITION_MIN and PROP_EX_CONCEPT_REPETITION_MAX control the repeat count of concepts in a concept/phrase grammar. They have no effect on SRGS grammars. Having a grammar such as:

concept "topping" = "pepperoni | olives | sausage | onions | peppers"

With MIN=1 MAX=5, is equivalent to an SRGS grammar

root $toppings; $toppings = $topping<1-5>; $topping = (pepperoni | olives | sausage | onions | peppers);


437

PROP_EX_ENABLE_LATTICE_CONFIDENCE_SCORE

Value Types:




PROP_EX_TARGET_PORT


Default Value: 1

The lattice based confidence score is a slightly slower, but more accurate confidence score. Set it to 0 to turn off the score.

PROP_EX_CHOOSE_MODEL

Value Types:




Default Value: 0

If this property is set to 1, then the client will decide which acoustic model is most appropriate for the server to use, based on a frequency analysis of the speaker's voice. Otherwise, two decodes will be done simultaneously, and an answer will be selected based on which model had better "coverage" for the speaker's voice.

PROP_EX_MOD_SEL_LOW_THLD

Value Types:



438



PROP_EX_TARGET_PORT


Default Value: 135Hz

PROP_EX_MOD_SEL_HIGH_THLD

Value Types:




PROP_EX_TARGET_PORT


Default Value: 155Hz

When property PROP_EX_CHOOSE_MODEL is set to 1, the engine will use the pitch of input audio to determine which acoustic model to use. If the pitch is lower than PROP_EX_MOD_SEL_LOW_THLD, the low pitch model will be used, while a pitch higher than PROP_EX_MOD_SEL_HIGH_THLD indicates using high pitch model. Any value that falls in between will causes the engine to use both models.


Value Types:




439


PROP_EX_TARGET_PORT


Default Value: 1

The maximum number of n-best result the engine can return. This property is required to be an integer greater than or equal to 1.


Value Types:




PROP_EX_TARGET_PORT


Default Value: 1

The time out value used by LV_SRE_WaitForDecode and LVSpeechPort::WaitForDecode functions.


440

Sound Formats

enum SOUND_FORMAT {

UNK_FORMAT = 0, ULAW_8KHZ, PCM_8KHZ, PCM_16KHZ, ALAW_8KHZ,

};

ULAW_8KHZ

-law format at 8000 samples per second. 1 byte per sample. One minute of sound occupies approximately .5 MB's of memory. This is the standard domestic telephone format.

PCM_8KHZ

Pulse code modulated at 8000 samples per second. 2 bytes per sample. One minute of sound occupies approximately 1 MB of memory.

PCM_16KHZ

Pulse code modulated at 16000 samples per second. 2 bytes per sample. One minute of sound occupies approximately 2 MB's of memory. This is the native format of the SRE.

ALAW_8KHZ

-law format at 8000 samples per second. 1 byte per sample. One minute of sound occupies approximately .5 MB's of memory. This is the standard international telephone format.

Note:


441

We will be adding support for more formats in near future, in particular the standard Windows wave format.


442

Standard Grammars

These grammars are deprecated in favor of built-in SRGS grammars.

The standard grammars are built-in grammars, predefined by LumenVox. Using these grammars will return a single concept, formatted appropriately. Only one standard grammar can be active at a time; no concepts can be removed from the standard grammar. The client application can, however, add and remove concepts to the voice channel grammar, which will coexist with the standard grammar.

1 GRAMMAR_DIGITS

String of single digits, like a phone number or pin code. In version 4.0, digits are a separate acoustical model and so only recognize (One, two, three, four, five, six, seven, eight, nine, zero and oh). It ignores application supplied grammar and cannot currently recognize things like "twenty-five or seventeen". This allowed us to obtain extremely low error rate. The number grammar can be used to mix application grammar and digit recognition.

2 GRAMMAR_MONEY

Monetary value.

3 GRAMMAR_NUMBER

Numeric value like 12,000, 24.45 or 35.

4 GRAMMAR_LETTERS

Letters of alphabet for spelling (not implemented).

5 GRAMMAR_DATE

Date values (not implemented).


443

Semantic Data Type

There are seven semantic data types. They are defined as macros in <LV_SRE_Semantic.h>

SI_TYPE_BOOL

SI_TYPE_INT

SI_TYPE_DOUBLE

SI_TYPE_STRING

SI_TYPE_OBJECT

SI_TYPE_ARRAY

SI_TYPE_NULL

Note: SI_TYPE_NULL is a special type which usually indicates that some error occurred.


444

Semantic Data Print Format

These macros are used in the SI_DATA_Print() function to specify the printing format.

SI_FORMAT_XML primitive data types are printed as string literals; objects and arrays are printed as a collection of xml key value pairs.

SI_FORMAT_ECMA primitive data types are printed as string literals; objects and arrays are printed as ecmascript objects.


445

Stream Parameters

Stream Parameters

STREAM_PARM_SOUND_FORMAT sound format stream handles - uses SOUND_FORMAT enum default value: ULAW_8KHZ

STREAM_PARM_VOICE_CHANNEL voice channel to load streamed sound data to no default - application must set

STREAM_PARM_GRAMMAR_SET grammar set to use with auto decode type streams no default - application must set if STREAM_PARM_AUTO_DECODE active

STREAM_PARM_DECODE_FLAGS decode flags to send with auto decode type streams no default - application must set if STREAM_PARM_AUTO_DECODE active

STREAM_PARM_USE_COMPRESSION use compression internally for sound data data sent to the Speech Engine and data stored to disk will be compressed to approx. 10% of normal size, this adds a small amount of load to the CPU default = 0 (off)

STREAM_PARM_DETECT_BARGE_IN if active, the speech port will discard stream data until barge-in detected default = 0 (off)

STREAM_PARM_DETECT_END_OF_SPEECH if active, the port will stop accepting stream data once end-of-speech is detected, and change stream status to STREAM_STATUS_END_SPEECH if auto_decode also active, will immediately begin decoding as well default = 0 (off)

STREAM_PARM_AUTO_DECODE if active decode will start immediately on end-of-speech detection or a call to


446

StopStream(), otherwise the application needs to call Decode to begin decode. default = 0 (off)

STREAM_PARM_BARGE_IN_TIMEOUT The streaming interface will flag STREAM_STATUS_BARGE_IN_TIMEOUT, if no speech was detected in the time frame specify by this property.

STREAM_PARM_END_OF_SPEECH_TIMEOUT After barge-in, the streaming interface will flag STREAM_STATUS_END_SPEECH_TIMEOUT, if it did detect end-of-speech in the time frame specified by this property.

STREAM_PARM_USE_FREQ_VAD. LumenVox Speech Engine API provides two Voice Activity Detection (VAD) algorithms, namely Time-domain VAD (TVAD) and Frequency-domain VAD (FVAD) . While TVAD is faster, FVAD has better performance and more flexibility. Set this parameter to 1 to enable FVAD, 0 to use TVAD. The default value is 1. Note: Each algorithm has its own set of parameters. Please make sure to use the correct parameters in your code. Below is each VAD parameter, along with the algorithm that it works with.

STREAM_PARM_BARGE_IN_BEGIN_DELAY <TVAD> number of 1/8 seconds at begriming of stream to limit barge-in during this period a much higher energy level is required to trigger barge-in this can be useful when echo-cancelled data streamed to port needs time for convergence default = 4 (0.5 seconds)

STREAM_PARM_BARGE_IN_NOISE_COUNT_LOW_THRESHOLD <TVAD> adjuster to strength of signal to trigger barge-in (and end-of-speech) lower number will trigger barge-in at lower volume if using dynamic barge-in adjust, this is the initial value. default = 55 (optimal for telephony applications)

STREAM_PARM_BARGE_IN_DYNAMIC_ADJUST <TVAD> adjust the volume trigger for barge-in dynamically, works best when audio data sent to a port is from the same source. Also works better if the EVENT_START_DECODE_SEQ and EVENT_END_DECODE_SEQ events are sent to port to signify change of audio source (as example a new telephony call is beginning). default = 1 (on)


447

STREAM_PARM_VAD_BARGEIN_LVL <FVAD> This is Signal-Noise-Ratio (SNR) threshold. An audio frame will be considered for voice activity only when the SNR metric is higher than this threshold. Lower this parameter for noisy channel, so that it is easier to barge in. The default value is 30. Note: this value is not a measurement in dB. It is just a relative value compared to an internal standard.

STREAM_PARM_VAD_EOS_DELAY <FVAD> End-of-speech delay in ms. The default value is 800ms.

STREAM_PARM_VAD_INIT_TIME <FVAD> The FVAD needs to be initialized properly to optimize the performance. The parameter sets the duration of initialization time at the beginning at each audio stream. The default value is 100ms.

STREAM_PARM_VAD_NOISE_FLOOR <FVAD> An audio frame will be considered for voice activity only when the average energy is higher than this threshold. The default value is 0. This parameter is particularly useful when the echo canceler doesn't work very well. When channel noise, background noise or residual echo causes false barge-in, try to raise this threshold to prevent low energy signal from triggering barge-in. The range is from 0 to 999, but in practice you probably won't need to set it above 200.

STREAM_PARM_VAD_WIND_BACK <FVAD> The length of audio to be wound back at the beginning of voice activity. It helps in the situation of weak speech onset. The resolution of this parameter is 1/8 sec, i.e. 125ms, which means setting this value to 249ms is same as setting it to 125ms. The default value is 250ms.

STREAM_PARM_VAD_BURST_THLD <FVAD> The FVAD algorithm triggers barge-in only after it has observed the duration of voice longer than this threshold. This threshold helps preventing bursting noise from triggering barge-in. The default value is 100ms.

STREAM_PARM_VAD_P2A_THLD <FVAD> An audio frame will be considered for voice activity only when the ratio of peak frequency band energy to average energy is higher than this threshold. This is a fine tune parameter. Usually users don't need to modify it. The valid range of this parameter is [0,1000]. The default value is 100.


448

Stream Status

STREAM_STATUS_NOT_READY

LV_SRE_StreamStart has not been called for this port.

STREAM_STATUS_READY

Stream is ready to accept data.

STREAM_STATUS_BARGE_IN

Only returned if STREAM_PARM_DETECT_BARGE_IN stream type set. Code has determined that speech has started, stream data is now being stored. (Hardware can stop playing audio when this state is reached.)

STREAM_STATUS_END_SPEECH

Only returned if STREAM_PARM_DETECT_END_OF_SPEECH stream type set. Code has determined that speech has stopped. If STREAM_PARM_AUTO_DECODE stream type has been set the decoding of audio data has begun. (Hardware can stop recording audio when this state is reached.)

STREAM_STATUS_STOPPED

Stream has stopped. Call LV_SRE_StreamStart to reset stream.

STREAM_STATUS_BARGE_IN_TIMEOUT

Barge-in was not triggered before timeout. No audio will be sent for decode.

STREAM_STATUS_END_SPEECH_TIMEOUT

End-of-speech was not detected before timeout. Note, the streaming will not stop until you call StreamStop or StreamCancel.


449

Environment Variables

Environment Variables

LV_SRE_CLIENT_CONNECT_IP

A comma (or semicolon) delimited list of IP addresses (and ports) the client will attempt to connect to. If this variable does not exists, the client will default to IP 127.0.0.1 (the local machine) and port 5000. Use a colon to separate IPs and Ports.

Example: "127.0.0.1;10.0.0.1:5001;10.10.0.1" Client will attempt to attach to the local machine, port 5000; IP address "10.0.0.1" port 5001; and IP address "10.10.0.1" port 5000.

Win32

The following environment variables need to be set up for the LVSpeechPort.Dll to function. The installation program creates these variables.

LVLANG

Location of the dictionary and language files, stored in two subdirectories: Dict and Responses.

LVBIN

Location of LVSpeechPort.Dll.

The following optional environment variables are set up for creating applications with the LVSpeechPort.DLL. See the LVSpeechPortConsole example program.

LVLIB

Location of LVSpeechPort.Lib

LVINCLUDE


450

Location of LVSpeechPort.h

Linux

The following environment variables can be used to override the default locations used by LVSpeechPort.so, and BNF_Dict.so.

LVLANG

Location of the dictionary files, stored in the Dict sub-directory. Default location "/usr/LumenVox".

LVRESPONSE

Location of the answer and response files created at run-time, stored in the Responses sub-directory. Default location "/var/LumenVox".

451

FAQs FAQs

Please email your questions to [email protected].

I cannot get the engine to recognize correctly, or my results have a low confidence.

A good speech recognition application depends on a well designed grammar. A grammar which contains very similar words (like "bit" and "pit") is an inefficient grammar that will hurt accuracy and speed. The engine will take longer as it tests the competing words against the audio. The resulting match will have a lower confidence because of the additional words which are very similar.

What do the confidence scores mean?

The confidence score is a rough measure of how closely the speech matched the phrases in the grammar. The score ranges from 0 - 1000. The higher the score the higher the estimated probability that the result. Typically, an application designer will use the confidence score to make decisions about the quality of a recognition result. For instance, results over 600 might always be accepted, results between 599 and 200 might trigger a confirmation, and results below 200 might be rejected outright. The thresholds to use depend largely on the grammar that is being used. In addition to the grammars, an application's confidence thresholds should be one of the first things to tune.

Do I need a Dialogic card?

Our engine is hardware-independent, so if the client application can collect the audio and put it into a buffer, the engine can decode. Which hardware a particular client application needs depends only on the client application.

How much memory does the Speech Engine need?

The memory requirement for running the Speech Engine is mainly determined by the maximum number of decoder threads. The start up memory usage is about 160MB, including one thread for each acoustic model. After that, each additional thread requires about 20MB. The maximum number of threads are determined by the number of processors. The more processors you have, the more simultaneous threads you can run, consequently the more memory you


452

need. In the future, we shall allow users to set the maximum number of threads on the server. Currently, typical memory requirement for running the engine is:

One processor with one acoustic model and 2 threads: 207MB. Dual processors with one acoustic model and 4 threads: 247MB. Quad-processor with one acoustic model and 8 threads: 327MB.

How fast does the computer need to be?

This is dependant on the expected density of your application. The Speech Engine can perform about 14 recognitions per minute per 100 megahertz of processor speed. This calculation is based on a single word 50 item grammar.

What are some ways to increase the recognition accuracy?

Smaller grammars always work better. The practical phrase limit is 2000, but depending on how easily the words in the grammar can be confused, or the number of branches at any point in the grammar, that number could be anywhere from 1000 to 10,000.

Longer phrases also work better. When you need to recognize a phrase like "How do I" or "transfer me to", put these in as a single phrase, not individual words. Except where recognizing a single word, (like "Yes" or "No") avoid single small words.

You can use the ABNF format to cover several variations of small words:

"How (do | would | could) (I | we | you)"

Also, attempt to cover all the words you believe a user will speak. If a word or phrase is not in the grammar, the engine will not be able to identify it.

Will the engine handle proper names?

The internal dictionary has thousands of common names. (Around half of the 120,000 words are names). If a name is not in the dictionary, the decoder will use basic rules to phonetically spell any name.

For unknown names, enter the phonetic spelling of the name if the phonetic speller is unable to come up with a good pronunciation. This has been shown to work in the vast majority of cases. The phonetic spelling can be directly

FAQs

453

entered as the phrase, if necessary, by enclosing the phoneme characters in curly braces "{ }". See Phonemes.

Can I ask for ticker symbols with your recognition engine?

Speaker-independent recognition systems have a hard time with open spelling. This is caused by the very similar sounding letters. For example, b, c, d, e, g, p, t, v and z all end with the sound of 'e'. Dictation software allows spelling because it trains for a single person's voice; many of those products also supply a phonetic alphabet system ("Alpha" for A, "Beta" for B, etc.).

In addition, there are more than sixteen thousand ticker symbols. Many of the symbols are very similar in the way they sound when being spelled out, and thus are hard to correct for:

eeee is the symbol for eMachines, Inc.

cccc is the symbol for Concord Career Colleges Inc.

How can I get around this problem?

Limit the tickers you support.

Breakdown the category of the stock. Make grammars smaller. First ask which stock exchange. Then ask for the symbol. Have a strategy available to disambiguate symbols until the proper answer is found.

What are the languages currently supported?

We currently support North American English. Spanish is the next language planned.

Does/Can LumenVox support language X?

The short answer is that, yes, LumenVox can localize/customize the products to the extent that we can add in different languages for speech recognition. There are two ways to do it:

The first option is very fast and easy to implement. Phonetically spell the (for example) Spanish words using the English phone set. For example, the Spanish word mañana can be entered {M AO N Y AO N AE}. See Phrases and Phonemes for more information on entering raw phonemes as phrases.


454

The second option requires a couple of items and more time. Basically, LumenVox needs:

- Lots of audio data in the target language; the amount can vary from 10 hours for male and 10 for female (20 total) for small vocabularies (10 -15 words), to as much as can be collected.

- The same audio data, transcribed as text.

- A machine-readable dictionary in the target language.

The first option is quite easy to implement, but loses some accuracy across very large vocabularies because the target language's sound inventory still different from the English inventory. The second option takes more time and energy to produce, but is quite a bit more accurate.

As a first step, phonetically spell each word so that your organization can test and deploy the application. Then, once you have collected enough audio data, LumenVox can train native language models and quit using the English models entirely.

With some work, LumenVox could adapt the Speech Engine itself so that it displays in a different language, but that is a special case situation.

Why does the engine occasionally recognize my speech in the Female model when I am male?

First, some notes about the "male" and "female" model. The models are entirely statistical, and the separate models just encode a speaker of type 1, and another model that encodes a speaker of type 2. It happens to be that a very useful distinction lies on gender (owing mainly to pitch differences between males and females), but there are men who sound like women and women who sound like men. In addition, it is possible that the particular utterance involved simply had better examples in the other model, so the "wrong" model did a better job of recognizing the speech. Because we trained the two separate models using data divided by gender, we named the models according to their gender as a convenience. In fact, the recognizer has no knowledge as to which gender the speaker is, only which model had the best match.

Do not use the engine to classify speakers according to their sex; the engine is not designed or intended to be used to categorize speakers according to personal characteristics, whether the characteristic is age, sex, dialect, or any other attribute. LumenVox takes NO responsibility for issues arising from using the engine in such a manner.

FAQs

455

Why does the engine always do two decodes, one in a male model, and one in a female model?

Suppose we have two models, a generic male (MM) and generic female model (MF), as well as a Speaker (S1). S1 says something, and the decoder runs two decodes, one against each model, MM and MF. The results break down as follows (for our purposes, correct means "got the right thing" whether the result is the actual string of words, or the right concept):

Case a: MM has the highest score, and the correct answer, MF may or may not return the correct answer.

Case b: MF has the highest score and the correct answer, MF may or may not return the correct answer.

Case c: MM has the highest score but returned the incorrect answer, while MF had the lower score but returned the correct answer.

Case d: MF has the highest score but returned the incorrect answer, while MM had the lower score but returned the correct answer.

Case e: Neither is correct, regardless of score.

For case e, since neither model got the right answer, all we can do is try to make the models better and the system tighter. Cases c and d are the worst case performance; we try to avoid these :). Cases a and b are the hoped-for result, since we get a correct answer. Notice that we never specify which is the "correct model" only the "correct answer". Also, note that for all cases "correct" requires some outside knowledge about which answer was correct. The engine has no such information, and is forced to choose the best answer based on highest score.

The potentially bad results are cases c and d; in this case, the recognizer picks the wrong answer, when it should have gotten the right answer had the engine more knowledge. Fortunately, c and d rarely happen; instead what we have found is that in cases a and b, the speaker's gender frequently does not always match the gender model which had the best answer. But, it doesn't actually matter, since we obtain the correct answer anyway (and we are looking for the answer, not the gender).

Running two decodes (ignoring decode history) allows us to capture each case where, for some reason, the mismatched gender model gets the right answer and the matched one blows up. There are several reasons this might happen: the mismatched model may have better coverage on the acoustics in question,


456

the speaker's voice could crack, or the speaker could be sucking on helium, etc. Since some people will waffle between the two different models, given the above, we are better off running two decodes. If we were to select a particular model based on previous history, we would lose the accuracy gain between running two models and letting the system pick the best result.

In addition, the incidence rate for mismatched, but correct answers is quite a bit higher than the incidence rate for mismatched, incorrect answers, which means running two decodes and picking the best result gets a net gain, even given incorrect answers occasionally.

That said, one plausible scenario where a client application might want to cut the second decode is for load balancing. If all 48 ports go active at once (or the system is on a slow machine), it might be better to sacrifice some accuracy to handle more customers quicker. For the systems LumenVox deploys on, we haven't had a problem with running two decodes yet; the load balancing feature is on the short term pipeline and should be online soon.

If the client application wants to track decode models for a caller, there is no restriction against doing so; load-balancing becomes an issue of deciding how many double decodes the application can handle, and then picking a permanent model for that caller/speaker. One thing not to do is to make the decision after only one utterance; let the double decodes continue for a few rounds (at least three or five) and then pick the model which had the highest score the most (the application will also need to take into account whether the decodes were correct). The gender model flags (LV_DECODE_GENDER_MALE, LV_DECODE_GENDER_FEMALE) for LV_SRE_Decode() and LVSpeechPort::Decode() tell the recognizer which model to use for the decode, thus disabling the dual decodes.

Because there is an accuracy gain doing both decodes, we recommend letting the system do both decodes for most applications. If load becomes a serious issue, than disable the double decode system and pick the model the application should use.

What is n-best?

Instead of hypothesizes only one sentence, the engine hypothesizes several sentences on what it heard. Usually the top best sentence is the highest scoring sentence. The others are the top alternative sentences, which scored lower. N-best results can be used to craft more intelligent confirmations.

Why does the API appear to cause a memory leak?

FAQs

457

A common reason that causes the memory usage to grow is keeping loading grammars without unloading them. A good practice is unloading grammars that will not be used for a while.

Also, please exercise caution when using the C API. Most of the handles created by the API, such as H_SI, H_GRAMMAR, and HPORT, need to be explicitly released after you were done using them.

458

How to Contact LumenVox LLC Web site: www.LumenVox.com Email: [email protected] Sales: [email protected] Support: [email protected] Phone: (858) 707-0707 Fax: (858) 707-7072

LumenVox LLC 3615 Kearny Villa Road, Suite # 202 San Diego, CA 92123

460

Glossary

C Concept: The string value returned by the decoder. The decoder can return

mutiple concepts. A concept represents words or phrases grouped together under single a "heading".

P Phrase: A word or series of words. Can also include BNF formated words and/or

pure phonemes.

S SISR: Semantic Interpretation for Speech Recognition; A companion to SRGS

grammars, this working draft describes a process for turning sentences recognized by an ASR into data objects usable by an application.

SRGS: Speech Recognition Grammar Specification; a W3C recommendation for the format of grammars used in a speech recognizer.

461

Index A

AddPhrase .......................... 109, 307

Asynchronously................... 120, 326

B

Backus Naur Form........................ 78

BNF............................................... 78

C

Callback Function ....................... 425

Cautions........................................ 80

ClosePort .............................. 89, 271

Concept ..... 109, 111, 131, 294, 307, 309

confidence value................. 132, 295

Contact Us .................................. 457

Copyright Information.................. 458

D

Decode ............... 118, 130, 272, 293

dictionary ...................................... 78

E

email............................................457

Environment Variables ................448

F

FAQ.............................................450

G

GetConcept .........................131, 294

GetConceptScore130, 132, 293, 295

Grammar .....................112, 310, 441

GRAMMAR_DIGITS............113, 302

GRAMMAR_LETTERS .......113, 302

GRAMMAR_MONEY ..........113, 302

GRAMMAR_NUMBER........113, 302

I

Invalid Error Code ...............139, 311

L

LoadStandardGrammar.......113, 302

LoadVoiceChannel..............116, 305


462

Logging ................... 86, 90, 268, 425

LV_DEFAULT_GRAMMAR_ALREADY_LOADED........................... 430

LV_DEFAULT_GRAMMAR_OUT_OF_RANGE ............................... 430

LV_GRAMMAR_SET_OUT_OF_RANGE ........................................ 430

LV_INVALID_SOUND_FORMAT 430

LV_RESET ................................. 430

LV_SOUND_CHANNEL_OUT_OF_RANGE ................................... 430

LV_STANDARD_GRAMMAR_ALREADY_LOADED ................ 113, 302

LV_STANDARD_GRAMMAR_OUT_OF_RANGE..................... 113, 302

LV_SYSTEM_ERROR................ 430

LV_TIME_OUT ........... 120, 326, 430

LVBIN ......................................... 448

LVINCLUDE................................ 448

LVLANG...................................... 448

LVLIB .......................................... 448

LVRESPONSE............................ 448

LVSpeechPort............................. 261

M

MillisecondsToWait .............120, 326

O

OpenPort...............................86, 268

P

pcm .....................................116, 305

PCM_16KHZ ...............................439

PCM_8KHZ .................................439

Phonemes .....................................75

Phonetic Spelling ..........................75

Phrase...........................78, 109, 307

Port........................................86, 268

Properties....................................433

Q

Questions ....................................450

R

RemoveConcept .................111, 309

ResetGrammar....................112, 310

ReturnCode.........................139, 311

ReturnErrorString ................139, 311

Index

463

S

scoring ................................ 132, 295

Sound Formats ........................... 439

speech port ........................... 86, 268

Standard Grammars ................... 441

StandardGrammar .............. 113, 302

Subdirectories............................. 448

T

Technical Support ....................... 457

TRIM_SILENCE_VALUE ............433

U

Ulaw ............................116, 305, 439

U-law...........................................439

ULAW_8KHZ...............................439

V

VoiceChannel......................116, 305

W

WaitForEngineToIdle...........120, 326

printed documentation - lumenvox · 2 release notes version 6.0: supports n-best. reduced server...

Documents