printed documentation - lumenvox · 2 release notes version 6.0: supports n-best. reduced server...
TRANSCRIPT
Printed Documentation
ii
Table Of Contents
Welcome to the LumenVox Speech Recognition Engine......................................1
Release Notes ......................................................................................................2
Version 6.0: .......................................................................................................2
Version 5.0: .......................................................................................................2
Version 4.0: .......................................................................................................2
Programmers Guide..............................................................................................4
Initializing a Speech Port...................................................................................4
C Code ..........................................................................................................4
C++ Code ......................................................................................................4
C Code ..........................................................................................................5
C++ Code ......................................................................................................6
Working with Grammars....................................................................................7
Loading A Grammar ......................................................................................7
C Code ..........................................................................................................7
C++ Code ......................................................................................................7
Activating A Grammar....................................................................................7
C Code ..........................................................................................................8
C++ Code ......................................................................................................8
See Also ........................................................................................................8
Table Of Contents
iii
Adding Audio.....................................................................................................9
Batched Audio ...............................................................................................9
C Code ..........................................................................................................9
C++ Code ......................................................................................................9
Streaming ......................................................................................................9
C++ Code ......................................................................................................9
C Code ........................................................................................................10
Decoding .........................................................................................................13
C Code ........................................................................................................13
C++ Code ....................................................................................................14
Streaming ....................................................................................................14
Getting The Return Value ............................................................................15
C Code ........................................................................................................15
C++ Code ....................................................................................................15
C Code ........................................................................................................16
C++ Code ....................................................................................................16
See Also ......................................................................................................17
Using the Speech Parse Tree .........................................................................18
Example 1: Print the Tags in the tree...........................................................18
Example 2: Print a structured tree ...............................................................19
Printed Documentation
iv
See Also ......................................................................................................21
Using the Interpretation Object........................................................................22
C API ...........................................................................................................22
C++ API .......................................................................................................22
Semantic Data Examples ............................................................................22
Example 1: Access Data Directly.................................................................24
C++ Code ....................................................................................................24
C Code ........................................................................................................24
Example 2: Traverse a Semantic Data Structure.........................................24
C Code ........................................................................................................24
Result ..........................................................................................................25
See Also ......................................................................................................26
Shutting Down the Speech Port ......................................................................27
C Code ........................................................................................................27
C++ Code ....................................................................................................27
Gotchas .......................................................................................................27
Example Code.................................................................................................28
A Working Example .....................................................................................28
main.cpp ......................................................................................................29
SimpleRecognizer.h.....................................................................................30
Table Of Contents
v
SimpleRecognizer.cpp.................................................................................31
AudioStreamer.h..........................................................................................36
AudioStreamer.cpp ......................................................................................37
HeaderClasses.h .........................................................................................39
SRGS Grammars ............................................................................................43
A Simple Grammar ......................................................................................43
Rule Expansions by Example ......................................................................46
Rule References ..........................................................................................49
Special Rules...............................................................................................51
Tags.............................................................................................................53
Applying Grammar Weights .........................................................................56
SRGS Definitions.........................................................................................58
Example Grammars.....................................................................................65
Semantic Interpretation ...................................................................................68
Intro to Semantic Interpretation....................................................................68
Semantic Interpretation by Example ............................................................70
Getting The Return Value ............................................................................74
Phonemes .......................................................................................................75
Phrases ...........................................................................................................78
BNF Refresher.............................................................................................78
Printed Documentation
vi
LumenVox SpeechRec API ................................................................................80
Cautions ..........................................................................................................80
LV_SRE C API Functions................................................................................81
LV_SRE.......................................................................................................81
API Functions ..............................................................................................86
LVInterpretation C API Functions..................................................................161
LVInterpretation Summary.........................................................................161
LVSemanticData Summary........................................................................163
API Functions ............................................................................................166
LVParseTree C API functions........................................................................190
API Functions ............................................................................................191
Related APIs..............................................................................................204
LVParseTree Class....................................................................................218
LVGrammar C API Functions........................................................................221
LVGrammar Summary...............................................................................221
API Functions ............................................................................................225
LVSpeechPort Class .....................................................................................261
class LVSpeechPort ..................................................................................261
Methods.....................................................................................................266
LVInterpretation Class...................................................................................334
Table Of Contents
vii
Intro To LVInterpretation............................................................................334
LVInterpretation: Constructing and Copying ..............................................336
ResultData.................................................................................................337
ResultName...............................................................................................338
Language...................................................................................................339
Mode..........................................................................................................340
TagFormat .................................................................................................341
InputSentence............................................................................................342
GrammarLabel...........................................................................................343
Score .........................................................................................................344
LVSemanticData Class..............................................................................345
LVSemanticObject Class ...........................................................................354
LVSemanticArray Class.............................................................................360
LVParseTree Class .......................................................................................363
LVParseTree Class....................................................................................363
Methods.....................................................................................................366
LVParseTree Inner Classes.......................................................................375
LVGrammar Class.........................................................................................388
class LVGrammar ......................................................................................388
Methods.....................................................................................................393
Printed Documentation
viii
Callback Functions ........................................................................................426
Logging Callback Function.........................................................................426
Streaming Callback Function.....................................................................427
Grammar Logging Callback Function ........................................................428
Constants ......................................................................................................429
Decoder Flags ...........................................................................................429
Error Codes ...............................................................................................431
Properties ..................................................................................................434
Sound Formats ..........................................................................................440
Standard Grammars ..................................................................................442
Semantic Data Type ..................................................................................443
Semantic Data Print Format.......................................................................444
Stream Parameters....................................................................................445
Environment Variables ..................................................................................449
Environment Variables...............................................................................449
FAQs.................................................................................................................451
FAQs .............................................................................................................451
How to Contact LumenVox LLC........................................................................458
Copyright Information........................................................................................459
Glossary............................................................................................................460
Table Of Contents
ix
Index .................................................................................................................461
1
Welcome to the LumenVox Speech Recognition Engine We strive to make our products as user-friendly as possible and we value your opinion. If there is something you would like added to the Help system, please email your suggestions to [email protected].
2
Release Notes Version 6.0:
Supports n-best.
Reduced server memory footprint.
Speed up on recognition algorithm.
Reduced server new thread start up time.
New American English acoustic models with 8~10% relative improvement on recognition accuracy.
Improved confidence score.
Global grammars are stored on server.
Version 5.0:
Support for the Speech Recognition Grammar Specifiacation (SRGS). SRGS grammars are now the official grammar format for the LumenVox Engine. SRGS grammars are powerful probabilistic context free grammars that allow a lot of flexibility in writing grammars.
Support for the Semantic Interpretation for Speech Recognition working draft (SISR). Semantic Interpretation makes it easy to transform spoken input into machine understandable data.
Version 4.0:
A new header file <LV_SRE2.h> is provided for the new C interface functions. This should be used in conjuction with <LV_SRE.h>
A new header file <LVSpeechPort2.h> is provided. This contains a new C++ wrapper class (with same name "class LVSpeechPort") which contains new methods. This replaces the <LVSpeechPort.h> header.
A new dll called "LVSpeechPort_stdcall.dll" is included to allow programming environments which require standard calls (like VB) to use
Release Notes
3
the SRE engine. The file SREAPI.txt contains a sample interface for use with VB.
4
Programmers Guide Initializing a Speech Port
The only thing you must do to initialize a speech port is to have an Speech Engine service running on your machine, and call OpenPort
C Code
HPORT port; long error_code; port = LV_SRE_OpenPort2(&error_code,NULL,NULL,0);
switch(error_code) { case LV_OPEN_PORT_FAILED__LICENSES_EXCEEDED: printf("licenses exceeded"); break; case LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING: case LV_NO_SERVER_RESPONDING printf("SRE server unavailable"); break; case LV_SUCCESS: printf("port opened"); break; }
C++ Code
LVSpeechPort port; port.OpenPort( ); int error_code = port.GetOpenPortStatus();
switch(error_code) { case LV_FAILURE: cout <<"licenses exceeded"; break; case LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING: case LV_NO_SERVER_RESPONDING cout << "SRE server unavailable"; break; case LV_SUCCESS: cout << "port opened"; break; }
Other things you can do besides opening a port include
Programmers Guide
5
Register logging callback functions
Register multiple servers
Turn on Engine sound file and result logging, for application tuning.
C Code
/* a structure to hold logfile info */ typedef struct logdata_s { long file; long message_count; }logdata_t; void logdata_callback(const char* message, void* userdata) { logdata_t* mydata = (logdata_t*)userdata; fprintf(mydata->file,"%s\n",message); ++(mydata->message_count; } int init_port (HPORT* port, logdata_t* app_message, logdata_t* log_message ) { long error_code; /* Register a callback to accept messages from the server or client library, at warning level 3 */ LV_SRE_RegisterAppLogMsg(logdata_callback,app_message, 3); /* point the client library to a local server and a remote server */ LV_SRE_SetPropertyEx(NULL,PROP_EX_SRE_SERVERS, PROP_EX_VALUE_TYPE_STRING, "127.0.0.1,10.0.0.1", PROP_EX_TARGET_CLIENT, 0); /* open the port, registering a callback to accept messages from the port at warning level 3 */ port = LV_SRE_OpenPort2(error_code, logdata_callback,log_message,3); /* turn on sound and response file logging */ int save_sound_files=1; LV_SRE_SetPropertyEx(port,PROP_EX_SAVE_SOUND_FILES, PROP_EX_VALUE_TYPE_INT_PTR, &save_sound_files, PROP_EX_TARGET_PORT,0);
Printed Documentation
6
return error_code; }
C++ Code
// a class to hold logfile info struct logdata { ofstream file; long message_count; static void callback(const char* message, void* userdata) { logdata* self = (logdata*)userdata; mydata->file << message << endl; ++(mydata->message_count; } }; int init_port (LVSpeechPort& port, logdata* app_message, logdata* log_message ) { long error_code; // Register a callback to accept messages from the server // or client library, at warning level 3. LVSpeechPort::RegisterAppLogMsg(logdata_callback,app_message, 3); // point the client library to a local server and a remote server LVSpeechPort::SetClientPropertyEx(PROP_EX_SRE_SERVERS, PROP_EX_TYPE_STRING, "127.0.0.1,10.0.0.1"); // open the port, registering a callback to accept messages // from the port at warning level 3. port.OpenPort(logdata_callback,log_message,3); // turn on sound and response file logging port.SetPropertyEx(PROP_EX_SAVE_SOUND_FILES, PROP_EX_VALUE_TYPE_INT_PTR, &save_sound_files); return port.GetOpenPortStatus(); }
Programmers Guide
7
Working with Grammars
Grammars tell the Speech Recognition Engine what words and phrases can be recognized by the engine, and in what order. The LumenVox grammar format is an implementation of Speech Recognition Grammar Specification, published by the W3C. A short tutorial on writing SRGS grammars is provided here.
Loading A Grammar
In order to decode audio, there must be at least one grammar loaded. Grammars can be loaded a variety of ways, a few of which are demonstrated below:
C Code
HPORT hport; /* Load a grammar into the global (application-level) space, and name it * nav_menu" * This grammar will be usable by any speech port on the client machine. * Any syntax warnings or error messages will be sent to the * application-level logging callback. */ LV_SRE_LoadGlobalGrammar ("nav_menu","c:/MyGrammars/top_level_navigation.gram"); /* Load a built-in grammar into the speech port, name it "yes_no". * Syntax error or warning messages * will be sent to the port's logging callback. * The hport needs to be open first, of course. */ LV_SRE_LoadGrammar(hport, "yes_no", "builtin:grammar/boolean");
C++ Code
LVSpeechPort port; port.OpenPort(); LVSpeechPort::LoadGlobalGrammar("nav_menu","c:/MyGrammars/top_level_navigation.gram"); port.LoadGrammar("yes_no", "builtin:grammar/boolean");
Activating A Grammar
Printed Documentation
8
When a grammar is loaded, it is compiled into a file usable by the Engine. But to use the grammar for a decode you must activate it. You may activate multiple grammars for a single decode; the Engine will tell you which grammar was matched.
C Code
/* Activates the "nav_menu" grammar that was loaded above. * Activate searches for a grammar named "nav_menu" in its port, then searches the global * space if it can't find it. */ LV_SRE_ActivateGrammar (hport, "nav_menu");
C++ Code
port.ActivateGrammar("nav_menu");
See Also
Grammar Writing Tutorial
Programmers Guide
9
Adding Audio
Because the LumenVox Speech Engine is hardware independent, the client application has greater flexibility when collecting the audio data. Once the audio is acquired, the client application should ensure the data is in a supported audio format. The audio must be header-less, otherwise known as "raw" audio format. For example, the standard Windows .wav files have a header which needs to be removed.
The audio data is stored in a voice channel. Each speech port has 64 different voice channels. This allows 64 different audio data samples to be stored in a speech port at once, although most applications will only need 2, one for the main answer, and one holding the results of a confirmation yes/no question.
Audio may be entered at once, as a batch decode, or it may be streamed in.
Batched Audio
To get your audio into the port all you have to do is collect your audio into a buffer and call LoadVoiceChannel
C Code
void LoadAudio(HPORT hport, void* audio, int audiolength) {
LV_SRE_LoadVoiceChannel(hport, 1, audio, audiolength, PCM_16KHZ);
}
C++ Code
void LoadAudio(LVSpeechPort& myPort, void* audio, int audiolength) {
myPort.LoadVoiceChannel(1, audio, audiolength, PCM_16KHZ); }
Streaming
In order to stream audio into the server, there are several parameters to set. We will set them to the most commonly used settings:
C++ Code
Printed Documentation
10
// The port gets opened and initialized. LVSpeechPort Port; Port.OpenPort(); // ... // let the port detect beginning and end of speech, // and handle the speech decoding automatically port.StreamSetParameter(STREAM_PARM_DETECT_BARGE_IN,1); port.StreamSetParameter(STREAM_PARM_DETECT_END_OF_SPEECH,1); port.StreamSetParameter(STREAM_PARM_AUTO_DECODE,1); //pick a voice channel to record audio and send responses to. port.StreamSetParameter(STREAM_PARM_VOICE_CHANNEL, 1); // If you wish to use your activated SRGS grammars, the grammar set // must be LV_ACTIVE_GRAMMAR_SET port.StreamSetParameter(STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET);
C Code
LV_SRE_StreamSetParameter(hport,STREAM_PARM_DETECT_BARGE_IN,1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_DETECT_END_OF_SPEECH,1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_AUTO_DECODE,1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_VOICE_CHANNEL, 1); LV_SRE_StreamSetParameter(hport,STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET);
The rest of this example will be in C++. The C version can be an exercise for the reader. Suppose we have an interface that intermittently provides audio to us. For simplicity, assume it always sends audio in u-Law 8KHz:
typedef bool (*AudioStreamCallback)(char* audio_chunk, int audio_length, void* user_data) class AudioStreamer { public: //non-blocking function. Sends audio through the callback function //at regular intervals on a separate thread. It will stop sending //audio if the callback returns "false". void StartStream(AudioStreamCallback cb, void* user_data); //The audio thread will stop sending audio through the callback if //StopStream is called. When StopStream returns, the audio thread //is no longer sending.
Programmers Guide
11
void StopStream( ); //constructors, destructors, hardware hooks, etc. //... };
The speech port also has a callback mechanism for letting the user know what state of processing it is in.
typedef void (*StreamStateChangeFn)(long new_state, unsigned long total_bytes, unsigned long recorded_bytes, void* user_data);
We can connect our speech port and the audio streamer together by way of their callbacks.
struct SimpleRecognizer { LVSpeechPort port; AudioStreamer audio; }; bool AudioCB(char* audio_chunk, int audio_length, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; self->port.StreamSendData(audio_chunk,audio_length); return true; } static void PortCB(long new_state, unsigned long total_bytes, unsigned long recorded_bytes, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; switch (new_state) { case STREAM_STATUS_READY: self->audio.StartStream(AudioCB,self); break; case STREAM_STATUS_STOPPED: case STREAM_STATUS_END_SPEECH: self->audio.StopStream(); //retrieve answers: we will define this later break; case STREAM_STATUS_BARGE_IN: //stop playing prompt break; } }
Now all that has to happen is to plug the PortCB function into the port.
Printed Documentation
12
SimpleRecognizer reco; //initialize the speech port and the audio streamer //... //start the stream. reco.port.StreamSetStateChangeCallBack(PortCB,&reco); reco.port.StreamSetParameter(STREAM_PARM_SOUND_FORMAT,ULAW_8KHZ); //StreamStart will put the port into the STREAM_STATUS_READY state, which //will trigger the audio streamer to start sending audio to the port. reco.port.StreamStart();
Programmers Guide
13
Decoding
Once grammars have been activated, and the speech port is receiving audio, The decode process can begin. The decode process sends audio and grammars to the Engine to be parsed and interpreted for meaning.
Batched Audio
With audio that is dropped directly into a speech port's voice channel, the user can explicitly call Decode, and wait for results to come back
C Code
HPORT hport; /* Let the port decide if the audio is suited for the MODEL_MALE or MODEL_FEMALE acoustic models. Otherwise, two decodes will be performed, and the port will choose afterward */ int choose_model = 1; LV_SRE_SetPropertyEx(NULL, PROP_EX_CHOOSE_MODEL, PROP_EX_VALUE_TYPE_INT_PTR, &choose_model, PROP_EX_TARGET_CLIENT,0); /* If you wish to use the LumenVox Semantic Interpretation process this flag needs to be present. */ unsigned long flags = LV_DECODE_SEMANTIC_INTERPRETATION; /* voice_channel is wherever you loaded the audio */ int voice_channel = 1; /* you should use the LV_ACTIVE_GRAMMAR_SET if you are using SRGS grammars. It is the grammar set that holds all of your active grammars. */ int grammar_set = LV_ACTIVE_GRAMMAR_SET; /* wait a max of 3 seconds before abandoning hope for the Engine to return an answer */ int timeout = 3000; LV_SRE_Decode(hport, voice_channel, grammar_set, flags); int code = LV_SRE_WaitForEngineToIdle(hport,timeout,voice_channel); if (code == LV_TIME_OUT) { /*do some clean up and exit */ } else { /* process the answers contained in the voice channel */ }
Printed Documentation
14
C++ Code
LVSpeechPort port; int choose_model = 1; LVSpeechPort::SetClientPropertyEx(PROP_EX_CHOOSE_MODEL, PROP_EX_VALUE_TYPE_INT_PTR, &choose_model); unsigned long flags = LV_DECODE_SEMANTIC_INTERPRETATION; int voice_channel = 1; int grammar_set = LV_ACTIVE_GRAMMAR_SET; int timeout = 3000; port.Decode(voice_channel, grammar_set, flags); int code = port.WaitForEngineToIdle(timeout,voice_channel); if (code == LV_TIME_OUT) { /*do some clean up and exit */ } else { /* process the answers contained in the voice channel */ }
Streaming
If you are streaming the audio into the speech port, you can elect to have the speech port handle the decode process automatically, as we did in the section on adding audio when we wrote the line:
port.SetStreamParameter(STREAM_PARM_AUTO_DECODE,1);
In order to wait for the Engine to return with results, we need to modify our callback function:
void ProcessResults(SimpleRecognizer* reco) { reco->audio.StopStream(); int code = reco->port.WaitForEngineToIdle(3000, voice_channel); if (code == LV_TIME_OUT) { /*do some clean up and exit */ } else { /* process the answers contained in the voice channel */ } } static void PortCB(long new_state, unsigned long total_bytes, unsigned long recorded_bytes, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; switch (new_state) { case STREAM_STATUS_READY: self->audio.StartStream(AudioCB,self); break; case STREAM_STATUS_STOPPED:
Programmers Guide
15
case STREAM_STATUS_END_SPEECH: ProcessResults(self); break; case STREAM_STATUS_BARGE_IN: //stop playing prompt break; } }
Getting The Return Value
If WaitForEngineToIdle returns successfully, you can grab answers out of the port. If you are using the semantic interpretation processor, you retrieve LVInterpretation objects.
C Code
if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_interp = LV_SRE_GetNumberOfInterpretations(hport,voice_channel); for (int i = 0; i < num_interp; ++i) { printf("interpretation %i:\n", i); H_SI interp = LV_SRE_CreateInterpretation(hport,voice_channel,i); const char* grammar = LVInterpretation_GetGrammarLabel(interp); int score = LVInterpretation_GetScore(interp); printf("utterance matched grammar %s with confidence %i\n",grammar,score); /* See "Using Semantic Data" to see how to handle the semantic data contained in this interpretation object by example */ /* release the interpretation handle when finished with it */ LVInterpretation_Release(interp); } }
C++ Code
if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_interp =
Printed Documentation
16
port.GetNumberOfInterpretations(voice_channel); for (int i = 0; i < num_interp; ++i) { cout <<"interpretation "<< i <<":"<<endl; LVInterpretation interp = port.GetInterpretation(voice_channel,i); const char* grammar = interp.GrammarLabel( ); int score = interp.Score( ); cout <<"utterance matched grammar "<<grammar<<" with confidence "<<score<<endl; // See "Using Semantic Data" to see how to handle the semantic data // contained in this interpretation object by example } }
If you are not using semantic interpretation, you can receive LVParseTree objects from the Engine.
C Code
if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_parses = LV_SRE_GetNumberOfParses(hport,voice_channel); for (int i = 0; i < num_parses; ++i) { printf("interpretation %i:\n", i); H_PARSE_TREE parse = LV_SRE_CreateParseTree(hport,voice_channel,i); /* See "Using the Parse Tree" to see how to handle the parse tree by example */ /* release the parse tree when finished with it */ LVParseTree_Release(parse); } }
C++ Code
if (code == LV_TIME_OUT) {/* do some clean up and exit */} else { int num_parses = port.GetNumberOfParses(voice_channel); for (int i = 0; i < num_parses; ++i) {
Programmers Guide
17
cout <<"interpretation "<< i <<":"<<endl; LVParseTree parse = port.GetParseTree(voice_channel,i); // See "Using the Parse Tree" to see how to handle // the parse tree by example } }
See Also
Using Semantic Data
Using the Parse Tree
Printed Documentation
18
Using the Speech Parse Tree
#include <LV_SRE_ParseTree.h>
A ParseTree represents a sentence diagram of engine output, according to the SRGS grammar that was matched. Information about the tree is accessed through iterators.
Here are a few code examples to show how information can be accessed from the speech parse tree. In every example, the active grammar will be:
#ABNF 1.0; language en-US; mode voice; tag-format <XML>; //a made up tag format.
root $PhoneNumber;
$Digit = one {1} | two {2} | three {3} | four {4} | five {5} | six {6} | seven {7} | eight {8} | nine {9} | (zero | oh) {0};
$AreaCode = [area code | one] {<AREA_CODE>} $Digit<3> {</AREA_CODE>};
$PhoneNumber = [$AreaCode] {<PHONE>} $Digit<7> {</PHONE>};
And the decoded sentence will be "area code eight five eight seven o seven o seven o seven". If you do not understand how to write an SRGS Grammar, read the tutorial now.
Example 1: Print the Tags in the tree
C++ API
#include <LV_SRE_ParseTree.h> #include <iostream>
using namespace std;
void PrintTags(LVParseTree& Tree) { LVParseTree::Iterator Itr = Tree.Begin(); LVParseTree::Iterator End = Tree.End();
Programmers Guide
19
for (; Itr != End; ++Itr) { if (Itr->IsTag()) { cout << Itr->Text() << "\n"; } } }
C API
#include <LV_SRE_ParseTree.h>
void PrintTags(H_PARSE_TREE Tree) { H_PARSE_TREE_NODE N; H_PARSE_TREE_ITR Itr; Itr = LVParseTree_CreateIteratorBegin(Tree); for (; !LVParseTree_Iterator_IsPastEnd(Itr); LVParseTree_Iterator_Advance(Itr)) { N = LVParseTree_Iterator_GetNode(Itr); if (LVParseTree_Node_IsTag(N)) { printf("%s ",LVParseTree_Node_GetLabel(N)); } } LVParseTree_Iterator_Release(Itr); }
Result
"<AREA_CODE> 8 5 8 </AREA_CODE> <PHONE> 7 0 7 0 7 0 7 </PHONE>"
Example 2: Print a structured tree
C++ API
#include <LV_SRE_ParseTree.h> #include <iostream> using namespace std; void PrintNode(LVParseTree::Node& N)
Printed Documentation
20
{ for (int i = 0; i < N.Level(); ++i) cout << " "; if (N.IsTerminal()) cout << "\"" << N.Text() << "\"\n"; if (N.IsTag()) cout << "{ " << N.Text() << " }\n"; if (N.IsRule()) { cout << "$" << N.RuleName() << ":\n"; LVParseTree::ChildrenIterator Itr = N.ChildrenBegin(); LVParseTree::ChildrenIterator End = N.ChildrenEnd(); for (;Itr != End; ++Itr) PrintNode(*Itr); } } void PrintTree(LVParseTree& Tree) { PrintNode(Tree.Root()); }
C API
#include <LV_SRE_ParseTree.h> #include <stdio.h> void PrintNode(H_PARSE_TREE_NODE N) { H_PARSE_TREE_CHILDREN_ITR I; int i; for (i = 0; i < LVParseTree_Node_GetLevel(N); ++i) printf(" "); if (LVParseTree_Node_IsTerminal(N)) printf("\"%s\"\n",LVParseTree_Node_GetText(N)); if (LVParseTree_Node_IsTag(N)) printf("{ %s }\n",LVParseTree_Node_GetText(N)); if (LVParseTree_Node_IsRule(N)) { printf("$%s:\n",LVParseTree_Node_GetRuleName(N)); I = LVParseTree_Node_CreateChildrenIterator(N); while (!LVParseTree_ChildrenIterator_IsPastEnd(I)) { PrintNode(LVParseTree_ChildrenIterator_GetNode(I)); LVParseTree_ChildrenIterator_Advance(I); } LVParseTree_ChildrenIterator_Release(I); } }
Programmers Guide
21
void PrintTree(H_PARSE_TREE Tree) { PrintNode(LVParseTree_GetRoot(Tree)); }
Result:
$PhoneNumber: $AreaCode: "AREA" "CODE" { <AREA_CODE> } $Digit: "EIGHT" { 8 } $Digit: "FIVE" { 5 } $Digit: "EIGHT" { 8 } { </AREA_CODE> } { <PHONE> } $Digit: "SEVEN" { 7 } $Digit: "OH" { 0 } $Digit: "SEVEN" { 7 } $Digit: "OH" { 0 } $Digit: "SEVEN" { 7 } $Digit: "OH" { 0 } $Digit: "SEVEN" { 7 } { </PHONE> }
See Also
LVParseTree C API
LVParseTree C++ API
Printed Documentation
22
Using the Interpretation Object
#include <LV_SRE_Semantic.h>
When the speech port executes your semantic interpretation tags, the output is an ECMAScript (JavaScript) object. LumenVox provides a C and C++ API for examining this object. When the speech port has finished its decode, and processed the resulting parse tree and tags, you may request an interpretation object. The interpretation object contains information about the decode -- confidence score, matching grammar, etc -- plus a single semantic data object.
C API
H_SI interpretation = LV_SRE_CreateInterpretation (hport,voicechannel,index); /* the name of the active grammar that matched this interpretation */ const char* grammar = LVInterpretation_GetGrammarLabel (interpretation); /* the SRE's confidence in this interpretation */ int confidence = LVInterpretation_GetScore (interpretation); /* the sentence that the SRE decoded */ const char* sentence = LVInterpretation_GetInputSentence (interpretation); /* the object returned by the semantic interpretation process */ H_SI_DATA result_data = LVInterpretation_GetResultData (interpretation);
C++ API
LVInterpretation interpretation = port.GetInterpretation (voicechannel, index); const char* grammar = interpretation.GrammarLabel( ); int confidence = interpretation.Score( ); const char* sentence = interpretation.InputSentence ( ); LVSemanticData result_data = interpretation.ResultData ( );
Semantic Data Examples
In the following examples, the grammar will be:
Programmers Guide
23
#ABNF 1.0; language en-US; mode voice; tag-format <lumenvox/1.0>; //This line tells the engine how to interpret the grammar's tags. //currently, only "lumenvox/1.0" or "semantics/1.0" is supported. root $small_number_and_text; $base = (one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9") { $ = parseInt($) }; $teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19" { $ = parseInt($) }; $twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"| seventy:"70"|eighty:"80"|ninety:"90"){ $ = parseInt($) } [$base { $ += $base }]; $tens = ($base|$teen|$twenty_to_ninetynine) { $ = $$ }; $hundred = ([a] hundred {$ = 100} | $base hundred {$ = 100 * $base}); $small_number = $hundred {$ = $$} [[and] $tens {$ += $$}] | $tens { $ = $$ }; $small_number_and_text = $small_number { $.number = $$; $.text = $$$.text };
And the input sentence will be "four hundred and six". If you do not understand how SRGS grammars are written, or how the semantic interpretation process works, please read the SRGS Grammar and/or Semantic Interpretation tutorials now.
The result of the semantic interpretation process on the input sentence is an ECMAScript object that looks like this:
small_number_and_text : // return value of type SI_TYPE_OBJECT { number: 406, // property of type SI_TYPE_INT text: "four hundred and six" // property of type SI_TYPE_STRING }
Printed Documentation
24
Example 1: Access Data Directly
If we knew that our application would always be receiving an object containing an integer property named "number", and a string property named "text", we could write code to retrieve the data as follows:
C++ Code
LVSemanticObject result_obj = interpretation.ResultData().GetSemanticObject( ); int number = result_obj["number"].GetInt( ); const char* text = result_obj["text"].GetString( );
C Code
H_SI_DATA result = LVInterpretation_GetResultData(interpretation); H_SI_DATA number_container = LVSemanticObject_GetPropertyValue(result,"number"); int number = LVSemanticData_GetInt(number_container); H_SI_DATA text_container = LVSemanticObject_GetPropertyValue(result,"text"); const char* text = LVSemanticData_GetString(text_container);
Example 2: Traverse a Semantic Data Structure
The following code prints a generic interpretation object as an XML fragment.
C Code
void PrintXML(H_SI hsi) { const char* result_name = LVInterpretation_GetResultName(hsi); printf("<%s>\n",result_name); PrintDataXML(LVInterpretation_GetResultData(hsi)); printf("</%s>\n",result_name); } void PrintDataXML(H_SI_DATA hsi) { int i; int n; const char* property_name; H_SI_DATA data;
Programmers Guide
25
switch(LVSemanticData_GetType(hsi)) { case SI_TYPE_BOOL: LVSemanticData_GetBool(hsi) ? printf("true\n") : printf("false\n"); break; case SI_TYPE_INT: printf("%d\n", LVSemanticData_GetInt(hsi)); break; case SI_TYPE_DOUBLE: printf("%f\n", LVSemanticData_GetDouble(hsi)); break; case SI_TYPE_STRING: printf("%s\n", LVSemanticData_GetString(hsi));
break; case SI_TYPE_OBJECT: n = LVSemanticObject_GetNumberOfProperties(hsi); for (i = 0; i < n; i++) { property_name = LVSemanticObject_GetPropertyName(hsi, i) data = LVSemanticObject_GetPropertyValue(hsi,property_name); printf("<%s>\n", property_name); PrintDataXML(data); printf("</%s>\n", property_name); } break; case SI_TYPE_ARRAY: n = LVSemanticArray_GetSize(hsi); for (i = 0; i < n; i++) { data = LVSemanticArray_GetElement(hsi,i); printf("<item>\n"); PrintDataXML(data); printf("</item>\n"); } break; } }
Result
<small_number_and_text> <number> 406 </number> <text> four hundred and six </text> </small_number_and_text>
Printed Documentation
26
See Also
Semantic Interpretation C API
Semantic Interpretation C++ API
Programmers Guide
27
Shutting Down the Speech Port
When the speech port is no longer needed it should be closed. Closing every unnecessary speech port frees up licensed ports, and releases all of the speech port's resources.
C Code
HPORT hport; /* open it...do some stuff...close when done */ LV_SRE_ClosePort (hport);
C++ Code
LVSpeechPort Port; //open it...do some stuff...close when done Port.ClosePort ( );
Gotchas
While closing the port may seem trivial, as soon as you start streaming audio to the port from a separate thread, the trivial can be problematic. Remember to completely disengage your stream from the port before you close it.
Printed Documentation
28
Example Code
A Working Example
Included in this documentation is a working example that incorporates streaming audio, SRGS grammars, and Semantic Interpretation. It is written in C++, is based on examples throughout this documentation, and compiles under Visual C++ 6.0.
It consists of six files.
main.cpp -- The entry point into the application.
SimpleRecognizer.h -- Definition of a recognizer, backed by LVSpeechPort.
SimpleRecognizer.cpp -- Implementation file.
AudioStreamer.h -- Definition of an object that mimics streaming by reading an audio file.
AudioStreamer.cpp -- Implementation file.
HeaderClasses.h -- Thread code to help implement AudioStreamer.
Programmers Guide
29
main.cpp
#include "AudioStreamer.h" #include "SimpleRecognizer.h" #include <iostream> int main() { SimpleRecognizer Reco; Reco.LoadGrammar("yesno","builtin:grammar/boolean"); AudioStreamer Audio("yesplease.ulaw"); Reco.Recognize(&Audio,"yesno"); Reco.WaitUntilDone(); std::cout << std::endl << Reco.GetResult() << std::endl << std::endl; return 0; }
Printed Documentation
30
SimpleRecognizer.h
#ifndef SIMPLE_RECOGNIZER_H #define SIMPLE_RECOGNIZER_H #include "AudioStreamer.h" #include <LVSpeechPort.h> class SimpleRecognizer { public: SimpleRecognizer(); ~SimpleRecognizer(); void WaitUntilDone(); void LoadGrammar(const std::string& grammar_name, const std::string& grammar_location); void Recognize(AudioStreamer* Stream, const std::string& grammar_name); const std::string& GetResult(); private: static void PortCB(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData); static bool AudioCB(char* audio_data, int audio_data_size, void* user_data); bool finished_decode; AudioStreamer* AudioThread; LVSpeechPort port; int voiceChannel; void GetAnswers(); std::string result; }; #endif//SIMPLE_RECOGNIZER_H
Programmers Guide
31
SimpleRecognizer.cpp
#include "SimpleRecognizer.h" #include <sstream> //============================================================================================== // callback for messages from the speech port void logger(const char* msg, void* userdata) { std::cout << msg << std::endl; } //============================================================================================== // code to plug LVSemanticData into any standard stream std::ostream& operator << (std::ostream& os ,const LVSemanticData& Data) { int i; LVSemanticObject Obj; switch (Data.Type()) { case SI_TYPE_BOOL: os << Data.GetBool() << "\n"; break; case SI_TYPE_INT: os << Data.GetInt() << "\n"; break; case SI_TYPE_DOUBLE: os << Data.GetDouble() << "\n"; break; case SI_TYPE_STRING: os << Data.GetString() << "\n"; break; case SI_TYPE_OBJECT: Obj = Data.GetSemanticObject(); for (i = 0; i < Obj.NumberOfProperties(); ++i) { os <<"<property name=" << Obj.PropertyName(i) << ">\n"; os << Obj.PropertyValue(i); os << "</property>\n"; } break; case SI_TYPE_ARRAY: for (i = 0; i < Data.GetSemanticArray().Size(); ++i) { os << "<element>\n"; os << Data.GetArray().At(i); os << "</element>\n";
Printed Documentation
32
} break; } return os; } //============================================================================================== // code to plug LVInterpretation into any standard stream std::ostream& operator << (std::ostream& os, const LVInterpretation& Interp) { os << "<interpretation grammar=\""<<Interp.GrammarLabel() <<"\" score=\""<<Interp.Score()<<"\">"<<std::endl; os << "<result name=\""<<Interp.ResultName()<<"\">"<<std::endl; os << Interp.ResultData(); os << "</result>"<<std::endl; os << "<input>"<<std::endl; os << Interp.InputSentence()<<std::endl; os << "</input>"<<std::endl; os << "</interpretation>"; return os; } //============================================================================================== void SimpleRecognizer::WaitUntilDone() { while (!finished_decode) Sleep(50); } //============================================================================================== SimpleRecognizer::SimpleRecognizer() : voiceChannel(1), finished_decode(true), AudioThread(NULL) { LVSpeechPort::RegisterAppLogMsg(logger,NULL,6); int v = port.OpenPort(logger,NULL,6); if (v != LV_SUCCESS) { std::cout << LVSpeechPort::ReturnErrorString(port.GetOpenPortStatus()) << std::endl; exit(-1); } // Turn on frequency based voice activity detector port.StreamSetParameter(STREAM_PARM_USE_FREQ_VAD,1); port.StreamSetParameter(STREAM_PARM_DETECT_BARGE_IN, 1); port.StreamSetParameter(STREAM_PARM_DETECT_END_OF_SPEECH, 1); port.StreamSetParameter(STREAM_PARM_VOICE_CHANNEL, voiceChannel); port.StreamSetParameter(STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET); //Let the port handle the decode process port.StreamSetParameter(STREAM_PARM_AUTO_DECODE, 1);
Programmers Guide
33
//and use semantic interpretation processor port.StreamSetParameter(STREAM_PARM_DECODE_FLAGS, LV_DECODE_SEMANTIC_INTERPRETATION); port.StreamSetStateChangeCallBack(PortCB, this); } //============================================================================================== SimpleRecognizer::~SimpleRecognizer() { port.ClosePort(); } //============================================================================================== void SimpleRecognizer::PortCB(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData) { SimpleRecognizer* self = (SimpleRecognizer*)UserData; switch (NewState) { case STREAM_STATUS_END_SPEECH: if (!self->finished_decode) { self->AudioThread->StopStream(); self->GetAnswers(); self->finished_decode = true; } break; case STREAM_STATUS_STOPPED: if (!self->finished_decode) { self->AudioThread->StopStream(); self->GetAnswers(); self->finished_decode = true; } break; case STREAM_STATUS_NOT_READY: break; case STREAM_STATUS_READY: self->finished_decode = false; self->AudioThread->StartStream(AudioCB,self); break; } } //============================================================================================== void SimpleRecognizer::LoadGrammar(const std::string& grammar_name, const std::string& grammar_location) {
Printed Documentation
34
port.LoadGrammar(grammar_name.c_str(), grammar_location.c_str()); } //============================================================================================== bool SimpleRecognizer::AudioCB(char* audio_data, int audio_data_size, void* user_data) { SimpleRecognizer* self = (SimpleRecognizer*)user_data; self->port.StreamSendData(audio_data,audio_data_size); return true; } //============================================================================================== void SimpleRecognizer::Recognize(AudioStreamer* Audio, const std::string& grammar_name) { finished_decode = false; AudioThread = Audio; port.DeactivateGrammars();//clear out old grammars. port.ActivateGrammar(grammar_name.c_str()); port.AddEvent(EVENT_START_DECODE_SEQ); port.StreamSetParameter(STREAM_PARM_SOUND_FORMAT,ULAW_8KHZ); port.StreamStart(); } //============================================================================================== void SimpleRecognizer::GetAnswers() { int val; val = port.WaitForEngineToIdle(3000,voiceChannel); if (val < 0) { result = "<noanswer/>"; return; } //view the results of the decode: std::stringstream ss; int numInterp = port.GetNumberOfInterpretations(voiceChannel); for (int t = 0; t < numInterp; ++t) { ss << port.GetInterpretation(voiceChannel,t); } result = ss.str(); } //============================================================================================== const std::string& SimpleRecognizer::GetResult() {return result;}
Programmers Guide
35
//==============================================================================================
Printed Documentation
36
AudioStreamer.h
#include "HeaderClasses.h" #ifndef AUDIO_STREAMER_H #define AUDIO_STREAMER_H typedef bool (*AudioStreamCB)(char* audio_chunk, int chunk_size, void* user_data); /** class AudioStreamer Mimics live audio being streamed. It reads audio a bit at a time from a file, periodically calling a user provided callback function to transmit the audio. It stops transmitting audio when the user callback function returns false. If it reaches the end of file before the callback tells it to stop, then it just sends silence. The audio is assumed to be a headerless u-Law audio file at 8Khz **/ class AudioStreamer : Demo::Thread { public: AudioStreamer(const char* filename); void StartStream(AudioStreamCB _cb, void* _user_data); void StopStream(); ~AudioStreamer(); private: char* audio_buffer; char* end_buffer; int audio_buffer_size; int increment_ms; AudioStreamCB cb; void* user_data; virtual void ThreadAction(); }; #endif//AUDIO_STREAMER_H
Programmers Guide
37
AudioStreamer.cpp
#include "AudioStreamer.h" #include <stdio.h> #include <fcntl.h> #include <io.h> //================================================================================== AudioStreamer::AudioStreamer(const char* filename): increment_ms(300), audio_buffer_size(0), audio_buffer(NULL) { int audio_handle = _open(filename, _O_BINARY | _O_RDONLY); if (audio_handle <= 0) { printf("could not open audio file %s\n",filename); exit(-1); } audio_buffer_size = _lseek(audio_handle,0L,SEEK_END); _close(audio_handle); audio_handle = _open(filename, _O_BINARY | _O_RDONLY); audio_buffer = new char[audio_buffer_size]; _read(audio_handle, audio_buffer, audio_buffer_size); _close(audio_handle); } //================================================================================== AudioStreamer::~AudioStreamer() { ThreadStop(); delete[] audio_buffer; } //================================================================================== void AudioStreamer::StartStream(AudioStreamCB CB, void* UserData) { cb = CB; user_data = UserData; ThreadActivate(); ThreadStart(); printf("audio stream started\n"); } //================================================================================== void AudioStreamer::StopStream()
Printed Documentation
38
{ ThreadStop(); printf("audio stream stopped\n"); } //================================================================================== void AudioStreamer::ThreadAction() { printf("audio thread working\n"); int chunk_size; int end_chunk_size; char* current_pos = audio_buffer; bool feed_more = true; chunk_size = 8000*1*increment_ms/1000; end_chunk_size=chunk_size; end_buffer = new char[end_chunk_size]; memset(end_buffer,0,end_chunk_size); while(current_pos != audio_buffer + audio_buffer_size && feed_more && !IsThreadShuttingDown()) { if(current_pos + chunk_size > audio_buffer + audio_buffer_size) { chunk_size = (audio_buffer+audio_buffer_size) - current_pos; } feed_more = cb(current_pos,chunk_size,user_data); current_pos += chunk_size; printf("sending audio\n"); Sleep(increment_ms); } while(feed_more && !IsThreadShuttingDown()) { feed_more = cb(end_buffer,end_chunk_size,user_data); Sleep(increment_ms); printf("sending dead air\n"); } printf("audio thread told to shut down\n"); delete[] end_buffer; } //==================================================================================
Programmers Guide
39
HeaderClasses.h
#ifndef HEADER_ONLY_HELPER_CLASSES_DEFINED #define HEADER_ONLY_HELPER_CLASSES_DEFINED #include <string> #include <process.h> #include <time.h> #include <sys/types.h> #include <sys/timeb.h> #include <Windows.h> #undef GetObject namespace Demo { //critical section wrapper class CS { public: CS(): m_busy(false) { InitializeCriticalSection( &m_cs ); } virtual ~CS() { DeleteCriticalSection( &m_cs ); } bool IsBusy() const { return m_busy; } //only valid at time of call void Enter() { EnterCriticalSection( &m_cs ); m_busy = true; } void Leave() { // Be careful, linux allows other non-owner of cs to unlock m_busy = false; LeaveCriticalSection( &m_cs ); } bool Try() { if (m_busy) return false; Enter(); return true; }
Printed Documentation
40
private: volatile bool m_busy; CRITICAL_SECTION m_cs; }; //simple way to lock critical section (releases in destructor) class CSLock { public: CSLock(CS& cs) { m_localCs = &cs; m_localCs->Enter(); } virtual ~CSLock() { m_localCs->Leave(); } private: CS* m_localCs; }; //simple windows event wrapper class Event { public: Event() { m_event = CreateEvent(NULL, false, false, NULL); } virtual ~Event() { CloseHandle( m_event ); } bool Wait(unsigned int timeout = INFINITE) { return WaitForSingleObject( m_event, timeout ) != WAIT_TIMEOUT; } bool Reset() { return ResetEvent( m_event ) != 0; } bool Signal() { return SetEvent( m_event ) != 0; } bool Try() { return Wait(0);
Programmers Guide
41
} private: HANDLE m_event; }; //a thread class. Have your class derive from this one, override the Thread() function. class Thread { bool Running; bool ShuttingDown; bool InUserThread; HANDLE hThread; unsigned int thrdaddr; CS CS; Event Event; public: Thread() { Running = false; ShuttingDown = true; InUserThread = false; } virtual ~Thread(){ ThreadStop(); } virtual void ThreadAction() = 0; //derive and override the ThreadAction function bool ThreadActivate() { CSLock L(CS); if (Running) return false; ShuttingDown = false; Running = true; InUserThread = false; hThread = (HANDLE) _beginthreadex(NULL, 0, CallBackThread ,(LPVOID) this, 0, &thrdaddr); return true; } bool ThreadStart() { CSLock L(CS); if (!Running || ShuttingDown || InUserThread) return false; Event.Signal(); return true; } bool ThreadStop(unsigned long WaitTime = 1000) { { CSLock L(CS); if (!Running) return false;
Printed Documentation
42
ShuttingDown = true; Event.Signal(); } if (WaitForSingleObject(hThread, WaitTime) == WAIT_TIMEOUT) TerminateThread(hThread,0); Sleep(50); thrdaddr = 0; Running = false; return true; } bool IsThreadRunning(){CSLock L(CS);return Running;}; bool IsThreadShuttingDown(){CSLock L(CS);return ShuttingDown;}; private: static unsigned int __stdcall CallBackThread(void* p) { ((Thread*)p)->InternalThread(); return 0; } void InternalThread() { while (!ShuttingDown) { if (Event.Wait(2000)) { { CSLock L(CS); InUserThread = true; } ThreadAction(); { CSLock L(CS); InUserThread = false; } } } { CSLock L(CS); Running = false; } } }; }//namespace Demo #endif
Programmers Guide
43
SRGS Grammars
A Simple Grammar
We will begin our look at writing SRGS grammars with a simple grammar that lets the engine recognize the words "yes" or "no". Yes or no grammars are the "hello world" of grammar writing.
Example
#ABNF 1.0; language en-US; //use the American English pronunciation dictionary. mode voice; //the input for this grammar will be spoken words (as opposed to DTMF)
root $yesorno;
$yes = yes; $no = no; $yesorno = $yes | $no;
This grammar contains most of the elements of any grammar you will write. Let's take it apart.
The Grammar Identifier
Any SRGS grammar written in ABNF notation must begin with the line
#ABNF 1.0;
With no additional characters. This identifies to the LumenVox grammar compiler that the file being read is an ABNF grammar, as opposed to an SRGS XML grammar, or other future supported grammar formats.
The Grammar Header
Following the identifier, a well formed grammar will contain information about the language the grammar is written in, the expected interaction mode, and the name of a rule where the engine will begin its search (the root rule). In addition, the header may contain one or more tags, and an identifier describing the tag format for this grammar. Tags will be discussed later in this tutorial.
Printed Documentation
44
The contents of the grammar header may be in any order, but no header data may occur in the file after the first rule is written.
Comments
ABNF grammars may contain comments anywhere in their body (with the exception of the first line, containing the grammar identifier). The comment format is the same one used by the C, C++, and Java programming languages.
Rules
The rules of a grammar specify what word combinations the engine may recognize. They are the heart of the grammar. Each rule has a name, appearing on the left hand side of an "=" sign, and a rule expansion, appearing on the right hand side.
The rule name starts with a "$", then a letter followed by additional letters, numbers, or underscore characters.
The rule expansion describes to the engine what sequences of words will allow a rule to be matched. An entire grammar is matched if its root rule is matched.
The first rule in the above grammar is matched if the engine detects the word "yes" being spoken. The second rule is matched if the word "no" is detected. The third rule contains a "|" symbol, which is a logical "or" operator. So the third rule is matched if the $yes or $no rules are matched.
Most of the rest of this tutorial will be concerned with writing more and more expressive rule expansions.
How the Speech Engine Uses a Grammar
When the engine begins decoding your audio, it starts at the root rule of the provided grammar, in this case the rule $yesorno. It then steps through all legal expansions, looking for the first words it's allowed to listen for. It moves into the rules $yes and $no, since it's allowed to match against either rule. Since the first words in the rules $yes and $no are "yes" and "no", the engine knows that it is allowed to recognize either word.
If the engine detects "yes" as a possibility, it then looks for the next word it can recognize in the $yes rule. Since there are no more words in the $yes rule, the rule is matched. And since the $yes rule is matched, the $yesorno root rule is matched, so the entire grammar is matched.
Programmers Guide
45
Next Rule Expansions
Printed Documentation
46
Rule Expansions by Example
Rule expansions are built by combining together small phrases with a number of grammar operations. The operations are
Operation Example Description
Alternatives $rule = $A | $B;
match A or B
Optional Expansion
$rule = $A [$B];
match A possibly followed by B
Repetition $rule = $A <7>;
match A 7 times
Rule Alternatives
As we saw in the previous "yes no" grammar, the SRE can be told to accept one or more possibilities by using the rule alternative operator "|".
Example
$toppings = pepperoni | sausage | green peppers;
The above rule is matched by the phrases "pepperoni", "sausage", or "green peppers".
Note that the rule alternative operator is greedy. It collects "peppers" with "green" to form the alternative "green peppers". If you wish to scope the effects of the rule alternative operator, you can use parentheses.
Example
$pizza = (pepperoni | sausage) pizza;
This rule matches "pepperoni pizza" or "sausage pizza". Without the parentheses, it would match "pepperoni" or "sausage pizza".
Programmers Guide
47
Optional Expansion
If you wish to make a portion of a rule expansion optional, you can wrap that portion of the expansion in the optional operator "[ ]"
Example
$yes = yes [please];
This rule matches "yes" or "yes please".
Any of the SRGS operators may be wrapped inside each other, or used in sequence, to create more and more expressive sentences.
Example
$yes = yes [please | thank you];
This rule matches "yes", "yes please", or "yes thank you".
Repetition
If you wish to allow a portion of a rule expansion to be repeated a number of times, you can use the repeat operator "< >". The repeat operator can be used to specify a fixed number of repetitions, or a range of repetitions.
Example
$digit = one | two | three | four | five | six | seven | eight | nine | zero; $seven_digits = $digit <7>; $seven_to_ten_digits = $digit <7-10>; $one_or_more_digit = $digit <1->;
The $seven_digits rule allows any seven digit combination to be recognized. The $seven_to_ten_digits rule allows any seven to ten digit combination to be recognized. The $one_or_more_digit rule allows one or more digits to be recognized.
The repeat operator is tightly binding; it only applies to whatever immediately precedes it. Use parentheses to control how much of a rule expansion it applies to.
Example
Printed Documentation
48
$oh_boy1 = oh boy <3>; $oh_boy2 = (oh boy)<3>;
The rule $oh_boy1 matches "oh boy boy boy". $oh_boy2 matches "oh boy oh boy oh boy";
Next Rule References
Programmers Guide
49
Rule References
You can reference grammar rules inside rule expansions, as we have already seen. You can also reference external grammar files--or rules within external files -- to create more complex grammars, and re-use existing grammar solutions. As an example, suppose you had a simple phone number grammar in a remote location that looked like this:
http://www.mycompany.com/phone_number.gram
#ABNF 1.0; language en-US; mode voice; root $phone_number;
$phone_number = [$area_code] $number;
$digit = one | two | three | four | five | six | seven | eight | nine | zero; $area_code = [one | area code] $digit<3>; $number = $digit<7>;
You can use this grammar in another grammar by using its location as a rulename.
#ABNF 1.0; language en-US; mode voice; root $main;
$main = (my | the) [phone] number is $<http://www.mycompany.com/phone_number.gram>;
The above grammar is using the root rule of the phone_number grammar in its $main rule. You can reference grammar files using http, ftp, or your operating systems local or network file descriptors. When writing grammars that utilize external grammar files, it's usually a good idea to specify a base URI in your grammar header.
To use a single rule in an external grammar, append the grammar name with the "#" symbol.
Example
Printed Documentation
50
#ABNF 1.0; language en-US; mode voice; root $main;
$main = (my | the) area code is $<http://www.mycompany.com/phone_number.gram#area_code>;
In addition to referencing external grammar files, you can also reference any of the LumenVox built-in grammars.
Example
#ABNF 1.0; language en-US; mode voice; root $main;
$main = (my | the) [phone] number is $<builtin:grammar/phone>;
Next Special Rules
Programmers Guide
51
Special Rules
In addition to the rules you create, there are several reserved rules that dictate special behaviour for the Speech Engine. These rules are
$NULL
$VOID
$GARBAGE
NULL
The $NULL rule is automatically matched as soon as it is seen. Users rarely need to use the $NULL rule, but it can be useful when creating grammars programmatically. The $NULL rule is illustrated below with standard grammar operations rewritten to use the $NULL rule.
Example 1
$yes = $yes [please];
/* Identical rule expansion using the $NULL rule */ $yes = $yes (please | $NULL);
Example 2
$oh_boy = (oh boy)<0->;
/* Identical rule expansion using the $NULL rule */ $oh_boy = oh boy $oh_boy | $NULL;
VOID
The $VOID rule invalidates any rule that contains it, and hence any answer that contains it.
Example
#ABNF 1.0; language en-US; mode voice;
Printed Documentation
52
root $yesorno;
$yes = yes [please]; $no = no $VOID;
If the engine recognizes the word no being spoken with the above grammar, it will invalidate the answer, and the engine will return with no answer.
GARBAGE
The $GARBAGE rule engages the out-of-vocabulary filter of the engine, allowing it to listen for arbitrary phonetic sequences until it hears the next matching word in the grammar. The garbage that was matched will not be returned by the engine.
Example
#ABNF 1.0; language en-US; mode voice;
root $yesorno;
$yes = yes [please]; $no = no $GARBAGE;
The above grammar could allow the user to say "no", "no thank you", or "no you stupid machine" (Though we've never heard anyone say that last one).
When using the $GARGAGE rule, keep in mind that engaging the out-of-vocabulary filter can slow down recognition times, and even cause additional mis-recognitions if used too aggressively. We recommend creating specific "filler" models using grammar rules that match frequently occurring out-of-vocabulary words instead of using the $GARBAGE rule, if possible.
Next Tags
Programmers Guide
53
Tags
Tags are special grammar tokens that can contain any information you wish to put in them. Tags are completely ignored when the engine uses your grammar. Any time the engine sees a tag in a rule, it skips right over it. But what makes tags useful, is that when the engine returns the results of a decode, it returns the tags it saw -- in the order it saw them -- along with the words and rules it recognized. This makes tags an good way to store post-processing information.
Example
#ABNF 1.0; language en-US; mode voice; root $yesorno;
$yes = yes [please] {!{ returnvalue: true }!}; // This is a tag $no = no [way | thank you] { returnvalue: false }; // Another tag $yesorno = $yes | $no;
To understand how you might use tags, we need to examine the form of an engine decode response.
Example
#ABNF 1.0; language en-US; mode voice; root $navigate;
$direction = forward | back | backward | left | right; $number = one | two | three | four | five;
$navigate = ( go | move | walk | step) $direction $number (steps | paces | units);
With the above grammar, if the engine recognizes "walk forward three paces", it will return a parse tree, or sentence diagram, that looks like this:
$navigate: "walk" $direction: "forward"
Printed Documentation
54
$number: "three" "paces"
You can read more about the parse tree return type here.
In order to convert the parse tree return type into data useful to your application, You need to walk the tree and convert it into a result your application expects. For instance, your application might expect a result that looks like this:
instruction:[ direction: 1, units: 3, ]
While it is certainly possible to make the conversion, there are disadvantages to interpreting the parse tree directly to do so. One disadvantage is that your application becomes directly dependant on knowing the structure of your grammar. If the form of your grammar changes, your application code will have to change as well. Another disadvantage is that if your application uses multiple grammars (as most do), then you will most likely have to have a different set of parse tree processing code for each of your
Instead of manipulating the parse tree directly, you can put the conversion process in your grammar using tags. To do so, you adopt a consistent format for your tags, and a uniform way of processing your tags + parse tree. Then the shape of your grammar does not matter, as long as you process your tags and parse tree in the same way each time.
For this example we will adopt a very simple method for post-processing: we will walk the tree, ignoring anything that is not a tag. We will treat the tags as string data, and concatenate the strings as we see them in the parse tree.
Example
#ABNF 1.0; language en-US; mode voice; root $navigate; tag-format <my_simple_tag_format>;
$direction = { direction: }( forward { 1, } | back { 2, } | backward { 2, } | left { 3, } | right { 4, } );
Programmers Guide
55
$number = { units: } ( one { 1, } | two { 2, } | three { 3, } | four { 4, } | five { 5, } );
$navigate = { instruction:[ } ( go | move | walk | step) $direction $number (steps | paces | units) { ] };
Now, with the above grammar, when the engine recognizes "walk forward three paces", the parse tree returned will look like:
$navigate: {!{ instruction:[ }!} "walk" $direction: {!{ direction: }!} "forward" {!{ 1, }!} $number: {!{ units: }!} "three" {!{ 3, }!} "paces" {!{ ] }!}
And when we concatenate the tags we get the result type our application expects.
Admittedly, this is a very naive tag processing scheme, and as a result it requires a hefty number of tags to accomplish a simple task, but it does achieve the goal we want of processing our tree in a way that is independent of the form of the grammar. As a result, if ever the form of the grammar needs to change, the tags in the grammar can change, too, and the application code can stay the same.
The LumenVox API provides a much more powerful post-processing scheme based on the Semantic Interpretation for Speech Recognition working draft . It is described in detail here.
Next Applying grammar weights.
Printed Documentation
56
Applying Grammar Weights
Ultimately, the engine is just a large probability machine. Inside the engine there are huge tables that store probability scores for phonemes and the sounds the sounds those phonemes are likely to generate when a person speaks. When the engine decodes audio input, it searches through these tables to find the most likely path through a sequence of phonemes given the audio input. Your SRGS grammars have the ability to modify these scores by providing grammar weights.
As an example, suppose we have a grammar that recognizes a person speaking a number that is four digits long.
#ABNF 1.0; language en-US; mode voice; root $number; $one_digit = zero | one | two | three | four | five | six | seven | eight | nine; $teens = ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen; $above_twenty = (twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety)[$one_digit]; $double_digit = $teens | $above_twenty;
$single_digits = $one_digit<4>; //one two three four $double_digits = $double_digit<2>; //twelve thirty four $single_double = $one_digit<2> $double_digit; //one two thirty four $double_single = $double_digit $single_digit<2>; //twelve three four
$number = $single_digits | $double_digits | $single_double | $double_single;
This is a flexible grammar, but if you used it in practice you might be disappointed. You might notice that too often words like "four three" are being misrecognized as "forty". In general, your callers may be speaking a sentence that matches $single_digits 95% of the time, but the engine too frequently returns a result that matches one of the other three rules.
You can help the engine get the right answer more frequently by predisposing it to choose the $single_digits rule. Here is the same grammar with grammar weights applied.
Programmers Guide
57
#ABNF 1.0; language en-US; mode voice; root $number; $one_digit = zero | one | two | three | four | five | six | seven | eight | nine; $teens = ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen; $above_twenty = (twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety)[$one_digit]; $double_digit = $teens | $above_twenty;
$single_digits = $one_digit<4>; //one two three four $double_digits = $double_digit<2>; //twelve thirty four $single_double = $one_digit<2> $double_digit; //one two thirty four $double_single = $double_digit $single_digit<2>; //twelve three four
// $single_digits has a 95% chance of being the right rule to match. // The other rules combine to take up the remaining 5%. $number = /0.95/ $single_digits | /0.05/ ($double_digits | $single_double | $double_single); /********************************************************** * you could also write the weights as * /95/ $single_digits | /5/($double_digits | $single_double | $double_single); * or * /19/ $single_digits | $double_digits | $single_double | $double_single; **********************************************************/
Now, in cases where the engine has a borderline decision to make between matching $single_digits or one of the others, it will more frequently choose $single_digits. We weighted the rules 95% to 5% only because we had records of our callers to back up the decision.
Do Not Apply Weights Without Data
Applying grammar weights should never be the first thing you do to your grammar. Initially, you don't really know how often each rule will be matched, so you are better off letting all rules be treated equally. Only after you have a compelling amount of data to suggest that applying grammar weights will help, as we did above, should you apply them. And after you do apply them, you must test their effects on real call data. Badly applied weights are worse than no weights at all.
Printed Documentation
58
SRGS Definitions
Interaction Mode
An interaction mode specifies the type of interaction the speech port is to having with a user. An interaction mode can be voice or DTMF.
In a grammar, you specify whether the grammar will be used in a DTMF interaction, or a voice interaction. When grammars are activated in a speech port, only the voice grammars get used to decode speech, and only the DTMF grammars get used to process a DTMF string.
To specify the interaction mode in a grammar, use the following syntax:
ABNF
mode voice; or mode dtmf;
XML
<grammar mode="voice" ...> or <grammar mode="dtmf" ...>
Programmers Guide
59
Tag Format
In an SRGS grammar, you may place pieces of data called tags anywhere in a grammar rule. When a rule is matched, the tag is returned to the user in a parse tree, along with the words spoken that caused the rule to match.
A common use for tags is to transform a speakers sentence into data that your application can understand. The LumenVox speech port is capable of manipulating the tags in your parse tree, if they are in a form known as the Semantic Interpretation for Speech Recognition (SISR) tag format. Examples of this tag format can be found in this help file here.
To do any kind of interpretation, you must specify the format your tags are in.
Within the speech port, the following tag format specifiers are acceptable. Currently, both formats tell the engine to perform the same interpretation process, but as other interpretation schemes are adopted, or interpretation schemes are modified, the tag format specifier you decide on will become more important.
semantics/1.0 Use the latest working draft of the SISR, as of this help file's publication.
lumenvox/1.0 Use the working draft of the SISR published on April 1 2003.
lumenvox/1.1 Use the next working draft of the SISR (since this next draft does not exist, this tag format does nothing -- its for example only).
If the tag format of your grammar does not match one of these specifiers, the speech port will not attempt to interpret your tags. You can still use the tag data in the Parse Tree to perform your own interpretation.
To specify the format of the tags in a grammar, use the following syntax:
ABNF
tag-format <lumenvox/1.0>;
XML
Printed Documentation
60
<grammar tag-format="semantics/1.0" ...>
Programmers Guide
61
Language Identifier
A language identifier specifies the language being spoken to the speech port.
The format of the language identifier follows the convention set out by RFC 3066. In a nutshell, the identifier is either a language and country pair -- like "en-US" for United States English, or its just a language descriptor -- like "fr" for generic French.
Within the speech port, the following language identifiers are acceptable:
"en-US" or "en"
Use the LumenVox AmericanEnglish acoustic models and dictionary
"fr-CA" or "fr" Use the LumenVox French acoustic models and dictionary
"es-MX" or "es"
Use the LumenVox Spanish acoustic models and dictionary
To specify the interaction mode in a grammar, use the following syntax in your grammar:
ABNF
language en-US;
XML
<grammar language="en-US" ... >
Printed Documentation
62
Tags
Tags are special tokens in a grammar that are automatically recognized whenever they are seen by the Speech Engine. They are usually filled with information useful to the author of the grammar, or to an application using a grammar. Tags may appear in the header or the body of a grammar. When the engine recognizes a rule containing a tag, it returns the tag information along with the rule.
Filling tags with snippets of JavaScript is the basis of the semantic information process.
ABNF
{!{ tag information }!}; //this is a header tag. //Its contents will be returned if the grammar is matched. $rule = some text {!{ tag information }!} more text; //this is a tag declared in a rule.
XML
<!-- header tag. Its contents will be returned if the grammar is matched. --> <tag> tag information </tag> <rule id="rule"> some text <!-- a tag declared in a rule --> <tag> tag information </tag> more text </rule>
Programmers Guide
63
Base URI
Declaring a base URI in a grammar tells the grammar how to resolve relative path names in the grammar. If no base URI is present, they will be resolved from the location of the grammar file. Grammars loaded by buffer should have a base URI if they contain relative path names. Grammars may have multiple base paths, and they are searched in the order provided.
ABNF
base <http://www.mycompany.com/grammars>; base <http://www.mycompany.com/more_grammars>;
XML
<grammar xml:base="http://www.mycompany.com/grammars" xml:base="http://www.mycompany.com/more_grammars" ... >
Printed Documentation
64
Built-in Grammars
LumenVox provides the built-in grammars expected by VoiceXML users. All of them provide the required output format
URI Sample Input Output
builtin:grammar/boolean "yes", "no thank you", etc.
"true" or "false"
builtin:grammar/date "january thirteenth" or "december first two thousand"
"????0113" or 20001201"
builtin:grammar/digits "one two three four" "1234"
builtin:grammar/currency "eighteen dollars and four cents"
"USD18.04"
builtin:grammar/number "four hundred point five"
"400.5"
builtin:grammar/phone "area code eight five eight seven oh seven oh seven oh seven"
"8587070707"
builtin:grammar/time "six o clock" or "five thirty p m"
"0600?" or "0530p"
Programmers Guide
65
Example Grammars
phone_number.gram
#ABNF 1.0; mode voice; language en-US; tag-format <lumenvox/1.0>; // The lumenvox tag format tracks the current working draft of // the W3Cs semantic interpretation proposal. // 1.0 corresponds to the working draft released on 01 April 2003
root $PhoneNumber;
/* ONE:"1" is shorthand for * ONE {!{ $="1" }!} * "$" refers to the current rule being matched ($Digit) * So the net effect is that $Digit resolves to a one digit string * after semantic interpretation. */ $Digit = (ONE:"1" | TWO:"2" | THREE:"3" | FOUR:"4" | FIVE:"5" | SIX:"6" | SEVEN:"7" | EIGHT:"8" | NINE:"9" | (ZERO | O):"0" ); /* $AreaCode resolves to a three digit string * after semantic interpretation. */ $AreaCode = { $ = "" } ( $Digit { $ += $Digit } ) <3>; /* $Number resolves to a seven digit string * after semantic interpretation. */ $Number = { $ = "" } ( $Digit { $ += $$ } ) <7>; //$$ is shorthand for the last rule detected //i.e. $Digit /* After semantic interpretation, * $PhoneNumber resolves to a structure with two member variable strings, * areacode (which defaults to "858"), and number. */ $PhoneNumber = ([AREA CODE | ONE] $AreaCode { $.areacode = $$ } $Number { $.number = $$ } ) | ( $Number ) { $.areacode = "858"; $.number = $$ };
Printed Documentation
66
Programmers Guide
67
top_level_navigation.gram
#ABNF 1.0; mode voice; language en-US; tag-format <lumenvox/1.0>; // The lumenvox tag format tracks the current working draft of // the W3Cs semantic interpretation proposal. // 1.0 corresponds to the working draft released on 01 April 2003 root $directive; $directive = (go back) {$ = "APPLICATION_BACK"} | (main menu) {$ = "APPLICATION_TOP"} | (goodbye | quit | exit) {$ = "APPLICATION_EXIT"};
Printed Documentation
68
Semantic Interpretation
Intro to Semantic Interpretation
When constructing an application using speech recognition, it is often not enough to know what the user said. You have to know what the user meant. In fact, often you don't care whether you heard the user correctly, as long as you got the meaning right. In the speech recognition world, semantic interpretation refers to the process of extracting meaning from what was spoken.
Creating a grammar and examining the parse tree that was generated by a user's speech input is the first step toward semantic interpretation. But sometimes, it is not enough to just read off the values of the tree; significant post processing of the tree is necessary to extract meaning.
As an example, here is an SRGS/ABNF grammar that matches speaking numbers from zero to nine hundred and ninety nine (it is by no means complete; for instance, it cannot recognize "two forty six" for 246):
#ABNF 1.0; language en-US; mode voice; root $small_number;
$base = one|two|three|four|five|six|seven|eight|nine; $teen = ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen; $twenty_to_ninetynine = (twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety)[$base];
$tens = $base|$teen|$twenty_to_ninetynine;
$hundred = ([a] hundred | $base hundred);
$small_number = $hundred [[and] $tens] | $tens;
If the engine recognizes "two hundred twelve", Then the parse tree looks like this:
$small_number: $hundred: $base: "TWO" "HUNDRED" $tens:
Programmers Guide
69
$teen: "TWELVE"
But if your application needs to find out if the speaker spoke a number larger than 500, then it's not enough to know the parse tree; all you have is a structure of words. You need to write code to transform the tree into the number 212, which is meaningful to your application. The logic to do this transformation is going to be tied closely to the grammar's rules. For instance, within the $hundred rule, you have to know that there is an optional $base rule that has to be multiplied by 100. But in the $twenty_to_ninetynine rule, the optional $base has to be added to the total of the number you are building.
Because of the close relationship between a grammar's rules, and the semantic interpretation process, it can be convenient if you can put the semantic interpretation directly into the grammar. This is where grammar tags come into play.
The LumenVox semantic interpretation scheme is an implementation of the W3C's Semantic Interpretation working draft . The W3C will likely make changes to the draft before approving it, and LumenVox will track those changes, while maintaining backward compatibility.
The basic idea behind the LumenVox semantic interpretation scheme is this:
1. Each tag contains snippets of ECMAScript code (still popularly known as JavaScript).
2. Each grammar rule can be thought of as a function that executes the ECMAScript code in its tags from left to right, and returns a value based on that executed code.
3. Any other rules that are referenced in a grammar rule are also executed left to right, and any tag that appears after a rule reference may use that rules return value.
4. Grammar rules are only executed if the recognizer detects something to match the rule.
There are other facets to master, but understanding these four concepts will help you with everything else.
Next Semantic Interpretation by Example
Printed Documentation
70
Semantic Interpretation by Example
The details of semantic interpretation will be discussed through example, by editing the numbers grammar from the introduction.
Literals
If you do not need to process any code to provide a return value for a rule, you can just attach a literal to the rule, as follows:
$foo = ($reference1 $reference2 some text):"bar";
Now, when the rule $foo is referenced, it will return the value "bar". Note: If no tags or literals exist in a grammar rule, the rule will just return text corresponding to the spoken words that matched the rule.
Literals can also be attached to individual words or phrases, as in this example:
$base = one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9";
$teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19";
Now $base and $teen return a numeric representation of the word that matched them. Note: Since a literal is the return value of a grammar rule, only one can be returned per rule. Since we have only one literal per rule alternative, this is no problem.
The Return Value
The return value of a grammar rule is an ECMAScript object named "$". You can build the return value up by writing code in tags that manipulates this symbol. For instance, our $foo rule above is equivalent to writing
$foo = ($reference1 $reference2 some text) { $ = "bar" };
This more meaningful example allows the $twenty_to_ninetynine rule to return a numeric representation of the words it matches.
Programmers Guide
71
$twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"|seventy:"70"| eighty:"80"|ninety:"90")[$base {$ = parseInt($) + parseInt($base)];
In this example, first the return value $ is set to "20" or "30" or "40", etc. Then, if the optional $base rule is matched, its value is added to $. Notice the use of the JavaScript operator parseInt. This is used because literals are always strings, so without parseInt, the addition above would resolve to string concatenation. Since it can be confusing to have a rule that sometimes returns a number, and other times returns a string, we will use parseInt in all of our rules:
$base = (one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9") { $ = parseInt($) };
$teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19" { $ = parseInt($) };
$twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"|seventy:"70"| eighty:"80"|ninety:"90"){ $ = parseInt($) } [$base { $ += $base }];
The "$$" object
So far we have seen that a rule's return can be referenced by its name after that rule has been matched. Sometimes, when there are lots of rule alternatives in a rule, it can be cumbersome to reference rules by name. Other times, a matched rule can't be referenced at all. For instance, you can never access an external rule reference by name in a tag, because its name is not a valid ECMAScript identifier. For these reasons, the "$$" object exists. The "$$" object is always equal to the last rule matched. Using the "$$" object, we can write the $tens, $hundred and $small_number rules like this:
$tens = ( $base | $teen | $twenty_to_ninetynine ) { $ = $$ };
$hundred = [a] hundred {$ = 100} | $base hundred {$ = 100 * $$} ;
$small_number = $hundred {$ = $$} [[and] $tens {$ += $$}] | $tens {$ = $$};
Printed Documentation
72
Composite return types
Our small numbers grammar now returns an integer named small_number. If that is all we want out of this grammar, then great. Sometimes, however, we want more than one piece of information for a return type. A grammar rule always returns an object type, and object types can have additional properties. Lets say in our grammar we also want to know the text that was spoken, possibly for transcription or reading the text back to the speaker. Each rule reference $foo also has a corresponding data structure called $foo$ (yes, the W3C working group is aware that they are seriously overworking the dollar symbol), with a property called "text". Also, the text of $$ can be referenced using $$$.text.
The following change to our grammar creates a composite return type containing the text that was spoken, and the numeric representation of that text.
root $small_number_and_text;
$small_number_and_text = $small_number { $.number = $$; $.text = $$$.text }; //Note: use semi-colons to separate ECMAScript commands within tags.
Now a successful grammar match returns an object with two member properties, number and text. Here is the grammar in one place:
#ABNF 1.0; language en-US; mode voice; tag-format <lumenvox/1.0>; //This line tells the engine how to interpret the grammar's tags. //currently, only "lumenvox/1.0" or "semantics/1.0" is supported. root $small_number_and_text;
$base = (one:"1"|two:"2"|three:"3"|four:"4"|five:"5"|six:"6"|seven:"7"|eight:"8"|nine:"9") { $ = parseInt($) };
$teen = ten:"10"|eleven:"11"|twelve:"12"|thirteen:"13"|fourteen:"14"|fifteen:"15" | sixteen:"16"|seventeen:"17"|eighteen:"18"|nineteen:"19" { $ = parseInt($) };
$twenty_to_ninetynine = (twenty:"20"|thirty:"30"|forty:"40"|fifty:"50"|sixty:"60"|seventy:"70"|
Programmers Guide
73
eighty:"80"|ninety:"90"){ $ = parseInt($) } [$base { $ += $base }];
$tens = ($base|$teen|$twenty_to_ninetynine) { $ = $$ };
$hundred = ([a] hundred {$ = 100} | $base hundred {$ = 100 * $base});
$small_number = $hundred {$ = $$} [[and] $tens {$ += $$}] | $tens { $ = $$ };
$small_number_and_text = $small_number { $.number = $$; $.text = $$$.text };
Next Getting the Return Value
Printed Documentation
74
Getting The Return Value
So far we have described how to use grammar tags to create a semantic interpretation result. So how do you access that result to use in your application?
LumenVox provides an XML fragment representation of the return type. This conforms to the W3C's proposal for generating XML from semantic interpretation results (except that do not enclose the XML in a top-level tag). LumenVox also provides an API for accessing the return value as a data structure.
Under the XML scheme, if the engine recognized "four hundred and six" using our example grammar, then the result would look like:
<number> 406 </number> <text> FOUR HUNDRED AND SIX </text>
To access the return value of semantic interpretation scheme you must do the following:
1. Set the LV_DECODE_SEMANTIC_INTERPRETATION flag in your decode function call.
2. After decode, get the number of different interpretations that exist using GetNumberOfInterpretations (usually there will only be one, but an ambiguous grammar might return more than one).
3. For each result, get the interpretation result by calling GetInterpretation.
Programmers Guide
75
Phonemes
The unit of sound the recognition engine actually recognizes are phonemes. All phrase formats are ultimately translated into phonetic spelling for decoding. These phonetic spellings can be directly entered if surrounded by curly braces.
The phonetic alphabet used by the decoder:
Phoneme Example #1
Phonetic Spelling #1
Example #2
Phonetic Spelling #2
Vowels
AA barn B AA R N top T AA P
AE bat B AE T crab K R AE B
AH what W AH T cut K AH T
AO more M AO R auto AO T OW
AW cow C AW house HH AW S
AX about AX B AW T dial D AY AX L
AXR butter B AH DX AXR
career K AXR IH R
AY type T AY P life L AY F
EH check CH EH K mess M EH S
ER church CH ER CH bird B ER D
EY take T EY K hail HH EY L
IH little L IH DX AX L rib R IH B
Printed Documentation
76
IX action AE K SH IX N
women W IH M IX N
IY team T IY M keep K IY P
OW loan L OW N robe R OW B
OY hoist H OY S T joy JH OY
UH book B UH K look L UH K
UW flew F L UW who HH UW
Consonants
B web W EH B bear B EH R
CH chair CH EY R statue S T AE CH UW
D reed R IY D dark D AA R K
DH with W IH DH other AH DH ER
DX forty F AO R DX IY
butter B AH DX AXR
F four F AO R graph G R AE F
G peg P EH G exam IH G Z AE M
HH halt HH AO L T Jose HH OW Z EY
JH cage K EY JH Jack JH AE K
K coin K OY N back B AE K
Programmers Guide
77
L late L EY T really R IH L IY
M lemon L EH M AH N mail M EY L
N night N AY T any EH N IY
NG ring R IH NG ankle AE NG K AH L
P pay P EY beep B IY P
R rest R EH S T prior P R AY ER
S sit S IH T bass B AE S
SH blush B L AH SH sure SH UH R
T raft R AE F T taped T EY P T
TH three TH R IY youth Y UW TH
V van V AE N river R IH V AXR
W swap S W AA P wing W IH NG
Y yes Y EH S year Y IY R
Z arms AA R M Z blaze B L EY Z
ZH Asian EY ZH AH N genre ZH AA N R AH
Printed Documentation
78
Phrases
The phrase is what the decoder attempts to match to speech.
A phrase can be in one or more of the following formats.
One of more words. Examples: "California" "how do I"
BNF format. Example: "[that's] (right | correct)" - that's right, that's correct , right or correct
Raw phonemes (inclosed in curly braces {} ) Example: "{Y EH S P L IY Z}" - yes please
Combination of above formats Example: "is that ( correct | {R AY T} )" - is that correct or is that right
The engine has an internal dictionary of approxiamately 120,000 words. There is also a robust phonetic speller for words not found in the dictionary. The only valid punctuation marks are the apostrophe (') and the dash. Dashes should be used for multiple words that should be looked up in the internal dictionary as a single word, an example being new-orleans. If the multiple words do not exist in the dictionary the dashes will be replaced by spaces words will be looked up in the dictionary separately.
BNF Refresher
BNF is an acronym for "Backus Naur Form". We use only terminal symbols. The pipe "|" is an OR operator and the square brackets "[ ]" surround optional words. The parenthesis clarify order of operation and nesting. Here are some examples.
( (I would like to speak | Please connect me ) with ) John Doe [please] translates to these variations:
1. I WOULD LIKE TO SPEAK WITH JOHN DOE PLEASE
Programmers Guide
79
2. PLEASE CONNECT ME WITH JOHN DOE PLEASE 3. I WOULD LIKE TO SPEAK WITH JOHN DOE 4. PLEASE CONNECT ME WITH JOHN DOE
I ( want | need ) [ to ( know | hear ) ] [ the ] directions [ to ] 1. I WANT TO KNOW THE DIRECTIONS TO 2. I NEED TO KNOW THE DIRECTIONS TO 3. I WANT TO HEAR THE DIRECTIONS TO 4. I NEED TO HEAR THE DIRECTIONS TO 5. I WANT THE DIRECTIONS TO 6. I NEED THE DIRECTIONS TO 7. I WANT TO KNOW DIRECTIONS TO 8. I NEED TO KNOW DIRECTIONS TO 9. I WANT TO HEAR DIRECTIONS TO 10. I NEED TO HEAR DIRECTIONS TO 11. I WANT DIRECTIONS TO 12. I NEED DIRECTIONS TO 13. I WANT TO KNOW THE DIRECTIONS 14. I NEED TO KNOW THE DIRECTIONS 15. I WANT TO HEAR THE DIRECTIONS 16. I NEED TO HEAR THE DIRECTIONS 17. I WANT THE DIRECTIONS 18. I NEED THE DIRECTIONS 19. I WANT TO KNOW DIRECTIONS 20. I NEED TO KNOW DIRECTIONS 21. I WANT TO HEAR DIRECTIONS 22. I NEED TO HEAR DIRECTIONS 23. I WANT DIRECTIONS 24. I NEED DIRECTIONS
80
LumenVox SpeechRec API Cautions
Calling LV_SRE functions using the same HPORT in different threads at the same time can have unexpected results.
Calling LVSpeechPort methods using the same LVSpeechPort object in different threads at the same time can have unexpected results.
Win32
The environment variable LVLANG specifies the location of the Lang subdirectory. The installation package will create this variable. If the client application needs to relocate the Lang subdirectory or the API was not installed using the installation package, the client application must make sure LVLANG has the correct location of the Lang subdirectory.
LVLANG\Dict is used to store static data files (primarily the language model files for the engine, which contain acoustic models and dictionaries).
LVLANG\Responses is used to store run-time created files (the Engine's call files which contain all the details of each recognition - audio data, grammar, recognized text, etc.). A sub-directory will be created for each day's data.
Linux
LVLANG is hard-coded to /usr/LumenVox/Dict by default and is used to store static data files (primarily the language model files for the Speech Engine, which contain acoustic models and dictionaries).
LVRESPONSE is hard-coded to /var/LumenVox/Responses by default and is used to store run-time created files (the Speech Engine call files which contain all the details of each recognition - audio data, grammar, recognized text, etc). A sub-directory will be created for each day's data.
The client application can create or modify either (or both) of these two environment variables to use custom locations if desired.
LumenVox SpeechRec API
81
LV_SRE C API Functions
LV_SRE
The following "C" API is exported from the LVSpeechPort dll. For C++ programmers, these functions are wrapped in class LVSpeechPort.
Port Management Functions
int LV_SRE_ClosePort(HPORT hport);
int LV_SRE_Decode(HPORT hport,int VoiceChannel,int grammarset,unsigned int flags);
int LV_SRE_GetVoiceChannelData(HPORT hport, int VoiceChannel, short** PCM, unsigned int Samples);
int LV_SRE_LoadVoiceChannel(HPORT hport,int VoiceChannel,void* M,int Length,SOUND_FORMAT Format,const char* SoundFileName);
HPORT LV_SRE_OpenPort(ExportLogMsg log,void *p,int verbosity);
void LV_SRE_RegisterAppLogMsg(ExportLogMsg Log,void* p,int NewMsgVerbosity);
const char* LV_SRE_ReturnErrorString(int ReturnCode);
int LV_SRE_SetProperty(HPORT hport, int property, int Value);
int LV_SRE_SetProperty(HPORT hport, int property, int valuetype, void *pvalue, int target, int ndx);
int LV_SRE_WaitForEngineToIdle(HPORT hport,int voicechannel,int ms);
int LV_SRE_WaitForDecode(HPORT hport, int voicechannel);
Streaming API Functions
int LV_SRE_StreamStart(HPORT hport);
Printed Documentation
82
int LV_SRE_StreamSendData(HPORT hport, void* SoundData, int SoundDataLength);
int LV_SRE_StreamGetStatus(HPORT hport);
int LV_SRE_StreamGetLength(HPORT hport);
int LV_SRE_StreamSetStateChangeCallBack(HPORT hport, LV_SRE_StreamStateChangeFn* fn, void* UserData);
void LV_SRE_StreamStateChangeFn(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData);
int LV_SRE_StreamStop(HPORT hport);
int LV_SRE_StreamCancel(HPORT hport);
int LV_SRE_StreamSetParameter(HPORT hport, int StreamParameter, unsigned long StreamParameterValue);
int LV_SRE_StreamGetParameter(HPORT hport, int StreamParameter, unsigned long* StreamParameterValue);
int LV_SRE_StreamSetParameterToDefault(HPORT hport, int StreamParameter);
SRGS Grammar Functions
int LV_SRE_LoadGrammar(HPORT hport, const char* GrammarLabel, const char* GrammarLocation);
int LV_SRE_LoadGrammarIdx(HPORT hport, int GrammarIndex, const char* GrammarLocation);
int LV_SRE_LoadGlobalGrammar(const char* GrammarLabel, const char* GrammarLocation);
int LV_SRE_LoadGrammarFromBuffer(HPORT hport, const char* GrammarLabel, const char* GrammarContents);
LumenVox SpeechRec API
83
int LV_SRE_LoadGrammarFromBufferIdx(HPORT hport, int GrammarIndex, const char* GrammarContents);
int LV_SRE_LoadGlobalGrammarFromBuffer(const char* GrammarLabel, const char* GrammarContents);
int LV_SRE_LoadGrammarFromObject(HPORT hport, const char* GrammarLabel, HGRAMMAR hgrammar);
int LV_SRE_LoadGrammarFromObjectIdx(HPORT hport, int GrammarIdx, HGRAMMAR hgrammar);
int LV_SRE_LoadGlobalGrammarFromObject(const char* GrammarLabel, HGRAMMAR hgrammar);
int LV_SRE_UnloadGrammar(HPORT hport, const char* GrammarLabel);
int LV_SRE_UnloadGrammarIdx(HPORT hport, int GrammarIndex);
int LV_SRE_UnloadGlobalGrammar(const char* GrammarLabel);
int LV_SRE_UnloadGrammars(HPORT hport);
int LV_SRE_UnloadGlobalGrammars(void);
int LV_SRE_IsGrammarLoaded(HPORT hport,const char* GrammarLabel);
int LV_SRE_IsGrammarLoadedIdx(HPORT hport, int GrammarIndex);
int LV_SRE_IsGlobalGrammarLoaded(const char* GrammarLabel);
int LV_SRE_ActivateGrammar(HPORT hport, const char* GrammarLabel);
int LV_SRE_ActivateGrammarIdx(HPORT hport, int GrammarIndex);
int LV_SRE_ActivateGlobalGrammar(HPORT hport, const char* GrammarLabel);
int LV_SRE_DeactivateGrammar(HPORT hport, const char* GrammarLabel);
int LV_SRE_DeactivateGrammarIdx(HPORT hport, int GrammarIndex);
int LV_SRE_DeactivateGrammars(HPORT hport);
Printed Documentation
84
SRGS Result Functions
int LV_SRE_GetNumberOfParses(HPORT hport, int VoiceChannel);
const char* LV_SRE_GetParseTreeString(HPORT hport, int VoiceChannel, int index);
H_PARSE_TREE LV_SRE_CreateParseTree(HPORT hport, int VoiceChannel, int Index);
int LV_SRE_GetNumberOfInterpretations(HPORT hport, int VoiceChannel);
const char* LV_SRE_GetInterpretationString(HPORT hport, int VoiceChannel, int index);
H_SI LV_SRE_CreateInterpretation(HPORT hport, int VoiceChannel, int index);
N-Best Result Functions
int LV_SRE_GetNumberOfNBestAlternatives(HPORT hport, int VoiceChannel);
int LV_SRE_SwitchToNBestAlternative(HPORT hport, int VoiceChannel, int index);
Concept-Phrase Grammar Functions (for backward compatibility)
int LV_SRE_AddPhrase(HPORT hport,int GrammarSet, const char* Concept, const char* Phrase);
int LV_SRE_LoadStandardGrammar(HPORT hport,int grammarset,int defaultgrammar);
int LV_SRE_ResetGrammar(HPORT hport,int GrammarSet);
const char* LV_SRE_GetConcept(HPORT hport,int VoiceChannel, int Index);
int LV_SRE_GetConceptScore(HPORT hport,int VoiceChannel, int Index);
int LV_SRE_GetNumberOfConceptsReturned(HPORT hport,int VoiceChannel);
LumenVox SpeechRec API
85
int LV_SRE_GetPhonemesDecoded(HPORT hport, int VoiceChannel, int Index);
int LV_SRE_GetPhraseDecoded(HPORT hport, int VoiceChannel, int Index);
int LV_SRE_GetRawTextDecoded(HPORT hport, int VoiceChannel, int Index);
int LV_SRE_RemoveConcept(HPORT hport,int GrammarSet, const char* Concept);
Printed Documentation
86
API Functions
LV_SRE_OpenPort
Opens the speech port and initializes a connection to the Speech Engine.
Functions
HPORT LV_SRE_OpenPort(ExportLogMsg Log, void* p, int verbosity);
HPORT LV_SRE_OpenPort2(unsigned long* error_code, ExportLogMsg Log, void* p, int verbosity);
Return Values
Note: the returned handle is used by most other API functions, and must be closed by calling LV_SRE_ClosePort.
Non-NULL
Port initialized successfully.
NULL
Licensing has been exceeded. There are too many ports active.
Parameters
Log
Pointer to a function which will receive logging information from the object.
p
A void pointer to client application-defined data. This data will be passed into the ExportLogMsg function to identify the calling port.
verbosity
range: 0 - 6
0 - minimal logging info
LumenVox SpeechRec API
87
6 - maximum logging info
error_code
An error message indicating why the port failed to open
Error Code Return Values for OpenPort2
LV_SUCCESS
The port opened successfully
LV_NO_SERVER_RESPONDING or LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING
The client could not find a server to request a licensed port from.
LV_OPEN_PORT_FAILED__LICENSES_SUCCEEDED
The primary server has too many ports connected for the number of licenses it has to give out.
This function activates the speech port object. The recognition engine will begin initializing when this function is called. Control will return to the application immediately.
p is passed into the ExportLogMsg function to enable client-application-defined behavior.
Remarks
This method activates the speech port object. The recognition engine will begin initializing when this function is called. Control will return to the application immediately.
p is passed into the ExportLogMsg function to enable client-application-defined behavior.
See Also
Logging Callback Function
LV_SRE_ClosePort
Printed Documentation
88
LVSpeechPort::OpenPort
LumenVox SpeechRec API
89
LV_SRE_ClosePort
Closes the port, and releases its resources.
int LV_SRE_ClosePort(HPORT hport);
Return Values
LV_SUCCESS
No errors; the port has successfully shutdown.
LV_FAILURE
The Port was unable to shutdown.
LV_INVALID_HPORT
The port was never successfully opened, or was already closed.
Note:
Frees this port from counting against the number of ports allowed by your license. Close every port not needed anymore.
See Also
LV_SRE_OpenPort
LVSpeechPort::ClosePort
Printed Documentation
90
LV_SRE_RegisterAppLogMsg
Registers an application level log msg callback..
void LV_SRE_RegisterAppLogMsg(ExportLogMsg log,void *p,int verbosity);
Return Values
none.
Parameters
Log
Pointer to a function which will receive logging information.
p
p is a void pointer to Application defined data. This data will be passed into the ExportLogMsg function to identify the application.
verbosity
range: 0 - 6
0 - minimal logging info
6 - maximum logging info
Remarks
This is in addition to the port log message callback, because some log messages are generated while not associated with any one port.
There currently is no equivalent in LVSpeechPort.
See Also
Logging Callback Function
LumenVox SpeechRec API
91
LV_SRE_ActivateGrammar functions
If you wish to use an SRGS grammar for decode, you need to activate it. Activating a grammar puts it in the multi-grammar grammarset called LV_ACTIVE_GRAMMAR_SET. The grammars that were activated can then be used for a decode by specifying LV_ACTIVE_GRAMMAR_SET as the grammarset parameter in a call to Decode, or by setting the STREAM_PARM_GRAMMAR_SET equal to the LV_ACTIVE_GRAMMAR_SET before calling StreamStart. The reason for this mechanism is to maintain backward compatibility with previous APIs.
When ActivateGrammar is called, first the grammar is searched for among the grammars in the speech port's loaded grammars. If it can not be found there, the collection of application level grammars is searched. If you wish to explicitly activate an application level grammar, use LV_SRE_ActivateGlobalGrammar.
Functions
LV_SRE_ActivateGrammar(HPORT hport, const char* gram_name);
LV_SRE_ActivateGrammarIdx(HPORT hport, int gram_name);
Parameters
hport
The handle of the speech port for which you are activating the grammar.
gram_name
The identifier for the grammar being activated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID if you use the *Idx version of the function call. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.
Return Values
LV_SUCCESS
No errors; this grammar is now active.
LV_GRAMMAR_LOADING_ERROR
Printed Documentation
92
This grammar could not be activated, because it was not found in the speech port's set of loaded grammars.
Remarks
Detailed error and warning messages are sent to the speech port's logging callback function at priorities 0 and 1, respectively.
See Also
LV_SRE_DeactivateGrammar functions
LV_SRE_ActivateGlobalGrammar
LVSpeechPort::ActivateGrammar functions (C++ API)
LumenVox SpeechRec API
93
LV_SRE_ActivateGlobalGrammar
You only need to use this function if you have a grammar in the speech port with same name as a grammar in the global space, and you wish to activate the global grammar.
Function
int LV_SRE_ActivateGlobalGrammar(HPORT hport,const char* gram_name);
Parameters
hport
The handle of the speech port for which you are activating the grammar.
gram_name
The identifier for the grammar being activated. This is the same identifier that was given to the grammar when it was loaded.
Return Values
LV_SUCCESS
No errors; this grammar is now active.
LV_FAILURE
This grammar could not be activated, because it was not found in the application-level set of grammars.
Remarks
Since LV_SRE_ActivateGrammar searches the speech port's loaded grammars, and then searches the application level grammars, you only need to use LV_SRE_ActivateGlobalGrammar if there is a name conflict between your local and app-level grammars, and you need to activate the app-level one.
Detailed error and warning messages are sent to the speech port's logging callback function at priorities 0 and 1, respectively.
Printed Documentation
94
See Also
LV_SRE_ActivateGrammar functions
LV_SRE_DeactivateGrammar functions
LVSpeechPort::ActivateGlobalGrammar (C++ API)
LumenVox SpeechRec API
95
LV_SRE_DeactivateGrammar functions
These functions remove a grammar from the set of active grammars. The last function clears the active grammar set
Functions
int LV_SRE_DeactivateGrammar(HPORT hport, const char* gram_name);
int LV_SRE_DeactivateGrammarIdx(HPORT hport, int gram_name);
int LV_SRE_DeactivateGrammars(HPORT hport);
Parameters
hport
The handle of the speech port for which you are activating the grammar.
gram_name
The identifier for the grammar being deactivated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID if you use the *Idx version of the function call. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.
Return Values
LV_SUCCESS
No errors; this grammar is no longer active.
LV_FAILURE
This grammar could not be deactivated, because it was never successfully activated.
See Also
LV_SRE_ActivateGrammar functions
Printed Documentation
96
LV_SRE_ActivateGlobalGrammar
LVSpeechPort::DeactivateGrammar (C++ API)
LumenVox SpeechRec API
97
LV_SRE_LoadGrammar functions
Before you can use a grammar, you must load it into the speech port's collection of grammars, or you must load it into the collection of application-level (global) grammars. When you load a grammar, it is compiled for use in the LumenVox Speech Engine.
These functions load an SRGS grammar that will be usable by a single speech port object.
Functions
LV_SRE_LoadGrammar(HPORT hport, const char* gram_name, const char* gram_location);
LV_SRE_LoadGrammarIdx(HPORT hport, int gram_name, const char* gram_location);
LV_SRE_LoadGrammarFromBuffer(HPORT hport, const char* gram_name, const char* gram_contents);
LV_SRE_LoadGrammarFromBufferIdx(HPORT hport, int gram_name, const char* gram_contents);
LV_SRE_LoadGrammarFromObject(HPORT hport, const char* gram_name, HGRAMMAR gram_handle);
LV_SRE_LoadGrammarFromObjectIdx(HPORT hport, int gram_name, HGRAMMAR gram_handle);
Parameters
hport
The handle for the speech port you are loading the grammar into.
gram_name
The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use. This can be a string, or an integer ID if you use the *Idx version of the function call. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.
Printed Documentation
98
gram_location
A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"
gram_contents
A null terminated string containing the contents of a valid SRGS grammar file.
gram_handle
A handle for an LVGrammar object, created by LVGrammar_Create
Return Values
LV_SUCCESS
No errors; this grammar is now ready for use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
Remarks
LumenVox SpeechRec API
99
Detailed error and warning messages are sent to the speech port's logging callback function at priorities 0 and 1, respectively.
See Also
LV_SRE_UnloadGrammar functions
LV_SRE_IsGrammarLoaded functions
LV_SRE_LoadGlobalGrammar functions
LVSpeechPort::LoadGrammar functions (C++ API)
Printed Documentation
100
LV_SRE_UnloadGrammar functions
These functions remove a loaded grammar from a speech port object. The last function removes all loaded grammars from the speech port.
Functions
int LV_SRE_UnloadGrammar(HPORT hport, const char* gram_name);
int LV_SRE_UnloadGrammarIdx(HPORT hport, int gram_name);
int LV_SRE_UnloadGrammars(HPORT hport);
Parameters
hport
The handle for the speech port you are unloading the grammar out of.
gram_name
The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it. It can be a null terminated string, or an integer if you use the *Idx version of the method.
Return Values
LV_SUCCESS
No errors; this grammar is removed.
LV_FAILURE
The grammar was not present. Nothing was removed.
Remarks
Grammars that were activated and then unloaded are still active; they must be explicitly deactivated.
See Also
LumenVox SpeechRec API
101
LV_SRE_IsGrammarLoaded functions
LV_SRE_UnloadGlobalGrammar functions
LV_SRE_LoadGrammar functions
LVSpeechPort::UnloadLoadGrammar functions (C++ API)
Printed Documentation
102
LV_SRE_UnloadGlobalGrammar
These functions removes a loaded grammar from the application level space of grammars. The second function removes all application-level grammars.
Functions
int LV_SRE_UnloadGlobalGrammar(const char* gram_name);
void LV_SRE_UnloadGlobalGrammars(void);
Parameters
gram_name
The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it.
Return Values
LV_SUCCESS
No errors; this grammar is removed.
LV_GLOBAL_GRAMMAR_TRANSACTION_ERROR
Fail to unload the grammar on all servers.
LV_GLOBAL_GRAMMAR_TRANSACTION_PARTIAL_ERROR
Fail to unload the grammar on some of the servers.
Remarks
A global grammar is unloaded on the server only when users have called unload functions on all labels that are associated with the grammar.
See Also
LV_SRE_UnloadGrammar functions
LV_SRE_IsGlobalGrammarLoaded functions
LumenVox SpeechRec API
103
LV_SRE_LoadGlobalGrammar functions
LVSpeechPort::UnloadGlobalGrammar functions (C++ API)
Printed Documentation
104
LV_SRE_LoadGlobalGrammar functions
When loading a global grammar, the grammar will be sent to the server. And all following decode requests only contain global grammar ID's, instead of the actual grammars, to avoid network transportation overhead on large grammars.
A global grammar is associated with the client process that loads that grammar. All speech ports that are belong to that client have access to that global grammar. However, different client processes don't share global grammars with each other.
Generally, the lifetime of a global grammar is controlled by load and unload functions. However, in the case that users terminate client process without unloading global grammars, in order to release un-used global grammars, the server periodically checks if the client process is still alive. Once the server detected that a client process has been inactive for more than 10 minutes, it will remove all grammars associated with that client process.
In multi-threaded program, it is safe to access global grammars in read-only fashion on multiple threads simultaneously. For instance, querying whether a global grammar is loaded, or calling decode with global grammars. In the case that loading or unloading takes place, such as unloading a global grammar while decoding on another thread with that grammar, it is users' responsibility to prevent racing from happening.
Functions
LV_SRE_LoadGlobalGrammar (const char* gram_name, const char* gram_location);
LV_SRE_LoadGlobalGrammarFromBuffer (const char* gram_name, const char* gram_contents);
LV_SRE_LoadGlobalGrammarFromObject (const char* gram_name, HGRAMMAR gram_handle);
Parameters
gram_name
The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use.
gram_location
LumenVox SpeechRec API
105
A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"
gram_contents
A null terminated string containing the contents of a valid SRGS grammar file.
gram_handle
A handle for an LVGrammar object, created by LVGrammar_Create
Return Values
LV_SUCCESS
No errors; this grammar is now ready to use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready for use.
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
LV_GLOBAL_GRAMMAR_TRANSACTION_ERROR
Fail to send the grammar to all servers.
LV_GLOBAL_GRAMMAR_TRANSACTION_PARTIAL_ERROR
Printed Documentation
106
Fail to send the grammar to some of the servers.
Remarks
Detailed error and warning messages are sent to the LVSpeechPort application-level logging callback function at priorities 0 and 1, respectively.
Users can load the same grammar with different labels. That will only create one instance of that grammar on the server.
See Also
LV_SRE_LoadGrammar functions
LV_SRE_IsGlobalGrammarLoaded functions
LV_SRE_UnloadGlobalGrammar functions
LVSpeechPort::LoadGlobalGrammar functions (C++ API)
LumenVox SpeechRec API
107
LV_SRE_IsGrammarLoaded functions
Functions
int LV_SRE_IsGrammarLoaded(HPORT hport, const char* gram_name);
int LV_SRE_IsGrammarLoadedIdx(HPORT hport, int gram_name);
Parameters
hport
The port being queried for gram_name.
gram_name
The identifier for the grammar being queried. This is the same identifier you gave the grammar when you loaded it.
Return Values
1 if a grammar was found with the label gram_name in the space of application-level grammars; 0 otherwise.
Remarks
Note: This function only tells you if a grammar with the name gram_name is loaded. It does not tell you if there are two identical grammar bodies loaded.
See Also
LV_SRE_UnloadGrammar functions
LV_SRE_IsGlobalGrammarLoaded
LV_SRE_LoadGrammar functions
LVSpeechPort::IsGrammarLoaded functions (C++ API)
Printed Documentation
108
LV_SRE_IsGlobalGrammarLoaded
Function
int LV_SRE_IsGlobalGrammarLoaded(const char* gram_name);
Parameters
gram_name
The identifier for the grammar being queried. This is the same identifier you gave the grammar when you loaded it.
Return Values
1 if a grammar was found with the label gram_name in the space of application-level grammars; 0 otherwise.
Remarks
Note: This function only tells you if a grammar with the name gram_name is loaded. It does not tell you if there are two identical grammar bodies loaded.
See Also
LV_SRE_UnloadGlobalGrammar
LV_SRE_IsGrammarLoaded functions
LV_SRE_LoadGlobalGrammar functions
LVSpeechPort::IsGlobalGrammarLoaded functions (C++ API)
LumenVox SpeechRec API
109
LV_SRE_AddPhrase
Adds a phrase to a new or existing concept.
int LV_SRE_AddPhrase(HPORT hport, int GrammarSet, const char* Concept , const char* Phrase);
Return Values
LV_SUCCESS
No errors; the phrase was added to the concept.
LV_BAD_HPORT
The engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.
LV_GRAMMAR_SET_OUT_OF_RANGE
The grammar set is out of range.
LV_GRAMMAR_SYNTAX_ERROR or LV_GRAMMAR_SYNTAX_WARNING
The phrase entered has bad syntax, such as mismatched parenthesis.
Parameters
GrammarSet
Which grammar set to add the phrase. Integer value between 0 - 63, inclusive.
Concept
Which concept to add the phrase. Null-terminated string.
Phrase
The new phrase.
Printed Documentation
110
Remarks
The concept can be a new or existing concept; the call will automatically add the new concept with the single phrase.
See Also
Phrase Formats
Phonemes
LVSpeechPort::AddPhrase
LumenVox SpeechRec API
111
LV_SRE_RemoveConcept
Removes a concept and all of its phrases.
int LV_SRE_RemoveConcept(HPORT hport, int GrammarSet, const char* Concept);
Return Values
LV_SUCCESS
No errors; the concept and all phrases are removed form the grammar set.
LV_GRAMMAR_SET_OUT_OF_RANGE
The grammar set specified is outside the valid range.
LV_BAD_HPORT
The engine is no longer running. This is the result of a LV_SRE_ClosePort call or a unrecoverable engine error.
Parameters
GrammarSet
Which grammar set to remove concept from. Possible value range 0 - 63.
Concept
The Existing concept to remove. Null-terminated string.
See Also
LVSpeechPort::RemoveConcept
Printed Documentation
112
LV_SRE_ResetGrammar
Removes all concepts from a grammar.
int LV_SRE_ResetGrammar(HPORT hport, int GrammarSet);
Return Values
LV_SUCCESS
No errors; grammar reset.
LV_GRAMMAR_SET_OUT_OF_RANGE
The grammar set value is out of expected range (0-63).
See Also
LVSpeechPort::ResetGrammar
LumenVox SpeechRec API
113
LV_SRE_LoadStandardGrammar
Standard Grammars are deprecated in favor of SRGS built-in grammars
Loads a standard, pre-defined grammar to easily recognize and format numbers, monetary figures or digits.
int LV_SRE_LoadStandardGrammar(HPORT hport,int GrammarSet, int StdGrammar);
Return Values
LV_SUCCESS
No errors; the standard grammar is loaded.
LV_STANDARD_GRAMMAR_OUT_OF_RANGE
The standard grammar value is not a recognized grammar type.
LV_GRAMMAR_SET_OUT_OF_RANGE
The standard grammar was loaded into a set that is not in range.
Parameters
GrammarSet
Which grammar set this phrase is being added to. Possible value range 0 - 63.
StandardGrammar
The standard grammars are:
1. GRAMMAR_DIGITS String of single digits like a phone number or pin code.
2. GRAMMAR_MONEY Monetary value (only implemented for SRGS decodes).
Printed Documentation
114
3. GRAMMAR_NUMERIC Numeric value like 12,000, 24.45, or 35).
4. GRAMMAR_SPELLING Alphabet letters for spelling (not implemented).
5. GRAMMAR_ALPHA_NUMERIC (Not implemented).
6. GRAMMAR_DATE Date values (only implemented for SRGS decodes).
7. GRAMMAR_NONE Clears out the standard grammar, without clearing out any phrases that were added. ResetGrammar( ) will clear out the entire grammar.
Remarks
The client application can load only one standard grammar, but can add any number of concepts with AddPhrase. This is not true, however, if you use SRGS grammars. The correct way to augment as standard SRGS grammar is to load a grammar to a different location, and then activate both. When a standard grammar is loaded, the decoder will return the number, dollar amount, or digit string as either a single concept, or a single interpretation string, depending on whether SRGS is used or not .
As an example, the client application loads GRAMMAR_NUMBER and also adds the concept and phrase "Widgets". If the sound data contained the speech "twelve widgets". The decoder will return two concepts: the first is the string "12" and the second the string "Widgets". If the speech was "one thousand one hundred and twenty nine Widgets seven point two Widgets", the decoder would return four concepts: "1129" , "Widgets", "7.2" and "Widgets" .
However, If you use SRGS, this is not what happens. In order to get this sort of functionality in the SRGS setting, you would create a grammar that looks like the following:
#ABNF 1.0; language en-US; mode voice; tag-format <semantics/1.0>; root $how_many_widgets;
LumenVox SpeechRec API
115
$how_many_widgets = $<builtin:grammar/number> widgets {$=$$;}
In this case you wouldn't bother using LoadStandardGrammar() at all, since the standard number grammar will get loaded when you load this grammar. The return type would be an interpretation string representing the number that was recognized, like "1129" or "7.2". The word "widgets" would not be returned in this grammar.
See Also
Standard Grammars
LVSpeechPort::LoadStandardGrammar
Printed Documentation
116
LV_SRE_LoadVoiceChannel
Loads the audio data into the specified voice channel prior to a call to LV_SRE_Decode (which decodes the audio data).
int LV_SRE_LoadVoiceChannel(HPORT hport,int VoiceChannel, void* M, int Length,SOUND_FORMAT);
Return Values
LV_SUCCESS
No errors; the voice channel audio successfully loaded.
LV_BAD_HPORT
The engine is no longer running. This is the result of an LV_SRE_ClosePort call or a unrecoverable engine error.
LV_FAILURE
Sound format was incorrectly specified.
Parameters
VoiceChannel
Accepted values 0 through 63.
M
Pointer to audio data.
Length
Memory size in bytes of the audio data.
Format
The audio data sound format.
LumenVox SpeechRec API
117
Remarks
Each LV_SpeechPort supports 64 separate voice channels. Each channel has its own separate storage for decode data, so once the call is made, the client application can release its own copy. LV_SRE_LoadVoiceChannel will accept the audio data and prepare it for decoding.
See Also
LVSpeechPort::LoadVoiceChannel
Printed Documentation
118
LV_SRE_Decode
Processes the voice channel audio data against the active grammar.
int LV_SRE_Decode(HPORT hport,int VoiceChannel,int grammarset,unsigned int flags);
Return Values
Zero (0) or greater indicates success.
A negative result indicates a specific error.
Parameters
VoiceChannel
The voice channel to process.
GrammarSet
The grammar to use to process.
Flags (bitwise OR flags to set desired options)
LV_DECODE_BLOCK - Decode will not return until it has finished.
LV_DECODE_GENDER_MALE - Gender identifier.
LV_DECODE_GENDER_FEMALE – Gender identifier.
LV_DECODE_FIRST_TIME_USER – Reset caller weights in Recognition Engine (not implemented).
LV_DECODE_USE_OOV - Use the Out-Of-Vocabulary filter (OOV) during decode.
Remarks
LumenVox SpeechRec API
119
If LV_DECODE_BLOCK is set, LV_SRE_Decode will not return until it has finished processing the data.
If LV_DECODE_BLOCK is not set, LV_SRE_Decode returns immediately (but continues processing the data on a separate thread); the client application can continue its own work. Calling other LVSpeechPort methods may block until the Decode is finished. Once the client application is ready to check for results, call either 1) LV_SRE_GetNumberOfConceptsReturned, or 2) LV_SRE_WaitForEngineToIdle and then LV_SRE_GetNumberOfConceptsReturned. LV_SRE_WaitForEngineToIdle will only wait for a specified time, and returns regardless of whether LV_SRE_Decode is finished, where LV_SRE_GetNumberOfConceptsReturned will block until Decode is finished.
LV_DECODE_GENDER_FEMALE and LV_DECODE_GENDER_MALE identify which gender acoustic model to use. If these flags are not specified, the engine automatically decodes each audio file against both gender models. While this slows the engine by requiring two decodes, evaluating against both models has a very significant positive effect on recognition accuracy. Since the engine is multithreaded, unless CPU loads are a serious issue, do not use these flags.
On an error, call LV_SRE_ReturnErrorString with the negative result from LV_SRE_Decode to get a description of the error.
See Also
LV_SpeechPort::Decode
Printed Documentation
120
LV_SRE_WaitForEngineToIdle
(Deprecated in favor of LV_SRE_WaitForDecode)
Blocks the client application until the port is idle (not decoding).
int LV_SRE_WaitForEngineToIdle(HPORT hport, int MillisecondsToWait, int VoiceChannel);
Return Values
LV_SUCCESS
No errors or timeout; the engine is now idle.
LV_TIME_OUT
WaitForEngineToIdle's timeout was reached before the engine became idle.
Parameters
MillisecondsToWait
The number of milliseconds to wait before returning if the Speech Port does not become idle.
VoiceChannel
Which VoiceChannel to wait on, -1 waits on all the voice channels for the port.
Remarks
This function is deprecated in favor of LV_SRE_WaitForDecode. To achieve the same behavior as LV_SRE_WaitForDecode, use property PROP_EX_DECODE_TIMEOUT, and set MillisecondsToWait to TIMEOUT_INFINITE.
Some of the LV_SRE functions run asynchronously, in particular, LV_SRE_Decode. LV_SRE_WaitForEngineToIdle is primarily useful when LV_SRE_Decode is called without LV_DECODE_BLOCK. In this case, LV_SRE_Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will
LumenVox SpeechRec API
121
eventually need the results, the clients need a way to query the port to see if LV_SRE_Decode has finished. LV_SRE_WaitForEngineToIdle will wait the specified time for the engine to idle; check the return value to ensure the engine is idle, indicating that decode results are available.
LV_SRE_WaitForEngineToIdle is also useful to ensure the engine has finished initializing, prior to calls to LV_SRE_Decode.
See Also
LV_SRE_Decode
LVSpeechPort::WaitForEngineToIdle
LV_SRE_WaitForDecode
Printed Documentation
122
LV_SRE_GetNumberOfInterpretations
Returns the number of semantic interpretation results that were generated by the previous decode.
Function
int LV_SRE_GetNumberOfInterpretations(HPORT hport, int voicechannel)
Parameters
hport
A handle to the speech port.
voicechannel
The audio channel holding the decoded audio.
See Also
LV_SRE_CreateInterpretation
LV_SRE_GetInterpretationString
LVSpeechPort::GetNumberOfInterpretations (C++ API)
LumenVox SpeechRec API
123
LV_SRE_CreateInterpretation
Returns a handle to a data structure representing the results of the semantic interpretation process. The handle must be released with LVInterpretation_Release when you are finished with it.
Function
H_SI LV_SRE_CreateInterpretation (HPORT hport, int voicechannel, int index)
Parameters
hport
A handle to the speech port
voicechannel
The channel that the decode took place on.
index
An utterance could give rise to multiple interpretations, particularly if the grammars involved are ambiguous. index ranges from 0 to LV_SRE_GetNumberOfInterpretations - 1.
Return Value
The return type is a handle to an interpretation object. The object is a representation of the ECMAScript object made by the matching grammar, using the Semantic Interpretation for Speech Recognition process. It also contains additional information such as the confidence score, matching grammar label, and the input sentence.
Remarks
The H_SI handle can be manipulated using the functions prefixed by "LVInterpretation_"
See Also
LV_SRE_GetNumberOfInterpretations
Printed Documentation
124
LV_SRE_GetInterpretationString
LVInterpretation C API
LVParseTree::GetInterpretation (C++ API)
LumenVox SpeechRec API
125
LV_SRE_GetInterpretationString
Provides the user with a string representation of the semantic interpretation result data.
Function
const char* LV_SRE_GetInterpretationString(HPORT hport, int voicechannel, int index)
Parameters
hport
A handle to the speech port
voicechannel
The channel containing the decoded audio
index
A value between 0 and LV_SRE_GetNumberOfInterpretations -1
Remarks
Logically, the interpretation string is the same as the result data contained in a semantic interpretation object.
See Also
LV_SRE_GetNumberOfInterpretations
LV_SRE_CreateInterpretation
LVSpeechPort::GetInterpretationString (C++ API)
Printed Documentation
126
LV_SRE_GetNumberOfParses
Returns the number of parse trees that were generated by the previous decode.
Function
int LV_SRE_GetNumberOfParses(HPORT hport, int voicechannel)
Parameters
hport
A handle to the speech port.
voicechannel
The audio channel holding the decoded audio.
See Also
LV_SRE_CreateParseTree
LV_SRE_GetParseTreeString
Speech Parse Tree Introduction
LVSpeechPort::GetNumberOfParses (C++ API)
LumenVox SpeechRec API
127
LV_SRE_CreateParseTree
Provides the user with a handle to a speech parse tree, representing the sentence structure of what was decoded by the Speech Engine, according to the active grammars. You must release the handle with LVParseTree_Release when you are finished with it.
Function
H_PARSE_TREE LV_SRE_CreateParseTree(HPORT hport, int voicechannel, int index)
Parameters
hport
The handle to the speech port.
voicechannel
The audio channel containing the input audio
index
It is possible to have more than one parse tree for an utterance (for instance if the grammar is ambiguous); this is the index of the tree
Return Value
A handle to a parse tree. The parse tree handle is manipulated with functions having the prefix "LVParseTree_".
Remark
Logically, a parse tree and the parse string returned to the user are the same. However, a speech parse tree makes it easy to search the parse tree for useful information.
See Also
LV_SRE_GetNumberOfParses
LV_SRE_GetParseTreeString
Printed Documentation
128
Parse Tree Introduction
LVParseTree C API
LVSpeechPort::GetParseTree (C++ API)
LumenVox SpeechRec API
129
LV_SRE_GetParseTreeString
Provides the user with a string representation of a speech parse tree.
Function
const char* LV_SRE_GetParseTreeString(HPORT hport, int voicechannel, int index)
Parameters
hport
The handle to the speech port.
voicechannel
The audio channel containing the input audio
index
It is possible to have more than one parse tree possibility (for instance if the grammar is ambiguous); this is the index of the tree
Remark
Logically, a speech parse tree and the parse string returned to the user are the same. However, a speech parse tree makes it easy to search the parse tree for useful information. The parse tree string is based on the examples provided by the W3C SRGS specification
See Also
LV_SRE_GetNumberOfParses
LV_SRE_CreateParseTree
Parse Tree Introduction
LVSpeechPort::GetParseTreeString (C++ API)
Printed Documentation
130
LV_SRE_GetNumberOfConceptsReturned
Returns the number of concepts found in the last call to LV_SRE_Decode.
int LV_SRE_GetNumberOfConceptsReturned(HPORT hport,int VoiceChannel);
Return Values
The number of concepts found for this voice channel.
Parameters
VoiceChannel
The voice channel processed by LV_SRE_Decode.
See Also
LVSpeechPort::GetNumberOfConceptsReturned
LumenVox SpeechRec API
131
LV_SRE_GetConcept
Returns one concept found in the last call to LV_SRE_Decode.
const char* LV_SRE_GetConcept(HPORT hport,int VoiceChannel, int Index);
Return Values
A null-terminated string representing the matched concept .
NULL indicates that Index was outside the possible range.
Parameters
VoiceChannel
The voice channel processed by LV_SRE_Decode.
Index
The recognition position of the concept, between 0 and (LV_SRE_GetNumberOfConceptsReturned - 1), inclusive.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine would return the concepts highlighted:
See Also
LVSpeechPort::GetConcept
Printed Documentation
132
LV_SRE_GetConceptScore
Returns the confidence score of a concept found in the last call to LV_SRE_Decode.
int LV_SRE_GetConceptScore(HPORT hport,int VoiceChannel, int Index);
Return Values
The confidence score of the matched concept. The range of possible values is 0 to 1000.
Parameters
VoiceChannel
The voice channel processed by LV_SRE_Decode.
Index
The recognition position of the concept, between 0 and (LV_SRE_GetNumberOfConceptsReturned - 1), inclusive.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the scores highlighted:
See Also
LV_SpeechPort::GetConceptScore
LumenVox SpeechRec API
133
LV_SRE_GetPhraseDecoded
Returns the decoded phrase (with BNF formatting retained) found in the last call to LV_SRE_Decode.
const char* LV_SRE_GetPhraseDecoded(HPORT hport, int VoiceChannel, int Index);
Return Values
A static string
Parameters
VoiceChannel
The voice channel to process.
Index
The recognition position of the phrase to decode.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phrases highlighted:
The main difference between LV_SRE_GetPhraseDecoded and LV_SRE_GetRawTextDecoded is in BNF formatting. LV_SRE_GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. LV_SRE_GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, LV_SRE_GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.
Printed Documentation
134
See Also
LV_SRE_GetPhonemesDecoded
LV_SRE_GetRawTextDecoded
LVSpeechPort::GetPhraseDecoded
LumenVox SpeechRec API
135
LV_SRE_GetPhonemesDecoded
Returns the actual phonemes found in a call to LV_SRE_Decode.
const char* LV_SRE_GetPhonemesDecoded(HPORT hport,int VoiceChannel, int Index);
Return Values
A null-terminated static string of the decoded phonemes.
Parameters
VoiceChannel
The voice channel to process.
Index
The recognition position of the decoded phonemes.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phonemes highlighted:
See Also
LV_SRE_GetPhraseDecoded
LV_SRE_GetRawTextDecoded
LVSpeechPort::GetPhonemes
Printed Documentation
136
LV_SRE_GetRawTextDecoded
Returns the decoded raw text (without BNF formatting) found in the last call to Decode.
const char* LV_SRE_GetRawTextDecoded(HPORT hport,int VoiceChannel, int Index);
Return Values
A null-terminated string representing the decoded raw text.
Parameters
VoiceChannel
The voice channel to process.
Index
The recognition position of the decoded raw text.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the raw text highlighted:
The main difference between LV_SRE_GetPhraseDecoded and LV_SRE_GetRawTextDecoded is in BNF formatting. LV_SRE_GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. LV_SRE_GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, LV_SRE_GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.
LumenVox SpeechRec API
137
See Also
LV_SRE_GetPhonemes
LV_SRE_GetPhraseDecoded
LVSpeechPort::GetRawTextDecoded
Printed Documentation
138
LV_SRE_GetVoiceChannelData
Sets the pointers to the voice channel's copy of the original preprocessed audio data.
int LV_SRE_GetVoiceChannelData(HPORT hport, int VoiceChannel, short** PCM, unsigned int* Samples);
Return Values
LV_SUCCESS
No errors; PCM and Samples have been successfully set.
LV_SOUND_CHANNEL_OUT_OF_RANGE
The grammar set specified is outside the valid range; possible values are 0-63, inclusive.
LV_BAD_HPORT
The Speech Engine is no longer running. This is the result of a ClosePort call or a unrecoverable Speech Engine error.
Parameters
VoiceChannel
The voice channel to process.
PCM
A pointer to a pointer that will be set to the post-processed audio data.
Samples
A pointer to an integer to the set the number of samples.
See Also
LVSpeechPort::GetVoiceChannelData
LumenVox SpeechRec API
139
LV_SRE_ReturnErrorString
Returns a description of an error code.
const char* LV_SRE_ReturnErrorString(int ReturnCode);
Return Values
A null-terminated static string describing the error code.
Parameters
ReturnCode
The error code.
Remarks
If the error code is an invalid error code, "Invalid Error Code" is returned.
See Also
LVSpeechPort::ReturnErrorString
Printed Documentation
140
LV_SRE_SetProperty
Sets various properties on the port.
int LV_SRE_SetProperty(HPORT hport, PROPERTIES Property, int Value);
Return Values
LV_SUCCESS
No errors; Property is set to Value.
LV_BAD_HPORT
hport was invalid.
LV_NOT_A_VALID_PROPERTY_VALUE
Value is invalid for the given property.
Parameters
HPort
The port's handle.
Property
Which property to modify.
Value
Property-dependent.
Remarks
Currently, only PROP_SAVE_SOUND_FILES is implemented; setting Value to 1 will cause the port to save request and answer files to disk; setting Value to 0
LumenVox SpeechRec API
141
turns this feature off. The request and answer files are invaluable for troubleshooting and tuning applications, but will quickly fill up a hard drive.
See Also
Properties
LVSpeechPort::SetProperty
Printed Documentation
142
LV_SRE_SetPropertyEx
Sets various properties for a port, client, soundchannel, or grammar.
int SetProperty(int propertyname, int valuetype, void* pvalue, int target = PROP_EX_TARGET_PORT, int index = 0 );
Return Values
LV_SUCCESS
No errors; property is set to the value pointed to by pvalue.
LV_INVALID_PROPERTY
The property does not exist.
LV_INVALID_PROPERTY_VALUE
The property value is invalid for the designated property (e.g. out of range).
LV_INVALID_PROPERTY_TARGET
The property cannot be set for the specified target.
LV_INVALID_PROPERTY_VALUE_TYPE
The property's type is incompatible with the declared type.
LV_INVALID_PROPERTY_TARGET_IDX
The target's index (grammar set, voicechannel) is out of range for this property.
Note: If more than one error occurs, which error code is returned is undefined.
Parameters
propertyname
LumenVox SpeechRec API
143
Which property to modify.
valuetype
The value type of the property being set. Legal values are:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
PROP_EX_VALUE_TYPE_STRING
PROP_EX_VALUE_TYPE_FLOAT_PTR
Each property has a set of legal set of value types. See Properties.
pvalue
A pointer to the new value for propertyname. pvalue will be reinterpreted according to the value type provided.
target
The portion of the API that this property is set for. Legal values are:
PROP_EX_TARGET_PORT -- pvalue affects an entire speech port object
PROP_EX_TARGET_CHANNEL -- pvalue affects one voice channel in the speech port. The channel is specified by index.
PROP_EX_TARGET_GRAMMAR -- pvalue affects one grammar set in the speech port. The set is specified by index.
PROP_EX_TARGET_CLIENT -- pvalue is global, and affects all ports on the client.
Remarks
Printed Documentation
144
See Properties for a list of modifiable properties.
See Also
Properties
LVSpeechPort::SetPropertyEx (C++ API)
LumenVox SpeechRec API
145
LV_SRE_StreamStart
Sets up a new stream.
int LV_SRE_StreamStart(HPORT hport);
Return Values
LV_SUCCESS
Stream set up.
LV_FAILURE
Parameters incorrectly set.
Parameters
HPort
The port's handle.
Remarks
Call this function to set up a new stream. You need to call this function after calling LV_SRE_StreamStop, LV_SRE_StreamCancel or after end-of-speech has been detected on previous utterance.
See Also
LV_SRE_StreamSetParameter
LV_SRE_StreamStop
LV_SRE_StreamCancel
Printed Documentation
146
LV_SRE_StreamSendData
Send data buffer of sound data to stream.
int LV_SRE_StreamSendData(HPORT hport, void* SoundData, int SoundDataLength);
Return Values
LV_SUCCESS
Data accepted
LV_FAILURE
Stream not active or NULL sound data.
Parameters
HPort
The port's handle.
SoundData
Pointer to the memory buffer containing sound data.
SoundDataLength
Length in bytes of sound data.
Remarks
Used to do the actual streaming. Call this function with each sound data buffer. This call copies sound data to an internal buffer and returns immediatly. Processing of sound data takes place on a background thread.
See Also
LV_SRE_StreamSetStateChangeCallBack
LV_SRE_StreamGetStatus
LumenVox SpeechRec API
147
LV_SRE_StreamGetStatus
Returns status of stream.
int LV_SRE_StreamGetStatus(HPORT hport);
Return Values
Returns a stream status define. See Steam Status.
Parameters
HPort
The port's handle.
Remarks
Called to check the current state of stream.
See Also
LV_SRE_StreamSetStateChangeCallBack
Printed Documentation
148
LV_SRE_StreamGetLength
Returns length of sound data in stream buffer.
int LV_SRE_StreamGetStatus(HPORT hport);
Return Values
Number of bytes in internal buffer for sound stream.
Parameters
HPort
The port's handle.
Remarks
This is the total number of bytes streamed. Does not include bytes sent before barge-in is detected (if STREAM_PARM_DETECT_BARGE_IN is active) Can be useful if application wants to stop post barge-in stream after a certain amount of time (as example, to limit a user speech to 10 seconds)
See Also
LV_SRE_StreamSetStateChangeCallBack
LumenVox SpeechRec API
149
LV_SRE_StreamSetStateChangeCallBack
Set up a call back to receive state change notification of a stream.
int LV_SRE_StreamSetStateChangeCallBack(HPORT hport, LV_SRE_StreamStateChangeFn* fn, void* UserData);
Return Values
LV_SUCCESS
LV_BAD_HPORT
Parameters
HPort
The port's handle.
LV_SRE_StreamStateChangeFn
Pointer to callback function to receive state change updates. See Stream Callback.
UserData
Application defined data sent back in callback.
Remarks
Each time a streams status changes, this callback will be called.
See Also
LV_SRE_StreamStateChangeFn
LV_SRE_StreamGetStatus
Printed Documentation
150
LV_SRE_StreamStop
Stops stream and loads sound channel with streamed data.
int LV_SRE_StreamStop(HPORT hport);
Return Values
LV_SUCCESS
LV_BAD_HPORT
LV_FAILURE Stream not active.
Parameters
HPort
The port's handle.
Remarks
This function ends streaming and puts streamed data into the voice channel defined with the STREAM_PARM_VOICE_CHANNEL parameter. If the STREAM_PARM_AUTO_DECODE parameter is active, the decode will begin (non-blocking) when this function is called.
See Also
LV_SRE_StreamSetParameter
LV_SRE_StreamCancel
Stream Parameters
LumenVox SpeechRec API
151
LV_SRE_StreamCancel
Stops stream, sound data is discarded.
int LV_SRE_StreamCancel(HPORT hport);
Return Values
LV_SUCCESS
LV_BAD_HPORT
LV_FAILURE Stream not active.
Parameters
HPort
The port's handle.
Remarks
This kills the stream. Can be called to cancel a stream (particularly auto-decode types streams) in order to start new stream.
See Also
LV_SRE_StreamStop
Printed Documentation
152
LV_SRE_StreamSetParameter
Sets a new value for a stream property.
int LV_SRE_StreamSetParameter(HPORT hport, int StreamParameter, unsigned long StreamParameterValue);
Return Values
LV_SUCCESS
LV_INVALID_PROPERTY StreamParameter does not exist.
LV_INVALID_PROPERTY_VALUE StreamParamerterValue is out of range for the stream parameter.
Parameters
HPort
The port's handle.
StreamParameter
Stream parameter to change. See Stream Parameters.
StreamParameterValue
New stream parameter value.
Remarks
Sets a stream parameter value.
See Also
LumenVox SpeechRec API
153
LV_SRE_StreamGetParameter
LV_SRE_StreamSetParameterToDefault
Stream Parameters
Printed Documentation
154
LV_SRE_StreamGetParameter
Gets the current value of a stream property.
int LV_SRE_StreamSetParameter(HPORT hport, int StreamParameter, unsigned long StreamParameterValue);
Return Values
LV_SUCCESS
LV_INVALID_PROPERTY StreamParameter does not exist.
LV_INVALID_PROPERTY_VALUE StreamParamerterValue is out of range for the stream parameter.
Parameters
HPort
The port's handle.
StreamParameter
Stream parameter to change. See Stream Parameters.
StreamParameterValue
New stream parameter value.
Remarks
Sets a stream parameter value.
See Also
LumenVox SpeechRec API
155
LV_SRE_StreamGetParameter
LV_SRE_StreamSetParameterToDefault
Stream Parameters
Printed Documentation
156
LV_SRE_StreamSetParameterToDefault
Sets a stream property to its default value.
int LV_SRE_StreamSetParameterToDefault(HPORT hport, int StreamParameter);
Return Values
LV_SUCCESS
LV_INVALID_PROPERTY Stream parameter does not exist.
Parameters
HPort
The port's handle.
StreamParameter
Stream parameter to reset . See Stream Parameters.
Remarks
Sets a stream parameter value back to default value.
See Also
LV_SRE_StreamGetParameter
LV_SRE_StreamSetParameter
Stream Parameters
LumenVox SpeechRec API
157
LV_SRE_GetNumberOfNBestAlternatives
Returns the number of n-best alternatives found by the engine.
int LV_SRE_GetNumberOfNBestAlternatives(HPORT hport, int voicechannel);
Return Values
Number of n-best alternatives. It will always less than or equal to the value set for PROP_EX_MAX_NBEST_RETURNED.
Parameters
HPort
The port's handle.
voicechannel
The channel containing the decoded audio.
Remarks
Sets a stream parameter value back to default value.
See Also
PROP_EX_MAX_NBEST_RETURNED
LV_SRE_SwitchToNBestAlternative
LVSpeechPort::GetNumberOfNBestAlternatives
Printed Documentation
158
LV_SRE_SwitchToNBestAlternative
Switch the n-best alternative that is viewable. After this function call, subsequent result retrieval functions, such as LV_SRE_CreateInterpretation will come from this n-best alternative.
int LV_SRE_SwitchToNBestAlternatives(HPORT hport, int voicechannel, int index);
Return Values
LV_SUCCESS
LV_FAILURE The index is not valid.
Parameters
HPort
The port's handle.
voicechannel
The channel containing the decoded audio.
index
The index of the n-best alternative to switch to. It may be any value in the range [0, LV_SRE_GetNumberOfNBestAlternatives).
Remarks
Each alternative represents a distinct sentence. However, since some sentences can have multiple interpretations or multiple parses, it is possible that for some alternatives you will have multiple parse tree or interpretation objects returned. For this reason, you should get all results out as follows:
int nbest_count; int nbest_total = LV_SRE_GetNumberOfNBestAlternatives(port, vc);
LumenVox SpeechRec API
159
int interp_count; for (nbest_count=0; nbest_count<nbest_total; ++nbest_count) { LV_SRE_SwitchToNBestAlternative(port, vc, nbest_count); int interp_total = LV_SRE_GetNumberOfInterpretations(port, vc); for (interp_count=0; interp_count<interp_total; ++interp_count) { H_SI interp = LV_SRE_CreateInterpretation(port, vc, interp_count); /* do something with the interp */ LVInterpretation_Release(interp); } }
Even though more than one interpretation can live in a single n-best result, the same interpretation will not live in more than one n-best result. The lower scoring interpretations are pruned out.
See Also
LV_SRE_GetNumberOfNBestAlternatives
LVSpeechPort::SwitchToNBestAlternative
Printed Documentation
160
LV_SRE_WaitForDecode
Blocks the client application until the decode is finished.
int LV_SRE_WaitForDecode(HPORT hport, int VoiceChannel);
Return Values
LV_SUCCESS
No errors or timeout; the decode interaction is finished.
LV_TIME_OUT
The timeout value associated with PROP_EX_DECODE_TIMEOUT was exceeded before a result was returned from the Speech Engine. The decode was dropped from the Engine, and the LVSpeechPort may now start a new decode request.
Parameters
VoiceChannel
Which voice channel to wait on. Setting VoiceChannel equal to -1 causes a wait on all the voice channels for the port.
Remarks
Some of the API functions run asynchronously, in particular, LV_SRE_Decode. LV_SRE_WaitForDecode is primarily useful when LV_SRE_Decode is called without LV_DECODE_BLOCK. In this case, LV_SRE_Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will eventually need the results, the clients need a way to query the port to see if LV_SRE_Decode has finished. LV_SRE_WaitForDecode will wait the specified time (determined by set value of PROP_EX_DECODE_TIMEOUT) for the engine to idle; check the return value to ensure the decode interaction is finished before attempting to retrieve answers from the speech port.
See Also
PROP_EX_DECODE_TIMEOUT
LV_SRE_Decode
LVSpeechPort::WaitForDecode
LumenVox SpeechRec API
161
LVInterpretation C API Functions
LVInterpretation Summary
The LVInterpretation object contains a fully processed decode result. It includes
The raw input the Speech Engine recognized
The name of the grammar that was matched
A confidence score for the interpretation
The semantic data object -- the result of processing the input sentence against the matching grammar, and executing the semantic tags in the sentence's parse tree
Use <LVSpeechPort.h> or <LV_SRE_Semantic.h>
Return Type Function Description
H_SI LVInterpretation_Create (void) Creates an empty LVInterpretation handle.
H_SI LVInterpretation_CreateFromCopy (H_SI other)
Create a copy of another LVInterpretation handle
void LVInterpretation_Release(H_SI hsi) Destroys the LVInterpretation handle
H_SI_DATA LVInterpretation_GetResultData (H_SI hsi)
The result object, representing the end product of the semantic interpretation process.
const char* LVInterpretation_GetResultName (H_SI hsi)
The name of the result
Printed Documentation
162
hsi) data, according to the matching grammar.
const char* LVInterpretation_GetGrammarLabel (H_SI hsi)
Returns the name of the grammar as it was provided to the speech port.
const char* LVInterpretation_GetMode (H_SI hsi) returns the interaction mode for this interpretation.
const char* LVInterpretation_GetLanguage (H_SI hsi)
Returns the language identifier for this interpretation.
const char* LVInterpretation_GetInputSentence (H_SI hsi)
The sentence that generated this interpretation.
int LVInterpretation_GetScore (H_SI hsi) Confidence score for this interpretation.
const char* LVInterpretation_GetTagFormat (H_SI hsi)
The tag format (interpretation scheme) that created the semantic data object.
LumenVox SpeechRec API
163
LVSemanticData Summary
An LVSemanticData object is the result of the semantic interpretation process. A user's spoken input is combined with a grammar containing semantic tag instructions to create a compound object. An LVSemanticData object can be one of the following types:
SI_TYPE_INT -- A simple integer value
SI_TYPE_DOUBLE -- A double precision floating point value
SI_TYPE_BOOL -- An integer that is either 1 or 0
SI_TYPE_STRING -- A null-terminated character array.
SI_TYPE_OBJECT -- A structure containing one or more property-value pairs.
SI_TYPE_ARRAY -- An indexed collection of values.
SI_TYPE_NULL -- A null object.
Return Value
Function Description
H_SI_DATA LVSemanticData_CreateFromCopy(H_SI_DATA other) Creates a new object from an old one. The new one will need to be released when no longer in use.
const char*
LVSemanticData_Print(H_SI_DATA data, int format) Prints the data in XML or ECMAScript formats.
int LVSemanticData_GetType(H_SI_DATA data) Returns the type of the data.
const char*
LVSemanticData_GetString(H_SI_DATA data) If the data is of type SI_TYPE_STRING,
Printed Documentation
164
returns the string contents.
int LVSemanticData_GetInt(H_SI_DATA data) If the data is of type SI_TYPE_INT, returns the integer.
double LVSemanticData_GetDouble(H_SI_DATA data) If the data is of type SI_TYPE_DOUBLE, returns the double.
int LVSemanticData_GetBool(H_SI_DATA data) If the data is of type SI_TYPE_BOOL, returns a 1 for true, 0 for false
int LVSemanticObject_GetNumberOfProperties(H_SI_DATA data)
If the data is of type SI_TYPE_OBJECT, returns the number of properties (member data) it contains.
const char*
LVSemanticObject_GetPropertyName(H_SI_DATA data, int i)
If the data is of type SI_TYPE_OBJECT, returns the name of the ith property
int LVSemanticObject_PropertyExists(H_SI_DATA data, const char* prop_name)
If the data is of type SI_TYPE_OBJECT, returns 1 if the object contains a value named prop_name, 0 otherwise.
H_SI_DATA LVSemanticObject_GetPropertyValue(H_SI_DATA data, const char* prop_name)
If the data is of type SI_TYPE_OBJECT, returns the member data named prop_name.
int LVSemanticArray_GetSize(H_SI_DATA data) If the data is of type
LumenVox SpeechRec API
165
SI_TYPE_ARRAY, returns the number of elements in the array.
H_SI_DATA LVSemanticArray_GetElement(H_SI_DATA data, int i)
If the data is of type SI_TYPE_ARRAY, returns the ith element in the array.
Printed Documentation
166
API Functions
LVInterpretation: Creating, Copying and Releasing
LVInterpretation objects are fully copyable.
Functions
H_SI LVInterpretation_Create(void)
H_SI LVInterpretation_CreateFromCopy(H_SI other_si)
void LVInterpretation_Copy(H_SI hsi, H_SI other_si)
void LVInterpretation_Release(H_SI hsi)
Parameters
hsi
The interpretation handle being copied into, or being released
other_hsi
The interpretaion handle whose contents are being copied.
Remarks
Any new handle given to you via Create or CreateFromCopy must be released. Also, any handle given to you by the speech port through LV_SRE_CreateInterpretation must be released.
Example
HPORT Port; H_SI Interp;
//open the port and do a decode //... //when the decode is finished,grab an interpretation handle Interp = LV_SRE_CreateInterpretation(Port, voicechannel, index);
LumenVox SpeechRec API
167
//start using the interpretation data. //... //When you are done with it, release it. LVInterpretation_Release(Interp);
See Also
Constructing, Copying and Destroying an LVInterpretation Object (C++ API)
Printed Documentation
168
LVInterpretation_GetResultData
Returns a handle for the semantic data generated by user input and a matching grammar. The returned handle does not allocate any additional memory, so do not release it.
Function
H_SI_DATA LVInterpretation_GetResultData(H_SI hsi)
Returns
A handle to the results of a semantic interpretation process.
Parameters
hsi
An interpretation handle.
Remarks
The semantic data handle provided to the user via this function is owned by the interpretation handle hsi. It will be released when hsi is released.
See Also
LVSemanticData C API
LVInterpretation::ResultData (C++ API)
LumenVox SpeechRec API
169
LVInterpretation_GetResultName
Returns the name of the name of the result data for this interpretation. The result name is usually the root rule of the matching grammar for this interpretation.
Function
const char* LVInterpretation_GetResultName (H_SI hsi)
Parameters
hsi
An interpretation handle.
See Also
LVInterpretation::ResultName (C++ API)
Printed Documentation
170
LVInterpretation_GetLanguage
Returns the language identifier of the grammar that generated this interpretation.
Function
const char* LVInterpretation_GetLanguage(H_SI hsi)
Parameters
hsi
An interpretation handle.
Returns
An RFC 3066 language identifier, such as "en-US" for United States English, or "fr" for French.
See Also
LVInterpretation::Language ( C++ API )
LumenVox SpeechRec API
171
LVInterpretation_GetMode
Returns the interaction mode that created the interaction.
Function
const char* LVInterpretation_GetMode(H_SI hsi)
Parameters
hsi
An interpretation handle.
Returns
"voice" or "dtmf"
See Also
LVInterpretation::Mode (C++ API)
Printed Documentation
172
LVInterpretation_GetInputSentence
Returns the input that was fed to the matching grammar to create this interpretation. It may represent the speech the engine recognized, or a dtmf sequence.
Function
const char* LVInterpretation_GetInputSentence(H_SI hsi)
Parameters
hsi
An interpretation handle
See Also
LVInterpretation::InputSentence (C++ API)
LumenVox SpeechRec API
173
LVInterpretation_GetGrammarLabel
Returns the name of the grammar that generated this interpretation.
Function
const char* LVInterpretation_GetGrammarLabel (H_SI hsi)
Parameters
hsi
An interpretation handle.
Remarks
LVInterpretation_GetGrammarLabel will always return the name of one of the grammars you activated for decode. If the active grammar had an integer label, then the returned label will be a string representation of that integer.
See Also
LVInterpretation::GrammarLabel ( C++ API )
Printed Documentation
174
LVInterpretation_GetScore
Returns a confidence score for this interpretation.
Function
int LVInterpretation_GetScore(H_SI hsi)
Parameters
hsi
An interpretation handle
Returns
A number between 0-1000. Higher numbers indicate more confidence by the speech port in this interpretation.
See Also
LVInterpretation::Score (C++ API)
LumenVox SpeechRec API
175
LVInterpretation_GetTagFormat
Returns the name of the tag format declared in the matching grammar for this interpretation. The tag format determines the semantic interpretation scheme.
Function
const char* LVInterpretation_GetTagFormat(H_SI hsi)
Parameters
hsi
An interpretation handle.
See Also
LVInterpretation::TagFormat (C++ API)
Printed Documentation
176
LVSemanticData_Release
Release memory used by a H_SI_DATA handle.
Function
void SI_DATA_Release(H_SI_DATA h_si_data)
Parameters
h_si_data
Semantic Data Handle.
LumenVox SpeechRec API
177
LVSemanticData_CreateFromCopy
Copies the contents of another handle into a new handle and returns the new handle. This function allocates memory for the new handle, so user is required to release the new handle.
H_SI_DATA LVSemanticData_CreateFromCopy(H_SI_DATA h_si_data)
Return Value
Non-zero
Successful.
NULL
Copying failed.
Parameters
h_si_data
Semantic data handle.
Printed Documentation
178
LVSemanticData_Print
Returns a string describing the contents of a semantic data handle. The function can return XML or ECMAScript formatted text.
const char* LVSemanticData_Print(H_SI_DATA h_si_data, int format)
Return Values
A pointer to the string which contains the print out information.
Parameters
h_si_data
Semantic data handle.
format
The format type.
Remark
The string contents are stored with the semantic data handle, and will be released when the handle is released.
LumenVox SpeechRec API
179
LVSemanticData_GetType
Returns the underlying data type of a given H_SI_DATA handle.
Function
int LVSemanticData_GetType(H_SI_DATA h_si_data)
Return Value
One of seven semantic data types.
Parameters
h_si_data
Semantic data handle.
Printed Documentation
180
LVSemanticData_GetString
Returns the string contained in a given handle. This function assumes that the handle is of type SI_TYPE_STRING. If the user passes in a handle with non SI_TYPE_STRING type, this function always return NULL.
Function
const char* LVSemanticData_GetString(H_SI_DATA h_si_data)
Return Values
NULL
Either the handle is not of type SI_TYPE_STRING, or some error occurred.
Other
A pointer to a buffer containing the string.
Parameters
h_si_data
Semantic data handle.
LumenVox SpeechRec API
181
LVSemanticData_GetDouble
Returns a double precision floating point value contained in the given semantic data handle. This function assumes that the handle is of type SI_TYPE_DOUBLE . If the user passes in a handle with non SI_TYPE_DOUBLE type, this function always returns 0.0.
Function
double LVSemanticData_GetDouble(H_SI_DATA h_si_data)
Return Values
A double.
Parameters
h_si_data
Semantic data handle.
Printed Documentation
182
LVSemanticData_GetInt
Returns the integer value contained in a given semantic data handle. This function assumes that the handle is of type SI_TYPE_INT. If the user passes in a handle with non SI_TYPE_INT type, this function always returns 0.
Function
int LVSemanticData_GetInt(H_SI_DATA h_si_data)
Return Values
An integer value.
Parameters
h_si_data
Semantic data handle.
LumenVox SpeechRec API
183
LVSemanticData_GetBool
Returns an integer value contained in a given handle. A non-zero integer value represents a true value, and a zero value represents a false value. This function assumes that the semantic data handle being passed in is of type SI_TYPE_BOOL. If the user passes in a handle with non SI_TYPE_BOOL type, this function always returns false.
Function
int LVSemanticData_GetBool(H_SI_DATA h_si_data)
Return Values
An integer value.
Parameters
h_si_data
Semantic data handle.
Printed Documentation
184
LVSemanticObject_GetNumberOfProperties
If a semantic data handle is of type SI_TYPE_OBJECT this function returns the number of elements in this object. Otherwise, it returns -1.
Function
int LVSemanticObject_GetNumberOfProperties(H_SI_DATA h_si_data)
Return Value
The number of elements in the object.
Parameters
h_si_data
Semantic data handle.
LumenVox SpeechRec API
185
LVSemanticObject_GetPropertyName
If a handle is of type SI_TYPE_OBJECT, this function returns the name of a property of the object. Otherwise this function returns NULL. Usually, the user obtains the number of properties by calling LVSemanticObject_GetNumberOfProperties, then gets each property name in sequence.
Function
const char* LVSemanticObject_GetPropertyName(H_SI_DATA h_si_data, int index)
Return Values
Non-NULL pointer
A pointer to a buffer containing the name of the property specified by index.
NULL
Either the handle is not of SI_TYPE_OBJECT type, or the index exceeds the total number of properties in this object.
Parameters
h_si_data
Semantic data handle.
index
The index of the property you are inspecting. The index begins at 0. If the index is greater or equal to the value returned by LVSemanticObject_GetNumberOfProperties, this function will return NULL.
Printed Documentation
186
LVSemanticObject_GetPropertyValue
If the handle is of SI_TYPE_OBJECT type, this function return the handle to the semantic data associated with the property name in the object.If the handle is not of SI_TYPE_OBJECT type, this function always return 0. This function does not allocate memory for the new handle, so do not try to release the new handle.
Function
H_SI_DATA LVSemanticObject_GetPropertyValue(H_SI_DATA h_si_data, const char *property_name)
Return Values
Non-zero value
A handle to the semantic data associated with the property name in the object..
NULL
The property name does not exist in the object, or the handle is not of SI_TYPE_OBJECT type.
Parameters
h_si_data
Semantic data handle.
property_name
A string containing the property name.
LumenVox SpeechRec API
187
LVSemanticObject_PropertyExists
If a handle is of SI_TYPE_OBJECT type, this function returns a boolean value indicating if the property name exists in the object. If the handle is not of SI_TYPE_OBJECT type, this function always return false.
Function
int LVSemanticObject_PropertyExists(H_SI_DATA h_si_data, const char *property_name)
Return Values
1
The property name exists in the object..
0
The property name does not exist in the object. Or the handle is not SI_TYPE_OBJECT type.
Parameters
h_si_data
A semantic data handle.
property_name
A string containing the property name.
Printed Documentation
188
LVSemanticArray_GetSize
If a handle is of SI_TYPE_ARRAY type, this function returns the number of elements in the array. Otherwise this function returns -1.
Function
int LVSemanticArray_GetSize(H_SI_DATA h_si_data)
Return Values
Non-negtive value
The number of elements in the array.
-1
Either the handle is not of SI_TYPE_ARRAY type, or some error occurred.
Parameters
h_si_data
Semantic data handle.
LumenVox SpeechRec API
189
LVSemanticArray_GetElement
If the handle is of SI_TYPE_ARRAY type, this function returns a handle to the semantic data specified by the index. If the handle is not of SI_TYPE_ARRAY type, this function always returns NULL. This function does not allocate memory for the new handle, so do not try to release it.
Function
H_SI_DATA LVSemanticArray_GetElement(H_SI_DATA h_si_data, int index)
Return Values
Non-zero value
The handle to the semantic data specified by the index in the array..
0
The index exceeds the number of elements. Or the handle is not SI_TYPE_ARRAY type.
Parameters
h_si_data
Semantic data handle.
index
The index begins at 0. If the index is greater or equal to the value returned by LVSemanticArray_GetSize, this function will return NULL.
Printed Documentation
190
LVParseTree C API functions
LumenVox SpeechRec API
191
API Functions
Creating, Copying and Releasing a LVParseTree Handle
LVParseTree objects are fully copyable and assignable.
Functions
H_PARSE_TREE LVParseTree_Create()
H_PARSE_TREE LVParseTree_CreateFromCopy(H_PARSE_TREE Other)
void LVParseTree_Copy (H_PARSE_TREE Tree, H_PARSE_TREE Other)
void LVParseTree_Release (H_PARSE_TREE Tree)
Parameters
Tree
A handle to a parse tree being released or copied into
Other
A handle to a parse tree being copied.
Remarks
CreateFromCopy and Copy both perform deep copies on the handles in question. Both handles will have to be released after either function call to release all allocated memory. Tree handles given to the user via LV_SRE_CreateParseTree must also be released.
Example
HPORT Port;
//open the port and do a decode //... //when the decode is finished,grab a parse tree handle H_PARSE_TREE Tree = LV_SRE_CreateParseTree(Port, voicechannel, index);
Printed Documentation
192
//start using the tree. //... //When you are done with it, release it. LVParseTree_Release(Tree);
See Also
Constructing, Copying and Destroying an LVParseTree Object (C++ API)
LumenVox SpeechRec API
193
LVParseTree_GetGrammarLabel
Returns the name of the grammar that generated this tree.
Function
const char* LVParseTree_GrammarLabel (H_PARSE_TREE Tree)
Parameters
Tree
A handle to the parse tree.
Remarks
LVParseTree_GetGrammarLabel( ) will always return the name of one of the grammars you activated for decode. If the active grammar had an integer label, then the returned label will be a string representation of that integer.
See Also
LVParseTree::GrammarLabel ( C++ API )
Printed Documentation
194
LVParseTree_GetLanguage
Returns the language identifier of the grammar that generated this tree.
Function
const char* LVParseTree_GetLanguage(H_PARSE_TREE Tree)
Parameters
Tree
A handle to a parse tree.
Returns
An RFC 3066 language identifier, such as "en-US" for United States English, or "fr" for French.
See Also
LVParseTree::Language ( C++ API )
LumenVox SpeechRec API
195
LVParseTree_GetMode
Returns the interaction mode that created the tree.
Function
const char* LVParseTree_GetMode(H_PARSE_TREE Tree)
Parameters
Tree
A handle to a parse tree.
Returns
"voice" or "dtmf"
See Also
LVParseTree::Mode (C++ API)
Printed Documentation
196
LVParseTree_GetTagFormat
Returns the name of the tag format declared in the matching grammar for this tree.
Function
const char* LVParseTree_GetTagFormat(H_PARSE_TREE Tree)
Parameters
Tree
A handle to a parse tree
See Also
LVParseTree::TagFormat (C++ API)
LumenVox SpeechRec API
197
LVParseTree_GetRoot
Gets the root parse tree node.
Function
H_PARSE_TREE_NODE LVParseTree_GetRoot(H_PARSE_TREE Tree);
Parameters
Tree
Handle to a parse tree.
Return Values
An H_PARSE_TREE_NODE handle representing the toplevel rule of the matching grammar.
Remarks
This node will always be a rule node (i.e will always satisfy LVParseTree_Node_IsRule(root) == 1). If the matching grammar specified a root rule then this node will always represent that rule.
See Also
LVParseTree::Root ( C++ API )
Printed Documentation
198
LVParseTree_CreateIteratorBegin and LVParseTree_CreateIteratorEnd
LVParseTree_CreateIteratorBegin and LVParseTree_CreateIteratorEnd provide iterators for visiting every node in the tree in a top-to-bottom, left-to-right descent. It is also the basis for the Tag and Terminal iterators.
Functions
H_PARSE_TREE_ITR LVParseTree_CreateIteratorBegin(H_PARSE_TREE Tree)
H_PARSE_TREE_ITR LVParseTree_CreateIteratorEnd(H_PARSE_TREE Tree)
Parameters
Tree
Handle to a parse tree.
Example
The following code prints out every node in a parse tree.
H_PARSE_TREE_ITR Itr; H_PARSE_TREE_ITR End; H_PARSE_TREE_NODE Node;
Itr = LVParseTree_CreateIteratorBegin(Tree); End = LVParseTree_CreateIteratorEnd(Tree);
while (!LVParseTree_Iterator_AreEqual(Itr,End)) { H_PARSE_TREE_NODE Node = LVParseTree_Iterator_GetNode(Itr); for (int i = 0; i < LVParseTree_Node_GetLevel(Node); ++i) printf("\t"); if (LVParseTree_Node_IsRule(Node)) printf("$%s:\n",LV_ParseTree_Node_GetRuleName(Node)); if (LVParseTree_Node_IsTag(Node)) printf("{%s}\n",LVParseTree_Node_GetText(Node)); if (LVParseTree_Node_IsTerminal(Node)) printf("\"%s\"\n",LVParseTree_Node_GetText(Node)); LVParseTree_Iterator_Advance(Itr); }
LumenVox SpeechRec API
199
LVParseTree_Iterator_Release(Itr); LVParseTree_Iterator_Release(End);
/* Note: Node handles don't get released; They are part of the tree, and the tree releases them when it gets released */
If the grammar was the top level navigation example grammar, and the engine recognized "go back", the the above code would print out:
$directive: "go" "back" {$ = "APPLICATION_BACK"}
See Also
LVParseTree::Begin and LVParseTree::End (C++ API)
Printed Documentation
200
LVParseTree_CreateTerminalIteratorBegin LVParseTree_CreateTerminalIteratorEnd
LVParseTree_CreateTerminalIteratorBegin and LVParseTree_CreateTerminalIteratorEnd provide access to the "terminals" of the tree. Terminals are the words and phrases in your grammar, so a TerminalIterator gives you access the the exact words the SRE heard a speaker say to match a grammar, in the order that the SRE heard those words.
Functions
H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorBegin(H_PARSE_TREE Tree)
H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorEnd(H_PARSE_TREE Tree)
Parameters
Tree
Handle to a parse tree.
Example
The following code prints out the sentence SRE heard, with a word-level confidence score attached to each word.
H_PARSE_TREE_TERMINAL_ITR Itr; H_PARSE_TREE_TERMINAL_ITR End; H_PARSE_TREE_NODE Node;
Itr = LVParseTree_CreateTerminalIteratorBegin(Tree); End = LVParseTree_CreateTerminalIteratorEnd(Tree);
while (!LVParseTree_TerminalIterator_AreEqual(Itr,End)) { Node = LVParseTree_TerminalIterator_GetNode(Itr); printf("\"%s\":(%i)\n",LVParseTree_Node_GetText(Node), LVParseTree_Node_GetScore(Node)); LVParseTree_TerminalIterator_Advance(Itr); } printf("\n");
LumenVox SpeechRec API
201
LVParseTree_TerminalIterator_Release(Itr); LVParseTree_TerminalIterator_Release(End);
/* Note: Node handles don't get released; They are part of the tree, and the tree releases them when it gets released */
So if the grammar being used was the top level navigation example grammar, and the SRE recognized "go back", then the output of the above code might look like:
"go":(850) "back":(901)
See Also
LVParseTree::TerminalsBegin and LVParseTree::TerminalsEnd (C++ API)
Printed Documentation
202
LVParseTree_CreateTagIteratorBegin LVParseTree_CreateTagIteratorEnd
LVParseTree_CreateTagIteratorBeginand LVParseTree_CreateTagIteratorEnd provide iterators for visiting the tags in the tree's body.
Functions
H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorBegin(H_PARSE_TREE Tree)
H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorEnd(H_PARSE_TREE Tree)
Parameters
Tree
Handle to a parse tree.
Example
The following code prints out every tag in a parse tree.
H_PARSE_TREE_TAG_ITR Itr; H_PARSE_TREE_TAG_ITR End; H_PARSE_TREE_NODE Node;
Itr = LVParseTree_CreateTagIteratorBegin(Tree); End = LVParseTree_CreateTagIteratorEnd(Tree);
while (!LVParseTree_TagIterator_AreEqual(Itr,End)) { Node = LVParseTree_TagIterator_GetNode(Itr); printf("%s;\n",LVParseTree_Node_GetText(Node)); LVParseTree_TagIterator_Advance(Itr); }
LVParseTree_TagIterator_Release(Itr); LVParseTree_TagIterator_Release(End);
/* Note: Node handles don't get released; They are part of the tree, and the tree releases them when it gets released */
LumenVox SpeechRec API
203
If the grammar was the top level navigation example grammar, and the engine recognized "go back", the the above code would print out:
$ = "APPLICATION_BACK";
Remark
The TagIterator does not visit the tags in a tree's header. Use LVParseTree::HeaderTag to access the contents of those tags.
See Also
LVParseTree::TagsBegin and LVParseTree::TagsEnd (C API)
Printed Documentation
204
Related APIs
LVParseTree_Node API
An LVParseTree is made out of Node objects. Each node represents a word, rule, or tag that was seen by the engine as it decoded an utterance against the matching grammar.
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function
H_PARSE_TREE_NODE LVParseTree_Node_GetParent (H_PARSE_TREE_NODE Node)
H_PARSE_TREE_CHILDREN_ITR LVParseTree_Node_CreateChildrenIteratorBegin(H_PARSE_TREE_NODE Node)
H_PARSE_TREE_CHILDREN_ITR LVParseTree_Node_CreateChildrenIteratorEnd(H_PARSE_TREE_NODE Node)
H_PARSE_TREE_ITR LVParseTree_Node_CreateIteratorBegin(H_PARSE_TREE_NODE Node)
LumenVox SpeechRec API
205
H_PARSE_TREE_ITR LVParseTree_Node_CreateIteratorBegin(H_PARSE_TREE_NODE Node)
H_PARSE_TREE_TERMINAL_ITR LVParseTree_Node_CreateTerminalIteratorBegin(H_PARSE_TREE_NODE Node)
LH_PARSE_TREE_TERMINAL_ITR LVParseTree_Node_CreateTerminalIteratorEnd(H_PARSE_TREE_NODE Node)
H_PARSE_TREE_TAG_ITR LVParseTree_Node_CreateTagIteratorBegin(H_PARSE_TREE_NODE Node)
H_PARSE_TREE_TAG_ITR LVParseTree_Node_CreateTagIteratorEnd(H_PARSE_TREE_NODE Node)
int LVParseTree_Node_IsRule (void)
Printed Documentation
206
int LVParseTree_Node_IsTerminal (void)
int LVParseTree_Node_IsTag (void)
const char* LVParseTree_Node_GetText (void)
LumenVox SpeechRec API
207
const char* LVParseTree_Node_GetPhonemes (void)
const char* LVParseTree_Node_GetRuleName (void)
int LVParseTree_Node_GetScore (void)
int LVParseTree_Node_GetStartTime (void)
Printed Documentation
208
int LVParseTree_Node_GetEndTime (void)
LumenVox SpeechRec API
209
Printed Documentation
210
LVParseTree_Iterator C API
An LVParseTree_Iterator object traverses a parse tree in a top-to-bottom, left-to-right fashion (sometimes called a pre-order or LL traversal). You can get an iterator over a subtree rooted at a Node by calling:
LVParseTree_Node_CreateIteratorBegin(H_PARSE_TREE_NODE Node)
LVParseTree_Node_CreateIteratorEnd(node)
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
H_PARSE_TREE_ITR LVParseTree_Iterator_Create(void)
Creates a blank Iterator; its not pointing over anything.
H_PARSE_TREE_ITR LVParseTree_Iterator_CreateFromCopy(H_PARSE_TREE_ITR Other)
Creates a new Iterator from another. Both Iterators will need to be released when no longer needed.
void LVParseTree_Iterator_Copy(H_PARSE_TREE Iterator, H_PARSE_TREE_ITR Other)
Copies the data from one handle into another.
void LVParseTree_Iterator_Release(H_PARSE_TREE Iterator)
Releases the memory allocated to the Iterator
LumenVox SpeechRec API
211
handle.
void LVParseTree_Iterator_Advance(H_PARSE_TREE_ITR Iterator) Advances the Iterator one position.
H_PARSE_TREE_NODE LVParseTree_Iterator_GetNode(H_PARSE_TREE_ITR Iterator)
Provides access to a node in the parse tree.
int
LVParseTree_Iterator_AreEqual(H_PARSE_TREE_ITR Iterator1, H_PARSE_TREE_ITR Iterator2)
Tests equality with another Iterator. Two Iterators are equal if they are pointing to the same node in a parse tree.
Printed Documentation
212
LVParseTree_ChildrenIterator C API
An LVParseTree_ChildrenIterator object traverses the immediate children of a rule node, from left to right. You get a ChildrenIterator object from a Node by calling
LVParseTree_Node_CreateChildrenIteratorBegin(H_PARSE_TREE_NODE Node)
LVParseTree_Node_CreateChildrenIteratorEnd(H_PARSE_TREE_NODE Node)
With these iterators, you can traverse the immediate children of Node.
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function
H_PARSE_TREE_CHILDREN_ITR LVParseTree_ChildrenIterator_Create(void)
H_PARSE_TREE_CHILDREN_ITR LVParseTree_ChildrenIterator_CreateFromCopy (H_PARSE_TREE_CHILDREN_ITR Other)
void LVParseTree_ChildrenIterator_Copy(H_PARSE_TREE_CHILDREN_ITR Itr, H_PARSE_TREE_CHILDREN_ITR Other)
void LVParseTree_ChildrenIterator_Release(H_PARSE_CHILDREN_ITR Itr)
LumenVox SpeechRec API
213
void LVParseTree_ChildrenIterator_Advance(H_PARSE_TREE_CHILDREN_ITR Itr)
H_PARSE_TREE_NODE LVParseTree_ChildrenIterator_GetNode(H_PARSE_TREE_CHILDREN_ITR Itr)
int
LVParseTree_ChildrenIterator_AreEqual(H_PARSE_TREE_CHILDREN_ITR Itr1, H_PARSE_TREE_CHILDREN_ITR Itr2)
Printed Documentation
214
LVParseTree_TerminalIterator C API
An LVParseTree_TerminalIterator object is an adaptation of the standard LVParseTree_Iterator. It only visits the nodes in a tree that are terminals. You can get a TerminalIterator from a Node by calling:
LVParseTree_Node_CreateTerminalIteratorBegin(H_PARSE_TREE_NODE Node)
LVParseTree_Node_CreateTerminalIteratorEnd(H_PARSE_TREE_NODE Node)
With these iterators, you can visit all of the terminal nodes in the subtree rooted by Node.
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function
H_PARSE_TREE_TERMINAL_ITR LVParseTree_TerminalIterator_Create(void)
H_PARSE_TREE_TERMINAL_ITR LVParseTree_TerminalIterator_CreateFromCopy (H_PARSE_TREE_TERMINAL_ITR Other)
void LVParseTree_TerminalIterator_Copy(H_PARSE_TREE_TERMINAL_ITR Itr, H_PARSE_TREE_TERMINAL_ITR Other)
void LVParseTree_TerminalIterator_Release(H_PARSE_TERMINAL_ITR Itr)
LumenVox SpeechRec API
215
void LVParseTree_TerminalIterator_Advance(H_PARSE_TREE_TERMINAL_ITR Itr)
H_PARSE_TREE_NODE LVParseTree_TerminalIterator_GetNode(H_PARSE_TREE_TERMINAL_ITR Itr)
int
LVParseTree_TerminalIterator_AreEqual(H_PARSE_TREE_TERMINAL_ITR Itr1, H_PARSE_TREE_TERMINAL_ITR Itr2)
Printed Documentation
216
LVParseTree_TagIterator C API
An LVParseTree_TagIterator object is an adaptation of the standard LVParseTree_Iterator. It only visits the nodes in a tree that are tags. You can get a tag iterator from a Node by calling:
LVParseTree_Node_CreateTagIteratorBegin(H_PARSE_TREE_NODE Node)
LVParseTree_Node_CreateTagIteratorEnd(H_PARSE_TREE_NODE Node)
With these iterators, you can traverse all of the tags in the subtreee rooted by Node.
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
H_PARSE_TREE_TAG_ITR LVParseTree_TagIterator_Create(void)
Creates a blank iterator; its not pointing over anything.
H_PARSE_TREE_TAG_ITR LVParseTree_TagIterator_CreateFromCopy (H_PARSE_TREE_TAG_ITR Other)
Creates a new iterator from another Both iterators will need to be released when no longer needed.
void LVParseTree_TagIterator_Copy(H_PARSE_TREE_TAG_ITR Itr, H_PARSE_TREE_TAG_ITR Other)
Copies the data from one handle into another.
void LVParseTree_TagIterator_Release(H_PARSE_TREE_TAG_ITR Itr) Releases the memory allocated to
LumenVox SpeechRec API
217
the iterator handle.
void LVParseTree_TagIterator_Advance(H_PARSE_TREE_TAG_ITR Itr) Advances the iterator one position.
H_PARSE_TREE_NODE LVParseTree_TagIterator_GetNode(H_PARSE_TREE_TAG_ITR Itr)
Provides access to a node in the parse tree.
int
LVParseTree_TagIterator_AreEqual(H_PARSE_TREE_TAG_ITR Itr1, H_PARSE_TREE_TAG_ITR Itr2)
Tests equality with another iterator. Two iterators are equal if they are pointing to the same node in a parse tree.
Printed Documentation
218
LVParseTree Class
The following C API is exported from "LV_SRE_ParseTree.h". An LVParseTree class is available for C++ programmers which wraps this API.
See Also Using the Parse Tree Tutorial
Return Type Function Description
H_PARSE_TREE LVParseTree_Create(void) Constructs an LVParseTree object.
H_PARSE_TREE LVParseTree_CreateFromCopy(H_PARSE_TREE Other)
Copy constructor
void LVParseTree_Copy(H_PARSE_TREE Tree, H_PARSE_TREE Other)
Assignment operator
void LVParseTree_Release (H_PARSE_TREE Tree) Destroys the LVParseTree object
H_PARSE_TREE_NODE LVParseTree_GetRoot (H_PARSE_TREE Tree)
Provides access to the parent node in the parse tree.
H_PARSE_TREE_ITR LVParse_CreateIteratorBegin (H_PARSE_TREE Tree)
Provides an iterator that walks each node in the tree in a top-to-bottom, left-to-right fashion
LumenVox SpeechRec API
219
H_PARSE_TREE_ITR LVParseTree_CreateIteratorEnd (H_PARSE_TREE Tree)
Marks the end of traversal for the parse tree iterator
H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorBegin (H_PARSE_TREE Tree)
Traverses the terminals of the parse tree (words).
H_PARSE_TREE_TERMINAL_ITR LVParseTree_CreateTerminalIteratorEnd (H_PARSE_TREE Tree)
Marks the end of traversal for the TerminalIterator.
H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorBegin (H_PARSE_TREE Tree)
Traverses the tags in the parse tree (semantic data).
H_PARSE_TREE_TAG_ITR LVParseTree_CreateTagIteratorEnd (H_PARSE_TREE Tree)
Marks the end of traversal for the TagIterator
const char* LVParseTree_GetTagFormat (H_PARSE_TREE Tree)
Returns the tag format, as described by the grammar that this tree matched (e.g. "lumenvox/1.0" or "semantics/1.0")
int LVParseTree_GetNumberOfTagsInHeader (H_PARSE_TREE Tree)
Returns the number of tags (semantic data) that were defined in the matching
Printed Documentation
220
grammar's header.
const char* LVParseTree_GetHeaderTag (H_PARSE_TREE Tree, int i)
Returns the ith header tag from the matching grammar.
const char* LVParseTree_GetGrammarLabel (H_PARSE_TREE Tree)
Returns the name of the matching grammar that was provided to the speech port when it was loaded
const char* LVParseTree_GetMode (H_PARSE_TREE Tree)
Returns the mode of the utterance decode that created this tree: "voice" or "dtmf"
const char* LVParseTree_GetLanguage (H_PARSE_TREE Tree )
Returns the language of the matching grammar (e.g. "en-US" or "es-MX")
LumenVox SpeechRec API
221
LVGrammar C API Functions
LVGrammar Summary
The LVGrammar API allows you to manipulate a context-free grammar object that can be used in the engine to recognize speech.
Use <LVSpeechPor.h> or <LV_SRE_Grammar.h>
Return Type Function Description
HGRAMMAR LVGrammar_Create() Constructs an grammar object.
HGRAMMAR LVGrammar_CreateFromCopy (HGRAMMAR other)
Constructs an grammar object by copying an existing one.
void LVGrammar_Copy (HGRAMMAR hgram, HGRAMMAR other)
Copy object pointed by other to the object pointed by hgram.
void LVGrammar_Release (HGRAMMAR hgram)
Destroys the grammar object.
int LVGrammar_Reset (HGRAMMAR hgram) Reset an grammar object.
void LVGrammar_RegisterLoggingCallback (HGRAMMAR hgram, GrammarLogCB Log, void* UserData)
Registers a callback so the object can report warnings and errors to the grammar author.
int LVGrammar_SaveCompiledGrammar (HGRAMMAR hgram, const char* filename)
Save the grammar object to a binary file
Printed Documentation
222
int LVGrammar_LoadCompiledGrammar (HGRAMMAR hgram, const char* filename)
Load the grammar object from a binary file
int LVGrammar_LoadGrammar (HGRAMMAR hgram, const char* uri)
Loads a grammar from a location specified by the "uri" argument.
int LVGrammar_LoadGrammarFromBuffer (HGRAMMAR hgram, const char* buffer)
Loads a grammar from a null terminated string containing the contents of the grammar.
int LVGrammar_AddRule (HGRAMMAR hgram, const char* left_hand_side, const char* right_hand_side)
Inserts a new rule into the grammar.
int LVGrammar_RemoveRule (HGRAMMAR hgram, const char* left_hand_side)
Removes a rule from the grammar.
int LVGrammar_SetRoot (HGRAMMAR hgram, const char* root)
Sets a starting rule for the grammar.
void LVGrammar_SetMode (HGRAMMAR hgram, const char* mode)
Declare the mode of grammar (the style of decode to be processed). Legal arguments are "voice" or "dtmf".
const char* LVGrammar_GetMode (GRAMMAR hgram)
Retrieve the mode of the grammar.
void LVGrammar_SetLanguage (HGRAMMAR hgram, const char* language)
Specify the language of this grammar as a language/country code pair. Legal arguments include "en-US" and "es-MX".
LumenVox SpeechRec API
223
const char* LVGrammar_GetLanguage (HGRAMMAR hgram)
Retrieve the language setting of the grammar.
int
LVGrammar_SetTagFormat (HGRAMMAR hgram, const char* tag_format)
Identify the tag format of the grammar. To use the LumenVox semantic interpretation, the tag format must be "lumenvox/1.0" or "semantics/1.0".
const char* LVGrammar_GetTagFormat (HGRAMMAR hgram)
Retrieve the tag format of the grammar.
int LVGrammar_GetNumberOfMetaData (HGRAMMAR hgram)
Retrieve number of meta data in the grammar
const char* LVGrammar_GetMetaDataKey (HGRAMMAR hgram, int index)
Returns the key of the meta data indicated by the index.
const char* LVGrammar_GetMetaDataValue (HGRAMMAR hgram, int index)
Returns the value of the meta data indicated by the index.
int
LVGrammar_ParseSentence (HGRAMMAR hgram, const char* sentence)
Use the grammar to parse a sentence.
int LVGrammar_GetNumberOfParses (HGRAMMAR hgram)
Returns the number of parses created by the most recent LVGrammar_ParseSentence call.
H_PARSE_TREE LVGrammar_CreateParseTree (HGRAMMAR hgram, int index)
Returns the parse tree handle indicated by the index.
Printed Documentation
224
int LVGrammar_InterpretParses (HGRAMMAR hgram)
Generate interpretations form parses trees created by the most recent LVGrammar_ParseSentence call.
int LVGrammar_GetNumberOfInterpretations (HGRAMMAR hgram)
Returns the number of interpretations created by the most recent LVGrammar_InterpretParses call.
H_SI LVGrammar_CreateInterpretation (HGRAMMAR hgram, int index)
Returns the semantic interpretation handle indicated by the index
LumenVox SpeechRec API
225
API Functions
LVGrammar_AddRule
Add rules to a grammar object.
Function
int LVGrammar_AddRule(HGRAMMAR hgram, const char* rule_name, const char* rule_definition)
Parameters
hgram
A handle to the grammar.
rule_name
The name of the rule
rule_definition
The definition of the rule
Return Values
LV_SUCCESS
No errors; the rule has been successfully added or removed.
LV_GRAMMAR_SYNTAX_WARNING
The new rule was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The new rule was not understandable to the grammar compiler. You will not be able to decode with this grammar.
Example
Printed Documentation
226
LVGrammar_AddRule(hgram, "foo", "hello [world]");
Is the same as writing a rule:
$foo = hello [world];
Remarks
New rules must be written in ABNF notation. Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LVGrammar_RemoveRule
LVGrammar::AddRule (C++ API)
LumenVox SpeechRec API
227
LVGrammar_SetRoot
Identifies one of the grammar rules as the root rule. The root rule is where the engine starts its search.
Function
int LVGrammar_SetRoot(HGRAMMAR hgram, const char* rule_name)
Parameters
hgram
A handle to the grammar.
rule_name
The name of the rule.
Example
LVGrammar_SetRule(hgram, "foo");
Is the same as writing in a grammar:
root $foo;
See Also
LVGrammar::SetRoot (C++ API)
Printed Documentation
228
LVGrammar_SetMode
Set mode property for the grammar,
Function
int LVGrammar_SetMode(HGRAMMAR hgram, const char* mode)
Parameters
hgram
A handle to the grammar.
mode
The interaction mode of the grammar.
Example
LVGrammar_SetLanguage(hgram, "en-US"); LVGrammar_SetMode(hgram,"voice"); LVGrammar_SetTagFormat(hgram,"lumenvox/1.0");
Is the same as writing in your grammar:
language "en-US; mode "voice"; tag-format <lumenvox/1.0>;
See Also
LVGrammar_GetMode
LVGrammar::SetMode (C++API)
LumenVox SpeechRec API
229
LVGrammar_Create
Creates an empty grammar object, and returns the handle.
Function
HGRAMMAR LVGrammar_Create()
Parameters
Return Values
A handle to the created grammar object.
Remarks
The memory pointed by the returned handle will not be released until the user called LVGrammar_Release explicitly.
See Also
LVGrammar_Release
Printed Documentation
230
LVGrammar_CreateFromCopy
Creates an grammar object by copying another one, and returns the handle.
Function
HGRAMMAR LVGrammar_CreateFromCopy(HGRAMMAR another)
Parameters
another
The grammar object to copy from.
Return Values
A handle to the created grammar object.
Remarks
The memory pointed by the returned handle will not be released until the user called LVGrammar_Release explicitly.
See Also
LVGrammar_Release
LumenVox SpeechRec API
231
LVGrammar_Copy
Copy one grammar object to another.
Function
int LVGrammar_Copy (HGRAMMAR hgram, HGRAMMAR other)
Parameters
hgrammar
Destination grammar object handle.
other
Source grammar object handle.
Return Values
LV_SUCCESS
LV_FAILURE
Remarks
This function doesn't create new objects for the destination handle. So no memory will be allocated. It is users' responsibility to make sure that the object pointed by the destination handle has already been created before calling this function.
See Also
LVGrammar::operator = (C++ API)
Printed Documentation
232
LVGrammar_Reset
Reset a grammar object.
Function
int LVGrammar_Reset (HGRAMMAR Grammar)
Parameters
hgram
The handle to the grammar object to be reset.
Return Values
LV_SUCCESS
LV_FAILURE
See Also
LVGrammar::Reset (C++ API)
LumenVox SpeechRec API
233
LVGrammar_Release
Destroy a grammar object.
Function
void LVGrammar_Release (HGRAMMAR Grammar)
Parameters
hgram
The handle to the grammar object to be released.
Remarks
The grammar object created by LVGrammar_Create and LVGrammar_CreateFromCopy need to be explicitly destroyed by calling LVGrammar_Release.
See Also
LVGrammar_Create
LVGrammar_CreateFromCopy
Printed Documentation
234
LVGrammar_RegisterLoggingCallback
Registers a callback so the object can report warnings and errors to the grammar author via the callback function.
Function
void LVGrammar_RegisterLoggingCallback (HGRAMMAR hgram, GrammarLogCB log, void* userData)
Parameters
hgram
The handle to the grammar object.
log
The logging callback function pointer.
userdata
The pointer to user defined data associated with the grammar object pointed by Grammar. It will be passed into the callback function.
Remarks
The call back function need to have signature defined by GrammarLogCB.
See Also
LVGrammar::RegisterLoggingCallback (C++ API)
LumenVox SpeechRec API
235
LVGrammar_SaveCompiledGrammar
Save a grammar object to a binary file.
Function
int LVGrammar_SaveCompiledGrammar (HGRAMMAR hgram, const char* filename)
Parameters
hgram
The handle to a grammar object.
filename
File name.
Return Values
LV_SUCCESS
LV_FAILURE
Remarks
The saved compiled grammar can be later loaded into a grammar object with LVGrammar_LoadCompiledGrammar.
See Also
LVGrammar_LoadCompiledGrammar
LVGrammar::SaveCompiledGrammar (C++ API)
Printed Documentation
236
LVGrammar_LoadCompiledGrammar
Load a grammar object from a binary file previously saved by LVGrammar_SaveCompiledGrammar.
Function
int LVGrammar_LoadCompiledGrammar (HGRAMMAR hgram, const char* filename)
Parameters
hgram
The handle to a grammar object.
filename
File name.
Return Values
LV_SUCCESS
LV_FAILURE
See Also
LVGrammar_SaveCompiledGrammar
LVGrammar::LoadCompiledGrammar (C++ API)
LumenVox SpeechRec API
237
LVGrammar_LoadGrammar
Loads a grammar from a local file or remote file via http or ftp. Grammar can be written in ABNF or XML notations.
Function
int LVGrammar_LoadGrammar(HGRAMMAR hgram, const char* grammar_location)
Parameters
hgram
Handle to a grammar object.
gram_location
A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"
Return Values
LV_SUCCESS
No errors; this grammar is now ready for use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
Printed Documentation
238
Remarks
Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LVGrammar::LoadGrammar (C++ API)
LumenVox SpeechRec API
239
LVGrammar_LoadGrammarFromBuffer
Loads a grammar from a null terminated string buffer. Grammar can be written in ABNF or XML notations.
Function
int LVGrammar_LoadGrammarFromBuffer(HGRAMMAR hgram, const char* grammar_contents);
Parameters
hgram
Handle to a grammar object.
gram_contents
A null terminated string containing the contents of a valid SRGS grammar.
Return Values
LV_SUCCESS
No errors; this grammar is now ready for use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
Remarks
Printed Documentation
240
Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LVGrammar::LoadGrammarFromBuffer (C++ API)
LumenVox SpeechRec API
241
LVGrammar_RemoveRule
Remove rules to a grammar object.
Function
int LVGrammar_RemoveRule(HGRAMMAR hgram, const char* rule_name)
Parameters
hgram
A handle to the grammar.
rule_name
The name of the rule
Return Values
LV_SUCCESS
No errors; the rule has been successfully added or removed.
LV_GRAMMAR_SYNTAX_WARNING
The new rule was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The new rule was not understandable to the grammar compiler. You will not be able to decode with this grammar.
Remarks
Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
Printed Documentation
242
LVGrammar_AddRule
LVGrammar::RemoveRule (C++ API)
LumenVox SpeechRec API
243
LVGrammar_SetLanguage
Set language for the grammar,
Function
int LVGrammar_SetLanguage(HGRAMMAR hgram, const char* language)
Parameters
hgram
A handle to the grammar.
language
The language identifier for the grammar
Example
LVGrammar_SetLanguage(hgram, "en-US"); LVGrammar_SetMode(hgram,"voice"); LVGrammar_SetTagFormat(hgram,"lumenvox/1.0");
Is the same as writing in your grammar:
language "en-US; mode "voice"; tag-format <lumenvox/1.0>;
See Also
LVGrammar_GetLanguage
LVGrammar::SetLanguage (C++ API)
Printed Documentation
244
LVGrammar_SetTagFormat
Set interpretation tag format of the grammar.
Function
int LVGrammar_SetTagFormat(HGRAMMAR hgram, const char* tag_format)
Parameters
hgram
A handle to the grammar.
tag_format
The grammar's tag format.
Example
LVGrammar_SetLanguage(hgram, "en-US"); LVGrammar_SetMode(hgram,"voice"); LVGrammar_SetTagFormat(hgram,"lumenvox/1.0");
Is the same as writing in your grammar:
language "en-US; mode "voice"; tag-format <lumenvox/1.0>;
See Also
LVGrammar_GetTagFormat
LVGrammar::SetTagFormat (C++ API)
LumenVox SpeechRec API
245
LVGrammar_GetMode
Return the mode setting for the grammar,
Function
const char* LVGrammar_GetMode(HGRAMMAR hgram)
Parameters
hgram
A handle to the grammar.
Return Values
The interaction mode of the grammar.
See Also
LVGrammar_SetMode
LVGrammar::GetMode (C++API)
Printed Documentation
246
LVGrammar_GetLanguage
Return the language setting for the grammar,
Function
const char* LVGrammar_GetLanguage(HGRAMMAR hgram)
Parameters
hgram
A handle to the grammar.
Return Values
The language identifier of the grammar.
See Also
LVGrammar_SetLanguage
LVGrammar::GetLanguage (C++API)
LumenVox SpeechRec API
247
LVGrammar_GetTagFormat
Return the interpretation tag format setting for the grammar,
Function
const char* LVGrammar_GetTagFormat(HGRAMMAR hgram)
Parameters
hgram
A handle to the grammar.
Return Values
The tag format of the grammar.
See Also
LVGrammar_SetTagFormat
LVGrammar::GetTagFormat (C++API)
Printed Documentation
248
LVGrammar_GetNumberOfMetaData
Return the number of meta data contained in the grammar.
Function
int LVGrammar_GetNumberOfMetaData(HGRAMMAR hgram)
Parameters
hgram
A handle to the grammar.
Example
If the grammar contains the following lines:
meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';
You can access meta data as follows:
int count = LVGrammar_GetNumberOfMetaData(grammar); // returns 2 const char* key = LVGrammar_GetMetaDataKey(grammar, 0); // returns "description" const char* value = LVGrammar_GetMetaDataValue(grammar, 1); // returns "05/12/2005"
See Also
LVGrammar_GetMetaDataKey
LVGrammar_GetMetaDataValue
LVGrammar::GetNumberOfMetaData (C++ API)
LumenVox SpeechRec API
249
LVGrammar_GetMetaDataKey
Return the key of the meta data indicated by the index.
Function
int LVGrammar_GetMetaDataKey(HGRAMMAR hgram, int index)
Parameters
hgram
A handle to the grammar.
index
Index of the meta data. It should be in the range [0, LVGrammar_GetNumberOfMetaData).
Return Values
null
The index is not valid.
non-null
A pointer to the value string.
Example
If the grammar has following lines:
meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';
You can access meta data as follows:
int count = LVGrammar_GetNumberOfMetaData(grammar); // returns 2 const char* key = LVGrammar_GetMetaDataKey(grammar, 0); // returns "description" const char* value = LVGrammar_GetMetaDataValue(grammar, 1); // returns "05/12/2005"
Printed Documentation
250
See Also
LVGrammar_GetNumberOfMetaData
LVGrammar_GetMetaDataValue
LVGrammar::GetMetaDataKey (C++ API)
LumenVox SpeechRec API
251
LVGrammar_GetMetaDataValue
Return the value of the meta data indicated by the index.
Function
int LVGrammar_GetMetaDataValue(HGRAMMAR hgram, int index)
Parameters
hgram
A handle to the grammar.
index
Index of the meta data. It should be in the range [0, LVGrammar_GetNumberOfMetaData).
Return Values
null
The index is not valid.
non-null
A pointer to the value string.
Example
If the grammar has following lines:
meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';
You can access meta data as follows:
int count = LVGrammar_GetNumberOfMetaData(grammar); // returns 2 const char* key = LVGrammar_GetMetaDataKey(grammar, 0); // returns "description" const char* value = LVGrammar_GetMetaDataValue(grammar, 1); // returns "05/12/2005"
Printed Documentation
252
See Also
LVGrammar_GetNumberOfMetaData
LVGrammar_GetMetaDataKey
LVGrammar::GetMetaDataValue (C++ API)
LumenVox SpeechRec API
253
LVGrammar_ParseSentence
Use a loaded grammar object to parse a sentence.
Function
int LVGrammar_ParseSentence(HGRAMMAR hgram, const char* sentence)
Parameters
hgram
A handle to the grammar.
sentence
The sentence to parse.
Return Values
0
The sentence is not covered by the grammar.
non-0
The number of distinct parses.
Example
Assume a grammar was defined as:
root $yes_no; $yes_no = $yes | $no; $yes = yes [please]; $no = no [thank you];
You can use this grammar to validate sentences as follows:
int count = LVGrammar_ParseSentence(grammar, "no thank you"); // returns 1 int count = LVGrammar_ParseSentence(grammar, "no thanks"); // returns 0
Printed Documentation
254
Remarks
With this function, you can identify how well a grammar covers your targeted transcript set.
See Also
LVGrammar_GetNumberOfParses
LVGrammar_CreateParseTree
LVGrammar::ParseSentence (C++ API)
LumenVox SpeechRec API
255
LVGrammar_GetNumberOfParses
Return the number of parses created by the most recent call of LVGrammar_ParseSentence.
Function
int LVGrammar_GetNumberOfParses(HGRAMMAR hgram)
Parameters
hgram
A handle to the grammar.
Return Values
0
The sentence is not covered by the grammar.
non-0
The number of distinct parses.
Remarks
This function can be used after a call to LVGrammar_ParseSentence. It is merely a convenience, as it returns the save value as the return value for LVGrammar_ParseSentence.
See Also
LVGrammar_ParseSentence
LVGrammar_CreateParseTree
LVGrammar::NumberOfParses (C++ API)
Printed Documentation
256
LVGrammar_CreateParseTree
Return the parse tree handle with the specified index.
Function
H_PARSE_TREE LVGrammar_CreateParseTree(HGRAMMAR hgram, int index)
Parameters
hgram
A handle to the grammar.
index
The index of the parse tree handle to be returned. It should be in the range [0, LVGrammar_GetNumberOfParses).
Return Values
null
The index is not valid.
non-null
The parse tree handle.
Remarks
This function should be used after a call to LVGrammar_ParseSentence.
If the returned handle is not null, you need to call LVParseTree_Release to destroy the parse tree object pointed by the handle.
See Also
LVGrammar_ParseSentence
LVGrammar_GetNumberOfParses
LumenVox SpeechRec API
257
LVGrammar::GetParseTree (C++ API)
Printed Documentation
258
LVGrammar_InterpretParses
Generate semantic interpretation results from the parse trees generated by the previous call to LVGrammar_ParseSentence.
Function
int LVGrammar_InterpretParses(HGRAMMAR hgram)
Parameters
hgram
A handle to a grammar.
Return Values
integer (>=0)
Number of available interpretations.
Remarks
Before passing a grammar object handle to this function, you should call LVGrammar_ParseSentence using that handle. Otherwise, that handle doesn't contain any parse tree information.
See Also
LVGrammar_ParseSentence
LVGrammar_GetNumberOfInterpretations
LVGrammar_CreateInterpretation
LVGrammar::InterpretParses (C++ API)
LumenVox SpeechRec API
259
LVGrammar_GetNumberOfInterpretations
Return the number of semantic interpretations created by the most recent call to LVGrammar_InterpretParses.
Function
int LVGrammar_GetNumberOfInterpretations(HGRAMMAR hgram)
Parameters
hgram
A handle to the grammar.
Return Values
integer (>=0)
Number of available interpretations.
Remarks
This function can be used after a call to LVGrammar_InterpretParses. It is merely a convenience, as the return value of LVGrammar_InterpretParses provides the same information.
See Also
LVGrammar_InterpretParses
LVGrammar_CreateInterpretation
LVGrammar::GetNumberOfInterpretations (C++ API)
Printed Documentation
260
LVGrammar_CreateInterpretation
Returns the semantic interpretation handle indicated by the index.
Function
H_SI LVGrammar_CreateInterpretation (HGRAMMAR hgram, int index)
Parameters
hgram
A handle to the grammar.
index
The index of the interpretation handle to be returned. It should be in the range [0, LVGrammar_GetNumberOfInterpretations).
Return Values
null
The index is not valid.
non-null
The interpretation handle.
Remarks
This function should be used after a call to LVGrammar_InterpretParses. A non-null interpretation handle needs to be released after you are done using it, by calling LVInterpretation_Release
See Also
LVGrammar_InterpretParses
LVGrammar_GetNumberOfInterpretations
LVGrammar::GetInterpretation (C++ API)
LumenVox SpeechRec API
261
LVSpeechPort Class
class LVSpeechPort
An LVSpeechPort Object represents one Speech Recognition Port and processes its sound data into text; all port instances can process their data in parallel. If the client application is multi-threaded, every thread that needs to process audio data should have its own LVSpeechPort.
Each port has multiple voice channels and grammar sets.
Each voice channel holds raw audio data. Before processing any data, the client application must call LoadVoiceChannel to load the channel. The channel keeps its own copy of this sound data, so the client application can free its copy after the call to LoadVoiceChannel. The voice channel will store the data until the client application loads new data into the channel. This allows the client application to decode the same sound data against different grammars without reloading the data.
The Decode method processes a voice channel against a grammar set, returning the concepts from the grammar set recognized in the channel’s audio data. Multiple voice channels are provided as a convenience, but only one voice channel can decode concurrently per port.
Use <LVSpeechPort.h>
Constructor/Destructors
LVSpeechPort Constructs an LVSpeechPort object.
~LVSpeechPort Closes the speech port object and releases its resources.
Functions
Printed Documentation
262
OpenPort Opens the speech port and initializes the SRE.
ClosePort Closes the port, and releases its resources.
Decode Processes the voice channel audio data against the active grammar.
ReturnErrorString Returns a description of an error code.
SetProperty Sets various properties on the port.
SetPropertyEx Sets various properties on various scopes.
SetClientPropertyEx Sets various properties on client process level. (static)
WaitForDecode Blocks the client application until the decode is finished.
WaitForEngineToIdle Blocks the client application until the port is idle (not decoding).
AddPhrase Adds a phrase to a new or existing concept.
GetConcept Returns one concept found in the last call to Decode.
GetConceptScore Returns the confidence score of a concept found in the last call to Decode.
GetNumberOfConceptsReturned Returns the number of concepts found in the last call to Decode.
GetPhonemesDecoded Returns the actual phonemes found in the
LumenVox SpeechRec API
263
last call to Decode.
GetPhraseDecoded Returns the decoded phrase (with BNF formatting) found in the last call to Decode.
GetRawTextDecoded Returns the decoded raw text (without BNF formatting) found in the last call to Decode.
GetVoiceChannelData Returns the (original) preprocessed audio data for the voice channel.
LoadStandardGrammar Loads a standard, pre-defined grammar to easily recognize and format numbers, monetary figures or digits.
LoadVoiceChannel Loads the audio data into the specified voice channel prior to a call to Decode (which decodes the audio data).
RemoveConcept Removes a concept and all of its phrases.
ResetGrammar Removes all concepts from a grammar.
StreamStart Sets up a new stream.
StreamSendData Send data buffer of sound data to stream.
StreamGetStatus Returns status of stream.
StreamGetLength Returns length of sound data in stream buffer.
StreamSetStateChangeCallBack Set up a call back to receive state change notification of a stream.
StreamStop Stops stream and loads sound channel
Printed Documentation
264
with streamed data.
StreamCancel Stops stream, sound data is discarded.
StreamSetParameter Sets a new value for a stream property.
StreamGetParameter Gets the current value of a stream property.
StreamSetParameterToDefault Sets a stream property to its default value.
LoadGrammar functions Loads and compiles an SRGS grammar
UnloadGrammar functions Unloads a grammar from the speech port.
IsGrammarLoaded Checks if a grammar has already been compiled and loaded into port.
ActivateGrammar functions Activates an SRGS grammar for decoding
DeactivateGrammar functions Removes a grammar from the active grammar set.
GetNumberOfParses Returns the number of parses generated by the decode, according to the active grammars.
GetParseTree Returns a Parse Tree result.
GetParseTreeString Returns a string representation of the parse tree.
GetNumberOfInterpretations Returns the number of interpretations generated by the decode + semantic interpretation process.
LumenVox SpeechRec API
265
GetInterpretation Returns an interpretation result.
GetInterpretationString Returns an XML snippet representation of the interpretation result.
GetNumberOfNBestAlternatives Returns number of n-best alternatives found by the engine.
SwitchToNBestAlternative Set the n-best alternative that is viewable.
Constants
Error Codes Error codes returned by methods.
Properties Property settings for the port.
Sound Formats Sound data format constants.
Standard Grammars Build-in grammar constants.
Printed Documentation
266
Methods
LVSpeechPort::LVSpeechPort
Constructs an LVSpeechPort object.
LVSpeechPort(void);
Remarks
Does not automatically open the port.
LumenVox SpeechRec API
267
LVSpeechPort::~LVSpeechPort
Closes the speech port object and releases its resources.
~LVSpeechPort(void)
See Also
ClosePort
Printed Documentation
268
LVSpeechPort::OpenPort
Opens the speech port and initializes the Speech Engine.
int OpenPort(ExportLogMsg Log, void* p, int verbosity);
Return Values
LV_SUCCESS
No errors; the port initialized successfully,
LV_FAILURE
Licensing has been exceeded. There are too many LVSpeechPorts active.
LV_SYSTEM_ERROR
The port is already open.
Parameters
Log
Pointer to a function which receives logging information from the LVSpeechPort instance.
p
A pointer to client application-defined data.
verbosity
range: 0 - 6
0 - minimal logging info
6 - maximum logging info
Remarks
LumenVox SpeechRec API
269
This method activates the speech port object. The recognition engine will begin initializing when this function is called. Control will return to the application immediately.
p is passed into the ExportLogMsg function to enable client-application-defined behavior.
See Also
Logging Callback Function
ClosePort
LV_SRE_OpenPort
Printed Documentation
270
LVSpeechPort::GetOpenPortStatus
Returns a detailed code about the results of opening the speech port.
LVSpeechPort::GetOpenPortStatus( );
Return Values
LV_SUCCESS
The port opened successfully
LV_NO_SERVER_RESPONDING or LV_OPEN_PORT_FAILED__PRIMARY_SERVER_NOT_RESPONDING
The client could not find a server to request a licensed port from.
LV_OPEN_PORT_FAILED__LICENSES_SUCCEEDED
The primary server has too many ports connected for the number of licenses it has to give out.
See Also
OpenPort
ClosePort
LV_SRE_OpenPort
LumenVox SpeechRec API
271
LVSpeechPort::ClosePort
Closes the port, and releases its resources.
int ClosePort(void);
Return Values
LV_SUCCESS
No errors; the port has successfully shutdown.
LV_FAILURE
The port was unable to shutdown.
LV_INVALID_HPORT
The port was never successfully opened, or was already closed.
Note:
Frees this port from counting against the number of ports allowed by your license. Close every port not needed anymore.
See Also
OpenPort
LV_SRE_ClosePort
Printed Documentation
272
LVSpeechPort::Decode
Processes the voice channel audio data against the active grammar.
int Decode(int VoiceChannel, int grammarset, unsigned int flags = 0);
Return Values
Zero (0) or greater indicates success.
A negative result indicates a specific error.
Parameters
VoiceChannel
The voice channel to process.
GrammarSet
The grammar set to process.
Flags (bitwise OR flags to set desired options)
LV_DECODE_BLOCK - Decode will not return until it has finished.
LV_DECODE_GENDER_MALE - Gender identifier.
LV_DECODE_GENDER_FEMALE – Gender identifier.
LV_DECODE_FIRST_TIME_USER – Reset caller weights in Recognition Engine (not implemented).
LV_DECODE_USE_OOV - Use the Out-Of-Vocabulary filter (OOV) during decode.
Remarks
If LV_DECODE_BLOCK is set, Decode will not return until it has finished processing the data.
LumenVox SpeechRec API
273
If LV_DECODE_BLOCK is not set, Decode returns immediately (but continues processing the data on a separate thread); the client application can continue its own work. Calling other LVSpeechPort methods may block until the Decode is finished. Once the client application is ready to check for results, call either 1) GetNumberOfConceptsReturned, or 2) WaitForEngineToIdle and then GetNumberOfConceptsReturned. WaitForEngineToIdle will only wait for a specified time, and returns regardless of whether Decode is finished, where GetNumberOfConceptsReturned will block until Decode is finished.
LV_DECODE_GENDER_FEMALE and LV_DECODE_GENDER_MALE identify which gender acoustic model to use. If these flags are not specified, the engine automatically decodes each audio file against both gender models. While this slows the engine by requiring two decodes, evaluating against both models has a very significant positive effect on recognition accuracy. Since the engine is multithreaded, unless CPU loads are a serious issue, do not use these flags.
On an error, call ReturnErrorString with the negative result from Decode to get a description of the error.
See Also
LV_SRE_Decode
Printed Documentation
274
LoadGrammar functions
Before you can use a grammar, you must load it into the speech port's collection of grammars, or you must load it into the collection of application-level (global) grammars. When you load a grammar, it is compiled for use in the LumenVox Speech Engine.
These functions load an SRGS grammar that will be usable by a single speech port object.
Functions
int LoadGrammar(const char* gram_name, const char* gram_location);
int LoadGrammar(int gram_name, const char* gram_location);
int LoadGrammarFromBuffer(const char* gram_name, const char* gram_contents);
int LoadGrammarFromBuffer(int gram_name, const char* gram_contents);
int LoadGrammarFromObject(const char* gram_name, LVGrammar& gram_obj);
int LoadGrammarFromObject(int gram_name, LVGrammar& gram_obj);
Parameters
gram_name
The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use. This can be a string, or an integer ID. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.
gram_location
A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"
gram_contents
LumenVox SpeechRec API
275
A null terminated string containing the contents of a valid SRGS grammar file.
gram_obj
An LVGrammar object.
Return Values
LV_SUCCESS
No errors; this grammar is now ready for use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
Remarks
Detailed error and warning messages are sent to the speech port's logging callback function at priorities 0 and 1, respectively.
See Also
LVSpeechPort::UnloadGrammar functions
LVSpeechPort::IsGrammarLoaded functions
LVSpeechPort::LoadGlobalGrammar functions
Printed Documentation
276
LV_SRE_LoadGrammar functions (C API)
LumenVox SpeechRec API
277
LoadGlobalGrammar functions
When loading a global grammar, the grammar will be sent to the server. And all following decode requests only contain global grammar ID's, instead of the actual grammars, to avoid network transportation overhead on large grammars.
A global grammar is associated with the client process that loads that grammar. All speech ports that are belong to that client have access to that global grammar. However, different client processes don't share global grammars with each other.
Generally, the lifetime of a global grammar is controlled by load and unload functions. However, in the case that users terminate client process without unloading global grammars, in order to release un-used global grammars, the server periodically checks if the client process is still alive. Once the server detected that a client process has been inactive for more than 10 minutes, it will remove all grammars associated with that client process.
In multi-threaded program, it is safe to access global grammars in read-only fashion on multiple threads simultaneously. For instance, querying whether a global grammar is loaded, or calling decode with global grammars. In the case that loading or unloading takes place, such as unloading a global grammar while decoding on another thread with that grammar, it is users' responsibility to prevent racing from happening.
Functions
static int LoadGlobalGrammar (const char* gram_name, const char* gram_location);
static int LoadGlobalGrammarFromBuffer (const char* gram_name, const char* gram_contents);
static int LoadGlobalGrammarFromObject (const char* gram_name, LVGrammar& gram_obj);
Parameters
gram_name
The identifier for the grammar being loaded. Whenever you activate, deactivate, or unload, this is the identifier you will use.
gram_location
Printed Documentation
278
A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"
gram_contents
A null terminated string containing the contents of a valid SRGS grammar file.
gram_obj
An LVGrammar object.
Return Values
LV_SUCCESS
No errors; this grammar is now ready to use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready for use.
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
LV_GLOBAL_GRAMMAR_TRANSACTION_ERROR
Fail to send the grammar to all servers.
LV_GLOBAL_GRAMMAR_TRANSACTION_PARTIAL_ERROR
LumenVox SpeechRec API
279
Fail to send the grammar to some of the servers.
Remarks
Detailed error and warning messages are sent to the LVSpeechPort application-level logging callback function at priorities 0 and 1, respectively.
Users can load the same grammar with different labels. That will only create one instance of that grammar on the server.
See Also
LVSpeechPort::LoadGrammar functions
LVSpeechPort::IsGlobalGrammarLoaded functions
LVSpeechPort::UnloadGlobalGrammar functions
LV_SRE_LoadGlobalGrammar functions (C API)
Printed Documentation
280
UnloadGrammar functions
These functions remove a loaded grammar from a speech port object. The last function removes all loaded grammars from the speech port.
Functions
int UnloadGrammar(const char* gram_name);
int UnloadGrammar(int gram_name);
void UnloadGrammars();
Parameters
gram_name
The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it. It can be a null terminated string, or an integer.
Return Values
LV_SUCCESS
No errors; this grammar is removed.
LV_FAILURE
The grammar was not present. Nothing was removed.
Remarks
Grammars that were activated and then unloaded are still active; they must be explicitly deactivated.
See Also
LVSpeechPort::IsGrammarLoaded functions
LVSpeechPort::UnloadGlobalGrammar functions
LumenVox SpeechRec API
281
LVSpeechPort::LoadGrammar functions
LV_SRE_UnloadGrammar functions (C API)
Printed Documentation
282
UnloadGlobalGrammar functions
These functions remove a loaded grammar from the application-level set of grammars. The second function removes all application-level grammars.
Functions
static int UnloadGlobalGrammar(const char* gram_name);
static void UnloadGlobalGrammars( );
Parameters
gram_name
The identifier for the grammar being unloaded. This is the same identifier you gave the grammar when you loaded it.
Return Values
LV_SUCCESS
No errors; this grammar is removed.
LV_FAILURE
The grammar was not present. Nothing was removed.
Remarks
A global grammar is unloaded on the server only when users have called unload functions on all labels that are associated with the grammar.
See Also
LVSpeechPort::UnloadGrammar functions
LVSpeechPort::IsGlobalGrammarLoaded functions
LVSpeechPort::LoadGlobalGrammar functions
LV_SRE_UnloadGlobalGrammar functions (C API)
LumenVox SpeechRec API
283
IsGrammarLoaded functions
Functions
bool IsGrammarLoaded(const char* gram_name);
bool IsGrammarLoaded(int gram_name);
Parameters
gram_name
The identifier for the grammar being queried. This is the same identifier you gave the grammar when you loaded it.
Return Values
1 if a grammar was found with the label gram_name in the space of application-level grammars; 0 otherwise.
Remarks
Note: This function only tells you if a grammar with the name gram_name is loaded. It does not tell you if there are two identical grammar bodies loaded.
See Also
LVSpeechPort::UnloadGrammar functions
LVSpeechPort::IsGlobalGrammarLoaded functions
LVSpeechPort::LoadGrammar functions
LV_SRE_IsGrammarLoaded functions (C API)
Printed Documentation
284
IsGlobalGrammarLoaded
Function
static bool IsGlobalGrammarLoaded(const char* gram_name);
Parameters
gram_name
The identifier for the grammar being queried. This is the same identifier you gave the grammar when you loaded it.
Return Values
true if a grammar was found with the label gram_name in the space of application-level grammars; false otherwise.
Remarks
Note: This function only tells you if a grammar with the name gram_name is loaded. It does not tell you if there are two identical grammar bodies loaded.
See Also
LVSpeechPort::UnloadGlobalGrammar functions
LVSpeechPort::IsGrammarLoaded functions
LVSpeechPort::LoadGlobalGrammar functions
LV_SRE_IsGlobalGrammarLoaded (C API)
LumenVox SpeechRec API
285
ActivateGrammar functions
If you wish to use a speech port's loaded SRGS grammar for decode, you need to activate it. Activating a grammar puts it in the multi-grammar grammarset called LV_ACTIVE_GRAMMAR_SET. The grammars that were activated can then be used for a decode by specifying LV_ACTIVE_GRAMMAR_SET as the grammarset parameter in a call to Decode, or by setting the STREAM_PARM_GRAMMAR_SET equal to the LV_ACTIVE_GRAMMAR_SET before calling StreamStart. The reason for this mechanism is to maintain backward compatibility with previous APIs.
When ActivateGrammar is called, first the grammar is searched for among the grammars in the speech port's loaded grammars. If it can not be found there, the collection of application level grammars is searched. If you wish to explicitly activate an application level grammar, use ActivateGlobalGrammar
Functions
int ActivateGrammar(const char* gram_name);
int ActivateGrammar(int gram_name);
Parameters
gram_name
The identifier for the grammar being activated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.
Return Values
LV_SUCCESS
No errors; this grammar is now active.
LV_GRAMMAR_LOADING_ERROR
This grammar could not be activated, because it was not found in the speech port's set of loaded grammars.
Remarks
Printed Documentation
286
Detailed error and warning messages are sent to the speech port's logging callback function at priorities 0 and 1, respectively.
See Also
LV_SRE_DeactivateGrammar functions
LVSpeechPort::ActivateGlobalGrammar
LV_SRE_ActivateGrammar functions (C API)
LumenVox SpeechRec API
287
DeactivateGrammar functions
These functions remove a grammar from the set of active grammars. The last function clears the active grammar set
Functions
int DeactivateGrammar(const char* gram_name);
int DeactivateGrammar(int gram_name);
int DeactivateGrammars();
Parameters
hport
The handle of the speech port for which you are activating the grammar.
gram_name
The identifier for the grammar being deactivated. This is the same identifier that was given to the grammar when it was loaded. This can be a string, or an integer ID. The string "123" and the integer 123 are identical labels. Integer names are provided for backward compatibility.
Return Values
LV_SUCCESS
No errors; this grammar is no longer active.
LV_FAILURE
This grammar could not be deactivated, because it was never successfully activated.
See Also
LVSpeechPort::ActivateGrammar functions
LV_SRE_DeactivateGrammar functions (C++ API)
Printed Documentation
288
GetNumberOfInterpretations
Returns the number of semantic interpretation results that were generated by the previous decode.
Function
int GetNumberOfInterpretations(int voicechannel)
Parameters
voicechannel
The audio channel holding the decoded audio.
See Also
LVSpeechPort::GetInterpretation
LVSpeechPort::GetInterpretationString
LV_SRE_GetNumberOfInterpretations (C API)
LumenVox SpeechRec API
289
GetInterpretation
Returns an LVInterpretation object representing the results of the semantic interpretation process.
Function
LVInterpretation GetInterpretation (int voicechannel, int index)
Parameters
voicechannel
The channel that the decode took place on.
index
An utterance could give rise to multiple interpretations, particularly if the grammars involved are ambiguous. index ranges from 0 to GetNumberOfInterpretations - 1.
Return Value
The return type is an interpretation object. The object is a representation of the ECMAScript object made by the matching grammar, using the Semantic Interpretation for Speech Recognition process. It also contains additional information such as the confidence score, matching grammar label, and the input sentence.
See Also
LVSpeechPort::GetNumberOfInterpretations
LVSpeechPort::GetInterpretationString
LVInterpretation C++ API
LV_SRE_CreateInterpretation (C API)
Printed Documentation
290
LVSpeechPort::GetNumberOfParses
Returns the number of parse trees that were generated by the previous decode.
Function
int GetNumberOfParses(int voicechannel)
Parameters
voicechannel
The audio channel holding the decoded audio.
See Also
LVSpeechPort::GetParseTree
LVSpeechPort::GetParseTreeString
Parse Tree Introduction
LV_SRE_GetNumberOfParses (C++ API)
LumenVox SpeechRec API
291
LVSpeechPort::GetParseTree
Provides the user with an LVParseTree object representing the sentence structure of what was decoded by the Speech Engine, according to the active grammars.
Function
LVParseTree GetParseTree(int voicechannel, int index)
Parameters
voicechannel
The audio channel containing the input audio
index
It is possible to have more than one parse tree for an utterance (for instance if the grammar is ambiguous); this is the index of the tree
Return Value
A parse tree.
Remark
Logically, a parse tree and the parse string returned to the user are the same. However, an LVParseTree object makes it easy to search the parse tree for useful information.
See Also
LVSpeechPort::GetNumberOfParses
LVSpeechPort::GetParseTreeString
Parse Tree Introduction
LVParseTree C++ API
LV_SRE_CreateParseTree (C API)
Printed Documentation
292
LVSpeechPort::GetParseTreeString
Provides the user with a string representation of a speech parse tree.
Function
const char* GetParseTreeString(int voicechannel, int index)
Parameters
voicechannel
The audio channel containing the input audio
index
It is possible to have more than one parse tree possibility (for instance if the grammar is ambiguous); this is the index of the tree
Remark
Logically, a speech parse tree and the parse string returned to the user are the same. However, a speech parse tree makes it easy to search the parse tree for useful information. The parse tree string is based on the examples provided by the W3C SRGS specification
See Also
LVSpeechPort::GetNumberOfParses
LVSpeechPort::GetParseTree
Parse Tree Introduction
LV_SRE_GetParseTreeString (C API)
LumenVox SpeechRec API
293
LVSpeechPort::GetNumberOfConceptsReturned
Returns the number of concepts found in the last call to Decode.
int GetNumberOfConceptsReturned(int VoiceChannel);
Return values
The number of concepts found for this voice channel.
Parameters
VoiceChannel
The voice channel processed by Decode.
See Also
LV_SRE_GetNumberOfConceptsReturned
Printed Documentation
294
LVSpeechPort::GetConcept
Returns one concept found in the last call to Decode.
const char* GetConcept(int VoiceChannel, int Index);
Return Values
A null-terminated string representing the matched concept .
NULL indicates that Index was outside the possible range.
Parameters
VoiceChannel
The voice channel processed by Decode.
Index
The recognition position of the concept, between 0 and (GetNumberOfConceptsReturned - 1), inclusive.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the concepts highlighted:
See Also
LV_SRE_GetConcept
LumenVox SpeechRec API
295
LVSpeechPort::GetConceptScore
Returns the confidence score of a concept found in the last call to Decode.
int GetConceptScore(int VoiceChannel, int Index);
Return Values
The confidence score of the matched concept. The range of possible values is 0 to 1000.
Parameters
VoiceChannel
The voice channel processed by Decode.
Index
The recognition position of the concept, between 0 and (GetNumberOfConceptsReturned - 1), inclusive.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the scores highlighted:
See Also
LV_SRE_GetConceptScore
Printed Documentation
296
LVSpeechPort::LVGetPhonemesDecoded
Returns the actual phonemes found in a call to Decode.
const char* GetPhonemesDecoded(int VoiceChannel, int Index);
Return Values
A null-terminated static string of the decoded phonemes.
Parameters
VoiceChannel
The voice channel to process.
Index
The recognition position of the decoded phonemes.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phonemes highlighted:
See Also
GetPhraseDecoded
GetRawTextDecoded
LV_SRE_GetPhonemes
LumenVox SpeechRec API
297
LVSpeechPort::GetPhraseDecoded
Returns the decoded phrase (with BNF formatting) found in the last call to Decode.
const char* GetPhraseDecoded(int VoiceChannel, int Index);
Return Values
A null-terminated string representing the decoded string.
Parameters
VoiceChannel
The voice channel to process.
Index
The recognition position of the decoded phrase.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the phrases highlighted:
The main difference between LVSpeechPort::GetPhraseDecoded and LVSpeechPort::GetRawTextDecoded is in BNF formatting. LVSpeechPort::GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. LVSpeechPort::GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, LVSpeechPort::GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.
See Also
Printed Documentation
298
GetPhonemesDecoded
GetRawTextDecoded
LV_SRE_GetPhraseDecoded
LumenVox SpeechRec API
299
LVSpeechPort::GetRawTextDecoded
Returns the decoded raw text (without BNF formatting) found in the last call to Decode.
const char* GetRawTextDecoded(HPORT hport,int VoiceChannel, int Index);
Return Values
A null-terminated string representing the decoded raw text.
Parameters
VoiceChannel
The voice channel to process.
Index
The recognition position of the decoded raw text.
Remarks
Assuming the speaker said "Violet" and the grammar contained the concepts under Concept, and the grammar under Phrase, the Speech Engine might return the raw text highlighted:
The main difference between GetPhraseDecoded and GetRawTextDecoded is in BNF formatting. GetPhraseDecode returns the decoded phrase, as it is entered into the grammar. If the phrase contains BNF formatting, with selections, options, grouping, etc., than the return value preserves that formatting. GetRawTextDecoded returns the decode phrase, after BNF formatting has been removed. Thus, GetRawTextDecoded will return the phrase as a list of the words actually recognized, rather than the phrase as it was entered into the grammar.
See Also
Printed Documentation
300
GetPhonemesDecoded
GetPhraseDecoded
LV_SRE_GetRawTextDecoded
LumenVox SpeechRec API
301
LVSpeechPort::GetVoiceChannelData
Sets the pointers to the voice channel's copy of the original preprocessed audio data.
int GetVoiceChannelData(int VoiceChannel, short** PCM, unsigned int* Samples);
Return Values
LV_SUCCESS
No errors; PCM and Samples have been successfully set.
LV_SOUND_CHANNEL_OUT_OF_RANGE
The grammar set specified is outside the valid range; possible values are 0-63, inclusive.
LV_BAD_HPORT
The Speech Engine is no longer running. This is the result of a ClosePort call or a unrecoverable Speech Engine error.
Parameters
VoiceChannel
The voice channel to process.
PCM
A pointer to a pointer to set to the post-processed audio data.
Samples
A pointer to an integer to set the number of samples.
See Also
LV_SRE_GetVoiceChannelData
Printed Documentation
302
LVSpeechPort::LoadStandardGrammar
Standard Grammars are deprecated in favor of SRGS built-in grammars
Loads a standard, pre-defined grammar to easily recognize and format numbers, monetary figures or digits.
int LoadStandardGrammar(int GrammarSet, int StdGrammar);
Return Values
LV_SUCCESS
No errors; the standard grammar is loaded.
LV_STANDARD_GRAMMAR_OUT_OF_RANGE
The standard grammar value is not a recognized grammar type.
LV_GRAMMAR_SET_OUT_OF_RANGE
The GrammarSet value is out of range.
Parameters
GrammarSet
Which grammar set this phrase is being added to. Possible value range 0 - 63.
StandardGrammar
The standard grammars are:
1. GRAMMAR_DIGITS String of single digits like a phone number or pin code.
2. GRAMMAR_MONEY Monetary value (only implemented for SRGS decodes).
LumenVox SpeechRec API
303
3. GRAMMAR_NUMERIC Numeric value like 12,000, 24.45, or 35).
4. GRAMMAR_SPELLING Alphabet letters for spelling (not implemented).
5. GRAMMAR_ALPHA_NUMERIC (Not implemented).
6. GRAMMAR_DATE Date values (only implemented for SRGS decodes).
7. GRAMMAR_NONE Clears out the standard grammar, without clearing out any phrases that were added. ResetGrammar( ) will clear out the entire grammar.
Remarks
The client application can load only one standard grammar, but can add any number of concepts with AddPhrase. This is not true, however, if you use SRGS grammars. The correct way to augment as standard SRGS grammar is to load a grammar to a different location, and then activate both. When a standard grammar is loaded, the decoder will return the number, dollar amount, or digit string as either a single concept, or a single interpretation string, depending on whether SRGS is used or not .
As an example, the client application loads GRAMMAR_NUMBER and also adds the concept and phrase "Widgets". If the sound data contained the speech "twelve widgets". The decoder will return two concepts: the first is the string "12" and the second the string "Widgets". If the speech was "one thousand one hundred and twenty nine Widgets seven point two Widgets", the decoder would return four concepts: "1129" , "Widgets", "7.2" and "Widgets" .
However, If you use SRGS, this is not what happens. In order to get this sort of functionality in the SRGS setting, you would create a grammar that looks like the following:
#ABNF 1.0; language en-US; mode voice; tag-format <semantics/1.0>; root $how_many_widgets;
Printed Documentation
304
$how_many_widgets = $<builtin:grammar/number> widgets {$=$$;}
In this case you wouldn't bother using LoadStandardGrammar() at all, since the standard number grammar will get loaded when you load this grammar. The return type would be an interpretation string representing the number that was recognized, like "1129" or "7.2". The word "widgets" would not be returned in this grammar.
See Also
Standard Grammars
LV_SRE_LoadStandardGrammar
LumenVox SpeechRec API
305
LVSpeechPort::LoadVoiceChannel
Loads the audio data into the specified voice channel prior to a call to Decode (which decodes the audio data).
int LoadVoiceChannel(int VoiceChannel, void* M, int Length, SOUND_FORMAT Format = ULAW_8KHZ);
Return Values
LV_SUCCESS
No errors; the voice channel audio successfully loaded.
LV_BAD_HPORT
The engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.
LV_FAILURE
Sound format was incorrectly specified.
Parameters
VoiceChannel
Accepted values 0 through 63.
M
Pointer to audio data.
Length
Memory size in bytes of the audio data.
Format
The audio data sound format.
Printed Documentation
306
Remarks
Each LV_SpeechPort supports 64 separate voice channels. Each channel has its own separate storage for decode data, so once the call is made, the client application can release its own copy. LoadVoiceChannel will accept the audio data and prepare it for decoding.
See Also
LV_SRE_LoadVoiceChannel
LumenVox SpeechRec API
307
LVSpeechPort::AddPhrase
Adds a phrase to a new or existing concept.
int AddPhrase(int GrammarSet, const char* Concept , const char* Phrase);
Return Values
LV_SUCCESS
No errors; the phrase was added to the concept.
LV_BAD_HPORT
The engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.
LV_GRAMMAR_SET_OUT_OF_RANGE
The grammar set is out of range.
LV_GRAMMAR_SYNTAX_ERROR or LV_GRAMMAR_SYNTAX_WARNING
The phrase entered has bad syntax, such as mismatched parenthesis.
Parameters
GrammarSet
Which grammar set to add the phrase. Integer value between 0 - 63, inclusive.
Concept
Which concept to add the phrase. Null-terminated string.
Phrase
The new phrase.
Printed Documentation
308
Remarks
The concept can be a new or existing concept; the call will automatically add the new concept with the single phrase.
See Also
Phrase Formats
Phonemes
LV_SRE_AddPhrase
LumenVox SpeechRec API
309
LVSpeechPort::RemoveConcept
Removes a concept and all of its phrases.
int RemoveConcept(int GrammarSet, const char* Concept);
Return Values
LV_SUCCESS
No errors; the concept and all phrases are removed form the grammar set.
LV_GRAMMAR_SET_OUT_OF_RANGE
The grammar set specified is outside the valid range.
LV_BAD_HPORT
The engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.
Parameters
GrammarSet
Which grammar set to remove the concept from. Possible value range 0 - 63.
Concept
Existing concept to remove. Null-terminated string.
See Also
LV_SRE_RemoveConcept
Printed Documentation
310
LVSpeechPort::ResetGrammar
Removes all concepts from a grammar.
int ResetGrammar(int GrammarSet);
Return Values
LV_SUCCESS
No errors; grammar reset.
LV_GRAMMAR_SET_OUT_OF_RANGE
The grammar set value is out of expected range (0-63).
See Also
LV_SRE_ResetGrammar
LumenVox SpeechRec API
311
LVSpeechPort::ReturnErrorString
Returns a description of an error code.
const char* ReturnErrorString(int ReturnCode);
Return Values
A null-terminated static string describing the error code.
Parameters
ReturnCode
The error code.
Remarks
If the error code is an invalid error code, "Invalid Error Code" is returned.
See Also
LV_SRE_ReturnErrorString
Printed Documentation
312
LVSpeechPort::SetProperty
SetProperty is deprecated in favor of using SetPropertyEx.
Sets various properties on the port.
int SetProperty(PROPERTIES Property, int Value);
Return Values
LV_SUCCESS
No errors; Property is set to Value.
LV_NOT_A_VALID_PROPERTY_VALUE
The property value is not a valid for the designated property.
Parameters
Property
Which property to modify.
Value
Property-dependent.
Remarks
Currently, only PROP_SAVE_SOUND_FILES is implemented; setting Value to 1 will cause the port to save request and answer files to disk; setting Value to 0 turns this feature off. The request and answer files are invaluable for troubleshooting and tuning applications, but will quickly fill up a hard drive.
See Also
Properties
LV_SRE_SetProperty
SetPropertyEx
LumenVox SpeechRec API
313
LVSpeechPort::SetPropertyEx
Sets various properties for a port, client, soundchannel, or grammar.
int SetPropertyEx(int propertyname, int valuetype, void* pvalue, int target = PROP_EX_TARGET_PORT, int index = 0 );
Return Values
LV_SUCCESS
No errors; property is set to the value pointed to by pvalue.
LV_INVALID_PROPERTY
The property does not exist.
LV_INVALID_PROPERTY_VALUE
The property value is invalid for the designated property (e.g. out of range).
LV_INVALID_PROPERTY_TARGET
The property cannot be set for the specified target.
LV_INVALID_PROPERTY_VALUE_TYPE
The property's type is incompatible with the declared type.
LV_INVALID_PROPERTY_TARGET_IDX
The target's index (grammar set, voicechannel) is out of range for this property.
Note: If more than one error occurs, which error code is returned is undefined.
Parameters
propertyname
Printed Documentation
314
Which property to modify.
valuetype
The value type of the property being set. Legal values are:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
PROP_EX_VALUE_TYPE_STRING
PROP_EX_VALUE_TYPE_FLOAT_PTR
Each property has a set of legal set of value types. See Properties.
pvalue
A pointer to the new value for propertyname. pvalue will be reinterpreted according to the value type provided.
target
The portion of the API that this property is set for. Legal values are:
PROP_EX_TARGET_PORT -- pvalue affects an entire speech port object
PROP_EX_TARGET_CHANNEL -- pvalue affects one voice channel in the speech port. The channel is specified by index.
PROP_EX_TARGET_GRAMMAR -- pvalue affects one grammar set in the speech port. The set is specified by index.
PROP_EX_TARGET_CLIENT -- pvalue is global, and affects all ports on the client.
Remarks
LumenVox SpeechRec API
315
See Properties for a list of modifiable properties.
You can use this function only after open a port. Calling this function before opening a port will result in failure. To set client scope property, use static function LVSpeechPort::ClientPropertyEx.
See Also
Properties
LV_SRE_SetPropertyEx
(static) LVSpeechPort::SetClientPropertyEx
Printed Documentation
316
LVSpeechPort::StreamStart
Sets up a new stream.
int StreamStart();
Return Values
LV_SUCCESS
Stream set up.
LV_FAILURE
Parameters incorrectly set.
Remarks
Call this function to set up a new stream. You need to call this function after calling StreamStop, StreamCancel or after end-of-speech has been detected on previous utterance.
See Also
StreamSetParameter
StreamStop
StreamCancel
LumenVox SpeechRec API
317
LVSpeechPort::StreamSendData
Send data buffer of sound data to stream.
int StreamSendData(void* SoundData, int SoundDataLength);
Return Values
LV_SUCCESS
Data accepted
LV_FAILURE
Stream not active or NULL sound data.
Parameters
SoundData
Pointer to the memory buffer containing sound data.
SoundDataLength
Length in bytes of sound data.
Remarks
Used to do the actual streaming. Call this function with each sound data buffer. This call copies sound data to an internal buffer and returns immediatly. Processing of sound data takes place on a background thread.
See Also
StreamSetStateChangeCallBack
StreamGetStatus
Printed Documentation
318
LVSpeechPort::StreamGetStatus
Returns status of stream.
int StreamGetStatus();
Return Values
Returns a stream status define. See Steam Status.
Remarks
Called to check the current state of stream.
See Also
StreamSetStateChangeCallBack
LumenVox SpeechRec API
319
LVSpeechPort::StreamGetLength
Returns length of sound data in stream buffer.
int StreamGetStatus();
Return Values
Number of bytes in internal buffer for sound stream.
Remarks
This is the total number of bytes streamed. Does not include bytes sent before barge-in is detected (if STREAM_PARM_DETECT_BARGE_IN is active) Can be useful if application wants to stop post barge-in stream after a certain amount of time (as example, to limit a user speech to 10 seconds)
See Also
StreamSetStateChangeCallBack
Printed Documentation
320
LVSpeechPort::StreamSetStateChangeCallBack
Set up a call back to receive state change notification of a stream.
int StreamSetStateChangeCallBack(LV_SRE_StreamStateChangeFn* fn, void* UserData);
Return Values
LV_SUCCESS
Parameters
LV_SRE_StreamStateChangeFn
Pointer to callback function to receive state change updates. See Stream Callback.
UserData
Application defined data sent back in callback.
Remarks
Each time a streams status changes, this callback will be called.
See Also
LV_SRE_StreamStateChangeFn
StreamGetStatus
LumenVox SpeechRec API
321
LVSpeechPort::StreamStop
Stops stream and loads sound channel with streamed data.
int StreamStop();
Return Values
LV_SUCCESS
LV_FAILURE Stream not active.
Remarks
This function ends streaming and puts streamed data into the voice channel defined with the STREAM_PARM_VOICE_CHANNEL parameter. If the STREAM_PARM_AUTO_DECODE parameter is active, the decode will begin (non-blocking) when this function is called.
See Also
StreamSetParameter
StreamCancel
Stream Parameters
Printed Documentation
322
LVSpeechPort::StreamCancel
Stops stream, sound data is discarded.
int StreamCancel();
Return Values
LV_SUCCESS
LV_FAILURE Stream not active.
Remarks
This kills the stream. Can be called to cancel a stream (particularly auto-decode types streams) in order to start new stream.
See Also
StreamStop
LumenVox SpeechRec API
323
LVSpeechPort::StreamSetParameter
Sets a new value for a stream property.
int StreamSetParameter(int StreamParameter, unsigned long StreamParameterValue);
Return Values
LV_SUCCESS
LV_INVALID_PROPERTY StreamParameter does not exist.
LV_INVALID_PROPERTY_VALUE StreamParamerterValue is out of range for the stream parameter.
Parameters
StreamParameter
Stream parameter to change. See Stream Parameters.
StreamParameterValue
New stream parameter value.
Remarks
Sets a stream parameter value.
See Also
StreamGetParameter
StreamSetParameterToDefault
Stream Parameters
Printed Documentation
324
LVSpeechPort::StreamGetParameter
Gets the current value of a stream property.
int StreamSetParameter(int StreamParameter, unsigned long StreamParameterValue);
Return Values
LV_SUCCESS
LV_INVALID_PROPERTY StreamParameter does not exist.
LV_INVALID_PROPERTY_VALUE StreamParamerterValue is out of range for the stream parameter.
Parameters
StreamParameter
Stream parameter to change. See Stream Parameters.
StreamParameterValue
New stream parameter value.
Remarks
Sets a stream parameter value.
See Also
StreamGetParameter
StreamSetParameterToDefault
Stream Parameters
LumenVox SpeechRec API
325
LVSpeechPort::StreamSetParameterToDefault
Sets a stream property to its default value.
int StreamSetParameterToDefault(int StreamParameter);
Return Values
LV_SUCCESS
LV_INVALID_PROPERTY Stream parameter does not exist.
Parameters
StreamParameter
Stream parameter to reset. See Stream Parameters.
Remarks
Sets a stream parameter value back to default setting.
See Also
StreamGetParameter
StreamSetParameter
Stream Parameters
Printed Documentation
326
LVSpeechPort::WaitForEngineToIdle
(Deprecated in favor of LVSpeechPort::WaitForDecode.)
Blocks the client application until the port is idle (not decoding).
int WaitForEngineToIdle(int MillisecondsToWait, int VoiceChannel = -1);
Return Values
LV_SUCCESS
No errors or timeout; the engine is now idle.
LV_TIME_OUT
WaitForEngineToIdle's timeout was reached before the engine became idle.
Parameters
MillisecondsToWait
The number of milliseconds to wait before returning if the Speech Port does not become idle.
VoiceChannel
Which VoiceChannel to wait on, -1 waits on all voice channels for the port.
Remarks
This function is deprecated in favor of LVSpeechPort::WaitForDecode. To achieve the same behavior as LVSpeechPort::WaitForDecode, use property PROP_EX_DECODE_TIMEOUT, and set MillisecondsToWait to TIMEOUT_INFINITE.
Some of the LVSpeechPort methods run asynchronous, in particular, Decode. WaitForEngineToIdle is primarily useful when Decode is called without LV_DECODE_BLOCK. In this case, Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will eventually need the results, the clients need a way to query the port to see if Decode has finished. WaitForEngineToIdle will wait the specified
LumenVox SpeechRec API
327
time for the engine to idle; check the return value to ensure the engine is idle, indicating that decode results are available.
WaitForEngineToIdle is also useful to ensure the LVSpeechPort has finished initializing, prior to calls to Decode.
See Also
Decode
LVSpeechPort::WaitForDecode
LV_SREWaitForEngineToIdle
Printed Documentation
328
LVSpeechPort::GetNumberOfNBestAlternatives
Returns the number of n-best alternatives found by the engine.
int GetNumberOfNBestAlternatives(int voicechannel);
Return Values
Number of n-best alternatives. It will always less than or equal to the value set for PROP_EX_MAX_NBEST_RETURNED.
Parameters
voicechannel
The channel containing the decoded audio.
See Also
PROP_EX_MAX_NBEST_RETURNED
LVSpeechPort::SwitchToNBestAlternative
LV_SRE_GetNumberOfNBestAlternatives
LumenVox SpeechRec API
329
LVSpeechPort::SwitchToNBestAlternative
Switch the n-best alternative that is viewable. After this function call, following result retrieval functions, such as LVSpeechPort::GetInterpretation will be bound to this n-best alternative.
int SwitchToNBestAlternatives(int voicechannel, int index);
Return Values
LV_SUCCESS
LV_FAILURE The index is not valid.
Parameters
voicechannel
The channel containing the decoded audio.
index
The index of the n-best alternative to switch to. It may be any value in the range [0, LVSpeechPort::GetNumberOfNBestAlternatives).
Remarks
Each alternative represents a distinct sentence. However, since some sentences can have multiple interpretations or multiple parses, it is possible that for some alternatives you will have multiple parse tree or interpretation objects returned. For this reason, it is recommended to get all result out as follows:
int nbest_count; int nbest_total = port.GetNumberOfNBestAlternatives(vc); int interp_count; for (nbest_count=0; nbest_count<nbest_total; ++nbest_count) { port.SwitchToNBestAlternative(vc, nbest_count); int interp_total = port.GetNumberOfInterpretations(vc); for (interp_count=0; interp_count<interp_total; ++interp_count) {
Printed Documentation
330
LVInterpretation interp = port.GetInterpretation(vc, interp_count); /* do something with the interp */ } }
Even though more than one interpretation can live in a single n-best result, the same interpretation will not live in more than one n-best result. The lower scoring interpretations are pruned out.
See Also
LVSpeechPort::GetNumberOfNBestAlternatives
LV_SRE_SwitchToNBestAlternative
LumenVox SpeechRec API
331
LVSpeechPort::WaitForDecode
Blocks the client application until the decode is finished.
int WaitForDecode(int VoiceChannel);
Return Values
LV_SUCCESS
No errors or timeout; the decode interaction is finished.
LV_TIME_OUT
The timeout value associated with PROP_EX_DECODE_TIMEOUT was exceeded before a result was returned from the Speech Engine. The decode was dropped from the Engine, and the LVSpeechPort may now start a new decode request.
Parameters
VoiceChannel
Which voice channel to wait on. Setting VoiceChannel equal to -1 causes a wait on all the voice channels for the port.
Remarks
Some of the API functions run asynchronous, in particular, LVSpeechPort::Decode. LVSpeechPort::WaitForDecode is primarily useful when LVSpeechPort::Decode is called without LV_DECODE_BLOCK. In this case, LVSpeechPort::Decode returns immediately, but continues processing the voice channel's audio data in a separate thread. Since client applications will eventually need the results, the clients need a way to query the port to see if LVSpeechPort::Decode has finished. LVSpeechPort::WaitForDecode will wait the specified time (determined by set value of PROP_EX_DECODE_TIMEOUT) for the engine to idle; check the return value to ensure the decode interaction is finished before attempting to retrieve answers from the speech port.
See Also
PROP_EX_DECODE_TIMEOUT
LVSpeechPort::Decode
LV_SRE_WaitForDecode
Printed Documentation
332
LVSpeechPort::SetClientPropertyEx
Sets various properties on the scope of client process..
static int SetClientPropertyEx(int propertyname, int valuetype, void* pvalue);
Return Values
LV_SUCCESS
No errors; property is set to the value pointed to by pvalue.
LV_INVALID_PROPERTY
The property does not exist.
LV_INVALID_PROPERTY_VALUE
The property value is invalid for the designated property (e.g. out of range).
LV_INVALID_PROPERTY_TARGET
The property cannot be set for the specified target.
LV_INVALID_PROPERTY_VALUE_TYPE
The property's type is incompatible with the declared type.
Note: If more than one error occurs, which error code is returned is undefined.
Parameters
propertyname
Which property to modify.
valuetype
The value type of the property being set. Legal values are:
LumenVox SpeechRec API
333
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
PROP_EX_VALUE_TYPE_STRING
PROP_EX_VALUE_TYPE_FLOAT_PTR
Each property has a set of legal set of value types. See Properties.
pvalue
A pointer to the new value for propertyname. pvalue will be reinterpreted according to the value type provided.
Remarks
See Properties for a list of modifiable properties.
A client property can be modified by calling this function even before opening a port.
See Also
Properties
LV_SRE_SetPropertyEx
Printed Documentation
334
LVInterpretation Class
Intro To LVInterpretation
Use <LVSpeechPort.h> or <LV_SRE_Semantic.h>
Return Type Function Description
LVInterpretation (void) Constructs an LVInterpretation object.
LVInterpretation (const LVInterpretatione& other) Copy constructor
LVInterpretation & operator= (const LVInterpretation& other) Assignment operator
~LVInterpretation(void) Destroys the LVInterpretation object
LVSemanticData & ResultData (void)
The result object, representing the end product of the semantic interpretation process.
const char* ResultName (void)
const char* GrammarLabel (void)
Returns the name of the grammar as it was provided to the speech port.
const char* Mode (void) returns the interaction mode for this answer.
LumenVox SpeechRec API
335
const char* Language (void) Returns the language identifier for this answer.
const char* InputSentence (void) The sentence that generated this interpretation.
int Score (void) Confidence score for this interpretation.
const char* TagFormat (void) The tag format that created the Data object.
Printed Documentation
336
LVInterpretation: Constructing and Copying
LVInterpretation objects are fully copyable.
Functions
LVInterpretation(void)
LVInterpretation(const LVInterpretation& other_si)
LVInterpretation& operator=(const LVInterpretation& other_si)
~LVInterpretation()
Parameters
other_hsi
The interpretaion object whose contents are being copied.
Remarks
Example
LVSpeechPort Port;
//open the port and do a decode //... //when the decode is finished,grab an interpretation object LVInterpretation Interp = Port.GetInterpretation(voicechannel, index);
//start using the interpretation data. //...
See Also
Creating, Copying and Releasing an LVInterpretation Handle (CAPI)
LumenVox SpeechRec API
337
ResultData
Returns a semantic data object generated by the user input and a matching grammar.
Function
const LVSemanticData& LVInterpretation::ResultData( )
Returns
An object representing the results of the semantic interpretation process.
See Also
LVSemanticData C++ API
LVInterpretation_GetResultData (C API)
Printed Documentation
338
ResultName
Returns the name of the name of the result data for this interpretation. The result name is usually the root rule of the matching grammar for this interpretation.
Function
const char* LVInterpretation::ResultName ( )
See Also
LVInterpretation_GetResultName (C API)
LumenVox SpeechRec API
339
Language
Returns the language identifier of the grammar that generated this interpretation.
Function
const char* LVInterpretation::Language( )
Returns
An RFC 3066 language identifier, such as "en-US" for United States English, or "fr" for French.
See Also
LVInterpretation_GetLanguage ( C API )
Printed Documentation
340
Mode
Returns the interaction mode that created the interaction.
Function
const char* LVInterpretation::Mode()
Returns
"voice" or "dtmf"
See Also
LVInterpretation_GetMode (C API)
LumenVox SpeechRec API
341
TagFormat
Returns the name of the semantic process that created this interpretation.
Function
const char* LVInterpretation::TagFormat()
Returns
tag format identifier
See Also
LVInterpretation_GetMode (C API)
Printed Documentation
342
InputSentence
Returns the input that was fed to the matching grammar to create this interpretation. It may represent the speech the Speech Engine recognized, or a dtmf sequence.
Function
const char* LVInterpretation::InputSentence()
See Also
LVInterpretation_GetInputSentence (CAPI)
LumenVox SpeechRec API
343
GrammarLabel
Returns the name of the grammar that generated this interpretation.
Function
const char* LVInterpretation::GrammarLabel ()
Remarks
GrammarLabel will always return the name of one of the grammars you activated for decode. If the active grammar had an integer label, then the returned label will be a string representation of that integer.
See Also
LVInterpretation_GetGrammarLabel ( C API )
Printed Documentation
344
Score
Function
int LVInterpretation::Score()
Returns
A number between 0-1000. Higher numbers indicate more confidence by the speech port in this interpretation.
See Also
LVInterpretation_GetScore (C API)
LumenVox SpeechRec API
345
LVSemanticData Class
LVSemanticData
LVSemanticData is the C++ class presenting semantic data. Think of an LVSemanticData object as a container containing one of the following items
A boolean
An integer
A floating point number
A composite object
An array
Return Value Function Description
LVSemanticData( ) Constructor
LVSemanticData (const LVSemanticData& other)
Copy constructor
LVSemanticData operator = (const LVSemanticData& other)
Assignment operator
~LVSemanticdata ( ) Destructor
int Type ( ) Returns the semantic data type contained in this object.
bool GetBool ( ) If thedata in this object is of type
Printed Documentation
346
SI_TYPE_BOOL, returns the boolean value.
int GetInt ( ) If the data in this object is of type SI_TYPE_INT, returns the integer value
double GetDouble ( ) If the data in this object is of type SI_TYPE_DOUBLE, returns the floating point value.
const char* GetString ( ) If the data in this object is of type SI_TYPE_STRING, returns the string value.
LVSemanticObject GetSemanticObject ( ) If the data in this object is of type SI_TYPE_OBJECT, returns the semantic object value.
LVSemanticArray GetSemanticArray ( ) If the data in this object is of type SI_TYPE_ARRAY, returns the semantic array value.
LumenVox SpeechRec API
347
Type
Returns the data type contained in a given LVSemanticData object.
Function
int LVSemanticData::Type( )
Return Value
One of seven semantic data types.
Printed Documentation
348
GetBool
Returns a boolean value contained in an LVSemanticData object. This function assumes that the object contains data of type SI_TYPE_BOOL. If the user calls this function when its type is not SI_TYPE_BOOL, the function always returns false.
Function
bool LVSemanticData::GetBool( )
Return Values
A boolean value.
LumenVox SpeechRec API
349
GetInt
Returns the integer value contained in a given semantic data object. This function assumes that the data contained is of type SI_TYPE_INT. If it is not, this function always returns 0.
Function
int LVSemanticData::GetInt( )
Return Values
An integer value.
Printed Documentation
350
GetDouble
Returns a double precision floating point value contained in the given semantic data object. This function assumes that the contained data is of type SI_TYPE_DOUBLE . If it is not, this function always returns 0.0.
Function
double LVSemanticData::GetDouble( )
Return Values
A double.
LumenVox SpeechRec API
351
GetString
Returns the string contained in a given LVSemanticData object. This function assumes that the contained data is of type SI_TYPE_STRING. If it is not, this function always returns NULL.
Function
const char* LVSemanticData::GetString( )
Return Values
NULL
Either the contained data is not of type SI_TYPE_STRING, or some error occurred.
Other
A pointer to a buffer containing the string.
Printed Documentation
352
GetSemanticObject
If the LVSemanticData object contains an element of type SI_DATA_OBJECT, this function returns the composite object. Otherwise, it returns an empty object.
Function
LVSemanticObject LVSemanticData::GetSemanticObject ( );
Returns
A semantic object
See Also
LVSemanticObject C++ API
LumenVox SpeechRec API
353
GetSemanticArray
If the LVSemanticData object contains an element of type SI_DATA_ARRAY this function returns the array object. Otherwise, it returns an empty array object.
Function
LVSemanticArray LVSemanticData::GetSemanticArray ( );
Returns
A semantic array
See Also
LVSemanticArray C++ API
Printed Documentation
354
LVSemanticObject Class
LVSemanticObject
LVSemanticObject represents a composite object. The user can get an LVSemanticObject by calling LVSemanticData::GetObject().
Return Types Functions Description
LVSemanticObject() Constructor
LVSemanticObject(const LVSemanticObject & other)
Copy constructor
~LVSemanticObject() Destructor
LVSemanticObject& operator = (const LVSemanticObject & other)
Assignment operator
int NumberOfProperties() Returns the number of properties in this object.
const char* PropertyName (int index)
Returns the property name corresponding to index.
LVSemanticData
PropertyValue(const char* property_name) PropertyValue(int index)
Returns the semantic data associated corresponding to property_name, or index
bool PropertyExists(const char* property_name)
If this object has a property named property_name, this method returns true, otherwise false.
LumenVox SpeechRec API
355
Printed Documentation
356
NumberOfProperties
Returns the number of properties in this LVSemanticObject
Function
int LVSemanticObject::NumberOfProperties ( )
LumenVox SpeechRec API
357
PropertyName
Returns the ith name of a property (member data) in this object.
Function
const char* LVSemanticObject::PropertyName(int i)
Parameter
i
An index between 0 and NumberOfProperties - 1
Printed Documentation
358
PropertyValue
Returns a property (member data) of this object.
Functions
LVSemanticData LVSemanticObject::PropertyValue(const char *property_name)
LVSemanticData LVSemanticObject::PropertyValue(int property_index)
Return Values
Returns a semantic data object. The first returns the object named property_name. The second returns the object corresponding to PropertyName(property_index)
Parameters
property_index
A number between 0 and NumberOfProperties - 1
property_name
A string containing the property name.
LumenVox SpeechRec API
359
PropertyExists
Function
bool LVSemanticObject::PropertyExists(const char *property_name)
Return Values
Returns true if there exists a property of this object named property_name.
Parameters
property_name
A property name.
Printed Documentation
360
LVSemanticArray Class
LVSemanticArray
LVSemanticArray represents an array type. You can get an array out of a data type container by calling LVSemanticData::GetArray().
Return Values Functions Description
LVSemanticArray() Constructor
LVSemanticArray(const LVSemanticArray& other) Copy constructor
LVSemanticArray&
operator=(const LVSemanticArray& other) Assignment Operator
~LVSemanticArray() Destructor
int Size() Return the number of properties in this array.
LVSemanticData operator [] (int Index)
Return the semantic data indicated by the index. If the Index does not exist, the returned semantic data will have type SI_TYPE_NULL.
LVSemanticData
At(int Index)
Return the semantic data indicated by the index. If the Index does not exist, the returned semantic data will have type SI_TYPE_NULL.
LumenVox SpeechRec API
361
Size
Returns the size of an LVSemanticArray.
Function
int LVSemanticArray::Size( )
Printed Documentation
362
Operator [ ] or At
Access elements in an LVSemanticArray the way you would a conventional array.
Functions
LVSemanticData LVSemanticArray::operator [] (int index)
LVSemantidData LVSemanticArray::At(int index)
Example
LVSemanticData myData = myArray[6];
LumenVox SpeechRec API
363
LVParseTree Class
LVParseTree Class
An LVParseTree object represents the results of a decode using a context free grammar.
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
See Also Using the Parse Tree Tutorial
Return Type Function Description
LVParseTree(void) Constructs an LVParseTree object.
LVParseTree(const LVParseTree& other) Copy constructor
LVParseTree operator=(const LVParseTree& other) Assignment operator
~LVParseTree(void) Destroys the LVParseTree object
LVParseTree::Node Root (void) Provides access to the parent node in the parse tree.
LVParseTree::Iterator Begin (void)
Provides an iterator that walks each node in the tree in a top-to-bottom, left-to-right fashion
Printed Documentation
364
LVParseTree::Iterator End (void) Marks the end of traversal for the parse tree iterator
LVParseTree::TerminalIterator TerminalsBegin (void) Traverses the terminals of the parse tree (words).
LVParseTree::TerminalIterator TerminalsEnd (void) Marks the end of traversal for the TerminalIterator.
LVParseTree::TagIterator TagsBegin (void) Traverses the tags in the parse tree (semantic data).
LVParseTree::TagIterator TagsEnd (void) Marks the end of traversal for the TagIterator.
const char* TagFormat (void)
Returns the tag format, as described by the grammar that this tree matched (e.g. "lumenvox/1.0" or "semantics/1.0")
int NumberOfTagsInHeader (void)
Returns the number of tags (semantic data) that were defined in the matching grammar's header.
const char* HeaderTag (int i) Returns the ith header tag from the matching grammar.
const char* GrammarLabel (void) Returns the name of the grammar as it was
LumenVox SpeechRec API
365
provided to the speech port.
const char* Mode (void) "voice" or "dtmf"
const char* Language (void)
Returns the language of the matching grammar (e.g. "en-US" or "es-MX")
Printed Documentation
366
Methods
LVParseTree Construction, Assignment and Destruction
LVParseTree objects are fully copyable and assignable.
Functions
LVParseTree()
LVParseTree(const LVParseTree& Other)
LVParseTree& operator = (const LVParseTree& Other)
~LVParseTree()
Parameters
Other
The LVParseTree object being copied
Remarks
You shouldn't have to worry too much about construction or destruction of an LVParseTree object. When you declare an LVParseTree, an empty tree is created. Just set it equal to the results of a decode, and begin using it.
Example
LVSpeechPort Port;
//open the port and do a decode //... //when the decode is finished, grab a parse tree from the speech port LVParseTree Tree = Port.GetParseTree (voicechannel, index);
//start using the tree. It is valid as long as its in scope.
See Also
Creating and Releasing an LVParseTree Handle (C API)
LumenVox SpeechRec API
367
LVParseTree::GrammarLabel
Returns the name of the grammar that generated this tree.
Function
const char* GrammarLabel( )
Remarks
GrammarLabel( ) will always return the name of one of the grammars you activated for decode. It will be the name of the grammar that matched the speakers input, according to the engine. If the active grammar had an integer label, then the returned label will be a string representation of that integer.
See Also
LVParseTree_GetGrammarLabel ( C API )
Printed Documentation
368
LVParseTree::Language
Returns the language identifier of the grammar that generated this tree.
Function
const char* Language()
Returns
An RFC 3066 language identifier, such as "en-US" for United States English, or "fr" for French.
See Also
LVParseTree_GetLanguage ( C API )
LumenVox SpeechRec API
369
LVParseTree::Mode
Returns the interaction mode that created the tree.
Function
const char* Mode(void)
Returns
"voice" or "dtmf"
See Also
LVParseTree_GetMode (C API)
Printed Documentation
370
LVParseTree::TagFormat
Returns the name of the tag format declared in the matching grammar for this tree.
Function
const char* TagFormat(void)
See Also
LVParseTree_GetTagFormat (C API)
LumenVox SpeechRec API
371
LVParseTree::Root
Gets the root parse tree node.
Function
LVParseTree::Node Root();
Return Values
An LVParseTree::Node object representing the toplevel rule of the matching grammar.
Remarks
This node will always be a rule node (i.e will always satisfy Tree.Root().IsRule() == true). If the matching grammar specified a root rule then this node will always represent that rule.
See Also
LVParseTree_GetRoot ( C API )
Printed Documentation
372
LVParseTree::Begin and LVParseTree::End
Begin and End provide iterators for visiting every node in the tree in a top-to-bottom, left-to-right descent. It is the basis for the Tag and Terminal iterators.
Functions
LVParseTree::Iterator Begin ()
LVParseTree::Iterator End ()
Example
The following code prints out every node in a parse tree.
LVParseTree::Iterator Itr = Tree.Begin(); LVParseTree::Iterator End = Tree.End();
for (; Itr != End; Itr++) { for (int i = 0; i < Itr->Level(); ++i) cout << "\t"; if (Itr->IsRule()) cout << "$" << Itr->RuleName() << ":" << endl; if (Itr->IsTag()) cout << "{" << Itr->Text() << "}" << endl; if (Itr->IsTerminal()) cout << "\"" << Itr->Text() << "\"" << endl; }
If the grammar was the top level navigation example grammar, and the engine recognized "go back", the above code would print out:
$directive: "go" "back" {$ = "APPLICATION_BACK"}
See Also
LVParseTree_GetIteratorBegin and LVParseTree_GetIteratorEnd (C API)
LumenVox SpeechRec API
373
LVParseTree::TerminalsBegin and LVParseTree::TerminalsEnd
TerminalsBegin and TerminalsEnd provide access to the "terminals" of the tree. Terminals are the words and phrases in your grammar, so a TerminalIterator gives you access the the exact words the engine heard a speaker say to match a grammar, in the order that the engine heard those words.
Functions
LVParseTree::TerminalIterator TerminalsBegin()
LVParseTree::TerminalIterator TerminalsEnd()
Example
The following code prints out the sentence engine heard, with a word-level confidence score attached to each word.
LVParseTree::TerminalIterator Itr = Tree.TerminalsBegin(); LVParseTree::TerminalIterator End = Tree.TerminalsEnd();
for (; Itr != End; ++Itr) { cout << "\"" << Itr->Text() << "\"":(" << Itr->Score() << ") "; } cout << endl;
So if the grammar being used was the top level navigation example grammar, and the engine recognized "go back", then the output of the above code might look like:
"go":(850) "back":(901)
See Also
LVParseTree_GetTerminalIteratorBegin and LVParseTree_GetTerminalIteratorEnd (C API)
Printed Documentation
374
LVParseTree::TagsBegin and LVParseTree::TagsEnd
TagsBegin and TagsEnd provide iterators for visiting the tags in the tree's body.
Functions
LVParseTree::TagIterator TagsBegin ()
LVParseTree::TagIterator TagsEnd ()
Example
The following code prints out every tag in a parse tree.
LVParseTree::TagIterator Itr = Tree.TagsBegin(); LVParseTree::TagIterator End = Tree.TagsEnd();
for (; Itr != End; Itr++) { cout << Itr->Text() << ";" << endl; }
If the grammar was the top level navigation example grammar, and the engine recognized "go back", the the above code would print out:
$ = "APPLICATION_BACK";
Remark
The TagIterator does not visit the tags in a tree's header. Use LVParseTree::HeaderTag to access the contents of those tags.
See Also
LVParseTree_GetTagIteratorBegin and LVParseTree_GetTagIteratorEnd (C API)
LumenVox SpeechRec API
375
LVParseTree Inner Classes
LVParseTree::Node
An LVParseTree is made out of Node objects. Each node represents a word, rule, or tag that was seen by the engine as it decoded an utterance against the matching grammar.
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
Node(void) Constructs an empty node.
Node(const Node& other)
Copy constructor
LVParseTree::Node& operator=(const Node& other)
Assignment operator
~Node(void) destructor
LVParseTree::Node Parent (void)
Provides access to the parent node of this node. Note: the parent of the tree's root node has an empty parent.
LVParseTree::ChildrenIterator ChildrenBegin (void)
Traverses the immediate children of this node.
LVParseTree::ChildrenIterator ChildrenEnd (void) Marks the end of traversal for the
Printed Documentation
376
ChildrenIterator
LVParseTree::Iterator SubTreeBegin (void)
Provides an iterator that walks each node in the sub tree rooted by this node in a top-to-bottom, left-to-right fashion.
LVParseTree::Iterator SubTreeEnd (void) Marks the end of traversal for the parse tree iterator
LVParseTree::TerminalIterator TerminalsBegin (void)
Traverses the terminals(words) of the subtree rooted by this node.
LVParseTree::TerminalIterator TerminalsEnd (void)
Marks the end of traversal for the TerminalIterator.
LVParseTree::TagIterator TagsBegin (void)
Traverses the tags (semantic data) in the subtree rooted by this node.
LVParseTree::TagIterator TagsEnd (void) Marks the end of traversal for the TagIterator.
bool IsRule (void)
Returns true if this node represents a matched rule in a grammar. Note: rule nodes are the only nodes that can have children. The children
LumenVox SpeechRec API
377
of a rule node match the right hand side of the grammar rule that is represented by this node.
bool IsTerminal (void)
Returns true if this node represents a terminal (word) in a grammar. Note: the parent of a terminal node is always a rule in the matching grammar that contains this terminal.
bool IsTag (void)
Returns true if this node represents a tag (semantic data) in a grammar. Note: the parent of a tag node is always a rule in the matching grammar that contains this tag.
const char* Text (void)
For a rule node, this is the partial sentence that caused the rule to match. For a terminal node, this is the word that the node represents. For a tag node, this is the tag data.
const char* Phonemes (void)
For a rule node, this is the phonetic pronunciation of the partial sentence that caused the rule to match. For a terminal node,
Printed Documentation
378
this is the phonetic pronunciation of the word that was spoken. For a tag node, this is empty.
const char* RuleName (void)
For a rule node, this is the name of the rule being represented. For a tag or terminal node, this is the name of the node's parent.
int Score (void)
For a rule node, this is the confidence of the rule being matched. For a terminal node, this is the confidence of the word being spoken. For a tag node, this is the parent rule's score.
int StartTime (void)
For a rule node, this is the the start time of the first word that matched this rule (elapsed time from the start of the utterance, in milliseconds). For a terminal node, this is the start time of the word. For a tag node, this is the start time of the first word after the tag/ the end time of the last word before the tag.
int EndTime (void)
For a rule node, this is the the end time of the last word that matched this rule (elapsed time from the start of the utterance, in
LumenVox SpeechRec API
379
milliseconds). For a terminal node, this is the end time of the word. For a tag node, this is the start time of the first word after the tag/ the end time of the last word before the tag.
Printed Documentation
380
LVParseTree::Iterator
An LVParseTree::Iterator Object traverses a parse tree in a top-to-bottom, left-to-right fashion (sometimes called a pre-order or LL traversal)
Use <LVSpeechPort2.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
Iterator(void) Constructs a blank Iterator; its not pointing over anything.
Iterator(const Iterator& other)
Copy constructor.
LVParseTree::Iterator& operator=(const Iterator& other)
Assignment operator.
~Iterator(void) Destructor.
LVParseTree::Iterator& operator ++ (void) pre-increments the iterator (++itr).
LVParseTree::Iterator operator ++ (int) post-increments the iterator (itr++).
const LVParseTree::Node* operator -> (void)
provides pointer-like access to the node the iterator is currently over ( e.g const char* text = itr->Text( ) )
const LVParseTree::Node&
operator * (void) provides access to the node the iterator is currently over
LumenVox SpeechRec API
381
(e.g. LVParseTree::Node n = *itr )
bool operator == (const Iterator& other)
Tests equality with another Iterator. Two Iterators are equal if they are pointing to the same node in the same tree.
bool operator != (const Iterator& other)
returns true if and only if the equality operator returns false.
Printed Documentation
382
LVParseTree::ChildrenIterator
An LVParseTree::ChildrenIterator Object traverses the immediate children of a rule node, from left to right. You get a ChildrenIterator object from a Node by calling
LVParseTree::Node::ChildrenBegin( )
and
LVParseTree::Node::ChildrenEnd( )
Use <LVSpeechPort.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
Iterator(void) Constructs a blank ChildrenIterator; its not pointing over anything.
Iterator(const ChildrenIterator& other) Copy constructor.
LVParseTree::ChildrenIterator& operator=(const ChildrenIterator& other) Assignment operator.
~ChildrenIterator(void) Destructor.
LVParseTree::ChildrenIterator& operator ++ (void) pre-increments the iterator (++itr).
LVParseTree::ChildrenIterator operator ++ (int) post-increments the iterator (itr++).
const LVParseTree::Node* operator -> (void) provides pointer-like access to the node the
LumenVox SpeechRec API
383
iterator is currently over ( e.g const char* text = itr->Text( ) )
const LVParseTree::Node& operator * (void)
provides access to the node the iterator is currently over (e.g. LVParseTree::Node n = *itr )
bool operator==(const ChildrenIterator& other)
Tests equality with another ChildrenIterator. Two ChildrenIterators are equal if they are pointing to the same node in the same tree. (e.g if itr1 == itr2 do something)
bool operator!=(const ChildrenIterator& other)
returns true if and only if the equality operator returns false.
Printed Documentation
384
LVParseTree::TerminalIterator
An LVParseTree::TerminalIterator object is an adaptation of the standard LVParseTree::Iterator. It only visits the nodes in a tree that are terminals. You get a TerminalIterator by calling:
LVParseTree::Node::TerminalsBegin( ) LVParseTree::Node::TerminalsEnd( )
Use <LVSpeechPort2.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
Iterator(void)
Constructs a blank TerminalIterator; its not pointing over anything.
Iterator(const TerminalIterator& other)
Copy constructor.
LVParseTree::TerminalIterator& operator=(const TerminalIterator& other)
Assignment operator.
~TerminalIterator(void) Destructor.
LVParseTree::TerminalIterator& operator ++ (void) pre-increments the iterator (++itr).
LVParseTree::TerminalIterator operator ++ (int) post-increments the iterator (itr++).
LumenVox SpeechRec API
385
const LVParseTree::Node* operator -> (void)
provides pointer-like access to the node the iterator is currently over ( e.g const char* text = itr->Text( ) )
const LVParseTree::Node& operator * (void)
provides access to the node the iterator is currently over (e.g. LVParseTree::Node n = *itr )
bool operator==(const TerminalIterator& other)
Tests equality with another TerminalIterator. Two TerminalIterators are equal if they are pointing to the same node in the same tree. (e.g if itr1 == itr2 do something)
bool operator!=(const TerminalIterator& other)
returns true if and only if the equality operator returns false.
Printed Documentation
386
LVParseTree::TagIterator
An LVParseTree::TagIterator object is an adaptation of the standard LVParseTree::Iterator. It only visits the nodes in a tree that are tags. You get a TagIterator by calling:
LVParseTree::Node::TagsBegin( ) LVParseTree::Node::TagsEnd( )
Use <LVSpeechPort2.h> or <LV_SRE_ParseTree.h>
Return Type Function Description
Iterator(void) Constructs a blank TagIterator; its not pointing over anything.
Iterator(const TagIterator& other)
Copy constructor.
LVParseTree::TagIterator& operator=(const TagIterator& other)
Assignment operator.
~TagIterator(void) Destructor.
LVParseTree::TagIterator& operator ++ (void) pre-increments the iterator (++itr).
LVParseTree::TagIterator operator ++ (int) post-increments the iterator (itr++).
const LVParseTree::Node* operator -> (void) provides pointer-like access to the node the iterator is currently over
LumenVox SpeechRec API
387
( e.g const char* text = itr->Text( ) )
const LVParseTree::Node&
operator * (void)
provides access to the node the iterator is currently over (e.g. LVParseTree::Node n = *itr )
bool operator==(const TagIterator& other)
Tests equality with another TagIterator. Two TagIterators are equal if they are pointing to the same node in the same tree. (e.g if itr1 == itr2 do something)
bool operator!=(const TagIterator& other)
returns true if and only if the equality operator returns false.
Printed Documentation
388
LVGrammar Class
class LVGrammar
An LVGrammar object represents a context-free grammar that can be used in the Speech Engine to recognize speech. An LVGrammar object can also be used to test the functionality of a grammar by processing transcripts.
Use <LVSpeechPor.h> or <LV_SRE_Grammar.h>
Return Type Function Description
LVGrammar (void) Constructs an LVGrammar object.
LVGrammar (GrammarLogCB log, void* userdata)
Constructs an LVGrammar object, with an initial logging function.
LVGrammar (const LVGrammar& other)
Copy constructor.
~LVGrammar (void) Destroys the LVGrammar object.
LVGrammar& operator = (const LVGrammar& other) Assignment operator
void
RegisterLoggingCallback (GrammarLogCB log, void* userdata)
Registers a callback so the object can report warnings and errors to the grammar author.
int Reset (void) Reset a grammar
LumenVox SpeechRec API
389
object.
int SaveCompiledGrammar (const char* filename)
Save the grammar object to a binary file.
int LoadCompiledGrammar (const char* filename)
Load the grammar object from a binary file
HGRAMMAR GetHGrammar (void) Returns the underlying object handle.
int LoadGrammar (const char* location)
Loads a grammar from a location specified by the "uri" argument.
int LoadGrammarFromBuffer (const char* contents)
Loads a grammar from a null terminated string containing the contents of the grammar.
int AddRule (const char* rulename, const char* definition)
Inserts a new rule into the grammar.
int RemoveRule (const char* rulename)
Removes a rule from the grammar.
int SetRoot (const char* rulename) Sets a starting rule for the grammar.
void SetMode (const char* mode)
Declare the mode of grammar (the style of decode to be processed). Legal arguments are "voice" or "dtmf".
Printed Documentation
390
const char* GetMode (void) Return the interaction mode of the grammar.
void SetLanguage (const char* language)
Specify the language of this grammar as a language/country code pair. Legal arguments include "en-US" and "es-MX".
const char* GetLanguage (void)
Return the language setting of the grammar.
void SetTagFormat (const char* tag_format)
Identify the tag format of the grammar. To use the LumenVox semantic interpretation, the tag format must be "lumenvox/1.0" or "semantics/1.0".
const char* GetTagFormat (void)
Return the tag format setting of the grammar.
int GetNumberOfMetaData (void)
Return the number of meta data in the grammar.
const char* GetMetaDataKey (int index)
Return the key of the meta data with a specified index
const char* GetMetaDataValue (int index)
Return the value of the meta data with a specified index
LumenVox SpeechRec API
391
int ParseSentence (const char* sentence)
Use the grammar to parse a sentence.
int NumberOfParses (void)
Returns the number of parses created by the most recent ParseSentence call.
LVParseTree GetParseTree (int index)
Returns the parse tree object created with a specified index
int InterpretParses (void)
Generate interpretations form parses trees created by the most recent ParseSentence call.
int GetNumberOfInterpretations (void)
Returns the number of interpretations created the most recent InterpretParses call.
LVInterpretation GetInterpretation (int index)
Returns the semantic interpretation with the specified index
Printed Documentation
392
LumenVox SpeechRec API
393
Methods
LVGrammar Constructor/Destructor
Functions
LVGrammar()
LVGrammar(GrammarLogCB log, void* userdata)
LVGrammar(const LVGrammar& other)
~LVGrammar()
Parameters
log
Error/warning reporting callback function pointer.
userdata
The logging callback function pointer.
other
Existing grammar object.
Remarks
The call back function need to have signature defined by GrammarLogCB.
See Also
LVGrammar_Create (C API)
LVGrammar_CreateFromCopy (C API)
LVGrammar_Release (C API)
Printed Documentation
394
LVGrammar::operator =
Assignment operator.
Function
LVGrammar& operator = (const LVGrammar& other)
Parameters
other
Existing grammar object.
See Also
LVGrammar_Copy (C API)
LumenVox SpeechRec API
395
LVGrammar::RegisterLoggingCallback
Registers a callback so the object can report warnings and errors to the grammar author via the callback function.
Function
void RegisterLoggingCallback (GrammarLogCB log, void* userData)
Parameters
log
The logging callback function pointer.
userdata
The pointer to user defined data associated with the grammar object pointed by Grammar. It will be passed into the callback function.
Remarks
The call back function need to have signature defined by GrammarLogCB.
See Also
LVGrammar__RegisterLoggingCallback (C API)
Printed Documentation
396
LVGrammar::Reset
Reset a grammar object.
Function
int Reset (void)
Return Values
LV_SUCCESS
LV_FAILURE
See Also
LVGrammar_Reset (C API)
LumenVox SpeechRec API
397
LVGrammar::SaveCompiledGrammar
Save a grammar object to a binary file.
Function
int SaveCompiledGrammar (const char* filename)
Parameters
filename
File name.
Return Values
LV_SUCCESS
LV_FAILURE
Remarks
The saved compiled grammar can be later loaded into a grammar object with LVGramma::LoadCompiledGrammar.
See Also
LVGramma::LoadCompiledGrammar
LVGrammar_SaveCompiledGrammar (C API)
Printed Documentation
398
LVGrammar::LoadCompiledGrammar
Load a grammar object from a binary file previously saved by LVGrammar::SaveCompiledGrammar.
Function
int LoadCompiledGrammar (const char* filename)
Parameters
hgram
The handle to a grammar object.
filename
File name.
Return Values
LV_SUCCESS
LV_FAILURE
See Also
LVGrammar::SaveCompiledGrammar
LVGrammar_LoadCompiledGrammar (C API)
LumenVox SpeechRec API
399
LVGrammar::GetHGrammar
Return underlying grammar object handle.
Function
HGRAMMAR GetHGrammar (void)
Return Values
A pointer to the underlying grammar object.
Remarks
class LVGrammar is just a thin wrapper of grammar object handle HGRAMMAR.
See Also
HGRAMMAR
Printed Documentation
400
LVGrammar::LoadGrammar
Loads a grammar from a local file or remote file via http or ftp. Grammar can be written in ABNF or XML notations.
Function
int LoadGrammar(const char* grammar_location)
Parameters
gram_location
A file descriptor or uri that points to a valid SRGS grammar file, such as "c:/grammars/pizza.grxml", "http://www.gramsRus.com/phonenumber.gram", or "builtin:dtmf/boolean?y=1;n=2"
Return Values
LV_SUCCESS
No errors; this grammar is now ready for use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
Remarks
LumenVox SpeechRec API
401
Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LVGrammar_LoadGrammar (C API)
Printed Documentation
402
LoadGrammarFromBuffer
Loads a grammar from a null terminated string buffer. Grammar can be written in ABNF or XML notations.
Function
int LoadGrammarFromBuffer(const char* grammar_contents);
Parameters
gram_contents
A null terminated string containing the contents of a valid SRGS grammar.
Return Values
LV_SUCCESS
No errors; this grammar is now ready for use.
LV_GRAMMAR_SYNTAX_WARNING
The grammar file was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The grammar file was not understandable to the grammar compiler. You will not be able to decode with this grammar.
LV_GRAMMAR_LOADING_ERROR
The grammar compiler was unable to find the location of the grammar you loaded.
Remarks
Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LumenVox SpeechRec API
403
LVGrammar_LoadGrammarFromBuffer (C API)
Printed Documentation
404
LVGrammar::AddRule
Add rules to a grammar object.
Function
int AddRule(const char* rule_name, const char* rule_definition)
Parameters
rule_name
The name of the rule
rule_definition
The definition of the rule
Return Values
LV_SUCCESS
No errors; the rule has been successfully added or removed.
LV_GRAMMAR_SYNTAX_WARNING
The new rule was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The new rule was not understandable to the grammar compiler. You will not be able to decode with this grammar.
Example
grammar.AddRule("foo", "hello [world]");
Is the same as writing a rule:
$foo = hello [world];
LumenVox SpeechRec API
405
Remarks
New rules must be written in ABNF notation. Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LVGrammar::RemoveRule
LVGrammar_AddRule (C API)
Printed Documentation
406
LVGrammar::RemoveRule
Remove rules to a grammar object.
Function
int RemoveRule(const char* rule_name)
Parameters
rule_name
The name of the rule
Return Values
LV_SUCCESS
No errors; the rule has been successfully added or removed.
LV_GRAMMAR_SYNTAX_WARNING
The new rule was not fully conforming, but it was understandable and is now ready to be used
LV_GRAMMAR_SYNTAX_ERROR
The new rule was not understandable to the grammar compiler. You will not be able to decode with this grammar.
Remarks
Detailed error and warning messages are sent to the grammar object's logging callback function.
See Also
LVGrammar::AddRule
LVGrammar_RemoveRule (C API)
LumenVox SpeechRec API
407
LVGrammar::SetRoot
Identifies one of the grammar rules as the root rule. The root rule is where the engine starts its search.
Function
int SetRoot(const char* rule_name)
Parameters
rule_name
The name of the rule.
Example
grammar.SetRule("foo");
Is the same as writing in a grammar:
root $foo;
See Also
LVGrammar_SetRoot (C API)
Printed Documentation
408
LVGrammar::SetMode
Set mode property for the grammar,
Function
int SetMode(const char* mode)
Parameters
mode
The interaction mode of the grammar.
Example
grammar.SetLanguage("en-US"); grammar.SetMode("voice"); grammar.SetTagFormat("lumenvox/1.0");
Is the same as writing in your grammar:
language "en-US; mode "voice"; tag-format <lumenvox/1.0>;
See Also
LVGrammar::GetMode
LVGrammar_SetMode (C API)
LumenVox SpeechRec API
409
LVGrammar::SetLanguage
Set language for the grammar,
Function
int SetLanguage(const char* language)
Parameters
language
The language identifier for the grammar
Example
grammar.SetLanguage("en-US"); grammar.SetMode("voice"); grammar.SetTagFormat("lumenvox/1.0");
Is the same as writing in your grammar:
language "en-US; mode "voice"; tag-format <lumenvox/1.0>;
See Also
LVGrammar::GetLanguage
LVGrammar_SetLanguage (C API)
Printed Documentation
410
LVGrammar::SetTagFormat
Set interpretation tag format of the grammar.
Function
int SetTagFormat(const char* tag_format)
Parameters
tag_format
The grammar's tag format.
Example
grammar.SetLanguage("en-US"); grammar.SetMode("voice"); grammar.SetTagFormat("lumenvox/1.0");
Is the same as writing in your grammar:
language "en-US; mode "voice"; tag-format <lumenvox/1.0>;
See Also
LVGrammar_GetTagFormat
LVGrammar_SetTagFormat (C++ API)
LumenVox SpeechRec API
411
LVGrammar::GetMode
Return the mode setting for the grammar,
Function
const char* GetMode(void)
Return Values
The interaction mode of the grammar.
See Also
LVGrammar::SetMode
LVGrammar_GetMode (C API)
Printed Documentation
412
LVGrammar::GetLanguage
Return the language setting for the grammar,
Function
const char* GetLanguage(void)
Return Values
The language identifier of the grammar.
See Also
LVGrammar::SetLanguage
LVGrammar_GetLanguage (C API)
LumenVox SpeechRec API
413
LVGrammar::GetTagFormat
Return the interpretation tag format setting for the grammar,
Function
const char* GetTagFormat(void)
Parameters
hgram
A handle to the grammar.
Return Values
The tag format of the grammar.
See Also
LVGrammar::SetTagFormat
LVGrammar_GetTagFormat (C API)
Printed Documentation
414
LVGrammar::GetNumberOfMetaData
Return the number of meta data contained in the grammar.
Function
int GetNumberOfMetaData(void)
Example
If the grammar has following lines:
meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';
You can access meta data as follows:
int count = grammar.GetNumberOfMetaData(); // returns 2 const char* key = grammar.GetMetaDataKey(0); //returns "description" const char* value = grammar.GetMetaDataValue(1); //returns "05/12/2005"
See Also
LVGrammar::GetMetaDataKey
LVGrammar::GetMetaDataValue
LVGrammar_GetNumberOfMetaData (C API)
LumenVox SpeechRec API
415
LVGrammar::GetMetaDatakey
Return the key of the meta data indicated by the index.
Function
int GetMetaDataKey(int index)
Parameters
index
Index of the meta data. It should be in the range [0, LVGrammar::GetNumberOfMetaData).
Return Values
null
The index is not valid.
non-null
A pointer to the value string.
Example
If the grammar has following lines:
meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';
You can access meta data as follows:
int count = grammar.GetNumberOfMetaData(); // returns 2 const char* key = grammar.GetMetaDataKey(0); //returns "description" const char* value = grammar.GetMetaDataValue(1); //returns "05/12/2005"
See Also
LVGrammar::GetNumberOfMetaData
Printed Documentation
416
LVGrammar::GetMetaDataValue
LVGrammar_GetMetaDataKey (C API)
LumenVox SpeechRec API
417
LVGrammar::GetMetaDataValue
Return the value of the meta data indicated by the index.
Function
int GetMetaDataValue(int index)
Parameters
index
Index of the meta data. It should be in the range [0, LVGrammar::GetNumberOfMetaData).
Return Values
null
The index is not valid.
non-null
A pointer to the value string.
Example
If the grammar has following lines:
meta 'description' is 'example grammar'; meta 'date' is '05/12/2005';
You can access meta data as follows:
int count = grammar.GetNumberOfMetaData(); // returns 2 const char* key = grammar.GetMetaDataKey(0); // returns "description" const char* value = grammar.GetMetaDataValue(1); // returns "05/12/2005"
See Also
LVGrammar::GetNumberOfMetaData
Printed Documentation
418
LVGrammar::GetMetaDataKey
LVGrammar_GetMetaDataValue (C API)
LumenVox SpeechRec API
419
LVGrammar::ParseSentence
Use a loaded grammar object to parse a sentence.
Function
int ParseSentence(const char* sentence)
Parameters
sentence
The sentence to parse.
Return Values
0
The sentence is not covered by the grammar.
non-0
The number of distinct parses.
Example
Assume a grammar was defined as:
root $yes_no; $yes_no = $yes | $no; $yes = yes [please]; $no = no [thank you];
You can use this grammar to validate sentences as follows:
int count = grammar.ParseSentence("no thank you"); // returns 1 int count = grammar.ParseSentence("no thanks"); // returns 0
Remarks
With this function, you can identify how well a grammar covers your targeted transcript set.
Printed Documentation
420
See Also
LVGrammar::GetNumberOfParses
LVGrammar::GetParseTree
LVGrammar_ParseSentence (C API)
LumenVox SpeechRec API
421
LVGrammar::NumberOfParses
Return the number of parses created by the most recent call of LVGrammar::ParseSentence.
Function
int GetNumberOfParses(void)
Return Values
0
The sentence is not covered by the grammar.
non-0
The number of distinct parses.
Remarks
This function can be used after a call to LVGrammar::ParseSentence. It is provided as a convenience; it returns the same value as LVGrammar::ParseSentence.
See Also
LVGrammar::ParseSentence
LVGrammar::GetParseTree
LVGrammar_GetNumberOfParses (C API)
Printed Documentation
422
LVGrammar::GetParseTree
Return the parse tree object with the specified index.
Function
LVParseTree GetParseTree(int index)
Parameters
index
The index of the parse tree handle to be returned. It should be in the range [0, LVGrammar::GetNumberOfParses).
Return Values
null
The index is not valid.
non-null
The parse tree handle.
Remarks
This function should be used after a call to LVGrammar::ParseSentence.
See Also
LVGrammar::ParseSentence
LVGrammar::GetNumberOfParses
LVGrammar_CreateParseTree (C API)
LumenVox SpeechRec API
423
LVGrammar::InterpretParses
Generate semantic interpretation results from parses created by previous calls to LVGrammar::ParseSentence.
Function
int InterpretParses(void)
Return Values
integer (>=0)
Number of available interpretations.
Remarks
Before calling this function , you have to call LVGrammar::ParseSentence on that grammar object. Otherwise, that grammar object doesn't contain any parse tree information.
See Also
LVGrammar::ParseSentence
LVGrammar::GetNumberOfInterpretations
LVGrammar::GetInterpretation
LVGrammar_InterpretParses (C API)
Printed Documentation
424
LVGrammar::GetNumberOfInterpretations
Return the number of semantic interpretations created by the most recent call to LVGrammar::InterpretParses.
Function
int GetNumberOfParses(void)
Return Values
integer (>=0)
Number of available interpretations.
Remarks
This function can be used after a call to LVGrammar::InterpretParses. It is provided as a convenience; it returns the same value as LVGrammar::InterpretParses.
See Also
LVGrammar::InterpretParses
LVGrammar::GetInterpretation
LVGrammar_GetNumberOfInterpretions (C API)
LumenVox SpeechRec API
425
LVGrammar::GetInterpretation
Returns the semantic interpretation handle indicated by the index.
Function
LVInterpretation GetInterpretation (int index)
Parameters
index
The index of the interpretation handle to be returned. It should be in the range [0, LVGrammar::GetNumberOfInterpretations).
Return Values
null
The index is not valid.
non-null
The interpretation handle.
Remarks
This function should be used after a call to LVGrammar_InterpretParses.
See Also
LVGrammar::InterpretParses
LVGrammar::GetNumberOfInterpretations
LVGrammar_CreateInterpretation (C API)
Printed Documentation
426
Callback Functions
Logging Callback Function
typedef void (*ExportLogMsg)(const char* String, void* p)
The callback function is called by the speech port with informational and error messages. It is the second parameter to LV_SRE_OpenPort, and LV_SRE_RegisterAppLogMsg, and the first parameter to LVSpeechPort::OpenPort.
p is a pointer to a user-defined class or function which can customize behavior when the engine sends logging messages to the callback.
See Also
LV_SRE_OpenPort
LV_SRE_RegisterAppLogMsg
LVSpeechPort::OpenPort
LumenVox SpeechRec API
427
Streaming Callback Function
typedef void (*LV_SRE_StreamStateChangeFn)(long NewState, unsigned long TotalBytes, unsigned long RecordedBytes, void* UserData)
The callback function is called by the speech port each time a stream status changes. Primarily this is used with streams performing barge-in detection and/or end-of-speech detection to notify hardware to stop playing prompt (barge-in) or stop recording user (end-of-speech).
Parameters
NewState
New state of stream. See Stream Status.
TotalBytes
Total bytes streamed (at point of stream status change), more sound data may still be in the internal unprocessed queue.
RecordedBytes
Total bytes minus data discarded before barge-in was detected.
UserData
Pointer to application defined data.
See Also
LV_SRE_StreamSendData
LV_SRE_StreamGetStatus
Printed Documentation
428
Grammar Logging Callback Function
typedef void (*GrammarLogCB)(const char* message, int error_level, void* user_data)
The callback function is called by the LVGrammar object when an error or warning is generated during the grammar compilation process. The types of errors which can be passed through the callback via the error_level parameter are:
LV_GRAMMAR_LOADING_ERROR -- the grammar could not be loaded from the location provided.
LV_GRAMMAR_SYNTAX_ERROR -- one or more rules or statements in the grammar was badly formed. The message parameter provides more detailed information.
LV_GRAMMAR_SYNTAX_WARNING -- one or more statements in the grammar were either missing, or not strictly conforming to specifications, but the grammar builder was able to recover. The message parameter provides more detailed information.
user_data is a pointer to a user-defined class or function which can customize behavior when the LVGrammar object sends logging messages through the callback.
See Also
LVGrammar_RegisterLoggingCallback
LVGrammar::RegisterLoggingCallback
LumenVox SpeechRec API
429
Constants
Decoder Flags
The engine accepts several different flags for use when calling LV_SRE_Decode (C API) and LVSpeechPort::Decode (C++ API). The flags can be bitwise OR'd ( "|" ) to customize behavior.
LV_DECODE_BLOCK
Normally, calls to the decode function/method will immediately return to allow the client application to continue working on other tasks while the engine processes the data. This flag blocks the client application until the engine has finished.
LV_DECODE_GENDER_MALE
LV_DECODE_GENDER_FEMALE
LV_DECODE_GENDER_MALE and LV_DECODE_GENDER_MALE identify which gender acoustic model to use during decode. If these flags are not specified, the engine automatically decodes each audio file against both gender models. While this slows the engine by requiring two decodes, evaluating against both models has a very significant positive effect on recognition accuracy. Since the engine is multit-hreaded, unless CPU loads are a serious issue, do not use these flags.
LV_DECODE_FIRST_TIME_USER
Reset caller weights in Recognition Engine (not implemented).
LV_DECODE_USE_OOV
Use the Out-Of-Vocabulary filter (OOV) during decode. The OOV filter, when set, processes each audio file against both the grammar specified by the client application, and a special grammar which detects words not in the grammar. If the engine detects these OOV words, it will not return them. Generally, the OOV filter slows the engine down without a large gain in accuracy, so client applications should use the filter only if OOV words seem to be a problem.
LV_DECODE_RETURN_EACH_DIGIT
Printed Documentation
430
When using standard grammars, a string of digits, monetary value etc. is passed back as a single concept. If this flag is used, each digit comes back as a separate concept. (Since each concept has a confidence score, this can be useful for determining poorly recognized individual digits.)
LV_DECODE_SRGS_GRAMMAR
Normally, you do not need to use this flag. But if you want to use a concept-phrase grammar as an SRGS grammar, and are not using the LV_ACTIVE_GRAMMAR_SET, this flag is necessary.
LV_DECODE_SEMANTIC_INTERPRETATION
This flag tells the decoder to process the parse tree return type for semantic information in the tree's tags.
LumenVox SpeechRec API
431
Error Codes
0 LM_SUCCESS No errors.
-1 LM_FAILURE General failure.
-2 LV_SYSTEM_ERROR The speech recognition engine is no longer running. This is the result of a ClosePort call or a unrecoverable engine error.
-4 LV_BAD_SOUND_DATA There was a problem with sound data.
-5 LV_INVALID_SOUND_FORMAT The sound format value is not one of the allowable formats.
-6 LV_TIME_OUT WaitForEngineToIdle's timeout was reached before the engine became idle. Also losing connection to an engine server during decode may return this error code.
-7 LV_GRAMMAR_SET_OUT_OF_RANGE The grammar set value is out of expected range (0-63).
-8 LV_SOUND_CHANNEL_OUT_OF_RANGE The sound channel value out of expected range.
-9 LV_STANDARD_GRAMMAR_ALREADY_LOADED Only one standard grammar can be loaded for a grammar set.
-10 LV_STANDARD_GRAMMAR_OUT_OF_RANGE The standard grammar value is not a recognized grammar type.
-11 LV_NOT_A_VALID_PROPERTY_VALUE The property value is not a valid for the designated property.
-12 LV_BAD_HPORT The specified port handle not valid.
Printed Documentation
432
-13 LV_NOT_IMPLEMENTED The action was not implemented in the current version.
-14 LV_SOCKETS_ERROR General network communication error.
-15 LV_INVALID_PROPERTY_TARGET The target type used in a call to LV_SRE_SetPropertyEx() is invalid for the property given.
-16 LV_INVALID_PROPERTY_VALUE_TYPE The value type used in a call to LV_SRE_SetPropertyEx() is invalid for the property given.
-17 LV_INVALID_PROPERTY The propert supplied in a call to LV_SRE_SetPropertyEx() or LV_SRE_SetProperty() is invalid.
-18 LV_INVALID_PROPERTY_TARGET_NDX When calling LV_SRE_SetPropertyEx() and using a target type of PROP_EX_TARGET_CHANNEL or PROP_EX_TARGET_GRAMMAR the index value was out or range.
-19 LV_STREAM_NOT_ACCEPTED Stream functions called on a stopped stream.
-20 LV_FUNCTION_NOT_FOUND LVSpeechPort_stdcall.dll is a wrapper dll around LVSpeechPortl.dll. If a newer version of the standard call dll is used, it may not find a function in LVSpeechPortl.dll.
-21 LV_STRING_BUFFER_TOO_SMALL The application supplied string buffer was too small.
-22 LV_NO_SERVER_AVAILABLE No engine servers where found to connect to.
-23 LV_GRAMMAR_SYNTAX_WARNING The grammar contained a syntax warning in one or more of its rules or declarations. A specific message from the grammar builder has been logged. The grammar was successfully built, despite the warning.
LumenVox SpeechRec API
433
-24 LV_GRAMMAR_SYNTAX_ERROR The grammar contained a syntax error in one or more of its rules or declarations. A specific message from the grammar builder has been logged. The grammar was not built.
-25 LV_GRAMMAR_LOADING_ERROR The grammar could not be loaded, because a specified url was invalid.
-26 LV_OPEN_PORT_FAILED__LICENSE_EXCEEDED Can not open ports due to exceeding the number of ports allowed by license.
-31 LV_GLOBAL_GRAMMAR_TRANSACTION_PARTIAL_ERROR Global grammar operation failed on some of the servers.
-32 LV_GLOBAL_GRAMMAR_TRANSACTION_ERROR Global grammar operation failed on all servers.
Note:
Not all the error codes are implemented.
Printed Documentation
434
Properties
#define PROP_EX_SAVE_SOUND_FILES 2 #define PROP_EX_LANGUAGE 3 #define PROP_EX_SRE_SERVERS 4 #define PROP_EX_CHOOSE_MODEL 8 #define PROP_EX_SET_SERVER_IP 10 #define PROP_EX_SET_SERVER_PORT 11 #define PROP_EX_SEARCH_BEAM_WIDTH 12 #define PROP_EX_CONCEPT_REPETITION_MIN 13 #define PROP_EX_CONCEPT_REPETITION_MAX 14 #define PROP_EX_ENABLE_LATTICE_CONFIDENCE_SCORE 15 #define PROP_EX_MAX_NBEST_RETURNED 16 #define PROP_EX_DECODE_TIMEOUT 17 #define PROP_EX_MOD_SEL_LOW_THLD 18 #define PROP_EX_MOD_SEL_HIGH_THLD 19
PROP_EX_SAVE_SOUND_FILES
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_PORT
Default Value: 1
Save request and answer files to disk.
Setting to 1 saves request and answer files for each call to Decode to LVLANG\Responses (Win32) or LVRESPONSES/Responses (Linux). Setting to 0 stops saving the files. Turning this property on can quickly fill up a hard drive, but is invaluable for troubleshooting and tuning the application.
PROP_EX_LANGUAGE
Value Types:
PROP_EX_VALUE_TYPE_STRING
LumenVox SpeechRec API
435
Targets: PROP_EX_TARGET_PORT
Default Value: "AmericanEnglish"
The language model to use for decodes.
PROP_EX_SRE_SERVERS
Value Types: PROP_EX_VALUE_TYPE_STRING Targets: PROP_EX_TARGET_CLIENT Default Value: "127.0.0.1:5000"
The list of Speech Engine servers which will handle decodes for this client. A comma (or semicolon) delimited list of IP addresses (and ports) the client will attempt to connect to. Use a colon to separate IPs and Ports. 5000 is the default port.
Example: "127.0.0.1;10.0.0.1:5001;10.10.0.1" Client will attempt to attach to the local machine, port 5000; IP address "10.0.0.1" port 5001; and IP address "10.10.0.1" port 5000.
PROP_EX_SEARCH_BEAM_WIDTH
Value Types:
PROP_EX_VALUE_TYPE_FLOAT_PTR
Targets: PROP_EX_TARGET_CLIENT
PROP_EX_TARGET_PORT
PROP_EX_TARGET_CHANNEL
Default Value: 1e-6
The beam controls how thorough the Speech Engine search is. Legal values can range from 0.0 to 1.0. The smaller the value, the more thorough the search is, leading to potentially more accurate searches, but also leading to more time intensive searches. Use the default at first, and only
Printed Documentation
436
experiment with this value while tuning your application for speed and accuracy. Make small changes only. For instance, try going from 1e-6 to 1e-9, but not 1e-30.
PROP_EX_CONCEPT_REPETITION_MIN
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_GRAMMAR
Default Value: 1
PROP_EX_CONCEPT_REPETITION_MAX
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_GRAMMAR
Default Value: -1 (infinity)
PROP_EX_CONCEPT_REPETITION_MIN and PROP_EX_CONCEPT_REPETITION_MAX control the repeat count of concepts in a concept/phrase grammar. They have no effect on SRGS grammars. Having a grammar such as:
concept "topping" = "pepperoni | olives | sausage | onions | peppers"
With MIN=1 MAX=5, is equivalent to an SRGS grammar
root $toppings; $toppings = $topping<1-5>; $topping = (pepperoni | olives | sausage | onions | peppers);
LumenVox SpeechRec API
437
PROP_EX_ENABLE_LATTICE_CONFIDENCE_SCORE
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_CLIENT
PROP_EX_TARGET_PORT
PROP_EX_TARGET_CHANNEL
Default Value: 1
The lattice based confidence score is a slightly slower, but more accurate confidence score. Set it to 0 to turn off the score.
PROP_EX_CHOOSE_MODEL
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_CLIENT
Default Value: 0
If this property is set to 1, then the client will decide which acoustic model is most appropriate for the server to use, based on a frequency analysis of the speaker's voice. Otherwise, two decodes will be done simultaneously, and an answer will be selected based on which model had better "coverage" for the speaker's voice.
PROP_EX_MOD_SEL_LOW_THLD
Value Types:
PROP_EX_VALUE_TYPE_INT
Printed Documentation
438
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_CLIENT
PROP_EX_TARGET_PORT
PROP_EX_TARGET_CHANNEL
Default Value: 135Hz
PROP_EX_MOD_SEL_HIGH_THLD
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_CLIENT
PROP_EX_TARGET_PORT
PROP_EX_TARGET_CHANNEL
Default Value: 155Hz
When property PROP_EX_CHOOSE_MODEL is set to 1, the engine will use the pitch of input audio to determine which acoustic model to use. If the pitch is lower than PROP_EX_MOD_SEL_LOW_THLD, the low pitch model will be used, while a pitch higher than PROP_EX_MOD_SEL_HIGH_THLD indicates using high pitch model. Any value that falls in between will causes the engine to use both models.
PROP_EX_MAX_NBEST_RETURNED
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
LumenVox SpeechRec API
439
Targets: PROP_EX_TARGET_CLIENT
PROP_EX_TARGET_PORT
PROP_EX_TARGET_CHANNEL
Default Value: 1
The maximum number of n-best result the engine can return. This property is required to be an integer greater than or equal to 1.
PROP_EX_DECODE_TIMEOUT
Value Types:
PROP_EX_VALUE_TYPE_INT
PROP_EX_VALUE_TYPE_INT_PTR
Targets: PROP_EX_TARGET_CLIENT
PROP_EX_TARGET_PORT
PROP_EX_TARGET_CHANNEL
Default Value: 1
The time out value used by LV_SRE_WaitForDecode and LVSpeechPort::WaitForDecode functions.
Printed Documentation
440
Sound Formats
enum SOUND_FORMAT {
UNK_FORMAT = 0, ULAW_8KHZ, PCM_8KHZ, PCM_16KHZ, ALAW_8KHZ,
};
ULAW_8KHZ
-law format at 8000 samples per second. 1 byte per sample. One minute of sound occupies approximately .5 MB's of memory. This is the standard domestic telephone format.
PCM_8KHZ
Pulse code modulated at 8000 samples per second. 2 bytes per sample. One minute of sound occupies approximately 1 MB of memory.
PCM_16KHZ
Pulse code modulated at 16000 samples per second. 2 bytes per sample. One minute of sound occupies approximately 2 MB's of memory. This is the native format of the SRE.
ALAW_8KHZ
-law format at 8000 samples per second. 1 byte per sample. One minute of sound occupies approximately .5 MB's of memory. This is the standard international telephone format.
Note:
LumenVox SpeechRec API
441
We will be adding support for more formats in near future, in particular the standard Windows wave format.
Printed Documentation
442
Standard Grammars
These grammars are deprecated in favor of built-in SRGS grammars.
The standard grammars are built-in grammars, predefined by LumenVox. Using these grammars will return a single concept, formatted appropriately. Only one standard grammar can be active at a time; no concepts can be removed from the standard grammar. The client application can, however, add and remove concepts to the voice channel grammar, which will coexist with the standard grammar.
1 GRAMMAR_DIGITS
String of single digits, like a phone number or pin code. In version 4.0, digits are a separate acoustical model and so only recognize (One, two, three, four, five, six, seven, eight, nine, zero and oh). It ignores application supplied grammar and cannot currently recognize things like "twenty-five or seventeen". This allowed us to obtain extremely low error rate. The number grammar can be used to mix application grammar and digit recognition.
2 GRAMMAR_MONEY
Monetary value.
3 GRAMMAR_NUMBER
Numeric value like 12,000, 24.45 or 35.
4 GRAMMAR_LETTERS
Letters of alphabet for spelling (not implemented).
5 GRAMMAR_DATE
Date values (not implemented).
LumenVox SpeechRec API
443
Semantic Data Type
There are seven semantic data types. They are defined as macros in <LV_SRE_Semantic.h>
SI_TYPE_BOOL
SI_TYPE_INT
SI_TYPE_DOUBLE
SI_TYPE_STRING
SI_TYPE_OBJECT
SI_TYPE_ARRAY
SI_TYPE_NULL
Note: SI_TYPE_NULL is a special type which usually indicates that some error occurred.
Printed Documentation
444
Semantic Data Print Format
These macros are used in the SI_DATA_Print() function to specify the printing format.
SI_FORMAT_XML primitive data types are printed as string literals; objects and arrays are printed as a collection of xml key value pairs.
SI_FORMAT_ECMA primitive data types are printed as string literals; objects and arrays are printed as ecmascript objects.
LumenVox SpeechRec API
445
Stream Parameters
Stream Parameters
STREAM_PARM_SOUND_FORMAT sound format stream handles - uses SOUND_FORMAT enum default value: ULAW_8KHZ
STREAM_PARM_VOICE_CHANNEL voice channel to load streamed sound data to no default - application must set
STREAM_PARM_GRAMMAR_SET grammar set to use with auto decode type streams no default - application must set if STREAM_PARM_AUTO_DECODE active
STREAM_PARM_DECODE_FLAGS decode flags to send with auto decode type streams no default - application must set if STREAM_PARM_AUTO_DECODE active
STREAM_PARM_USE_COMPRESSION use compression internally for sound data data sent to the Speech Engine and data stored to disk will be compressed to approx. 10% of normal size, this adds a small amount of load to the CPU default = 0 (off)
STREAM_PARM_DETECT_BARGE_IN if active, the speech port will discard stream data until barge-in detected default = 0 (off)
STREAM_PARM_DETECT_END_OF_SPEECH if active, the port will stop accepting stream data once end-of-speech is detected, and change stream status to STREAM_STATUS_END_SPEECH if auto_decode also active, will immediately begin decoding as well default = 0 (off)
STREAM_PARM_AUTO_DECODE if active decode will start immediately on end-of-speech detection or a call to
Printed Documentation
446
StopStream(), otherwise the application needs to call Decode to begin decode. default = 0 (off)
STREAM_PARM_BARGE_IN_TIMEOUT The streaming interface will flag STREAM_STATUS_BARGE_IN_TIMEOUT, if no speech was detected in the time frame specify by this property.
STREAM_PARM_END_OF_SPEECH_TIMEOUT After barge-in, the streaming interface will flag STREAM_STATUS_END_SPEECH_TIMEOUT, if it did detect end-of-speech in the time frame specified by this property.
STREAM_PARM_USE_FREQ_VAD. LumenVox Speech Engine API provides two Voice Activity Detection (VAD) algorithms, namely Time-domain VAD (TVAD) and Frequency-domain VAD (FVAD) . While TVAD is faster, FVAD has better performance and more flexibility. Set this parameter to 1 to enable FVAD, 0 to use TVAD. The default value is 1. Note: Each algorithm has its own set of parameters. Please make sure to use the correct parameters in your code. Below is each VAD parameter, along with the algorithm that it works with.
STREAM_PARM_BARGE_IN_BEGIN_DELAY <TVAD> number of 1/8 seconds at begriming of stream to limit barge-in during this period a much higher energy level is required to trigger barge-in this can be useful when echo-cancelled data streamed to port needs time for convergence default = 4 (0.5 seconds)
STREAM_PARM_BARGE_IN_NOISE_COUNT_LOW_THRESHOLD <TVAD> adjuster to strength of signal to trigger barge-in (and end-of-speech) lower number will trigger barge-in at lower volume if using dynamic barge-in adjust, this is the initial value. default = 55 (optimal for telephony applications)
STREAM_PARM_BARGE_IN_DYNAMIC_ADJUST <TVAD> adjust the volume trigger for barge-in dynamically, works best when audio data sent to a port is from the same source. Also works better if the EVENT_START_DECODE_SEQ and EVENT_END_DECODE_SEQ events are sent to port to signify change of audio source (as example a new telephony call is beginning). default = 1 (on)
LumenVox SpeechRec API
447
STREAM_PARM_VAD_BARGEIN_LVL <FVAD> This is Signal-Noise-Ratio (SNR) threshold. An audio frame will be considered for voice activity only when the SNR metric is higher than this threshold. Lower this parameter for noisy channel, so that it is easier to barge in. The default value is 30. Note: this value is not a measurement in dB. It is just a relative value compared to an internal standard.
STREAM_PARM_VAD_EOS_DELAY <FVAD> End-of-speech delay in ms. The default value is 800ms.
STREAM_PARM_VAD_INIT_TIME <FVAD> The FVAD needs to be initialized properly to optimize the performance. The parameter sets the duration of initialization time at the beginning at each audio stream. The default value is 100ms.
STREAM_PARM_VAD_NOISE_FLOOR <FVAD> An audio frame will be considered for voice activity only when the average energy is higher than this threshold. The default value is 0. This parameter is particularly useful when the echo canceler doesn't work very well. When channel noise, background noise or residual echo causes false barge-in, try to raise this threshold to prevent low energy signal from triggering barge-in. The range is from 0 to 999, but in practice you probably won't need to set it above 200.
STREAM_PARM_VAD_WIND_BACK <FVAD> The length of audio to be wound back at the beginning of voice activity. It helps in the situation of weak speech onset. The resolution of this parameter is 1/8 sec, i.e. 125ms, which means setting this value to 249ms is same as setting it to 125ms. The default value is 250ms.
STREAM_PARM_VAD_BURST_THLD <FVAD> The FVAD algorithm triggers barge-in only after it has observed the duration of voice longer than this threshold. This threshold helps preventing bursting noise from triggering barge-in. The default value is 100ms.
STREAM_PARM_VAD_P2A_THLD <FVAD> An audio frame will be considered for voice activity only when the ratio of peak frequency band energy to average energy is higher than this threshold. This is a fine tune parameter. Usually users don't need to modify it. The valid range of this parameter is [0,1000]. The default value is 100.
Printed Documentation
448
Stream Status
STREAM_STATUS_NOT_READY
LV_SRE_StreamStart has not been called for this port.
STREAM_STATUS_READY
Stream is ready to accept data.
STREAM_STATUS_BARGE_IN
Only returned if STREAM_PARM_DETECT_BARGE_IN stream type set. Code has determined that speech has started, stream data is now being stored. (Hardware can stop playing audio when this state is reached.)
STREAM_STATUS_END_SPEECH
Only returned if STREAM_PARM_DETECT_END_OF_SPEECH stream type set. Code has determined that speech has stopped. If STREAM_PARM_AUTO_DECODE stream type has been set the decoding of audio data has begun. (Hardware can stop recording audio when this state is reached.)
STREAM_STATUS_STOPPED
Stream has stopped. Call LV_SRE_StreamStart to reset stream.
STREAM_STATUS_BARGE_IN_TIMEOUT
Barge-in was not triggered before timeout. No audio will be sent for decode.
STREAM_STATUS_END_SPEECH_TIMEOUT
End-of-speech was not detected before timeout. Note, the streaming will not stop until you call StreamStop or StreamCancel.
LumenVox SpeechRec API
449
Environment Variables
Environment Variables
LV_SRE_CLIENT_CONNECT_IP
A comma (or semicolon) delimited list of IP addresses (and ports) the client will attempt to connect to. If this variable does not exists, the client will default to IP 127.0.0.1 (the local machine) and port 5000. Use a colon to separate IPs and Ports.
Example: "127.0.0.1;10.0.0.1:5001;10.10.0.1" Client will attempt to attach to the local machine, port 5000; IP address "10.0.0.1" port 5001; and IP address "10.10.0.1" port 5000.
Win32
The following environment variables need to be set up for the LVSpeechPort.Dll to function. The installation program creates these variables.
LVLANG
Location of the dictionary and language files, stored in two subdirectories: Dict and Responses.
LVBIN
Location of LVSpeechPort.Dll.
The following optional environment variables are set up for creating applications with the LVSpeechPort.DLL. See the LVSpeechPortConsole example program.
LVLIB
Location of LVSpeechPort.Lib
LVINCLUDE
Printed Documentation
450
Location of LVSpeechPort.h
Linux
The following environment variables can be used to override the default locations used by LVSpeechPort.so, and BNF_Dict.so.
LVLANG
Location of the dictionary files, stored in the Dict sub-directory. Default location "/usr/LumenVox".
LVRESPONSE
Location of the answer and response files created at run-time, stored in the Responses sub-directory. Default location "/var/LumenVox".
451
FAQs FAQs
Please email your questions to [email protected].
I cannot get the engine to recognize correctly, or my results have a low confidence.
A good speech recognition application depends on a well designed grammar. A grammar which contains very similar words (like "bit" and "pit") is an inefficient grammar that will hurt accuracy and speed. The engine will take longer as it tests the competing words against the audio. The resulting match will have a lower confidence because of the additional words which are very similar.
What do the confidence scores mean?
The confidence score is a rough measure of how closely the speech matched the phrases in the grammar. The score ranges from 0 - 1000. The higher the score the higher the estimated probability that the result. Typically, an application designer will use the confidence score to make decisions about the quality of a recognition result. For instance, results over 600 might always be accepted, results between 599 and 200 might trigger a confirmation, and results below 200 might be rejected outright. The thresholds to use depend largely on the grammar that is being used. In addition to the grammars, an application's confidence thresholds should be one of the first things to tune.
Do I need a Dialogic card?
Our engine is hardware-independent, so if the client application can collect the audio and put it into a buffer, the engine can decode. Which hardware a particular client application needs depends only on the client application.
How much memory does the Speech Engine need?
The memory requirement for running the Speech Engine is mainly determined by the maximum number of decoder threads. The start up memory usage is about 160MB, including one thread for each acoustic model. After that, each additional thread requires about 20MB. The maximum number of threads are determined by the number of processors. The more processors you have, the more simultaneous threads you can run, consequently the more memory you
Printed Documentation
452
need. In the future, we shall allow users to set the maximum number of threads on the server. Currently, typical memory requirement for running the engine is:
One processor with one acoustic model and 2 threads: 207MB. Dual processors with one acoustic model and 4 threads: 247MB. Quad-processor with one acoustic model and 8 threads: 327MB.
How fast does the computer need to be?
This is dependant on the expected density of your application. The Speech Engine can perform about 14 recognitions per minute per 100 megahertz of processor speed. This calculation is based on a single word 50 item grammar.
What are some ways to increase the recognition accuracy?
Smaller grammars always work better. The practical phrase limit is 2000, but depending on how easily the words in the grammar can be confused, or the number of branches at any point in the grammar, that number could be anywhere from 1000 to 10,000.
Longer phrases also work better. When you need to recognize a phrase like "How do I" or "transfer me to", put these in as a single phrase, not individual words. Except where recognizing a single word, (like "Yes" or "No") avoid single small words.
You can use the ABNF format to cover several variations of small words:
"How (do | would | could) (I | we | you)"
Also, attempt to cover all the words you believe a user will speak. If a word or phrase is not in the grammar, the engine will not be able to identify it.
Will the engine handle proper names?
The internal dictionary has thousands of common names. (Around half of the 120,000 words are names). If a name is not in the dictionary, the decoder will use basic rules to phonetically spell any name.
For unknown names, enter the phonetic spelling of the name if the phonetic speller is unable to come up with a good pronunciation. This has been shown to work in the vast majority of cases. The phonetic spelling can be directly
FAQs
453
entered as the phrase, if necessary, by enclosing the phoneme characters in curly braces "{ }". See Phonemes.
Can I ask for ticker symbols with your recognition engine?
Speaker-independent recognition systems have a hard time with open spelling. This is caused by the very similar sounding letters. For example, b, c, d, e, g, p, t, v and z all end with the sound of 'e'. Dictation software allows spelling because it trains for a single person's voice; many of those products also supply a phonetic alphabet system ("Alpha" for A, "Beta" for B, etc.).
In addition, there are more than sixteen thousand ticker symbols. Many of the symbols are very similar in the way they sound when being spelled out, and thus are hard to correct for:
eeee is the symbol for eMachines, Inc.
cccc is the symbol for Concord Career Colleges Inc.
How can I get around this problem?
Limit the tickers you support.
Breakdown the category of the stock. Make grammars smaller. First ask which stock exchange. Then ask for the symbol. Have a strategy available to disambiguate symbols until the proper answer is found.
What are the languages currently supported?
We currently support North American English. Spanish is the next language planned.
Does/Can LumenVox support language X?
The short answer is that, yes, LumenVox can localize/customize the products to the extent that we can add in different languages for speech recognition. There are two ways to do it:
The first option is very fast and easy to implement. Phonetically spell the (for example) Spanish words using the English phone set. For example, the Spanish word mañana can be entered {M AO N Y AO N AE}. See Phrases and Phonemes for more information on entering raw phonemes as phrases.
Printed Documentation
454
The second option requires a couple of items and more time. Basically, LumenVox needs:
- Lots of audio data in the target language; the amount can vary from 10 hours for male and 10 for female (20 total) for small vocabularies (10 -15 words), to as much as can be collected.
- The same audio data, transcribed as text.
- A machine-readable dictionary in the target language.
The first option is quite easy to implement, but loses some accuracy across very large vocabularies because the target language's sound inventory still different from the English inventory. The second option takes more time and energy to produce, but is quite a bit more accurate.
As a first step, phonetically spell each word so that your organization can test and deploy the application. Then, once you have collected enough audio data, LumenVox can train native language models and quit using the English models entirely.
With some work, LumenVox could adapt the Speech Engine itself so that it displays in a different language, but that is a special case situation.
Why does the engine occasionally recognize my speech in the Female model when I am male?
First, some notes about the "male" and "female" model. The models are entirely statistical, and the separate models just encode a speaker of type 1, and another model that encodes a speaker of type 2. It happens to be that a very useful distinction lies on gender (owing mainly to pitch differences between males and females), but there are men who sound like women and women who sound like men. In addition, it is possible that the particular utterance involved simply had better examples in the other model, so the "wrong" model did a better job of recognizing the speech. Because we trained the two separate models using data divided by gender, we named the models according to their gender as a convenience. In fact, the recognizer has no knowledge as to which gender the speaker is, only which model had the best match.
Do not use the engine to classify speakers according to their sex; the engine is not designed or intended to be used to categorize speakers according to personal characteristics, whether the characteristic is age, sex, dialect, or any other attribute. LumenVox takes NO responsibility for issues arising from using the engine in such a manner.
FAQs
455
Why does the engine always do two decodes, one in a male model, and one in a female model?
Suppose we have two models, a generic male (MM) and generic female model (MF), as well as a Speaker (S1). S1 says something, and the decoder runs two decodes, one against each model, MM and MF. The results break down as follows (for our purposes, correct means "got the right thing" whether the result is the actual string of words, or the right concept):
Case a: MM has the highest score, and the correct answer, MF may or may not return the correct answer.
Case b: MF has the highest score and the correct answer, MF may or may not return the correct answer.
Case c: MM has the highest score but returned the incorrect answer, while MF had the lower score but returned the correct answer.
Case d: MF has the highest score but returned the incorrect answer, while MM had the lower score but returned the correct answer.
Case e: Neither is correct, regardless of score.
For case e, since neither model got the right answer, all we can do is try to make the models better and the system tighter. Cases c and d are the worst case performance; we try to avoid these :). Cases a and b are the hoped-for result, since we get a correct answer. Notice that we never specify which is the "correct model" only the "correct answer". Also, note that for all cases "correct" requires some outside knowledge about which answer was correct. The engine has no such information, and is forced to choose the best answer based on highest score.
The potentially bad results are cases c and d; in this case, the recognizer picks the wrong answer, when it should have gotten the right answer had the engine more knowledge. Fortunately, c and d rarely happen; instead what we have found is that in cases a and b, the speaker's gender frequently does not always match the gender model which had the best answer. But, it doesn't actually matter, since we obtain the correct answer anyway (and we are looking for the answer, not the gender).
Running two decodes (ignoring decode history) allows us to capture each case where, for some reason, the mismatched gender model gets the right answer and the matched one blows up. There are several reasons this might happen: the mismatched model may have better coverage on the acoustics in question,
Printed Documentation
456
the speaker's voice could crack, or the speaker could be sucking on helium, etc. Since some people will waffle between the two different models, given the above, we are better off running two decodes. If we were to select a particular model based on previous history, we would lose the accuracy gain between running two models and letting the system pick the best result.
In addition, the incidence rate for mismatched, but correct answers is quite a bit higher than the incidence rate for mismatched, incorrect answers, which means running two decodes and picking the best result gets a net gain, even given incorrect answers occasionally.
That said, one plausible scenario where a client application might want to cut the second decode is for load balancing. If all 48 ports go active at once (or the system is on a slow machine), it might be better to sacrifice some accuracy to handle more customers quicker. For the systems LumenVox deploys on, we haven't had a problem with running two decodes yet; the load balancing feature is on the short term pipeline and should be online soon.
If the client application wants to track decode models for a caller, there is no restriction against doing so; load-balancing becomes an issue of deciding how many double decodes the application can handle, and then picking a permanent model for that caller/speaker. One thing not to do is to make the decision after only one utterance; let the double decodes continue for a few rounds (at least three or five) and then pick the model which had the highest score the most (the application will also need to take into account whether the decodes were correct). The gender model flags (LV_DECODE_GENDER_MALE, LV_DECODE_GENDER_FEMALE) for LV_SRE_Decode() and LVSpeechPort::Decode() tell the recognizer which model to use for the decode, thus disabling the dual decodes.
Because there is an accuracy gain doing both decodes, we recommend letting the system do both decodes for most applications. If load becomes a serious issue, than disable the double decode system and pick the model the application should use.
What is n-best?
Instead of hypothesizes only one sentence, the engine hypothesizes several sentences on what it heard. Usually the top best sentence is the highest scoring sentence. The others are the top alternative sentences, which scored lower. N-best results can be used to craft more intelligent confirmations.
Why does the API appear to cause a memory leak?
FAQs
457
A common reason that causes the memory usage to grow is keeping loading grammars without unloading them. A good practice is unloading grammars that will not be used for a while.
Also, please exercise caution when using the C API. Most of the handles created by the API, such as H_SI, H_GRAMMAR, and HPORT, need to be explicitly released after you were done using them.
458
How to Contact LumenVox LLC Web site: www.LumenVox.com Email: [email protected] Sales: [email protected] Support: [email protected] Phone: (858) 707-0707 Fax: (858) 707-7072
LumenVox LLC 3615 Kearny Villa Road, Suite # 202 San Diego, CA 92123
459
Copyright Information Copyright 2001, 2002, 2003, 2004, 2005 LumenVox LLC. All rights reserved.
460
Glossary
C Concept: The string value returned by the decoder. The decoder can return
mutiple concepts. A concept represents words or phrases grouped together under single a "heading".
P Phrase: A word or series of words. Can also include BNF formated words and/or
pure phonemes.
S SISR: Semantic Interpretation for Speech Recognition; A companion to SRGS
grammars, this working draft describes a process for turning sentences recognized by an ASR into data objects usable by an application.
SRGS: Speech Recognition Grammar Specification; a W3C recommendation for the format of grammars used in a speech recognizer.
461
Index A
AddPhrase .......................... 109, 307
Asynchronously................... 120, 326
B
Backus Naur Form........................ 78
BNF............................................... 78
C
Callback Function ....................... 425
Cautions........................................ 80
ClosePort .............................. 89, 271
Concept ..... 109, 111, 131, 294, 307, 309
confidence value................. 132, 295
Contact Us .................................. 457
Copyright Information.................. 458
D
Decode ............... 118, 130, 272, 293
dictionary ...................................... 78
E
email............................................457
Environment Variables ................448
F
FAQ.............................................450
G
GetConcept .........................131, 294
GetConceptScore130, 132, 293, 295
Grammar .....................112, 310, 441
GRAMMAR_DIGITS............113, 302
GRAMMAR_LETTERS .......113, 302
GRAMMAR_MONEY ..........113, 302
GRAMMAR_NUMBER........113, 302
I
Invalid Error Code ...............139, 311
L
LoadStandardGrammar.......113, 302
LoadVoiceChannel..............116, 305
Printed Documentation
462
Logging ................... 86, 90, 268, 425
LV_DEFAULT_GRAMMAR_ALREADY_LOADED........................... 430
LV_DEFAULT_GRAMMAR_OUT_OF_RANGE ............................... 430
LV_GRAMMAR_SET_OUT_OF_RANGE ........................................ 430
LV_INVALID_SOUND_FORMAT 430
LV_RESET ................................. 430
LV_SOUND_CHANNEL_OUT_OF_RANGE ................................... 430
LV_STANDARD_GRAMMAR_ALREADY_LOADED ................ 113, 302
LV_STANDARD_GRAMMAR_OUT_OF_RANGE..................... 113, 302
LV_SYSTEM_ERROR................ 430
LV_TIME_OUT ........... 120, 326, 430
LVBIN ......................................... 448
LVINCLUDE................................ 448
LVLANG...................................... 448
LVLIB .......................................... 448
LVRESPONSE............................ 448
LVSpeechPort............................. 261
M
MillisecondsToWait .............120, 326
O
OpenPort...............................86, 268
P
pcm .....................................116, 305
PCM_16KHZ ...............................439
PCM_8KHZ .................................439
Phonemes .....................................75
Phonetic Spelling ..........................75
Phrase...........................78, 109, 307
Port........................................86, 268
Properties....................................433
Q
Questions ....................................450
R
RemoveConcept .................111, 309
ResetGrammar....................112, 310
ReturnCode.........................139, 311
ReturnErrorString ................139, 311
Index
463
S
scoring ................................ 132, 295
Sound Formats ........................... 439
speech port ........................... 86, 268
Standard Grammars ................... 441
StandardGrammar .............. 113, 302
Subdirectories............................. 448
T
Technical Support ....................... 457
TRIM_SILENCE_VALUE ............433
U
Ulaw ............................116, 305, 439
U-law...........................................439
ULAW_8KHZ...............................439
V
VoiceChannel......................116, 305
W
WaitForEngineToIdle...........120, 326