wake-up-word speech recognition:

36
Wake-Up-Word Speech Recognition: A Missing Link to Natural Language Understanding Dr. Veton Këpuska ECE Department [email protected]

Upload: alvin-hess

Post on 31-Dec-2015

94 views

Category:

Documents


0 download

DESCRIPTION

Wake-Up-Word Speech Recognition:. A Missing Link to Natural Language Understanding Dr. Veton Këpuska ECE Department [email protected]. What is: Wake-Up-Word Recognition. Wake-Up-Word ( WUW ) Speech/Voice Recognition ( SR ): - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Wake-Up-Word  Speech Recognition:

Wake-Up-Word Speech Recognition:

A Missing Link to Natural Language Understanding

Dr. Veton KëpuskaECE [email protected]

Page 2: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 2

What is: Wake-Up-Word Recognition

Wake-Up-Word (WUW) Speech/Voice Recognition (SR):Automatic Speech Recognition Task of identifying a single word/phrase in a continuous free speech – Correct Recognition (e.g.):

<HAL> – Arthur Clark’s “Space Odyssey 2001”, <Computer> – Capt. Pickard’s Star Trek’s computer on

the starship “Enterprise”, or <Operator> – Capt. Këpuska’s WUW-SR System

& more importantly

Automatic Recognition of any other noise/sound/word/phrase etc. NOT to be that WUW – Correct Rejection.

Page 3: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 3

WUW-SR

WUW-SR Requires Continuous Monitoring of Speech

WUW can be used to:Get Attention,Provide/Change Context,Resynchronize Communication

Mimic Human to Human Interaction and Communication that currently is not possible, &Provides for significantly more efficient Solution (Memory and CPU) vs. any Natural Language Understanding System.

It is a mode of communication that would enable more natural interaction of man and machine.

Page 4: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 4

Natural Language Understanding (NLU) Task

Massachusetts Institute of Technology’s (MIT’s) Spoken Language Systems Laboratory’s mission statement states:

“Our goal is both simple and ambitious – create technology that makes it possible for everyone in the world to interact with computers via natural spoken language. Conversational interfaces will enable us to converse with machines in much the same way that we communicate with one another and will play a fundamental role in facilitating our move toward an information-based society”.

To achieve this goal, SR and NLU communities implicitly position the solution to WUW problem in the context of solving overall natural language understanding problem.

When a system that can understand the whole language is developed, the WUW problem will be solved.

Page 5: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 5

Natural Language Understanding Task - Problem

There are two major problems with the approach that requires solving the WUW problem within a general framework of the speech and natural language understanding system:

Is an expensive solution (CPU, memory, etc.)

It does not exist yet because it is very difficult to achieve.

Even if it is possible to develop NLU Systems close to human capabilities – WUW is still needed (see previous slide 3).

Page 6: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 6

WUW-SR Acoustic-Linguistic Context

Current Implementation of WUW recognizes how he/she intuitively would use a proper name to get attention:

It does not respond to other contexts where the same word (e.g., “OPERATOR”) is used for other purposes.

What are other WUW contexts?

Page 7: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 7

Wizard of Oz Experiment (NSF 05-551 Proposal)

Study possible uses of WUW in human-to-human communication.Collaboration with:

Dr. Deborah Carstens – Human Machine Interface Specialist (FIT - Management Information Systems) Dr. Ron Wallace – Bio-Behavioral Anthropology and English Language (UCF).

Department of Psychology – Behavior Analysis Laboratory.

Page 8: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 8

History of Wake-Up-Word Speech Recognition

Wildfire of Waltham Massachusetts: Introduced rudimentary capability for Wake-Up-Word (WUW) Recognition through Personal Assistant application in mid 90’s.

At that time the solution was not recognized nor was developed as being a WUW-SR problem.

Application was restricted to specific word:“Wildfire”

This custom solution did not perform sufficiently well and thus Wildfire does not exist any longer.

Page 9: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 9

History of Wake-Up-Word Speech Recognition (cont.)

Këpuska generalized and introduced a novel way of performing WUW Recognition while at ThinkEngine Networks, Marlborough, MA (2001-2003) Recognition performance of the patented solution allows practical application of WUW for any suitable word (e.g., Verizon’s “IOBI” project).Demonstration uses fixed point DSP implementation simulated in Windows platform.New generation of WUW-SR system using floating-point C++ implementation almost ready for prime time.Simulations of floating-point system indicate significant improvement over the fixed point implementation

Page 10: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 10

Wake-Up-Word Speech Recognition Technology

~26000 Number of Lines of Fixed Point Implementation of C Code & Model Data.

Uses Dynamic Time Warping Algorithm for Pattern Matching (DTW)

Features are based on Mel-Scale Cepstral Coefficients (MFCC) + Delta’s and Second Order Delta’s

Uses single Speaker Independent Model.

Achieves high density on DSP

Page 11: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 11

WUW-SR System: Initial Development

ThinkEngine Networks, Marlborough, MA84 Simultaneous Channels of WUW Recognition on each fixed point TI’s TMS320C205 DSP

200MHzMemory Space:

64K Byte Program64K Byte Data2M Byte External Data

Total of 672 Channels with farm of 8 DSPsRecognition Rate >95% with ~0% False Acceptance.

Page 12: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 12

Solution: 3 Patented Inventions

Fundamental Contribution to Pattern RecognitionPatent Application 13323-009001 - 10/152,095: “Dynamic Time Warping (DTW) Matching”

Extended DTW Matching.Patent Application 13323-010001 - 10/152,447: “Rescoring using Distribution Distortion Measurements of Dynamic Time Warping Match”

Feature Based Voice Activity Detector (VAD)Patent Application 13323-011001 - 10/144,248: “Voice Activity Detection Based on Cepstral Features”

Page 13: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 13

WUW Fixed-Point System Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 20 40 60 80 100

[%]

Confidence Score (0-100)%

Distribution Plot of Confidence Scores for WUW "Operator"

INV

INV-CUMMULATIVE

OOV

OOV-CUMMULATIVE

Operating Threshold

Equal Error Rate

Page 14: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 14

WUW-SR Development Status

Implemented C++ ETSI-MFCC Front End:Extraction of Mel-Filtered Cepstral CoefficientsStandard Processing Technique to be used as a baseline

C++ Framework and applied implementation emphasizes modularity to facilitate researchImplemented Dynamic Time Warping (DTW) as a Back-End of the Recognition system.Integrated Perl scripts to automate model building and accuracy testing procedures.

Includes automatic graph generation

Page 15: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 15

Front-EndFront-End

VADVAD

Back-EndBack-End

Current Architecture of WUW-SR System

Page 16: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 16

Performance of WUW-SR Floating Point System

Page 17: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 17

WUW-SR System Performance

How is it possible to achieve this performance? Considering:

Single Speaker Independent Model for WUW

No Additional Modeling for other acoustic events: noise/tone/sound/word/phrase

Clever use of Two-Pass Scoring

Page 18: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 18

Usual Recognition Scoring: First Score

Standard “First” Recognition Score Performance

Lowest Score of an OOV Sample

Page 19: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 19

“Second” Score is NOT-Independent from the “First” Score

Distribution of Second Score as Function of First Score

Lowest Score of an OOV Sample

Page 20: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 20

How to Obtain “Second” Score?

All modern Speech Recognition Systems use multiple scoring techniques:

Re-scoring N-best hypothesis to Improve Correct Recognition based on:

More elaborate recognition algorithmBaum-Welch Forward-Backward HMM Scoring vs.Viterbi Scoring

Different FeaturesMFCC (Mel-scale Filtered Cepstral Coefficients)RASTA-PLP (Relative Spectral Transform - Perceptual Linear Prediction)Other Proprietary front-end’s

Re-scoring using additional models (of non-WUW’s) to improve Correct Rejection (“Garbage Models”)

Page 21: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 21

WUW-SR System

Uses Proprietary solution thatDoes not require additional “Garbage Models” to increase robustness and Correct Rejection Rate, e.g.,

It is model independent, and even

It is matching algorithm independent (DTW, HMM, Graphical Modeling, or any other paradigm).

Page 22: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 22

What Next?

WUW-SR: Useful technology for numerous applications:

“Voice Activated” Car Navigation SystemCurrent Solutions apply mixed interfaces: Driver must press a button while speaking to the system.

Dictation Systems: Require lunching the application and “informing” the system when dictation is “on” and when is “off”.

PDA – removing stylus as necessary interface tool.

Keyboard-less laptop computers.

“Smart Rooms”

Page 23: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 23

Smart Room Application25

'-0"

25

12

3

45

6

78

9

*8

#

90909090909090

65

28

Microphonearray

<Percolating Sound>

“Yes Master”<Percolating Sound>

9090909090

Wake-Up-Word SpeechRecognition

System

“COMPUTER”Play Todd Agnew CD

“COMPUTER”!Play Todd Agnew CD

Page 24: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 24

Microphone Arrays

Applied Perception Laboratory CE313

Page 25: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 25

Noise Removal

First Place at UML-ADI Competition June, 2005.

Developed Wiener Filter Nose Removal and implemented on Analog Devices “Shark” DSP:

Page 26: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 26

Speech Processing and Recognition System Architecture

Host PCEZ-Kit Lite

Sharc ProcessorAD21161N

Microphone Speakers

Speakers

48 kHz to 8 kHz Down-sampling with 70 Tap FIR FilterWiener Filter Based Noise Removal:

Switch Controlled Activation of the De-noising AlgorithmAutomatic Gain Control:

Switch Controlled Activation of the AlgorithmLED Indicate the processing state of the System

Wake-Up-Word Speech Recognition Software

•~26000 Lines of Speech Recognition Engine Code & Model Data in C.

•~5000 Lines of Embedded C code

Page 27: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 27

Experimental Results Windows PC

Noisy test file:

After de-noise:

Page 28: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 28

Experimental Results Windows PC

Footloose:

Not Footloose:

Page 29: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 29

Results: why didn’t this work?

Hair dryer:

Still there?!?!:

Page 30: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 30

Experimental Results Windows PC

Hair dryer:

Gone:

Page 31: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 31

Experimental Results on DSP

Brown Noise Example:

Page 32: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 32

Experimental Results on DSP

Drill Test

Page 33: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 33

Experimental Results on DSP

Closer Drill Noise

Page 34: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 34

Experimental Results on DSP

Brown Noise + Drill

Page 35: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 35

Research: Tools Development

MATLAB (NSF EMD-MLR), perl, gnuplot

Page 36: Wake-Up-Word  Speech Recognition:

April 19, 2023 Dr. Veton Këpuska Slide 36

What is missing?

In need of more of highly motivated students.

No news there!

Business opportunities and ventures need to be considered.

Help, advice, … welcome.