gestures and lip shape integration for cued speech recognition

Gestures and Lip Shape Integration

for

Cued Speech Recognition

Seminar By:

Mohammed Musfir

ECE-B, 08104131

Seminar Coordinator:

Mr. Rino P. C. Assistant Professor, ECE

Seminar Guide:

Mr. Edet Bijoy K. Assistant Professor, ECE

02/12/2011 2

02/12/2011 3

02/12/2011 4

02/12/2011 5

Overview of Presentation

Objective

Introduction

ASR Techniques

Lip Reading – AVSR

Cued Speech

Integrated Recognition

Conclusion

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 6

Objective

Developments in ASR technique

AVSR Accessibility solution

Lip Detection

Cued Speech detection

Integration of both

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 7

INTRODUCTION

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 8

Briefing ASR

First successful system in 1970

Consist of two systems

ASR – Transcribe

SU- Understand transcription

Knowledge Intensive

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 9

ASR TECHNIQUES

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 10

ASR Industry

Industry pioneers – NUANCE, NTT Labs, AT

& T labs

MIT and GPL – Vox Forge, Gvoice

Desktop Dictation -1990

Types of ASR

DVI – Word or phrase spotting

LVCSR- Several thousands words

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 11

Techniques

Sequence of sounds

ASR involves

Acquisition - Recording

Feature Extraction – Spectral analysis

Pattern matching and decoding

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 12

Techniques

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 13

Approaches

Template Based

Knowledge Based

Statistical

Learning based

Artificial Intelligence

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 14

LIP READING

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 15

Lip Reading - AVSR

Front end Lips detection

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 16

Localisation and Tracking

ROI determination – Sobel Edge Filtering

Kalman Filter – Tracking

Principal Component Analysis – Feature

Coefficients

Audio feature - MFCC

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 17

CUED SPEECH

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

Overview of Cued Speech

02/12/2011 18 GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 19

INTEGRATION

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 20

Steps

Lip feature extraction

Audio Synchronization with the Image

Multistream HMM Fusion – State Synchronous

Decision

Automatic Image Processing to record the CUEs

Lip Width, Aperture, Area, Upper pinch and

Lower Pinch

Modeling - 8 lip parameters and 10 hand

parameters

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 21

Fusion

Feature Fusion – Concatenation

𝑂𝑡𝐿 𝐻 = [𝑂𝑡

𝐿 𝑇

, 𝑂𝑡𝐻 𝑇

]𝑇 ∈ 𝑅𝐷

𝑂𝑡𝐿 𝐻 - Lip hand feature vector

𝑂𝑡𝐿 𝑇

- Lip shape feature vector

𝑂𝑡𝐻 𝑇

- Hand feature vector

D - Dimensionality

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 22

Conclusion

Cued Speech Recognition – 80% accuracy

Outstands ASR in normal environment

Visual mode – Education of the hearing impaired

Phoneme recognition successful

Another product over SIRI

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 23

Reference 1. Baum L.E., Petrie T., “Statistical Inference for Probabilistic functions of Finite-State Markov

Chains”, Annotated Mathematical Statistics, Volume 37, Number 6, pp.1554-1563, 1966

2. XiaoZheng Zhang, Charles C. Broun, Russell M. Mersereau, Mark A. Clements, “Automatic

speech reading with applications to human computer interfaces”, Eurasip Journal on Applied

Signal Processing, Volume 2002, Issue 11, pp. 1228-1247.

3. Jian-Ming Zhang, Liang-Min Wang, De-Jiao Niu,Yong-Zhao Zhan, “Research and

implementation of a real time approach to lip detection in video sequence”, International

Conference on Machine Learning and Cybernetics, IEEE, 2003.

4. Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md Saifur Rahman, “Speaker

identification using Mel frequency cepstral coefficients”, 3rd International Conference on

Electrical And Computer Engineering, ICECE 2004.

5. P. Dreuw, D. Rybach, T. Deselaers, M. Zahedi, and H. Ney, “Speech recognition techniques

for a sign language recognition system,” In Proceedings of Interspeech, pp. 2513–2516, 2007.

6. A. A. Montgomery and P. L. Jackson, “Physical characteristics of the lips underlying vowel lip

reading performance,” Journal of the Acoustical Society of America, Volume 73, Number 6,

pp. 2134–2144, 1983.

7. J. Leybaert, “Phonology acquired through the eyes and spelling in deaf children,” Journal of

Experimental Child Psychology, Volume 75, pp. 291–318, 2000.

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

02/12/2011 24

THANK YOU

GE

ST

UR

E A

ND

LIP

SH

AP

E I

NT

EG

RA

TIO

N F

OR

CU

ED

SP

EE

CH

RE

CO

GN

ITIO

N

gestures and lip shape integration for cued speech recognition

Technology

lip shape integration

cued speech recognitionseminar

lip parameters

cues lip width

systems asr

types of asr dvi word

eurasip journal

statistical learning