gestures and lip shape integration for cued speech recognition
DESCRIPTION
TRANSCRIPT
Gestures and Lip Shape Integration
for
Cued Speech Recognition
Seminar By:
Mohammed Musfir
ECE-B, 08104131
Seminar Coordinator:
Mr. Rino P. C. Assistant Professor, ECE
Seminar Guide:
Mr. Edet Bijoy K. Assistant Professor, ECE
02/12/2011 2
02/12/2011 3
02/12/2011 4
02/12/2011 5
Overview of Presentation
Objective
Introduction
ASR Techniques
Lip Reading – AVSR
Cued Speech
Integrated Recognition
Conclusion
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 6
Objective
Developments in ASR technique
AVSR Accessibility solution
Lip Detection
Cued Speech detection
Integration of both
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 7
INTRODUCTION
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 8
Briefing ASR
First successful system in 1970
Consist of two systems
ASR – Transcribe
SU- Understand transcription
Knowledge Intensive
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 9
ASR TECHNIQUES
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 10
ASR Industry
Industry pioneers – NUANCE, NTT Labs, AT
& T labs
MIT and GPL – Vox Forge, Gvoice
Desktop Dictation -1990
Types of ASR
DVI – Word or phrase spotting
LVCSR- Several thousands words
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 11
Techniques
Sequence of sounds
ASR involves
Acquisition - Recording
Feature Extraction – Spectral analysis
Pattern matching and decoding
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 12
Techniques
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 13
Approaches
Template Based
Knowledge Based
Statistical
Learning based
Artificial Intelligence
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 14
LIP READING
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 15
Lip Reading - AVSR
Front end Lips detection
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 16
Localisation and Tracking
ROI determination – Sobel Edge Filtering
Kalman Filter – Tracking
Principal Component Analysis – Feature
Coefficients
Audio feature - MFCC
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 17
CUED SPEECH
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
Overview of Cued Speech
02/12/2011 18 GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 19
INTEGRATION
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 20
Steps
Lip feature extraction
Audio Synchronization with the Image
Multistream HMM Fusion – State Synchronous
Decision
Automatic Image Processing to record the CUEs
Lip Width, Aperture, Area, Upper pinch and
Lower Pinch
Modeling - 8 lip parameters and 10 hand
parameters
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 21
Fusion
Feature Fusion – Concatenation
𝑂𝑡𝐿 𝐻 = [𝑂𝑡
𝐿 𝑇
, 𝑂𝑡𝐻 𝑇
]𝑇 ∈ 𝑅𝐷
𝑂𝑡𝐿 𝐻 - Lip hand feature vector
𝑂𝑡𝐿 𝑇
- Lip shape feature vector
𝑂𝑡𝐻 𝑇
- Hand feature vector
D - Dimensionality
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 22
Conclusion
Cued Speech Recognition – 80% accuracy
Outstands ASR in normal environment
Visual mode – Education of the hearing impaired
Phoneme recognition successful
Another product over SIRI
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 23
Reference 1. Baum L.E., Petrie T., “Statistical Inference for Probabilistic functions of Finite-State Markov
Chains”, Annotated Mathematical Statistics, Volume 37, Number 6, pp.1554-1563, 1966
2. XiaoZheng Zhang, Charles C. Broun, Russell M. Mersereau, Mark A. Clements, “Automatic
speech reading with applications to human computer interfaces”, Eurasip Journal on Applied
Signal Processing, Volume 2002, Issue 11, pp. 1228-1247.
3. Jian-Ming Zhang, Liang-Min Wang, De-Jiao Niu,Yong-Zhao Zhan, “Research and
implementation of a real time approach to lip detection in video sequence”, International
Conference on Machine Learning and Cybernetics, IEEE, 2003.
4. Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md Saifur Rahman, “Speaker
identification using Mel frequency cepstral coefficients”, 3rd International Conference on
Electrical And Computer Engineering, ICECE 2004.
5. P. Dreuw, D. Rybach, T. Deselaers, M. Zahedi, and H. Ney, “Speech recognition techniques
for a sign language recognition system,” In Proceedings of Interspeech, pp. 2513–2516, 2007.
6. A. A. Montgomery and P. L. Jackson, “Physical characteristics of the lips underlying vowel lip
reading performance,” Journal of the Acoustical Society of America, Volume 73, Number 6,
pp. 2134–2144, 1983.
7. J. Leybaert, “Phonology acquired through the eyes and spelling in deaf children,” Journal of
Experimental Child Psychology, Volume 75, pp. 291–318, 2000.
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N
02/12/2011 24
THANK YOU
GE
ST
UR
E A
ND
LIP
SH
AP
E I
NT
EG
RA
TIO
N F
OR
CU
ED
SP
EE
CH
RE
CO
GN
ITIO
N