juli 2010 enterface application : surveillance in trains video, audio processing sound localization,...
TRANSCRIPT
Juli 2010 eNTERFACE
Application : Surveillance in trains
Video, Audio processingSound localization, pattern rec.
Juli 2010 eNTERFACE
Lip reading
Facial expression recognition
Automatic recognition of facial expressions and lipreading using vector flow
Model based approach
Juli 2010 eNTERFACE
What makes visual speech recognition so hard?
Visemes Smaller word separability Speech info in audio > Speech info in
video
Juli 2010 eNTERFACE
Lip-reading by Humans
People recognize speech better when the signal is both auditory and visual
The difference inrecognition ratesgrows with thelevel of noise inthe environment
0102030405060708090
100
noisy clear
S/N (dB)% c
orr
ect
resp
on
ses
A A+V
Juli 2010 eNTERFACE
Inspiration
In the 1968 Stanley Kubrick film 2001: A space odyssey the computer reads from the lip-movements the conversation of two astronauts.
Thirty years later automated lip-reading becomes a significant part of research in speech recognition systems.
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
New speech corpusAV speech
corpus
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Databases of different quality and resolution
Juli 2010 eNTERFACE
Recording a new speech corpusAV speech
corpus
Visemes|Corpus|Tracking|Features
Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations
Juli 2010 eNTERFACE
Recording a new speech corpusAV speech
corpus
Visemes|Corpus|Tracking|Features
Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations
Juli 2010 eNTERFACE
New speech corpus Dutch Recorded at high-speed: 100 fps Front and profile views included 70 people
49 male, 21 female Students, professors,
secretaries, friends Utterances:
Sentences, digits, spelling, conversation starters/endings, open questions
Normal, fast, whispering
AV speechcorpus
Visemes|Corpus|Tracking|Features
Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations
Juli 2010 eNTERFACE
New speech corpusAV speech
corpus
Visemes|Corpus|Tracking|Features
Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations
Juli 2010 eNTERFACE
Lip-reading by Humans
People recognize speech better when the signal is both auditory and visual
The difference inrecognition ratesgrows with thelevel of noise inthe environment
0102030405060708090
100
noisy clear
S/N (dB)% c
orr
ect
resp
on
ses
A A+V
Juli 2010 eNTERFACE
ISFER WorkbenchExamples (continued)
Juli 2010 eNTERFACE
Active Contours Internal and external
energies Internal energy forces
contour to shrink Locally defined
external energy forces the contour to stop at the edge of the mouth
Computationally cheap Sensitivity to initial
setting of the contour7
9
810
12
13 13 1113
13
11
10 9 7
10
8
7
6
8
6
5
Juli 2010 eNTERFACE
Template Matching Internal and external
energies Internal energy forces template
to maintain geometry Globally defined external
energy forces appropriate placement on the picture
Better results than with snakes Integration of energy functions at each step
can be very time consuming
Juli 2010 eNTERFACE
Model
Goal: lip-reading Needed:
accurate description of visible parts of articulatory system
Accurate description of the shape of the mouth: measurements of the distance of the lip to a
center of the mouth measurements of thickness of visible part of
the lips
Juli 2010 eNTERFACE
Data processingFiltered image
- intensity distribution- center of mouth
Image in polar coordinates
Conditional distribution
Mean and variance functions
(continued)
yxI , EYEX ,
cos,sin,ˆ rEYrEXIrI
rGaussrI m ,,ˆ
mM V
Juli 2010 eNTERFACE
Data visualization
Single frame
data vector:
181181 , mm
Juli 2010 eNTERFACE
Results of Experiments
Feed Forward BP
Vanmiddag komt de pianostemmer langs om mijn vleugel te stemmen
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Tracking the face – Optical flow
Capturing apparent motion of subsequent images in a grid of motion vectors
Advantages No lip model required Good at capturing motion
Disadvantage Slow
Face tracking
Juli 2010 eNTERFACE
Tracking the face – Lip Geometry Estimation
Applying some color filters and capturing the lip contours in polar coordinates
Advantages No lip model required More or less person-independent
Disadvantage Not robust to external factors
Face tracking
Juli 2010 eNTERFACE
Tracking the face – Active Appearance Models Point tracking according to a statistical lip
model
Disadvantage Requires annotated training images
Advantages Robust against external factors Fast!
Face tracking
Juli 2010 eNTERFACE
Active Appearance Models – Design of the lip model
Face tracking
Juli 2010 eNTERFACE
AAM model point coordinatesFace
tracking
Juli 2010 eNTERFACE
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Features plotted for“F”
Feature extraction
time (frames)
Juli 2010 eNTERFACE
5-states HMM
Juli 2010 eNTERFACE
Automatic bi-modal human emotion recognition
Automatic recognition of facial expressions using active Appearance model
Model based approach
Juli 2010 eNTERFACE
Face localization
Juli 2010 eNTERFACE
User-interface prototype iCat tohelp users in daily tasks.
Juli 2010 eNTERFACE
M.A.E.L.I.A. Our digital cat
H.C.I. Group
Juli 2010 eNTERFACEH.C.I. Group
Juli 2010 eNTERFACE
H.C.I. Group
Juli 2010 eNTERFACE
Requirements in other words…Are you out of your mind? I am sleeping!!!
Get a life! I am still sleeping!
I am so bored! I
wish I had a companion!
7:00 AM 8:00 AM
11:00 AM 14:00 AM
I feel so lonely!!! I am very sad and depressed.
16:00 AM
Finally I have a friend! I am so happy and I even managed to pick up the bone! Wow!!!
AIBO! Bring me my
newspaper!!!
AIBO! Let’s play!!! Follow
me
AIBO! Let’s play!!! Follow
me
Juli 2010 eNTERFACE
Multimodal Communication
Uh, ….
I have no time to do anything with you
Hello,
do you like to chat with me ?
Uh, what a nerd
I want a date
She looks nice
Juli 2010 eNTERFACE
Multi-modal interaction
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Would you like to join mefor a dinner ?
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Juli 2010 eNTERFACE
Chat-session A cup of tea? Mmh, njeh, I don’t like tea. What’s wrong with tea? Tea makes me sick. That’s nonsense!! And my sister doesn’t like you too! She is very disappointed!! Hihi, I was joking!!! Oh, that’s funny!!!
Juli 2010 eNTERFACE
Chat-session (f) A cup of tea? : - ) (m) Mmh, njeh, I don’t like tea. (: - (
(f) What’s wrong with tea? : - o (m) Tea makes me sick. % - \
(f) That’s nonsense!! : - l l (f) My sister doesn’t like you too! : - l l (f) She is very disappointed!! : - ( (m) Hihi, I was joking!!! ; - ) (f) Oh, that’s funny!!! : - ]
Juli 2010 eNTERFACE
A cup of tea?
: - )
Juli 2010 eNTERFACE
Mmh, njeh, I don’t like tea.
(: - (
Juli 2010 eNTERFACE
What’s wrong with tea?
: - o
Juli 2010 eNTERFACE
Tea makes me sick.
% - \
Juli 2010 eNTERFACE
That’s nonsense!!
: - l l
Juli 2010 eNTERFACE
My sister doesn’t like you too!
: - l l
Juli 2010 eNTERFACE
She is very disappointed!!
: - (
Juli 2010 eNTERFACE
Hihi, I was joking!!!
; - )
Juli 2010 eNTERFACE
Oh, that’s funny!!!
: - ]