juli 2010 enterface application : surveillance in trains video, audio processing sound localization,...

57
Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec.

Upload: kareem-fax

Post on 31-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Application : Surveillance in trains

Video, Audio processingSound localization, pattern rec.

Page 2: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Lip reading

Facial expression recognition

Automatic recognition of facial expressions and lipreading using vector flow

Model based approach

Page 3: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

What makes visual speech recognition so hard?

Visemes Smaller word separability Speech info in audio > Speech info in

video

Page 4: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Lip-reading by Humans

People recognize speech better when the signal is both auditory and visual

The difference inrecognition ratesgrows with thelevel of noise inthe environment

0102030405060708090

100

noisy clear

S/N (dB)% c

orr

ect

resp

on

ses

A A+V

Page 5: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Inspiration

In the 1968 Stanley Kubrick film 2001: A space odyssey the computer reads from the lip-movements the conversation of two astronauts.

Thirty years later automated lip-reading becomes a significant part of research in speech recognition systems.

Page 6: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 7: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

New speech corpusAV speech

corpus

Page 8: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 9: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Databases of different quality and resolution

Page 10: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Recording a new speech corpusAV speech

corpus

Visemes|Corpus|Tracking|Features

Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Page 11: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Recording a new speech corpusAV speech

corpus

Visemes|Corpus|Tracking|Features

Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Page 12: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

New speech corpus Dutch Recorded at high-speed: 100 fps Front and profile views included 70 people

49 male, 21 female Students, professors,

secretaries, friends Utterances:

Sentences, digits, spelling, conversation starters/endings, open questions

Normal, fast, whispering

AV speechcorpus

Visemes|Corpus|Tracking|Features

Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Page 13: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

New speech corpusAV speech

corpus

Visemes|Corpus|Tracking|Features

Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Page 14: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Lip-reading by Humans

People recognize speech better when the signal is both auditory and visual

The difference inrecognition ratesgrows with thelevel of noise inthe environment

0102030405060708090

100

noisy clear

S/N (dB)% c

orr

ect

resp

on

ses

A A+V

Page 15: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

ISFER WorkbenchExamples (continued)

Page 16: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Active Contours Internal and external

energies Internal energy forces

contour to shrink Locally defined

external energy forces the contour to stop at the edge of the mouth

Computationally cheap Sensitivity to initial

setting of the contour7

9

810

12

13 13 1113

13

11

10 9 7

10

8

7

6

8

6

5

Page 17: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Template Matching Internal and external

energies Internal energy forces template

to maintain geometry Globally defined external

energy forces appropriate placement on the picture

Better results than with snakes Integration of energy functions at each step

can be very time consuming

Page 18: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Model

Goal: lip-reading Needed:

accurate description of visible parts of articulatory system

Accurate description of the shape of the mouth: measurements of the distance of the lip to a

center of the mouth measurements of thickness of visible part of

the lips

Page 19: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Data processingFiltered image

- intensity distribution- center of mouth

Image in polar coordinates

Conditional distribution

Mean and variance functions

(continued)

yxI , EYEX ,

cos,sin,ˆ rEYrEXIrI

rGaussrI m ,,ˆ

mM V

Page 20: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Data visualization

Single frame

data vector:

181181 , mm

Page 21: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Results of Experiments

Feed Forward BP

Vanmiddag komt de pianostemmer langs om mijn vleugel te stemmen

Page 22: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 23: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 24: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Tracking the face – Optical flow

Capturing apparent motion of subsequent images in a grid of motion vectors

Advantages No lip model required Good at capturing motion

Disadvantage Slow

Face tracking

Page 25: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Tracking the face – Lip Geometry Estimation

Applying some color filters and capturing the lip contours in polar coordinates

Advantages No lip model required More or less person-independent

Disadvantage Not robust to external factors

Face tracking

Page 26: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Tracking the face – Active Appearance Models Point tracking according to a statistical lip

model

Disadvantage Requires annotated training images

Advantages Robust against external factors Fast!

Face tracking

Page 27: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Active Appearance Models – Design of the lip model

Face tracking

Page 28: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

AAM model point coordinatesFace

tracking

Page 29: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Features plotted for“F”

Feature extraction

time (frames)

Page 30: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

5-states HMM

Page 31: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Automatic bi-modal human emotion recognition

Automatic recognition of facial expressions using active Appearance model

Model based approach

Page 32: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Face localization

Page 33: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

User-interface prototype iCat tohelp users in daily tasks.

Page 34: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

M.A.E.L.I.A. Our digital cat

H.C.I. Group

Page 35: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACEH.C.I. Group

Page 36: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

                                                        

   

                                                         

                                            

                                                        

   

                           

H.C.I. Group

Page 37: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Requirements in other words…Are you out of your mind? I am sleeping!!!

Get a life! I am still sleeping!

I am so bored! I

wish I had a companion!

7:00 AM 8:00 AM

11:00 AM 14:00 AM

I feel so lonely!!! I am very sad and depressed.

16:00 AM

Finally I have a friend! I am so happy and I even managed to pick up the bone! Wow!!!

AIBO! Bring me my

newspaper!!!

AIBO! Let’s play!!! Follow

me

AIBO! Let’s play!!! Follow

me

Page 38: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Multimodal Communication

Uh, ….

I have no time to do anything with you

Hello,

do you like to chat with me ?

Uh, what a nerd

I want a date

She looks nice

Page 39: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Multi-modal interaction

Page 40: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 41: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Would you like to join mefor a dinner ?

Page 42: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 43: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 44: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 45: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 46: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Page 47: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Chat-session A cup of tea? Mmh, njeh, I don’t like tea. What’s wrong with tea? Tea makes me sick. That’s nonsense!! And my sister doesn’t like you too! She is very disappointed!! Hihi, I was joking!!! Oh, that’s funny!!!

Page 48: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Chat-session (f) A cup of tea? : - ) (m) Mmh, njeh, I don’t like tea. (: - (

(f) What’s wrong with tea? : - o (m) Tea makes me sick. % - \

(f) That’s nonsense!! : - l l (f) My sister doesn’t like you too! : - l l (f) She is very disappointed!! : - ( (m) Hihi, I was joking!!! ; - ) (f) Oh, that’s funny!!! : - ]

Page 49: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

A cup of tea?

: - )

Page 50: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Mmh, njeh, I don’t like tea.

(: - (

Page 51: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

What’s wrong with tea?

: - o

Page 52: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Tea makes me sick.

% - \

Page 53: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

That’s nonsense!!

: - l l

Page 54: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

My sister doesn’t like you too!

: - l l

Page 55: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

She is very disappointed!!

: - (

Page 56: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Hihi, I was joking!!!

; - )

Page 57: Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec

Juli 2010 eNTERFACE

Oh, that’s funny!!!

: - ]