signal processing institute swiss federal institute of technology, lausanne 1 feature selection for...

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne

Feature selection for Feature selection for audio-visual speech recognitionaudio-visual speech recognition

Mihai GurbanMihai Gurban

2Outline

Feature selection and extractionFeature selection and extraction– Why select features?– Information theoretic criteria

Our approachOur approach– The audio-visual recognizer– Audio-visual integration– Features and selection methods

Experimental resultsExperimental results

ConclusionConclusion

3Feature selection

Features and classificationFeatures and classification– Features (or attributes, properties, characteristics) - different types of

measures that can be taken on the same physical phenomenon– An instance (or pattern, sample, example) - collection of feature values

representing simultaneous measurements– For classification, each sample has an associated class label

Feature selectionFeature selection– Finding from the original feature set, a subset which retains most of the

information that is relevant for a classification task– This is needed because of the curse of dimensionality

Why dimensionality reduction?Why dimensionality reduction?– The number of samples required to obtain accurate models of the data grows

exponentially with the dimensionality– The computing resources required also grow with the dimensionality of the

data– Irrelevant information can decrease performance

4Feature selection

Entropy and mutual informationEntropy and mutual information– H(X), the entropy of X – the amount of uncertainty about the value of X– I(X;Y), the mutual information between X and Y – the reduction in the

uncertainty of X due to the knowledge of Y (or vice-versa)

Maximum dependencyMaximum dependency– One of the frequently used criteria is mutual information– Pick YS1…YSm from the set Y1…Yn of features, such that

I(YS1,YS2,…, YSm ; C) is maximum

How many subsets?How many subsets?– Impossible to check all subsets, high number of combinations:

– As an approximate solution, greedy algorithms are used

– The number of possibilities is reduced to

5A simple example

Entropies and mutual information can be represented by Venn diagramsEntropies and mutual information can be represented by Venn diagrams

We are searching for the features YWe are searching for the features YSiSi with maximum mutual information with the with maximum mutual information with the

class labelclass label Assume the complete set of features is Assume the complete set of features is

Y 5 Y 3Y 2

6A simple example

7A simple example

8A simple example

Y 3Y 2

9A simple example

Y 3Y 2

10Which criterion to penalize redundancy?

Many different criteria proposed in the literatureMany different criteria proposed in the literature

Our criterion penalizes only relevant redundancyOur criterion penalizes only relevant redundancy

11Solutions from the literature

““Natural” DCT orderingNatural” DCT ordering– Zigzag scanning, used in

compression (JPEG/MPEG)

Maximum mutual informationMaximum mutual information– Typically the redundancy is not taken

into account

Linear Discriminant AnalysisLinear Discriminant Analysis– A transform is applied on the features

12Our application: AVSR

VISUAL FRONT-END

VISUAL FEATURE

EXTRACTION

FACE DETECTION MOUTH LOCALIZATION

LIP TRACKING

AUDIO FEATURE

EXTRACTION

AUDIO-VISUAL FUSION

AUDIO-ONLY RECOGNITION

VISUAL-ONLY RECOGNITION

AUDIO-VISUAL RECOGNITION

Experiments on the CUAVE databaseExperiments on the CUAVE database– 36 speakers, 10 words, 5 repetitions per speaker– Leave-one-out crossvalidation– Audio features: MFCC coefficients– Visual features: DCT with first and second temporal derivatives– Different levels of noise added to the audio

13The multi-stream HMM

Audio(39 MFCCs)

Video (DCT features)

Audio-visual integration with multi-stream HMMsAudio-visual integration with multi-stream HMMs– States are modeled with gaussian mixtures– Each modality is modeled separately– The emission likelihood is a weighted product– The optimal weights are chosen for each SNR

14Information content of different types of features

Comparison of mutual information I(X;C) between different features

1 2 3 4 5

MFCC in clean conditions

MFCC at -10dB of SNR

DCT coef f icients

PCA coef f icients

Optical f low coef f icients

15Visual-only recognition rate

0 20 40 60 80 100 120 140 160 180 20030

Number of features

Max MIpenalize redundancy

Max MI

Zigzag ordering

16Audio-visual performance

100.00

clean 25db 20db 15db 10db 05db 00db -05db -10db

audio only

video only

audio-visual

17AV performance with clean audio

0 20 40 60 80 100 120 140 160 180 20098.2

Number of features

audio-only

18AV performance at 10db SNR

0 20 40 60 80 100 120 140 160 180 20091

Number of features

audio-only

19Noisy AV and visual-only comparison

0 20 40 60 80 100 120 140 160 180 20040

Number of features

AV performance-10db SNR

Video-only performance

20Conclusion and future work

Feature selection for audio-visual speech recognitionFeature selection for audio-visual speech recognition– Visual-only recognition rate not a good predictor for audio-visual

performance because of dimensionality – Maximum audio-visual performance is obtained for small video

dimensionalities– Algorithms that improve performance at small dimensionalities are

needed

Future workFuture work– Better methods to compute the amount of redundancy between

features

signal processing institute swiss federal institute of technology, lausanne 1 feature selection for...

feature selection features

performance slide

y n of features

features y si

mutual information hx

feature selection entropy

knowledge of y

classification features

Documents

gurban ulsin geree.batchimeg

nanophotonics and metrology lab | home - dynamics of...

t-cell hiv vaccines giuseppe pantaleo, m.d. professor of...

ligne 21 - site officiel du canton de vaud - vd.ch ...

conservation phylogeography: does historical diversity...

swissbib lausanne

signal processing laboratory swiss federal institute of...

tpack lausanne

bienvenue une chaleureuse et cordiale bienvenue à vous,...

building a high performance business culture daniel denison...

frustrated spin systems - cond-mat.de frustrated spin...

projects lausanne

lausanne découvertes

arrondissement de lausanne - vd.ch · gasser andré peintre...

lausanne 2

spsl - lausanne

materials for food encapsulation - gras state of the … ·...

hec lausanne

lausanne discover

points forts 2013 - lausanne région · rhodanie 2 – 1000...