age and gender recognition from speech patterns … ·  · 2012-12-06age and gender recognition...

24
Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011 1 Mohamad Hasan Bahari Hugo Van hamme

Upload: dokhue

Post on 21-May-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Age and Gender Recognition

from Speech Patterns Based on

Supervised Non-Negative Matrix Factorization

July 2011 1

Mohamad Hasan Bahari

Hugo Van hamme

Page 2: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Outline

� Introduction and Motivations

� Age and Gender Recognition

� Corpora

� Supervised Non-negative Matrix Factorization

2

� Supervised Non-negative Matrix Factorization

� Proposed Method

� Results

� Conclusions and Future Researches

Page 3: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Introduction

� Confirming the identity of individuals

� Biometric Characteristics

� Fingerprint

� Face

� Iris

3

� Iris

� Hand Geometry

� Ear Shape

� Voice pattern

� +

� Choosing a characteristic

� Availability

� Reliability

Page 4: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Motivation

� In many real world cases, only speech patterns are available(kidnapping, threatening calls, +)

� Speech patterns can include many interesting information

� Gender

� Age

4

� Age

� Dialect (original or previous regions)

� Membership of a particular social group

� +

To facilitates in identifying a criminal

To narrow down the number of suspects

Page 5: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Goal

Goal:

To extract different physical and psychological characteristics of the speaker from his/her voice patterns (Speaker Profiling).

Physical: Psychological:

5

Physical:

1. Gender

2. Age

3. Accent

4. +

Psychological:

1. Anxiousness

2. Stress

3. Confidence

4. +

Page 6: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Age and Gender Recognition

Three approaches:

I. Directly from speech signal.

II. Modeling the speech generation

6

II. Modeling the speech generation

system.

III. Modeling the hearing system.

Page 7: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

I. Directly from speech signal.

� Different acoustic features vary with age.

1) Fundamental frequency

2) Speech rate

3) Sound pressure level

Age and Gender Recognition

7

4) …

� By Finding all acoustic features varying with age and their exact relation

to the speaker age.

� Conceptually simple and computationally inexpensive

x These features are affected by many other parameters, such as weight,

height, voice quality, emotional condition, …

Page 8: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Effect of Age and Gender on speech (Fundamental frequency) [1]

Age and Gender Recognition

�Age is only one of inputs affecting

the speech and consequently acoustic

features.

8[1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315, 1991.

�It is impossible to estimate the age

without considering the rest of inputs

�Perceptions of gender and age have a

significant mutual impact on each

other.

Page 9: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

II. Modeling the speech generation system.

� It is an input estimation problem.

x Modeling the speech generation system of the speaker is very

difficult.

Age and Gender Recognition

9

Page 10: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Age and Gender Recognition

III. Modeling the hearing system

� To solve the speech recognition problem, the hearing system is

modeled using Hidden Markove Models (HMMs).

� Using the tools applied in speech recognition problems (HMMs) .

� Well established.

10

� Well established.

� Accurate in recognizing content.

x There exist a difference between the age of a speaker as perceived,

and their actual age.

x Computationally complex

Page 11: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Corpora

Young Young Middle Middle Senior Senior

� 555 speakers from the N-best evaluation corpus [1]

� The corpus contains live and read commentaries, news, interviews, and

reports broadcast in Belgium

�Different age groups and genders

11

Category NameYoung

Male

Young

Female

Middle

Male

Middle

Female

Senior

Male

Senior

Female

Age 18-35 18-35 36-45 36-45 46-81 46-81

Number of Speakers 85 53 160 41 191 25

[1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp. 2571-2574, 2009.

Page 12: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

SNMF

� Non-negative Matrix Factorization (NMF) is a popular machine

learning algorithm [1]

� It is used in supervised or unsupervised modes.

� Supervised NMF or SNMF is a pattern recognition method [1]

12

� It is very effective in the case of high dimension input space.

� It is a generative classifier.

� It can directly classify patterns into multiple classes (no need to

change the problem into multiple binary classification).

[1] H. Van hamme, In proc. Interspeech, Australia, pp. 2554-2557, 2008.

Page 13: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Problem Statement:

Given a training data-set: Str= {(x1, y1), . . ., (xn, yn), . . . , (xN, yN)}

xn is a vector of observed characteristics for the data item

yn denotes a label vector which represents the class that xn belongs to

SNMF

13

Goal:

Approximation of a classifier function (g), such that ŷ=g(xtst) is as

close as possible to the true label.

xtst is an unseen observation

Page 14: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

SNMF

SNMF in Training Phase:

First step: Second step:

[ ][ ]Ntr

B

N

tr

S

xxV

yyV

L

L

1

1

=

= tr

tr

B

tr

S

tr

B

tr

Strtrtr HW

W

V

VHWV

=

tr

B

tr

Str

V

VV

Extended Kullbeck-Leibler divergence:

Multiplicative updating formula:

14

( ) ( ) ( ) ( )∑∑ +−+

=

zn

zn

tr

mn

tr

mnmn

trtr

mn

trtr

tr

mntr

mn

trtrtr

KL HVHWHW

VVHWVD ρlog

[ ][ ]

[ ][ ]

[ ][ ]

[ ][ ]trtr

trTtr

NM

Ttr

trtr

Ttr

trtr

tr

Ttr

NM

trtr

HW

VW

W

HH

HHW

V

H

WW

)(1)(

)()(1

o

o

ρ+←

×

×

Page 15: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

SNMF

SNMF in Testing Phase:

First step: Second step:

( )tsttr

B

tst

KLH

tr

S

tst HWxDWxgytst

minarg)(ˆ ==

tsttr

B

tst HWx ≈ tsttr

S

tst HWxgy == )(ˆ

Extended Kullbeck-Leibler divergence:

Multiplicative updating formula:

15

B HWx ≈ S

( ) ( ) ( ) ( )∑∑ +−+

=

z

z

tst

m

tst

mm

tsttr

B

m

tsttr

B

tst

mtst

m

tsttr

B

stt

KL HxHWHW

xxHWxD ρlog

[ ][ ]

[ ][ ]tsttr

B

tstTtr

B

M

Ttr

B

tsttst

HW

xW

W

HH )(

1)( 1

o

ρ+←

×

Page 16: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Proposed Method

1. Feature selection

2. Acoustic modeling

3. Supervector making procedure

4. Training phase

16

4. Training phase

5. Testing phase

Page 17: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Proposed Method

1. Feature selection

• MEL Spectra

• Mean normalization

• vocal tract length normalization

• Augmented with their first and second order time derivatives.

17

• Augmented with their first and second order time derivatives.

Speech Signal

Feature selection

Feature Vectors

+.

Page 18: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Proposed Method

2. Acoustic modeling

Speaker

Independent

Model

Speaker

Adaptation

Method

Model of

the

Speaker

18

Speaker independent Model:

• An HMM with a shared pool of 49740 Gaussians to model the observations in 3873 cross-word

context-dependent tied triphone states.

Adaptation Method:

• The speaker dependent mixture weights for each speaker result from a re-estimation of the

speaker independent weights based on a forced alignment of the training data for that speaker

using a speaker-independent acoustic model.

The result of this step is 555 speaker adapted models

Model Method Speaker

Page 19: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Proposed Method

3. Supervector making procedure

Gaussian Mixture Model (GMM) of each speaker adapted HMMs is:

Three type of supervectors:

),,()(1

s

j

s

jt

J

j

s

jt owosf

s

∑∆=∑=

µ

19

Three type of supervectors:

1. Means

2. Variances

3. Weights

Weights supervectors:

The result of this step is 555 supervectors for each of 555 speakers

[ ][ ]TTSTsT

n

Ts

Q

s

q

sss wwwfr

)()()( 1

1

λλλχ

λ

LL

LL

=

=

Page 20: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Proposed Method

4. Training phase

20

5. Testing phase

Page 21: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Results

Evaluation Methodology

� 5-fold cross-validation (five independent run)

� In each of five run:

� Training set is speech data of 444 speakers

� Testing set is speech data of 111 speakers

21

� Testing set is speech data of 111 speakers

TST TR TR TR TR

Database

TR TST TR TR TR

Database

.

.

.

Run 1

Run 2

Page 22: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Results

Gender recognition is 96%.

relative confusion matrix

CL

AC

YM YF MM MF SM SF

YM 13 03 58 0 26 0

YF 02 77 04 11 057 0

MM 06 01 44 01 47 0

MF 0 54 02 24 17 02

22

Age group recognition

MF 0 54 02 24 17 02

SM 03 01 19 0 76 0

SF 0 2 08 28 28 16

Category Name

Young Male Young Female Middle MaleMiddle Female

Senior Male Senior Female

Prior 15 10 29 7 34 4

Accuracy 13 77 44 24 76 16

Page 23: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

Conclusions and Future Researches

Conclusions:

1. A new age-gender recognition method based on SNMF

2. Supervectors of GMM weights were used

3. Evaluated on N-Best Corpus

4. Gender recognition accuracy is 96%

23

4. Gender recognition accuracy is 96%

5. Age group recognition accuracy is significantly higher than chance level

Future Researches:

1. Age estimation instead of age group recognition.

2. Using supervectors of GMM means and variances and combining these features

Page 24: Age and Gender Recognition from Speech Patterns … ·  · 2012-12-06Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization July 2011

g{tÇ~ lÉâ yÉÜ lÉâÜ TààxÇà|ÉÇ

24