look who’s talking? project 3.1

26
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University

Upload: matt

Post on 23-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Look who’s talking? Project 3.1. Yannick Thimister Han van Venrooij Bob Verlinden . Project 3.1 27-01-2011 DKE Maastricht University. Contents. Speaker recognition S peech samples Voice activity d etection Feature extraction Speaker recognition Multi speaker recognition - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Look  who’s  talking? Project 3.1

Look who’s talking?Project 3.1

Yannick Thimister Han van VenrooijBob Verlinden Project 3.1

27-01-2011 DKEMaastricht University

Page 2: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

2

Speaker recognition Speech samples Voice activity detection Feature extraction Speaker recognition Multi speaker recognition Experiments and results Discussion Conclusion

Contents

Page 3: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

3

Speech contains several layers of info Spoken words Speaker identity

Speaker-related differences are a combination of anatomical differences and learned speaking habits

Speaker Recognition

Page 4: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

4

Self recorded database 55 sentences from 11 different people 2x2 predefined and 1 random Pro recording and build-in laptop microphone

Database via Voxforge.org 610 sentences from 61 different people Varying recording microphones and

environments

Speech samples

Page 5: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

5

Power-based Entropy-based Long term spectral divergence

Frames Initial frames are noise Hangover

Voice activity detection

Adaptive noise estimation

Page 6: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

6

Power-based Assumes that the noise is normally distributed

Calculate mean, standard deviation

For each sample n Calculate

For each frame j The majority of the samples

Voice activity detection

Page 7: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

7

Entropy-based

Scale DFT coefficients

Entropy equals

Voice activity detection

Page 8: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

8

Long term spectral divergence L-frame window

Estimation

Divergence

Voice activity detection

Page 9: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

9

Long term spectral divergence Estimate the noise spectrum

Averages of the DFT coefficients Calculate mean (μ) LTSD of noise frames

For each frame f Calculate the LTSD > c μ

Update

Voice activity detection

Page 10: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

10

Representation of speakers

Mel frequency cepstral coefficients Imitates human hearing

Linear predictive coding Linear function of previous samples

Feature extraction

Page 11: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

11

Hamming window FFT Mel-scale Log FFT

MFCC

Page 12: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

12

Pth order linear function estimated

LPC

Page 13: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

13

Nearest Neighbor Euclidean distance

Neural Network Multilayer perceptron

Speaker recognition

Page 14: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

14

Features compared pairwise

Nearest neighbor

Page 15: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

15

Neural network

Page 16: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

16

Preprocessing using VAD Consecutive speech frames Single speaker recognition per segment

Multi speaker recognition

Page 17: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

17

Hand labeled samples Percentage of correct classified False Negatives

Experiments VAD

Page 18: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

18

Entropy-based Correctly classified:65,3% False negatives: 9,3%

Power-based Correctly classified:76,3% False negatives: 6,2%

Long term spectral divergence Correctly classified: 79,0% False negatives: 1,6%

Results VAD

Page 19: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

19

Nr. of coefficients MFCC

Optimal: 10 90.9%

LPC Optimal: 8 77.3%

Experiments Feature extraction

Page 20: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

20

Professional vs. Build-in laptop microphone

Silence removal

Experiments single speaker recognition

Trained Tested Neural network

Nearest neighbor

Pro Pro 90.9% 100%Laptop Laptop 61.1% 94.4%Pro Laptop 16.7% 33.3%Laptop Pro 9.4% 21.4%

Page 21: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

21

Optimal number of nodes

Self recorded database: 25 nodes Voxforge database: 100 nodes

Experiments neural network

Page 22: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

22

Cycles

Experiments neural network

Page 23: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

23

Self-made samples Optimal settings used Neural network: 66.7% Nearest neighbor: 76.5%

Experiments multi speaker recognition

Page 24: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

24

Nearest neighbor better than neural network?

Neural network better applicable VAD gives no improvement

Discussion

Page 25: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

25

LTSD is the best VAD method MFCC outperforms LPC Training and testing with different

microphones gives significant less accuracy Nearest neighbor works better than an

optimized neural network

Conclusions

Page 26: Look  who’s  talking? Project 3.1

Project 3.1 DKE - Maastricht University

26

Questions?