sure research poster-1-1 cave-urbano

1
Speaker Recognition at a Distance Kerly Urbano 1,4 Kevin P Cave 1,3 , Joey Skufca 1,5 , Mike Fowler 1,2 , Joseph D Skufca 1,2 , Stephanie Schuckers 1,3 1 CITER, 2 Department of Mathematics, 3 Department of Electrical Engineering, 4 Department of Mechanical Engineering - Clarkson University 5 Department of Computer Science - Stonybrook University Acknowledgements: We thank Kevin Chapman and Jonathon Bramsen for help during spring semester; Prof JJ Remus, for starting this project and developing the hardware we used. We thank CUPO for funding support for this work. Iris Cardio-Respiratory 1 2 3 4 5 6 7 8 0 5 10 x 10 -4 Tim e,s A m plitude, V Subject#1395 Baseline E CG 1 2 3 4 5 6 7 8 -5 0 5 10 x 10 -4 Tim e,s A m plitude, V Subject#1395 A rrythm ia E CG We consider the impact of speaker-to-microphone distance challenges on Speaker Identification, in particular focusing on the mismatch in distance between the condition when the speaker was enrolled into the system and the conditions the system tests at. Previous research done by others indicates that distance mismatch can significantly diminish system performance. We will be creating a filter that will make a recording from a distance of five feet sound as if it were recorded at a distance at thirty-four feet, and use this filtered data to improve the identification process when comparing at thirty-four feet. If successful, we can create more filters each tailored for a different span of distances to create a general model for filters that can improve identification. Abstract Biometrics: The identification of human being based on their unique characteristics and traits. Motivation for research (National Biometrics Challenge [1]) Robust biometrics at distance Multi-modal Non-cooperative subjects Speaker Identification impacted by “… performance issues associated with the lack of comparable recording environments between the enrollment and test sample.” Speaker Recognition is the when you identify a person based on their voice signal. This research is essential for improving current identification systems that use voice as their identifier. Voice has characteristics that are unique to solely one person in most cases making voice a good marker for identification systems. Introduction If we have 5ft signal enrolled in the database and want to test signal from a probe at 34ft, we can create a filter that mimics 34ft data to improve performance of the identification system. Hypothesis • Initial experiments determined optimal placement of microphone array with respect to speaker (maximum response) and primary noise source (minimum response). (See Fig 2.) • Controlled speaker experiments collected in Room CAMP194 at multiple distances. • Fourier analysis used to asses attenuation vs. distance in that environment. • Multiple “trial” filters were developed for preprocessing of audio to simulate the effect of recording at 34 feet based on an initial data collection at 5 ft. •Biometric matching performance comparing filtered and unfiltered data is in progress. Experimental Procedure Preliminary Results Genuine Imposter Genuine Imposter 30 570 180 20 5m - train 5m - test 34m - train 34m - test noise noise 29m mismatch 0 500 1000 1500 2000 2500 3000 3500 4000 0 50 100 150 200 250 300 350 400 450 500 frequency (H z) Amplitude 34ft 34ft 5ft 5ft -1 1 FAR FRR -1 1 threshold Genuine Genuine Imposter Imposter Measuring Performance MATLAB Procedure •Write MATLAB scripts to analyze our data in waveforms. •Plot signal as amps vs. frequency. •Analyze trend in 5ft data and pick points that best fit the trend. Those point s are used as coordinates to filter the 34 ft signals. •We run creatfilter .m and applyFiltter.m to the data collected last summer so we can test our filter’s performance on a larger database. •We retrain the UBM (universal background model) for last years’ data to obtain better supervectors. References 1.National Science Technology Council Report (2011). 2.M. Fowler, M McCurry, J. Barmsen, K Dunsin, J Remus. ICASSP 2011 Conference Proceedings. 3.F. Bimbot et al., EURASIP Journal on Applied Signal Processing, 2004:4, 430-451. Fig 1. Multiiple modes of biometric signature. Fig 2. Meausre directionality of Microphone channel 17. Array was positioned to maximiize speaker signal while minimizing interfering noise. Good OK Poor Fig 3. A Mismatch in distance between training (enrollment) data and testing (probe) data results in significantly degraded performace. Fig 4. Fowler et al [2] showed that distance mismatch degrades performance. Fig 5. Experiments were conducted in CAMP194 for a controlled acoustic environment. The microphone array (constructed by Mark McCurry and Prof JJ Remus allowed for recording on 18 audio channels. Expermiments recorded speech at 5tf,8ft,13ft,21ft,34ft. Numbers chosen by Fibonacci sequence allows for multiple measurements at the same distance mismatch. Fig 6. Fourier Transform of signals measured at 5ft and 34ft, with two spectrum for each distance. Note that the spectrums at 34ft are consistent, and the spectrums at 5ft are consistent, but the two distances are disparate. Fig 7. Relative strength of attenuation comparing signal at 34 ft. to signal at 5 ft. Attenuation ratio varies with frequencies. The red line indicates the “filter” that was fitted to the spectral data comparison. Attenuation Filter equation based on Fourier amplitudes at 34ft and 5 ft. Fig 8. How do we assess performance of a classifier. Fig 10. Preliminary results comparing matching performace using unfiltered data and using our developed attenuation filter. Smaller errors indicate improved performance. Fig 9. (Extracted from [3]. General structure of speaker identification. The blue circle indicates where we preprocess using our filter.

Upload: kevin-cave

Post on 19-Jan-2017

90 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: SURE research poster-1-1 Cave-Urbano

Speaker Recognition at a DistanceKerly Urbano1,4 Kevin P Cave1,3, Joey Skufca1,5 , Mike Fowler1,2, Joseph D Skufca1,2 , Stephanie Schuckers1,3

1CITER, 2Department of Mathematics, 3Department of Electrical Engineering, 4Department of Mechanical Engineering - Clarkson University 5Department of Computer Science - Stonybrook University

Acknowledgements: We thank Kevin Chapman and Jonathon Bramsen for help during spring semester; Prof JJ Remus, for starting this project and developing the hardware we used.

We thank CUPO for funding support for this work.

Iris Cardio-Respiratory

1 2 3 4 5 6 7 8

0

5

10

x 10-4

Time, s

Am

plitu

de, V

Subject #1395 Baseline ECG

1 2 3 4 5 6 7 8

-5

0

5

10

x 10-4

Time, s

Am

plitu

de, V

Subject #1395 Arrythmia ECG

We consider the impact of speaker-to-microphone distance challenges on Speaker Identification, in particular focusing on the mismatch in distance between the condition when the speaker was enrolled into the system and the conditions the system tests at. Previous research done by others indicates that distance mismatch can significantly diminish system performance. We will be creating a filter that will make a recording from a distance of five feet sound as if it were recorded at a distance at thirty-four feet, and use this filtered data to improve the identification process when comparing at thirty-four feet. If successful, we can create more filters each tailored for a different span of distances to create a general model for filters that can improve identification.

Abstract

Biometrics: The identification of human being based on their unique characteristics and traits. Motivation for research (National Biometrics Challenge [1])

• Robust biometrics at distance• Multi-modal• Non-cooperative subjects• Speaker Identification impacted by “… performance issues associated with

the lack of comparable recording environments between the enrollment and test sample.”

Speaker Recognition is the when you identify a person based on their voice signal. This research is essential for improving current identification systems that use voice as their identifier. Voice has characteristics that are unique to solely one person in most cases making voice a good marker for identification systems.

Introduction

If we have 5ft signal enrolled in the database and want to test signal from a probe at 34ft, we can create a filter that mimics 34ft data to improve performance of the identification system.

Hypothesis

• Initial experiments determined optimal placement of microphone array with respect to speaker (maximum response) and primary noise source (minimum response). (See Fig 2.)• Controlled speaker experiments collected in Room CAMP194 at multiple distances. • Fourier analysis used to asses attenuation vs. distance in that environment.• Multiple “trial” filters were developed for preprocessing of audio to simulate the effect of recording at 34 feet based on an initial data collection at 5 ft.•Biometric matching performance comparing filtered and unfiltered data is in progress.

Experimental Procedure

Preliminary ResultsGenuine Imposter

Genuine

Imposter

30

570

180

20

5m - train

5m - test

34m - train

34m - test

noise

noise

29mmismatch

0 500 1000 1500 2000 2500 3000 3500 40000

50

100

150

200

250

300

350

400

450

500

frequency (Hz)

Am

plitu

de

34ft34ft5ft5ft

-1 1

FARFRR

-1 1

threshold

Genuine

Genuine

Imposter

Imposter

Measuring Performance

MATLAB Procedure•Write MATLAB scripts to analyze our data in waveforms.•Plot signal as amps vs. frequency.•Analyze trend in 5ft data and pick points that best fit the trend. Those point s are used as coordinates to filter the 34 ft signals.•We run creatfilter .m and applyFiltter.m to the data collected last summer so we can test our filter’s performance on a larger database.•We retrain the UBM (universal background model) for last years’ data to obtain better supervectors.

References1. National Science Technology Council Report (2011).2. M. Fowler, M McCurry, J. Barmsen, K Dunsin, J Remus. ICASSP 2011

Conference Proceedings.3. F. Bimbot et al., EURASIP Journal on Applied Signal Processing, 2004:4,

430-451.

Fig 1. Multiiple modes of biometric signature.

Fig 2. Meausre directionality of Microphone channel 17. Array was positioned to maximiize speaker signal while minimizing interfering noise.

Good OK

Poor

Fig 3. A Mismatch in distance between training (enrollment) data and testing (probe) data results in significantly degraded performace.

Fig 4. Fowler et al [2] showed that distance mismatch degrades performance.

Fig 5. Experiments were conducted in CAMP194 for a controlled acoustic environment. The microphone array (constructed by Mark McCurry and Prof JJ Remus allowed for recording on 18 audio channels. Expermiments recorded speech at 5tf,8ft,13ft,21ft,34ft. Numbers chosen by Fibonacci sequence allows for multiple measurements at the same distance mismatch.

Fig 6. Fourier Transform of signals measured at 5ft and 34ft, with two spectrum for each distance. Note that the spectrums at 34ft are consistent, and the spectrums at 5ft are consistent, but the two distances are disparate.

Fig 7. Relative strength of attenuation comparing signal at 34 ft. to signal at 5 ft. Attenuation ratio varies with frequencies. The red line indicates the “filter” that was fitted to the spectral data comparison.

Attenuation Filter equation based on Fourier amplitudes at 34ft and 5 ft.

Fig 8. How do we assess performance of a classifier.

Fig 10. Preliminary results comparing matching performace using unfiltered data and using our developed attenuation filter. Smaller errors indicate improved performance.

Fig 9. (Extracted from [3]. General structure of speaker identification. The blue circle indicates where we preprocess using our filter.