i-vector based joint anti-spoofing and speaker...

29
i - vector based joint anti - spoofing and speaker verification Tomi Kinnunen, Elie Khoury, Aleksandr Sizov Zhizheng Wu , Sébastien Marcel Contact: [email protected]

Upload: buikien

Post on 11-Aug-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

i-vector based joint anti-spoofing and speaker verification

Tomi Kinnunen, Elie Khoury, Aleksandr Sizov Zhizheng

Wu, Sébastien Marcel

Contact: [email protected]

Page 2: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Spoofing attacks:Achille’s heel of biometrics

2014: Samsung Galaxy S5

linked with user’s PayPal

account, fake fingerprint

2011: HK->Canada passenger

with fake face mask

2013: Apple iphone 5S

touchID, fake fingerprints

2014 book

on the topic

Page 3: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Spoofing speaker verification

• Sneakers (1992)

“Sneakers” (1992)

IS IT RELEVANT IN REAL

APPLICATIONS ?

Page 4: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Increasing use of ASV in finance

• Barclays bank (48 million customers in 50 countries)[ http://www.computerweekly.com/news/2240179218/Barclays-streamlines-phone- banking-with-voice-biometrics ]

• Banco Santander México[ http://findbiometrics.com/road2bup-commerce-3-deployments-of-biometrics-in-finance

• National Australia Bank[ http://www.businessspectator.com.au/news/2012/11/21/technology/nab-speaks-loud-and-clear-voice-biometrics ]

• Australian Health Management[ http://www.zdnet.com/voice-biometrics-replaces-id-check-at-ahm-1339274060/ ]

• ”Voice unlock” feature in Lenovo A586 phone[ http://hlt.i2r.a-star.edu.sg/site_media/news_articles/Digital_Life_Voiceprint_Tech_30_Jan_2013.pdf ]

Page 5: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Wu et al., “Spoofing and countermeasures for automatic speaker verification: a survey”, to appear in Speech Communication

Replay

Impersonation

Text-to-Speech (TTS)

Voice conversion

Four ways to spoof ASV

Tomi here,

verify me !

Mimicry by a human being

Replay of a previously-

recorded utterance

Generation of speech

signal from text input

Conversion of speaker

identity of an utterance

Page 6: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Source Target Converted

Male-to-

female

Use ~30 seconds to train conversion conversion

Convert spectrum only, retain F0

Voice conversion demo

Page 7: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Voice conversion increases FAR

Study Speaker Verification

Technique

Zero-effort

FAR (%)

FAR after VC

spoofing (%)

Perrot et al. 2005 GMM-UBM 16.0 40.0

Matrouf et al. 2006 GMM-UBM 8.00 100.0

Bonastre et al. 2007 GMM-UBM 6.61 55.0

Kinnunen et al. 2012 JFA 3.24 17.33

Wu et al. 2013 i-vector PLDA 2.99 41.25

Wu et al. 2013 Text-dep. HMM 2.92 21.87

Alegre et al. 2013 i-vector PLDA 3.03 55.00

Kons et al. 2013 Text-dep. HMM-NAP 1.00 36.00

FAR: false acceptance rate

GMM-UBM: Gaussian Mixture Model - Universal Background Model

JFA: Joint Factor Analysis

PLDA: Probabilistic Linear Discriminant Analysis

HMM: Hidden Markov Model

NAP: Nuisance Attribute Projection

Page 8: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Traditional approch:Independent ASV and spoofing detector systems

[Wu, Z., Kinnunen, T., Chng, E.S., Li, H., Ambikairajah, E., 2012b. ”A study on spoofing attack in state-of-the-art

speaker verification: the telephone speech case”, in: Proc. Asia-Pacific Signal Information Processing Association

Annual Summit and Conference (APSIPA ASC)]

[ Wu, Z., Li, H., 2013. Voice conversion and spoofing attack on speaker verification systems, in: Proc. Asia-Pacific

Signal Information Processing Association Annual Summit and Conference (APSIPA ASC) ]

MFCCs

Hand-crafted features based on

knowledge of the attacks

Could we use the same

MFCC i-vector front-end ?• Simpler system

• Only one threshold

• Computational savings

Page 9: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

i-Vector extraction

Utterance

MFCC

extraction

GMM mean

supervector

extraction

Utterance-dependent

supervector

Low-rank matrix

i-vectorUBM supervector

512 Gaussians60 MFCCs

30720 x 40030720

30720

400

Universal background

model (UBM)

[ N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,”

IEEE Trans. on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, May 2011 ]

Tφms φ

Page 10: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Probabilistic Linear Discriminant

Analysis (PLDA) modeling of i-vectors

j:th i-vector of

speaker i

Between-speaker

subspace V,

speaker factor yi

Within-speaker

subspace U,

factors xi,j

Residual with

N(0, ∑), diagonal

covariance

[ S. J. D. Prince and J. H. Elder, “Probabilistic linear discriminant analysis for inferences about

identity,” in IEEE ICCV, 2007, pp. 1–8 ]

ijijiij εUxVyφ

Page 11: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Three use cases of PLDA

1. Stand-alone speaker verification

2. Stand-alone spoofing detection

3. Joint speaker verification and anti-spoofing

Requires additional i-vectors extracted from

synthetic speech

Page 12: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

”Synthetic” i-vector generation

MCEP or

LPC vocoder

Copy - synthesis

utterance

i-vector

extraction

MFCC

extraction

MFCC

extraction

Original

utterance

i-vector

extraction

x

x x x x

x

o

ooo

o oo o

xx

Original

utterance

Vocoded

speech

i-vectors

Natural

speech

i-vectors

Page 13: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Examples of i-vectors(reduced to 2-d with linear discriminant analysis)

Before length normalization After length normalization

Page 14: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Experiments

Text-independent set-up

Subset of NIST 2006 core task trials

Voice conversion spoofs: Non-parallel frame alignment

Mel-cepstrum (MCEP) & linear prediction (LPC) vocoders

from the SPTK toolkit

Joint-density GMM conversion of the spectral features

Equalization of mean and variance of log-F0 (RAPT F0 extraction)

[ T. Kinnunen, Z.-Z. Wu, K. A. Lee, F. Sedlak, E. S. Chng, H. Li, “Vulnerability of Speaker Verification Systems Against Voice

Conversion Spoofing Attacks: the Case of Telephone Speech”, Proc. ICASSP 2012, pp. 4401--4404, Kyoto, Japan, March 2012 ]

[ Z. Wu, T. Kinnunen, E.S. Chng, H. Li, E. Ambikairajah, ”A Study on spoofing attack in state-of-the-art speaker verification:

the telephone speech case”, Proc. 2012 APSIPA ASC 2012, pp. 1--5, Hollywood, USA, December 2012 ]

[ Speech Signal Processing Toolkit (SPTK), http://sp-tk.sourceforge.net/ ]

Page 15: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Database summary

Male Female Total

Target speakers 241 342 583

Genuine trials 1,614 2,332 3,946

Zero-effort impostor

trials

1,132 1,615 2,747

Voice conversion

impostors (MCEP)

1,132 1,615 2,747

Voice conversion

impostors (LPC)

1,132 1,615 2,747

ZERO-

EFFORT

PROTOCOL

ZERO-EFFORT SPOOF PROTOCOL:

subset of the original NIST 2006 core task

Page 16: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Database summary

Male Female Total

Target speakers 241 342 583

Genuine trials 1,614 2,332 3,946

Zero-effort impostor

trials

1,132 1,615 2,747

Voice conversion

impostors (MCEP)

1,132 1,615 2,747

Voice conversion

impostors (LPC)

1,132 1,615 2,747

MCEP SPOOF

PROTOCOL

MCEP SPOOF PROTOCOL: Voice conversion attack with

Mel-cepstral features and joint-density GMM conversion

Page 17: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Database summary

Male Female Total

Target speakers 241 342 583

Genuine trials 1,614 2,332 3,946

Zero-effort impostor

trials

1,132 1,615 2,747

Voice conversion

impostors (MCEP)

1,132 1,615 2,747

Voice conversion

impostors (LPC)

1,132 1,615 2,747

LPC SPOOF

PROTOCOL

LPC SPOOF PROTOCOL: Voice conversion attack with linear

prediction vocoder and joint-density GMM conversion

Page 18: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Training

samples

Attack

samples

Cos SVM PLDA

A. MCEP MCEP 92.2 91.7 91.8

B. LPC MCEP 53.0 53.6 53.1

C. MCEP LPC 98.3 98.3 98.7

D. LPC LPC 99.3 99.4 99.4

Stand-alone spoof detection (% correct)

A & B: DEDICATED ATTACKER

”Matched” MCEP vocoder with the recognizer features

C & D: SLOPPY ATTACKER

”Mismatched” vocoder with recognizer features

Take SVM as a baseline

spoofing detector

Page 19: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

“Integrated PLDA”

],,,[, ],,,,[ nat

ns

nat

n2

nat

n1

nat

1s

nat

12

nat

11

natural

n1

],,,[, ],,,,[ mcep

ns

mcep

n2

mcep

n1

mcep

1s

mcep

12

mcep

11

mcep

n1

],,,[, ],,,,[ lpc

ns

lpc

n2

lpc

n1

lpc

1s

lpc

12

lpc

11

lpc

n1

Expand training set:

Two times more ‘speakers’ to train PLDA:

Integrated

PLDA (lpc)

lpcnatural

Integrated

PLDA (mcep)mcepnatural

Page 20: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Interaction of speaker verification

and anti-spoofing, FAR (%)

Spoof detector

training data

Zero-effort

spoofs

MCEP

spoofs

LPC

spoofs

Baseline PLDA --

Score fusion MCEP

LPC

Integrated PLDA MCEP

LPC

Baseline PLDA: No countermeasures

Score fusion: SVM countermeasure + PLDA ASV, weights from natural utterances

Integrated PLDA: Expanded training set including synthetic i-vectors

Page 21: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Spoof detector

training data

Zero-effort

spoofs

MCEP

spoofs

LPC

spoofs

Baseline PLDA -- 1.76 6.13 10.84

Score fusion MCEP 1.62 7.12 13.13

LPC 1.73 4.89 9.35

Integrated PLDA MCEP 1.24 3.90 5.82

LPC 1.42 5.94 2.97

Interaction of speaker verification

and anti-spoofing, FAR (%)

Baseline PLDA: No countermeasures

Score fusion: SVM countermeasure + PLDA ASV, weights from natural utterances

Integrated PLDA: Expanded training set including synthetic i-vectors

Page 22: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Spoof detector

training data

Zero-effort

spoofs

MCEP

spoofs

LPC

spoofs

Baseline PLDA -- 1.76 6.13 10.84

Score fusion MCEP 1.62 7.12 13.13

LPC 1.73 4.89 9.35

Integrated PLDA MCEP 1.24 3.90 5.82

LPC 1.42 5.94 2.97Zero-effort FAR not

affected much, good... But is not

systematically helpful in

reducing spoof FAR

Interaction of speaker verification

and anti-spoofing, FAR (%)

Baseline PLDA: No countermeasures

Score fusion: SVM countermeasure + PLDA ASV, weights from natural utterances

Integrated PLDA: Expanded training set including synthetic i-vectors

Page 23: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Spoof detector

training data

Zero-effort

spoofs

MCEP

spoofs

LPC

spoofs

Baseline PLDA -- 1.76 6.13 10.84

Score fusion MCEP 1.62 7.12 13.13

LPC 1.73 4.89 9.35

Integrated PLDA MCEP 1.24 3.90 5.82

LPC 1.42 5.94 2.97

Interaction of speaker verification

and anti-spoofing, FAR (%)

Baseline PLDA: No countermeasures

Score fusion: SVM countermeasure + PLDA ASV, weights from natural utterances

Integrated PLDA: Expanded training set including synthetic i-vectors

Page 24: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Pooled MCEP and LPC spoofs

Additional i-vectors: MCEP Additional i-vectors: LPC

Page 25: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

What? Use of i-vectors to do speaker

verification & anti-spoofing

How? A simple ”integrated PLDA” recipe

1. Create synthetic i-vectors by copy-synthesis

2. Treat synthetic speakers as a new ”speakers”

3. Score as usual

Worth for further studies!

Does not solve the problem of wrong training vocoder

Not as impressive improvements as dedicated

countermeasures, but much simpler system

What next?

More vocoders

Impersonation, replay, synthesis attacks

Other biometric modalities

Integrated

PLDA

mcep

natural

Summary

Page 26: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Other recent topics

Foreign accent & regional dialect identification H. Behravan, V. Hautamäki, T. Kinnunen, “Factors Affecting i-Vector Based Foreign Accent Recognition: a Case

Study in Spoken Finnish”, Speech Communication (to appear)

H. Behravan, V. Hautamäki, S.M. Siniscalchi, T. Kinnunen, C.-H. Lee, ”Introducing attribute features to foreign

accent recognition”, Proc. ICASSP 2014

H. Behravan, V. Hautamäki, S.M. Siniscalchi, E. Khoury, T. Kurki, T. Kinnunen, C.-H. Lee, ”Dialect Levelling in

Finnish: A Universal Speech Attribute Approach”, Proc. Interspeech 2014

Effect of human mimicry (impersonation) R. Gonzalez Hautamäki, T. Kinnunen, V. Hautamäki, A.-M. Laukkanen, ”Comparison of human listeners and

speaker verification systems using voice mimicry data”, Proc. Odyssey 2014: The Speaker & Language

Recognition Workshop, pp. 137--144, Joensuu, Finland, June 2014

R. Gonzalez Hautamäki, T. Kinnunen, V. Hautamäki, T. Leino, A.-M. Laukkanen, ”I-vectors meet imitators: on

vulnerability of speaker verification systems against voice mimicry”,Proc. Interspeech 2013, pp. 930--934, Lyon,

France, August 2013

Recording device identification from speech C. Hanilçi and T. Kinnunen, “Source Cell-Phone Recognition from Recorded Speech Using Non-Speech

Segments”, Digital Signal Processing (to appear)

Vocal effort compensation J. Pohjalainen, C. Hanilçi, T. Kinnunen, P. Alku, “Mixture Linear Prediction in Speaker Verification Under Vocal

Effort Mismatch”, IEEE Signal Processing Letters, 21(12): 1516--1520, December 2014

Page 27: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Foreign accent detectionData: Finnish national foreign

language certificate (FSD) corpus

Finnish spoken utterances

produced by foreigners

H. Behravan, V. Hautamäki, T. Kinnunen, “Factors Affecting i-Vector Based Foreign Accent Recognition: a Case

Study in Spoken Finnish”, Speech Communication (to appear)

Page 28: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing
Page 29: i-vector based joint anti-spoofing and speaker verificationcostic1206.uvigo.es/sites/default/files/Meetings/Belgrade/Presentations/presentation... · i-vector based joint anti-spoofing

Thank you