speaker veri cation using i-vectors - da/sec
TRANSCRIPT
Speaker Verification using i-VectorsCAST-Forderpreis IT-Sicherheit 2014
Andreas Nautsch
Hochschule Darmstadt, atip GmbH, CASED, da/sec Security Research Group
Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 1/22
Outline
I Motivation with research questions
I Speaker verification and i-vectors
I Research and development
I Conclusion and future perspectives
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 2/22
Motivation
Biometric IT-security & forensic applications
I Authentication and recognition by voice
I Advantages to knowledge-/token-based approaches:I Cannot be forgotten
I Cannot be incorporated
I Application fields and scenarios e.g.:I Mobile device authentication: random PINs, short duration
I Call-center user validation: free speech, variant duration
I Suspect tracking: various contents & signal qualities
Voice reference Feature extraction
Voice probe Feature extraction
Comparison Score
Accept
Reject
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 3/22
Motivation
Assessment of Speaker Recognition
I Known technology: modeling acoustic features
! Very accurate due to detailed modeling
! Fast processing on short duration scenarios
% High computational effort on text-independent scenarios
I State-of-the-Art: identity vector (i-vector) features, 2011
! Fully text- & language-independent
! Fast computation & scoring independent of duration
% Unknown behavior in commercial voice biometric scenarios
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 4/22
Motivation
Voice biometrics: speech duration & sample completeness
I Sound unit (phoneme) distribution by duration
From [T. Hasan et al., 2013]
I Text-independent case: content varies from sample to sampleI Long duration ⇒ stable distribution
I Short duration ⇒ insufficient data
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 5/22
Motivation
Baseline speaker recognition approach
I Hidden-Markov-Models (HMMs)I State-based model
I States can represent articulation phases, phonemes, . . .
0 1 2
p00
p10
p11
p12
p22
I Detailed, but extensive computation of optimal path
2
1
0
2
1
0
2
1
0
2
1
0
2
1
0
2
1
0
2
1
0
2
1
0
2
1
0
Speaker modelm
Impostor model
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 6/22
Motivation
Research questions
1. Is the i-vector approach extensible on short duration scenarioswith applicable performances?
2. Do i-vector systems deliver new information to HMM systems?
3. Are duration-depending performance mismatchescompensable?
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 7/22
Speaker Verification
Speech processing
1. Raw speech signal as air pressure changes
2. Frequency analysis: spectral representation
3. Short-time acoustic features, e.g. Mel-Frequency CepstralCoefficients (MFCCs)
Time
Airpressure
Time
Frequency
Time
MFCCs
−50
0
50
100
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 8/22
Speaker Verification
Statistic model: i-vector extraction
4. Gaussian modeling
5. Speaker sub-spaces from Universal Background Model (UBM)
6. identity vectors (i-vectors) as characteristic offset
Mapping by total variability matrix*
MFCC-1
MFCC-2
Speaker A
Speaker B
UBM
MFCC-1MFCC-2
* iteratively in order to optimize the model fit: UBM offset 7→ i-vectors on development data
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 9/22
Speaker Verification
How to imagine i-vectors?
Speaker separation
I Relevant parametersI UBM size: detail of the acoustic spaceI # iterations: adaptation depth of total variabilityI # characteristic factors: i-vector dimension
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 10/22
Research & Development
Extending the baseline, exploring new technologies
I Research on new technologiesI Experimental evaluations on i-vectors
I Commercial and academic scenarios
I Participation in international research evaluation
I Developing more robust approachesI Extending state-of-the-art i-vector score normalization
I Implementation of speaker verification framework in Matlabaccording to ISO/IEC IS 19795-1Information technology – Biometric performance testing and reporting –
Part 1: Principles and framework ⇒ reproducible research
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 11/22
Research & Development
Commercial scenario: experimental set-up
I Aiming at research questions
1. i-vectors on short duration scenarios?
2. New information to HMMs by i-vectors?
I Short but fix duration scenarioI In-house database: 3 – 5 German digits
I Text-independent: random sequences
I 362 / 56 / 300 subjects (development, calibration, evaluation)
I 30 – 34 reference / 2 probe samples
≈ 200,000 comparisons
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 12/22
Research & Development
Examining i-vector parameters
UBM size: 128 UBM size: 256
12
510
200300
400600
2
5
IterationsFactors
EE
R(i
n%
)
12
510
200300
400600
2
5
IterationsFactors
EE
R(i
n%
)
UBM size: 512 UBM size: 1024
12
510
200300
400600
2
5
IterationsFactors
EE
R(i
n%
)
12
510
200300
400600
2
5
IterationsFactors
EE
R(i
n%
)
Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 13/22
Research & Development
Performance analysis
UBM size: 128 UBM size: 256
12510200
300400
600
25
IterationsFactors
EE
R(i
n%
)
12510200
300400
600
25
IterationsFactors
EE
R(i
n%
)
UBM size: 512 UBM size: 1024
12510200
300400
600
25
IterationsFactors
EE
R(i
n%
)
12510200
300400
600
25
IterationsFactors
EE
R(i
n%
)
.1.2 .5 1 2 5 10 20 40
.1
.2
.512
5
10
20
40
False Match Rate (FMR in %)
Fa
lse
No
n-M
atc
hR
ate
(FN
MR
in%
) Detection Error Tradeoff diagram
i-vector-128 i-vector-128
i-vector-256 i-vector-256
i-vector-512 i-vector-512
30 FNMs
Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 14/22
Research & Development
Information analysis: cross-entropy of system fusion
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Normalized
entrop
yHMM
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Normalized
entrop
y
i-vector-256
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Normalized
entrop
y
HMM+i-vector-256
Optimizing
minEntropy(η ≈ 4.6 = odds 1:100)
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 15/22
Research & Development
Performance gains by fusion
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Normalized
entrop
y
HMM
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Normalized
entrop
y
i-vector-256
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Normalized
entrop
y
HMM+i-vector-256
Optimizing
minEntropy(η ≈ 4.6 = odds 1:100)
.1.2 .5 1 2 5 10 20 40
.1
.2
.512
5
10
20
40
FMR (in %)
FN
MR
(in
%)
HMM
i-vector-256
HMM+i-vector-256
30 FNMs
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 16/22
Research & Development
Academic scenario: experimental set-up
I Aiming at research question
3. Compensation of duration mismatches on i-vectors?
I Variable duration scenarioI Data of 2013 – 2014 NIST i-vector Machine Learning challenge
I Text-independent, multi-lingual scenario
I 4,781 / 1,306 subjects (development, evaluation)
I 5 reference samples / 9,634 probe i-vectors
≈ 12,000,000 comparisons
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 17/22
Research & Development
Analysis of a variant duration scenario
I 10 offline 5-fold cross-validations
I NIST baseline system2048-component UBM, 60 MFCCs, 600-dim i-vectors
I Adaptive Symmetric (AS) score-normalization
S′ = 12
(S−µreferenceσreference
+S−µprobe
σprobe
), each from top-100 scores of comparisons to dev-set
5s 10s 20s 40s full
1
2
5
10
EE
R(i
n%
)
Baseline AS-norm
Baseline, all AS-norm, all
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 18/22
Research & Development
Proposing duration-invariant AS score-normalization
I Observation: independent i-vector factors⇒ poor comparisons
dev-set full 40s 20s 10s 5s
5s 141 91 66 44 010s 230 140 70 020s 246 118 040s 180 0full 0
(Student t-test)
I Idea:I On probes: dev-set duration > 60s
I On references: dev-set duration ≈ probe duration
System EER (in %) all 5s 10s 20s 40s full
Baseline 2.56 0.89 0.95 0.93 0.92 0.89 0.86AS-norm 2.49 0.17 1.18 0.41 0.18 0.08 0.05
dAS-norm 2.06 0.10 0.35 0.20 0.11 0.07 0.07
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 19/22
Conclusion and perspectives
Conclusion and future perspectives
I Research questions positively confirmed
1. i-vectors are applicable to short duration scenarios
2. Relative cross-entropy gain: 48% to HMM baseline
3. Performance mismatch compensation: 89% to baseline
I Examining probabilistic i-vector comparators e.g.,Probabilistic Linear Discriminant Analysis (PLDA)
I Analyse effects on other speech signal features
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 20/22
Conclusion and perspectives
Community impact
I XI IAPR/IEEE Int.l Summer school for advanced studies onbiometrics: Biometrics for Forensics, Security and beyond,Alghero, Italy, June 9 – 13, 2014Presentation to well-established biometric researchers
I ISCA Odyssey 2014: The Speaker and Language RecognitionWorkshop, Joensuu, Finland, June 16 – 19, 2014A. Nautsch, C. Rathgeb, C. Busch, H. Reininger, and K. Kasper:
Towards Duration Invariance of i-Vector-based Adaptive Score
Normalization, ISCA Odyssey, 2014.
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 21/22
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Andreas Nautsch, M.Sc.Doctoral Researcher | Research Area: Secure Services
CASEDMornewegstr. 3264293 Darmstadt/[email protected]
+49 6151 16-75182+49 6151 16-4321
TelefonFaxwww.cased.de
Acknowledgment: atip GmbH, associated company duringB.Sc. and M.Sc. studies, for founding and education.
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
System parameters & metricsI Variable parameters
I UBM sizeI # i-vector factorsI # training interations of
total variability matrix
I Performance metricsI Equal Error Rate (EER)I False Match Rate (FMR)I False Non-Match Rate (FNMR)I Score-cross entropy
.1.2 .5 1 2 5 10 20 40
.1
.2
.512
510
20
40
FMR (in %)
FN
MR
(in
%)
Detection Error Tradeoff diagram
System
30 FMs
30 FNMs
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
Nor
m.
entr
op
y
Hdefaultnorm
Hminnorm
Htotnorm
30 FMs
30 FNMs
NISTη
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Real-time evaluation
Table: Enroll/VerifyHMM ⇔ i-vector
System Enroll Verify
HMM 206.2% 5.5%
i-vector-128 6.1% 3.2%i-vector-256 9.9% 3.1%i-vector-512 16.2% 3.3%
Baum-Welch statistics T matrix estimation
128 256 512 1024
2003004006003
3.23.4
UBM sizeFactors×
RT
in%
128 256 512 1024
2003004006000
1020
UBM sizeFactors
×R
Tin
%
Enrollments Verifications
128 256 512 1024
200300400600
2040
UBM sizeFactors
×R
Tin
%
128 256 512 1024
2003004006003
4
UBM sizeFactors×
RT
in%
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Illustration of total variability matrix
0 500 1,000 1,500 2,000
0
100
200
300
UBM offset vector
i-ve
ctor
elem
ents
−5
0
5
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Analyzing i-vector processing steps
I State-of-the-Art: spherical space projection
Raw i-vectors 7→ Spherical i-vectors
From [N. Dehak et al., 2011]
I On short-duration scenario (calibration set)
64 128 256 512 1024 20481
2
5
10
20
UBM size
Eq
ua
lE
rror
Ra
teE
ER
(in
%) Raw i-vectors
Spherical i-vectors
5% EER cut-off
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Duration-invariant Adaptive Symmetric score normalization
Development set
i-vectors
Subset
z-norm
Subset
t-norm
Probe i-vector
with duration
Λsubset
dprobe == dΛ
Λ>60
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Probabilistic Linear Discriminant Analysis (PLDA)
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Evaluation framework in Matlab
Biometric
characteristics
Presentation
Sample
Data Capture
Sensor
Voice Activity Detection VAD samples
Features
Signal Processing
Enrolment database
Comparator
Identity
claim
Threshold
Verification
outcome
Decision
Score
normalisation
Simlarity
score
Enrolment
Verification
Comparison
scores
Normalised
scoresReference
Reference
Probe
Feature extraction
Fused (calibrated)
scores
Score
fusion/calibration
Evaluation
Metrics & plots
UBM
Features
GMMs
Template
creation
Figure: Framework design according to ISO/IEC 19795-1
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014
.
.
.
. .
.
.. .
.. .
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
. ....... ...........
.. .. ..... ........... .
. .......... ....... . .
......... .........
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. .
.. ..
. .
. .
.
Evaluation framework in Matlab
Voice Activity Detection
Signal Processing
Comparator
Threshold
Verification
outcome
Decision
Score
normalisation
Simlarity
score
Feature extraction
Template
creation
UBM
Features
Score
fusion/calibration
Evaluation
Metrics & plotsatip VAD
HTK
Matlab &
rastamat
HTK
Matlab &
Joint Factor Analysis Matlab Demo
Matlab &
Joint Factor Analysis Matlab Demo
Matlab
BOSARIS toolkit
BOSARIS toolkit
Matlab
Matlab
Matlab &
HDF5
Matlab &
HDF5
Matlab &
HDF5
Matlab &
HDF5
Raw-PCM
files
Matlab &
HDF5
Matlab &
HDF5
Figure: Framework & Toolboxes
Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014