video&to&video(face(matching:( establishing(abaseline(for...
TRANSCRIPT
Video-‐to-‐Video Face Matching: Establishing a Baseline for
Unconstrained Face Recogni:on Lacey Best-‐Rowden, Brendan Klare,
Joshua Klontz, and Anil K. Jain
Biometrics: Theory, Applica:ons, and Systems Washington DC, USA September 30, 2013
Face Recogni:on in Video • Abundance of video data – Ubiquity of surveillance and mobile phone cameras
– Low-‐cost digital cameras
• Forensic and security applica:ons – 2011 London riots – 2011 Vancouver riots – 2013 Boston bombings
Face Recogni:on Scenarios
S"ll-‐to-‐S"ll S"ll-‐to-‐Video
Video-‐to-‐Video
Commercial-‐off-‐the-‐Shelf (COTS) Performance on S:ll-‐to-‐S:ll FR
0
10
20
30
40
50
60
70
80
90
100
Constrained Unconstrained
TAR
@ F
AR =
0.1
%
0
10
20
30
40
50
60
70
80
90
100
Constrained Unconstrained
TAR
@ F
AR =
0.1
%
0
10
20
30
40
50
60
70
80
90
100
Constrained Unconstrained
TAR
@ F
AR =
0.1
%
0
10
20
30
40
50
60
70
80
90
100
Constrained Unconstrained
TAR
@ F
AR =
0.1
%
0
10
20
30
40
50
60
70
80
90
100
Constrained Unconstrained
TAR
@ F
AR =
0.1
%
FRGC Database!
99%!
LFW Database!
MBGC Database!
84%!
54%!
Approaches to Video-‐based FR
• Sequence of face images with temporal ordering – Explicitly leverage temporal dynamics between frames
• Simultaneous tracking and recogni:on [Zhou et al., TIP, 2004]
• Unordered set of face images – Fuse informa:on prior to matching
• Output single representa:on or single face image • 3-‐D modeling [Park and Jain, CVPR, 2007] • Super-‐resolu:on [Arandjelovic and Cipolla, ICCV, 2007] • Manifold-‐based methods [Wang et al., CVPR, 2008]
– Fuse informa:on a4er matching • Combine match scores from sta:c face matchers • Frame subset selec:on based on face quality and/or diversity [Thomas et al., IJCV, 2007]
Video Face Databases
• Methods that can quickly be integrated into opera:onal environments are preferred over those that have been demonstrated as proof of concept
• Video matching algorithms are oben compared to sta:c frame-‐based matching
• We provide a baseline accuracy for unconstrained video-‐based face recogni:on by using state of the art commercial-‐off-‐the-‐shelf (COTS) face matchers
Mo:va:on: Representa:ve Baselines
Face Track Extrac"on
€
U = u1,u2,...,ua
€
V = v1,v2,...,vb
Mul"-‐frame Score-‐level Fusion:
• mean • median • max • min
€
...
€
...
All Frame Pairs
…
…
€
a × b
Similarity Matrix
€
...
…
€
s u1,v1( )...
€
... s ua,vb( )€
s U,V( )
Not Same
Same
€
≥ t
€
< t
COTS Face Matcher Mul:-‐Frame Fusion
Same
Fusion of Mul:ple Matchers Mul"-‐Matcher Mul"-‐Frame
(MMMF) Fusion Mul"-‐Frame Mul"-‐Matcher
(MFMM) Fusion
SB €
...
…
SA €
...
…
SAB €
...
…
sAB
SB €
...
…
SA €
...
…
sA
sB sAB
Mul"-‐Matcher Mul"-‐Frame Mul"-‐Frame Mul"-‐Matcher
Experimental Details
YouTube Faces (YTF) Database [Wolf et al., CVPR 2011]: • 1,447 subjects • 3,226 videos
1 – 6 (average ≈ 2) videos per subject 48 – 2,157 (average ≈ 182) frames per video
• Faces detected with Viola-‐Jones detector 24 fps Aligned and cropped to 300×300 pixels
• Experimental protocol 10-‐fold, cross-‐valida:on, pairwise tests
• 250 same, 250 not same face pairs per split
Experimental Details
COTS Matchers: • COTS-‐A, COTS-‐B, COTS-‐C • All par:cipated in 2010 NIST MBE
Previous Results on YTF DB: • Matched Background Similarity (MBGS) [Wolf et al., CVPR,
2011] • Adap:ve Probabilis:c Elas:c Matching (APEM) Fusion [Li et al., CVPR, 2013]
• Spa:al-‐Temporal Face Region Descriptor + Pairwise-‐constrained Mul:ple Metric Learning (STFRD+PMML) [Cui et al., CVPR, 2013]
• Rank Aggrega:on [Bham et al., ICIP, 2013]
Experimental Results Mul"-‐Frame Fusion
10−3 10−2 10−10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Accept Rate (FAR)
True
Acc
ept R
ate
(TAR
)COTS−B
all frames (mean)30 frames (faceness)30 frames (near−frontal)1 frame (faceness)1 frame (near−frontal)
Experimental Results Quality-‐based Frame Subset Selec"on
10−3 10−2 10−10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Accept Rate (FAR)
True
Acc
ept R
ate
(TAR
)COTS−B
all frames (mean)30 frames (faceness)30 frames (near−frontal)1 frame (faceness)1 frame (near−frontal)
10−3 10−2 10−10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Accept Rate (FAR)
True
Acc
ept R
ate
(TAR
)COTS−B
all frames (mean)30 frames (faceness)30 frames (near−frontal)1 frame (faceness)1 frame (near−frontal)
Experimental Results
10−3 10−2 10−1 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Accept Rate (FAR)
True
Acc
ept R
ate
(TAR
)
COTS−A (mean)COTS−B (mean)COTS−C (mean)MMMF (tanh, sum, mean)MBGSAPEM FusionSTFRD+PMMLRank Aggregation
10−3 10−2 10−1 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Accept Rate (FAR)
True
Acc
ept R
ate
(TAR
)
COTS−A (mean)COTS−B (mean)COTS−C (mean)MMMF (tanh, sum, mean)MBGSAPEM FusionSTFRD+PMMLRank Aggregation
10−3 10−2 10−1 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Accept Rate (FAR)
True
Acc
ept R
ate
(TAR
)
COTS−A (mean)COTS−B (mean)COTS−C (mean)MMMF (tanh, sum, mean)MBGSAPEM FusionSTFRD+PMMLRank Aggregation
High Score
Low Score
Impostor Examples
Genuine Examples
Experimental Results
• Mul:-‐matcher fusion overcomes the face enrollment problem – No. of frames that fail to enroll (587,035 total frames)
• Face detec:on and landmark localiza:on are crucial to leverage all available frames in a face track
Experimental Results
Conclusions
• All three COTS face matchers outperformed current published results on the YTF database
• Fusion of three COTS matchers improved performance
• Subsequent research on face matching should use COTS matchers as baselines
• Face tracks contain redundant facial informa:on – Quality-‐based key-‐frame selec:on can be used to reduce the number of frames for matching
Thank you!