video&to&video(face(matching:( establishing(abaseline(for...

Video-‐to-‐Video Face Matching: Establishing a Baseline for

Unconstrained Face Recogni:on Lacey Best-‐Rowden, Brendan Klare,

Joshua Klontz, and Anil K. Jain

Biometrics: Theory, Applica:ons, and Systems Washington DC, USA September 30, 2013

Face Recogni:on in Video •  Abundance of video data –  Ubiquity of surveillance and mobile phone cameras

–  Low-‐cost digital cameras

•  Forensic and security applica:ons –  2011 London riots –  2011 Vancouver riots –  2013 Boston bombings

Face Recogni:on Scenarios

S"ll-‐to-‐S"ll S"ll-‐to-‐Video

Video-‐to-‐Video

Commercial-‐off-‐the-‐Shelf (COTS) Performance on S:ll-‐to-‐S:ll FR

0

10

20

30

40

50

60

70

80

90

100

Constrained Unconstrained

TAR

@ F

AR =

0.1

%

0

10

20

30

40

50

60

70

80

90

100


TAR

@ F

AR =

0.1

%

0

10

20

30

40

50

60

70

80

90

100


TAR

@ F

AR =

0.1

%

0

10

20

30

40

50

60

70

80

90

100


TAR

@ F

AR =

0.1

%

0

10

20

30

40

50

60

70

80

90

100


TAR

@ F

AR =

0.1

%

FRGC Database!

99%!

LFW Database!

MBGC Database!

84%!

54%!

Approaches to Video-‐based FR

•  Sequence of face images with temporal ordering –  Explicitly leverage temporal dynamics between frames

•  Simultaneous tracking and recogni:on [Zhou et al., TIP, 2004]

•  Unordered set of face images –  Fuse informa:on prior to matching

•  Output single representa:on or single face image •  3-‐D modeling [Park and Jain, CVPR, 2007] •  Super-‐resolu:on [Arandjelovic and Cipolla, ICCV, 2007] •  Manifold-‐based methods [Wang et al., CVPR, 2008]

–  Fuse informa:on a4er matching •  Combine match scores from sta:c face matchers •  Frame subset selec:on based on face quality and/or diversity [Thomas et al., IJCV, 2007]

Video Face Databases

•  Methods that can quickly be integrated into opera:onal environments are preferred over those that have been demonstrated as proof of concept

•  Video matching algorithms are oben compared to sta:c frame-‐based matching

•  We provide a baseline accuracy for unconstrained video-‐based face recogni:on by using state of the art commercial-‐off-‐the-‐shelf (COTS) face matchers

Mo:va:on: Representa:ve Baselines

Face Track Extrac"on

€

U = u1,u2,...,ua

€

V = v1,v2,...,vb

Mul"-‐frame Score-‐level Fusion:

•  mean •  median •  max •  min

€

...

€

...

All Frame Pairs

…

…

€

a × b

Similarity Matrix

€

...

…

€

s u1,v1( )...

€

... s ua,vb( )€

s U,V( )

Not Same

Same

€

≥ t

€

< t

COTS Face Matcher Mul:-‐Frame Fusion

Same

Fusion of Mul:ple Matchers Mul"-‐Matcher Mul"-‐Frame

(MMMF) Fusion Mul"-‐Frame Mul"-‐Matcher

(MFMM) Fusion

SB €

...

…

SA €

...

…

SAB €

...

…

sAB

SB €

...

…

SA €

...

…

sA

sB sAB

Mul"-‐Matcher Mul"-‐Frame Mul"-‐Frame Mul"-‐Matcher

Experimental Details

YouTube Faces (YTF) Database [Wolf et al., CVPR 2011]: •  1,447 subjects •  3,226 videos

  1 – 6 (average ≈ 2) videos per subject   48 – 2,157 (average ≈ 182) frames per video

•  Faces detected with Viola-‐Jones detector   24 fps   Aligned and cropped to 300×300 pixels

•  Experimental protocol   10-‐fold, cross-‐valida:on, pairwise tests

•  250 same, 250 not same face pairs per split

Experimental Details

COTS Matchers: •  COTS-‐A, COTS-‐B, COTS-‐C •  All par:cipated in 2010 NIST MBE

Previous Results on YTF DB: •  Matched Background Similarity (MBGS) [Wolf et al., CVPR,

2011] •  Adap:ve Probabilis:c Elas:c Matching (APEM) Fusion [Li et al., CVPR, 2013]

•  Spa:al-‐Temporal Face Region Descriptor + Pairwise-‐constrained Mul:ple Metric Learning (STFRD+PMML) [Cui et al., CVPR, 2013]

•  Rank Aggrega:on [Bham et al., ICIP, 2013]

Experimental Results Mul"-‐Frame Fusion

10−3 10−2 10−10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Accept Rate (FAR)

True

Acc

ept R

ate

(TAR

)COTS−B

all frames (mean)30 frames (faceness)30 frames (near−frontal)1 frame (faceness)1 frame (near−frontal)

Experimental Results Quality-‐based Frame Subset Selec"on

10−3 10−2 10−10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


True

Acc

ept R

ate

(TAR

)COTS−B


10−3 10−2 10−10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


True

Acc

ept R

ate

(TAR

)COTS−B


Experimental Results

10−3 10−2 10−1 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


True

Acc

ept R

ate

(TAR

)

COTS−A (mean)COTS−B (mean)COTS−C (mean)MMMF (tanh, sum, mean)MBGSAPEM FusionSTFRD+PMMLRank Aggregation

10−3 10−2 10−1 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


True

Acc

ept R

ate

(TAR

)


10−3 10−2 10−1 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


True

Acc

ept R

ate

(TAR

)


High Score

Low Score

Impostor Examples

Genuine Examples


•  Mul:-‐matcher fusion overcomes the face enrollment problem –  No. of frames that fail to enroll (587,035 total frames)

•  Face detec:on and landmark localiza:on are crucial to leverage all available frames in a face track


Conclusions

•  All three COTS face matchers outperformed current published results on the YTF database

•  Fusion of three COTS matchers improved performance

•  Subsequent research on face matching should use COTS matchers as baselines

•  Face tracks contain redundant facial informa:on – Quality-‐based key-‐frame selec:on can be used to reduce the number of frames for matching

Thank you!

video&to&video(face(matching:( establishing(abaseline(for...

Documents