machine learning for biometrics - comp.hkbu.edu.hk · machine learning for biometrics ... vector...

Machine Learning for Biometrics

Dong XU

School of Electrical and Information Engineering

University of Syndey

Outline

Dimensionality Reduction for Tensor-based

Objects

Graph Embedding: A General Framework for

Dimensionality Reduction

Learning using Privileged Information for

Face Verification and Person Re-identification

What is Dimensionality Reduction?

PCA LDA

Examples: 2D space to 1D space

What is Dimensionality Reduction?

Example: 3D space to 2D space

ISOMAP: Geodesic Distance PreservingJ. Tenenbaum et al., 2000

Why Conduct Dimensionality Reduction?

Pose Variation

Exp

ressio

n V

aria

tion

LPP, 2003He et al.

Uncover intrinsic structure

Visualization

Feature Extraction

Computation Efficiency

Broad Applications

Face Recognition

Human Gait Recognition

CBIR

Outline


Objects





What is Tensor?

Tensors are arrays of numbers which transform in

certain ways under coordinate transformations.

1m

2m3m

1m

Vector Matrix 3rd-order Tensor

2m

1m

1 2 3m m m X R1 2m mX R

1mxR

2

1

m

ij ik kj

k

Y X U

100

100

100

100

100

100

.

.

.

100

10

100

10

100

10

100

10

100

10

100

10

=

=

=

.

.

.

Definition of Mode-k Product

(100)

2 )'(10m2 (100)m

2m(100)

1m(100)

1m

2 )'(10m

1m

2 (100)m

1(100)m

3 (40)m

2 )'(10m

2 (100)m

k U Y XNotation:

Product for two Matrices

Original

Matrix New

Matrix

=1(100)m

3 (40)m

2 )'(10m

Y XUProjection

Matrix

Original

Tensor

New

TensorProjection

Matrix

Projection:high-dimensional space

-> low-dimensional space

Reconstruction:low-dimensional space

-> high-dimensional space

Data Representation in Dimensionality Reduction

Vector Matrix 3rd-order Tensor

High Dimension

Low Dimension

Examples

PCA, LDA Rank-1 Decomposition, 2001

A. Shashua and A. Levin

Low rank approximation of matrix

J. Ye

Tensorface, 2002

M. Vasilescu and

D. Terzopoulos

Our Work

Xu et al., 2005

Yan et al., 2005

...

...

...

Filtered Image Video SequenceGray-level Image

What is Gabor Features?Gabor features can improve recognition performance in comparison

to grayscale features. C Liu and H Wechsler, T-IP, 2002

Gabor Wavelet Kernels

Eight Orientations

Fiv

e Sca

les

Input:

Grayscale

ImageOutput:

40 Gabor-filtered

Images

…

Why Represent Objects as Tensors

instead of Vectors? Natural Representation

Gray-level Images (2D structure)

Videos (3D structure)

Gabor-filtered Images (3D structure)

Enhance Learnability in Real Application

Curse of Dimensionality (Gabor-filtered image: 100*100*40 -> Vector: 400,000)

Small sample size problem

Reduce Computation Cost

... ...

Concurrent Subspace Analysis as an Example

(Criterion: Optimal Reconstruction)

1m

100

40

100

1U

2U

The reconstructed

sample

Input sample

Projection Matrices?

Sample in Low-

dimensional space

3U

1m

10

10

10

Dimensionality

Reduction

1m

100

40

100

Reconstruction

D. Xu, S. Yan, H. Zhang and et al., CVPR, 2005

31

* 3

1

2

1 1 1 3 3 3|

( | )

arg min || ... ||k k

k k

i iiU

U

U U U U

X X

Objective Function:

Connection to Previous Work –Tensorface(M. Vasilescu and D. Terzopoulos, 2002)

Person

Image Vector

Illumination

Pose

Expression

Image Object Dim 1

Image Object Dim 2

Image Object Dim 3

Image Object Dim 4

.

.

.

.

.

.

Image object 1 Image object 2

Image Object Dim 1

Image Object Dim 2

Image Object Dim 3

Image Object Dim 4

.

.

.

. . .

(a) Tensorface (b) CSA

From an algorithmic view or mathematics view, CSA and Tensorface are

both variants of Rank-(R1,R2,…,Rn) decomposition.

Tensorface CSA

Motivation Characterize external factors Characterize internal factors

Input: Gray-level Image Vector Matrix

Input: Gabor-filtered Image

(Video Sequence )

Not address 3rd-order tensor

When equal to PCA The number of images per

person are only one or are a

prime numberNever

Number of Images per Person

for Training

Lots of images per person One image per person

Experiments: Database Description

Number of Persons

(Images per person)

Image Size

(Pixels)

Example Images

Simulated Video

Sequence

60 (1) 646413

ORL database 40 (10) 5646

CMU PIE-1 sub-

database

60 (10) 6464

CMU PIE-2 sub-

databases

60 (10) 6464

Experiments: Object Reconstruction (1)

Input: Gabor-filtered images

ORL database

CMU PIE-1 database

Objective Evaluation Criterion:

Root Mean Squared Error (RMSE) and Compression Ratio (CR)

ORL database CMU PIE-1 database

Experiments: Object Reconstruction (2)

Original Images

Reconstructed Images from PCA

Reconstructed Images from CSA

Input: Simulated video sequence

Experiments: Face Recognition

Input: Gray-level images and Gabor-filtered images

ORL database

CMU PIE database

Algorithm CMU PIE-1 CMU PIE-2 ORL

PCA (Gray-level feature) 70.1% 28.3% 76.9%

PCA (Gabor feature) 80.1% 42.0% 86.6%

CSA (Ours) 90.5% 59.4 % 94.4%

Summary

• This is the first work to address dimensionality

reduction with a tensor representation of

arbitrary order.

• Opens a new research direction.

Bilinear and Tensor Subspace Learning (New

Research Direction)

• Concurrent Subspace Analysis (CSA), CVPR 2005 and T-CSVT 2008

• Discriminant Analysis with Tensor Representation (DATER): CVPR 2005 and T-IP 2007

• Rank-one Projections with Adaptive Margins (RPAM): CVPR 2006 and T-SMC-B 2007

• Enhancing Tensor Subspace Learning by Element Rearrangement: CVPR 2007 and T-PAMI 2009

• Discriminant Locally Linear Embedding with High Order Tensor Data(DLLE/T): T-SMC-B 2008

• Convergent 2D Subspace Learning with Null Space Analysis (NS2DLDA): T-CSVT 2008

• Semi-supervised Bilinear Subspace Learning: T-IP 2009

• Applications in Human Gait Recognition

– CSA+DATER: T-CSVT 2006

– Tensor Marginal Fisher Analysis (TMFA): T-IP 2007

Other researchers also published several papers along this direction!!!

Human Gait Recognition: Basic Modules

(a)

(d)

Pattern

Matching

(b)

(c)

Human Detection

and Tracking

Silhouette

Extraction

Feature

Extraction

Gallery Videos

Stored in

Database

Human Detection

and Tracking

Silhouette

Extraction

Feature

Extraction

Probe

Video

Classification

Yes or No

(Verification)

ID of Top N Candidates

(Identification)

Pattern

Matching

(a) (d): The extracted silhouettes from

one probe and gallery video;

(b) (c): The gray-level Gait Energy

Images (GEI).

Human Gait Recognition with

Matrix Representation

D. Xu, S. Yan, H. Zhang and et al., T-CSVT, 2006

USF HumanID

Experiment (Probe)# of Probe

Sets

Difference between Gallery and

Probe Set

A (G, A, L, NB, M/N) 122 View

B (G, B, R, NB, M/N) 54 Shoe

C (G, B, L, NB, M/N) 54 View and Shoe

D (C, A, R, NB, M/N) 121 Surface

E (C, B, R, NB, M/N) 60 Surface and Shoe

F (C, A, L, NB, M/N) 121 Surface and View

G (C, B, L, NB, M/N) 60 Surface, Shoe, and View

H (G, A, R, BF, M/N) 120 Briefcase

I (G, B, R, BF, M/N) 60 Briefcase and Shoe

J (G, A, L, BF, M/N) 120 Briefcase and View

K (G, A/B, R, NB, N) 33 Time, Shoe, and Clothing

L (C, A/B, R, NB, N) 33 Time, Shoe, Clothing, and Surface

1. Shoe types: A or B; 2. Carrying: with or without a briefcase; 3. Time: May or

November; 4. Surface: grass or concrete; 5. Viewpoint: left or right

Human Gait Recognition: Our Contributions

*The DNGR method additionally uses the manually annotated silhouettes, which

are not publicly available.

Methods Average Rank-1 Results (%)

Our Recent Work (Ours, TIP 2012) 70.07

DNGR (Sarkar’s group, TPAMI 2006) 62.81

Image-to-Class distance (Ours, TCSVT 2010) 61.19

GTDA (Maybank’s group, TPAMI 2007) 60.58

Bilinear Subspace Learning method 2: MMFA (Ours, TIP 2007)

59.9%

Bilinear Subspace Learning method 1: CSA + DATER (Ours, TCSVT 2006)

58.5%

PCA+LDA (Bhanu’s group, TPAMI 2006) 57.70%

Top ranked results on the benchmark USF HumanID dataset

How to Utilize More Correlations?

Pixel

Rearrangement

Sets of highly

correlated pixels

Columns of highly

correlated pixels

Pixel Rearrangement

Potential Assumption in Previous Tensor-based Subspace Learning:

Intra-tensor correlations: Correlations among the features within certain

tensor dimensions, such as rows, columns and Gabor features…

D. Xu, S. Yan et al., T-PAMI 2009

Problem Definition

• The task of enhancing correlation/redundancy among

2nd–order tensor is to search for a pixel rearrangement

operator R, such that

* 2

,1

arg m in { m in || || }N

R T R T

i iR U V

i

R X U U X V V

1. is the rearranged matrix from sample

2. The column numbers of U and V are predefinediX

R

iX

After pixel rearrangement, we can use the rearranged

tensors as input for concurrent subspace analysis

Solution to Pixel Rearrangement Problem

26

Compute reconstructed matrices

1

, 1 1 1 1nRR e c T T

i n n n i n nX U U X V V

Optimize operator R

2

,

1

arg m in || ||N

R Rec

n i i nR

i

R X X

Optimize U and V

n=n+1

Initialize U0, V0

2

,1

( , ) arg m in || ||n n

NR RT T

n n i iU V

i

U V X U U X V V

2 2

, 1 1 1 1

1 1

: || || || ||n n n

N NR R RR ec T T

i i n i n n i n n

i i

N ote X X X U U X V V

• It is an integer programming problem

How to optimize R

* 2

,

1

arg m in || ||N

R Rec

i i nR

i

R X X

,

min .

1 : 0 1; 2 : 1; 3 : 1

pq pqR

p q

pq pq pq

p q

c R st

R R R

2

,

1

| ( ) ( ) |N

Rec

pq i i n

i

w here c X p X q

1. Linear programming problem has integer solution.

2. We constrain the rearrangement within local neighborhood for speedup.

p

Original matrix

Reconstructed matrix

q

pqc

Sender

Receiver

Convergence Speed

28

Rearrangement Results

29

Reconstruction Visualization

30

Reconstruction Visualization

31

Classification Accuracy

Outline


Objects





Representative Previous Work

ISOMAP: Geodesic

Distance Preserving

J. Tenenbaum et al., 2000

LLE: Local Neighborhood

Relationship Preserving

S. Roweis & L. Saul, 2000

LE/LPP: Local Similarity

Preserving, M. Belkin, P.

Niyogi et al., 2001, 2003

PCA LDA

Dimensionality Reduction Algorithms

• Any common perspective to understand and explain these

dimensionality reduction algorithms? Or any unified formulation

that is shared by them?

• Any general tool to guide developing new algorithms for

dimensionality reduction?

Statistics-based Geometry-based

PCA/KPCA LDA/KDA … ISOMAP LLE LE/LPP …

Matrix Tensor

Hundreds

Our Answers

Direct Graph Embedding

1minT

T

y B yy Ly

Original PCA & LDA,

ISOMAP, LLE,

Laplacian Eigenmap

Linearization

PCA, LDA, LPP

wXy T

Kernelization

KPCA, KDA

)( iii xw

Tensorization

CSA, DATER

n

nii wwwy 2

2

1

1X

Type

Formulation

Example

S. Yan, D. Xu et al., CVPR 2005, T-PAMI 2007


1 2[ , , . . . , ]NX x x x1 2[ , , . . . , ] T

Ny y y y

Data in high-dimensional space and low-

dimensional space (assumed as 1D space here):

L, B: Laplacian matrix from S, SP;

[ , ]i i jG Sx

Intrinsic Graph:

Penalty Graph

S, SP: Similarity matrix (graph edge)

[ , ]P P

i jiG Sx

, i i i jj iL D S D S i

Similarity in high

dimensional space

Direct Graph Embedding -- Continued

1 2[ , , . . . , ]NX x x x1 2[ , , . . . , ] T

Ny y y y

* 2

1 1

1 1

arg m in || || arg m ini j ijy y or y y ori jy B y y B y

y y y S y L y

* 2

1

1

arg m in || ||i j ijy y or i jy By

y y y S

Data in high-dimensional space and low-

dimensional space (assumed as 1D space here):

L, B: Laplacian matrix from S, SP;[ , ]i i jG Sx

Criterion to Preserve Graph Similarity:

Intrinsic Graph:

Penalty Graph

S, SP: Similarity matrix (graph edge)

Special case B is Identity matrix (Scale normalization)

[ , ]P P

i jiG Sx

Problem: It cannot handle new test data.

, i i i jj iL D S D S i

Similarity in high

dimensional space

Linearization

y X w

*

1

1

arg m inw w or

w XBX w

w w XL X w

Linear mapping function

Objective function in Linearization

Intrinsic Graph

Penalty Graph

Problem: linear mapping function is not enough to preserve

the real nonlinear structure?

Kernelization

: ix Ff

the original input space to another

higher dimensional Hilbert space.

Nonlinear mapping:

( , ) ( ) ( )k x y x y ( , )i j i jK k x x

( )i iiw x

*

1

1

arg m inK or

K BK

a K LK

Kernel matrix:

Constraint:

Objective function in Kernelization

Intrinsic Graph

Penalty Graph

)( ix

Tensorization

Low dimensional representation is

obtained as:

Objective function in Tensorization

1 2

1 2 . . . n

i i ny w w w X

1 1 2 1 2

1

* 2

1 2 1 2( ,..., ) 1

( , ..., ) a rg m in || ... ... ||n n n

ni n j n ij

f w w i j

w w w w w w w w S

X X

1 1 2

1 1 2 1 2

2

1 21

2

1 2 1 2

( , ..., ) || ... ||

( , ..., ) || ... ... ||

n n

n n n

N

i n iii

P

i n j n ij

i j

f w w w w w B or

f w w w w w w w w S

X

X Xwhere

Intrinsic Graph

Penalty Graph

Common Formulation

Tensorization

1 1 2 1 2

1

* 2

1 2 1 2( ,..., ) 1

( , ..., ) a rg m in || ... ... ||n n n

ni n j n ij

f w w i j

w w w w w w w w S

X X

1 1 2

1 1 2 1 2

2

1 21

2

1 2 1 2

( , ..., ) || ... ||

( , ..., ) || ... ... ||

n n

n n n

N

i n iii

P

i n j n ij

i j

f w w w w w B or

f w w w w w w w w S

X

X Xwhere

Linearization

Kernelization


L, B: Laplacian matrix from S, SP;

S, SP: Similarity matrixIntrinsic graph

Penalty graph

*

1

1

arg m inw w or

w XBX w

w w XL X w

*

1

1

arg m inK or

K BK

a K LK

*

1

1

arg miny y or

y By

y y L y

A General Framework for Dimensionality Reduction

Algorithm S & B Definition Embedding Type

PCA/KPCA/CSA L/K/T

LDA/KDA/DATER L/K/T

ISOMAP D

LLE D

LE/LPPif ; B=D

D/L

1 , ;Ni jS i j B I

1, ,

i j ii j l l l NS n B I e e

( ) , ;i j G i jS D i j B I

;S M M M M B I

2e x p { | | | | / }i j i jS x x t

| | | |i jx x

D: Direct Graph Embedding

L: Linearization

K: Kernelization

T: Tensorization

General Framework: PCA and LDA

1 , ;NijW i j B I

*

1w ww arg min w C w

1 1 1( )( ) ( )i iN N N

i

C x x x x X I ee X

*

1

1

1

1

( )( ) ( )

( )( )

i i

c

c

c

W W

w wB

Nl l

W i i

i

N

c c

c

Nc c T

n

c

B c W

w S w Sw arg min rg min

w S w

S x x x x

x x x x

w wa

w Cw

X I e e X

S n NC S

[1,1, ...,1]e

Principal Component Analysis Linear Discriminant Analysis

1, ,

i j iij l l l NW n B I ee

General Framework: ISOMAP

( ) , ;ij G ijW D i j B I

2

1 1

1 1 1 1' ' ' '2 ' ' '

1 1 1 1' ' ' '2 2 2' ' ' '2

( ) ( / 2)

( ( ) ( ) / 2)

( ( ))

( ) ( )

0

G ij ijj j

ijN Nj

ij i j ij i jN N Nj i j i

ij ij i j i jN Nj jj ji jj iN

D HSH

I ee S I ee

S S S S

S S S S

Geodesic Distance Preserving

GD

Geodesic Distance Matrix Inner Product Matrix

2

( ) / 2

1/

G

ij ij ij ij

D HSH

S D and H N

Multidimensional Scaling * max ( ) T

Gy

y Arg y D y

General Framework: LLE

;W M M M M B I

[( )( )] ( )

1 ( )

1 ( )

1 ( )

1 0

ij ij ij ji ijj j

ij ji ki kjj j k

ij ji ki kjj k j

ij ji kij k

ij ji kij j k

I M I M I M M M M

M M M M

M M M M

M M M

M M M

Local neighborhood relationship preserving

i ij jjX W X

* min

T

y

ij ij ij ji ki kj

k

y Arg y M y

where M w w w w

2( ) | | i ij iji

y y W y

General Framework: Laplacian Eigenmap/ LPP

2 || || 2exp{ }, || ||

0 otherwise.

i j

i jtijW

x xx x

B=D

2

1min ( )

T i j ij

y D yij

y y W

2 || || 2exp{ }, || ||

0 otherwise.

x xx xi j

i jtijW

1minT

T

y D yy Ly

, ii ijjL D W D W

T TXLX XDXw w

Ty X w

Local similarity preserving

Laplacian Eigenmap LPP

New Dimensionality Reduction Algorithm:

Marginal Fisher Analysis

ijS

Important Information

for face recognition:

1) Label information

2) Local manifold structure

(neighborhood or margin)

1: if xi is among the k1-nearest neighbors of xj in the same class;

0 : otherwise

1: if the pair (i,j) is among the k2 shortest pairs among the data set;

0: otherwise

P

ijS

Marginal Fisher Analysis: Advantage

No Gaussian distribution assumption

Experiments: Face Recognition

PIE-1 G3/P7 G4/P6

PCA+LDA (Linearization) 65.8% 80.2%

PCA+MFA (Ours) 71.0% 84.9%

KDA (Kernelization) 70.0% 81.0%

KMFA (Ours) 72.3% 85.2%

DATER-2 (Tensorization) 80.0% 82.3%

TMFA-2 (Ours) 82.1% 85.2%

ORL G3/P7 G4/P6

PCA+LDA (Linearization) 87.9% 88.3%

PCA+MFA (Ours) 89.3% 91.3%

KDA (Kernelization) 87.5% 91.7%

KMFA (Ours) 88.6% 93.8%

DATER-2 (Tensorization) 89.3% 92.0%

TMFA-2 (Ours) 95.0% 96.3%

Summary

• Optimization framework that unifies previous

dimensionality reduction algorithms as special

cases.

• A new dimensionality reduction algorithm:

Marginal Fisher Analysis.

Outline


Objects





V. Vapnik and A. Vashist, A new learning paradigm: Learning using privileged information. Neural Networks, 544–557, 2009.

Privileged information:

Information available only in the training process (not available in the testing

process).

Training: attending classes in the classroom

Testing: taking an exam

Privileged Information: teacher's instruction

Learning using Privileged Information

oracle function

SVM+

(Primal Form)

primal form of SVM

: main feature

: privileged information

Applications in Image and Video Recognition

Testing Data

Azimut 95 Luxury Yacht at the Miami

International Boat Show 2012 Azimut-

Benetti Yachts sees 20 per cent gain in

new luxury yacht sales

caption, tags, keywords, ....

Training Data

1) Web images/videos are usually associated with additional surrounding textual

descriptions (tags, captions, etc.)

2) RGB-D images from Kinect cameras contain additional depth images

Distance Metric Learning using

Privileged Information• Distance Metric Learning for Face Verification

– Mahalanobis distance:

• Distance Metric Learning using Privileged Information

– Additional depth information is used as privileged information

Similar pair

Dissimilar pair

Similar or dissimilar?


Similar pair

Dissimilar pair



Information Theoretic Distance

Metric Learning (ITML)• Distance Metric Learning

– Given training dataset , where is the feature vector

and their labels:

: two samples belong to the same subject

: two samples are from two different subjects

• Information Theoretic Distance Metric Learning

where

Information Theoretic Distance Metric Learning

using Privileged Information (ITML+)

• Objective Function– We learn and simultaneously:

where

– Discussions:• For similar pairs, the learned distance should be more similar

• For dissimilar pairs, the learned distance should be more dissimilar

X. Xu, W. Li and D. Xu, T-NNLS 2015

Experiments: Face Verification

• Setting:– Main features (i.e. gradient-LBP features) from RGB

images

– Privileged information (i.e. gradient-LBP features)

from depth images

Results (AP% and AUC%) on the EUROCOM face dataset

Experiments: Person Re-identification

• Setting:– Main features (i.e. kernel descriptor features) from

RGB images

– Privileged information (i.e. kernel descriptor features)

from depth images

Results (mean Rank-1 recognition rates%) on the BIWI RGBD-ID dataset

Summary

• A new distance metric learning method to

take advantage of additional depth images

as privileged information

machine learning for biometrics - comp.hkbu.edu.hk · machine learning for biometrics ... vector...

Documents