machine learning for biometrics - comp.hkbu.edu.hk · machine learning for biometrics ... vector...
TRANSCRIPT
Machine Learning for Biometrics
Dong XU
School of Electrical and Information Engineering
University of Syndey
Outline
Dimensionality Reduction for Tensor-based
Objects
Graph Embedding: A General Framework for
Dimensionality Reduction
Learning using Privileged Information for
Face Verification and Person Re-identification
What is Dimensionality Reduction?
Example: 3D space to 2D space
ISOMAP: Geodesic Distance PreservingJ. Tenenbaum et al., 2000
Why Conduct Dimensionality Reduction?
Pose Variation
Exp
ressio
n V
aria
tion
LPP, 2003He et al.
Uncover intrinsic structure
Visualization
Feature Extraction
Computation Efficiency
Broad Applications
Face Recognition
Human Gait Recognition
CBIR
Outline
Dimensionality Reduction for Tensor-based
Objects
Graph Embedding: A General Framework for
Dimensionality Reduction
Learning using Privileged Information for
Face Verification and Person Re-identification
What is Tensor?
Tensors are arrays of numbers which transform in
certain ways under coordinate transformations.
1m
2m3m
1m
Vector Matrix 3rd-order Tensor
2m
1m
1 2 3m m m X R1 2m mX R
1mxR
2
1
m
ij ik kj
k
Y X U
100
100
100
100
100
100
.
.
.
100
10
100
10
100
10
100
10
100
10
100
10
=
=
=
.
.
.
Definition of Mode-k Product
(100)
2 )'(10m2 (100)m
2m(100)
1m(100)
1m
2 )'(10m
1m
2 (100)m
1(100)m
3 (40)m
2 )'(10m
2 (100)m
k U Y XNotation:
Product for two Matrices
Original
Matrix New
Matrix
=1(100)m
3 (40)m
2 )'(10m
Y XUProjection
Matrix
Original
Tensor
New
TensorProjection
Matrix
Projection:high-dimensional space
-> low-dimensional space
Reconstruction:low-dimensional space
-> high-dimensional space
Data Representation in Dimensionality Reduction
Vector Matrix 3rd-order Tensor
High Dimension
Low Dimension
Examples
PCA, LDA Rank-1 Decomposition, 2001
A. Shashua and A. Levin
Low rank approximation of matrix
J. Ye
Tensorface, 2002
M. Vasilescu and
D. Terzopoulos
Our Work
Xu et al., 2005
Yan et al., 2005
...
...
...
Filtered Image Video SequenceGray-level Image
What is Gabor Features?Gabor features can improve recognition performance in comparison
to grayscale features. C Liu and H Wechsler, T-IP, 2002
Gabor Wavelet Kernels
Eight Orientations
Fiv
e Sca
les
Input:
Grayscale
ImageOutput:
40 Gabor-filtered
Images
…
Why Represent Objects as Tensors
instead of Vectors? Natural Representation
Gray-level Images (2D structure)
Videos (3D structure)
Gabor-filtered Images (3D structure)
Enhance Learnability in Real Application
Curse of Dimensionality (Gabor-filtered image: 100*100*40 -> Vector: 400,000)
Small sample size problem
Reduce Computation Cost
... ...
Concurrent Subspace Analysis as an Example
(Criterion: Optimal Reconstruction)
1m
100
40
100
1U
2U
The reconstructed
sample
Input sample
Projection Matrices?
Sample in Low-
dimensional space
3U
1m
10
10
10
Dimensionality
Reduction
1m
100
40
100
Reconstruction
D. Xu, S. Yan, H. Zhang and et al., CVPR, 2005
31
* 3
1
2
1 1 1 3 3 3|
( | )
arg min || ... ||k k
k k
i iiU
U
U U U U
X X
Objective Function:
Connection to Previous Work –Tensorface(M. Vasilescu and D. Terzopoulos, 2002)
Person
Image Vector
Illumination
Pose
Expression
Image Object Dim 1
Image Object Dim 2
Image Object Dim 3
Image Object Dim 4
.
.
.
.
.
.
Image object 1 Image object 2
Image Object Dim 1
Image Object Dim 2
Image Object Dim 3
Image Object Dim 4
.
.
.
. . .
(a) Tensorface (b) CSA
From an algorithmic view or mathematics view, CSA and Tensorface are
both variants of Rank-(R1,R2,…,Rn) decomposition.
Tensorface CSA
Motivation Characterize external factors Characterize internal factors
Input: Gray-level Image Vector Matrix
Input: Gabor-filtered Image
(Video Sequence )
Not address 3rd-order tensor
When equal to PCA The number of images per
person are only one or are a
prime numberNever
Number of Images per Person
for Training
Lots of images per person One image per person
Experiments: Database Description
Number of Persons
(Images per person)
Image Size
(Pixels)
Example Images
Simulated Video
Sequence
60 (1) 646413
ORL database 40 (10) 5646
CMU PIE-1 sub-
database
60 (10) 6464
CMU PIE-2 sub-
databases
60 (10) 6464
Experiments: Object Reconstruction (1)
Input: Gabor-filtered images
ORL database
CMU PIE-1 database
Objective Evaluation Criterion:
Root Mean Squared Error (RMSE) and Compression Ratio (CR)
ORL database CMU PIE-1 database
Experiments: Object Reconstruction (2)
Original Images
Reconstructed Images from PCA
Reconstructed Images from CSA
Input: Simulated video sequence
Experiments: Face Recognition
Input: Gray-level images and Gabor-filtered images
ORL database
CMU PIE database
Algorithm CMU PIE-1 CMU PIE-2 ORL
PCA (Gray-level feature) 70.1% 28.3% 76.9%
PCA (Gabor feature) 80.1% 42.0% 86.6%
CSA (Ours) 90.5% 59.4 % 94.4%
Summary
• This is the first work to address dimensionality
reduction with a tensor representation of
arbitrary order.
• Opens a new research direction.
Bilinear and Tensor Subspace Learning (New
Research Direction)
• Concurrent Subspace Analysis (CSA), CVPR 2005 and T-CSVT 2008
• Discriminant Analysis with Tensor Representation (DATER): CVPR 2005 and T-IP 2007
• Rank-one Projections with Adaptive Margins (RPAM): CVPR 2006 and T-SMC-B 2007
• Enhancing Tensor Subspace Learning by Element Rearrangement: CVPR 2007 and T-PAMI 2009
• Discriminant Locally Linear Embedding with High Order Tensor Data(DLLE/T): T-SMC-B 2008
• Convergent 2D Subspace Learning with Null Space Analysis (NS2DLDA): T-CSVT 2008
• Semi-supervised Bilinear Subspace Learning: T-IP 2009
• Applications in Human Gait Recognition
– CSA+DATER: T-CSVT 2006
– Tensor Marginal Fisher Analysis (TMFA): T-IP 2007
Other researchers also published several papers along this direction!!!
Human Gait Recognition: Basic Modules
(a)
(d)
Pattern
Matching
(b)
(c)
Human Detection
and Tracking
Silhouette
Extraction
Feature
Extraction
Gallery Videos
Stored in
Database
Human Detection
and Tracking
Silhouette
Extraction
Feature
Extraction
Probe
Video
Classification
Yes or No
(Verification)
ID of Top N Candidates
(Identification)
Pattern
Matching
(a) (d): The extracted silhouettes from
one probe and gallery video;
(b) (c): The gray-level Gait Energy
Images (GEI).
USF HumanID
Experiment (Probe)# of Probe
Sets
Difference between Gallery and
Probe Set
A (G, A, L, NB, M/N) 122 View
B (G, B, R, NB, M/N) 54 Shoe
C (G, B, L, NB, M/N) 54 View and Shoe
D (C, A, R, NB, M/N) 121 Surface
E (C, B, R, NB, M/N) 60 Surface and Shoe
F (C, A, L, NB, M/N) 121 Surface and View
G (C, B, L, NB, M/N) 60 Surface, Shoe, and View
H (G, A, R, BF, M/N) 120 Briefcase
I (G, B, R, BF, M/N) 60 Briefcase and Shoe
J (G, A, L, BF, M/N) 120 Briefcase and View
K (G, A/B, R, NB, N) 33 Time, Shoe, and Clothing
L (C, A/B, R, NB, N) 33 Time, Shoe, Clothing, and Surface
1. Shoe types: A or B; 2. Carrying: with or without a briefcase; 3. Time: May or
November; 4. Surface: grass or concrete; 5. Viewpoint: left or right
Human Gait Recognition: Our Contributions
*The DNGR method additionally uses the manually annotated silhouettes, which
are not publicly available.
Methods Average Rank-1 Results (%)
Our Recent Work (Ours, TIP 2012) 70.07
DNGR (Sarkar’s group, TPAMI 2006) 62.81
Image-to-Class distance (Ours, TCSVT 2010) 61.19
GTDA (Maybank’s group, TPAMI 2007) 60.58
Bilinear Subspace Learning method 2: MMFA (Ours, TIP 2007)
59.9%
Bilinear Subspace Learning method 1: CSA + DATER (Ours, TCSVT 2006)
58.5%
PCA+LDA (Bhanu’s group, TPAMI 2006) 57.70%
Top ranked results on the benchmark USF HumanID dataset
How to Utilize More Correlations?
Pixel
Rearrangement
Sets of highly
correlated pixels
Columns of highly
correlated pixels
Pixel Rearrangement
Potential Assumption in Previous Tensor-based Subspace Learning:
Intra-tensor correlations: Correlations among the features within certain
tensor dimensions, such as rows, columns and Gabor features…
D. Xu, S. Yan et al., T-PAMI 2009
Problem Definition
• The task of enhancing correlation/redundancy among
2nd–order tensor is to search for a pixel rearrangement
operator R, such that
* 2
,1
arg m in { m in || || }N
R T R T
i iR U V
i
R X U U X V V
1. is the rearranged matrix from sample
2. The column numbers of U and V are predefinediX
R
iX
After pixel rearrangement, we can use the rearranged
tensors as input for concurrent subspace analysis
Solution to Pixel Rearrangement Problem
26
Compute reconstructed matrices
1
, 1 1 1 1nRR e c T T
i n n n i n nX U U X V V
Optimize operator R
2
,
1
arg m in || ||N
R Rec
n i i nR
i
R X X
Optimize U and V
n=n+1
Initialize U0, V0
2
,1
( , ) arg m in || ||n n
NR RT T
n n i iU V
i
U V X U U X V V
2 2
, 1 1 1 1
1 1
: || || || ||n n n
N NR R RR ec T T
i i n i n n i n n
i i
N ote X X X U U X V V
• It is an integer programming problem
How to optimize R
* 2
,
1
arg m in || ||N
R Rec
i i nR
i
R X X
,
min .
1 : 0 1; 2 : 1; 3 : 1
pq pqR
p q
pq pq pq
p q
c R st
R R R
2
,
1
| ( ) ( ) |N
Rec
pq i i n
i
w here c X p X q
1. Linear programming problem has integer solution.
2. We constrain the rearrangement within local neighborhood for speedup.
p
Original matrix
Reconstructed matrix
q
pqc
Sender
Receiver
Outline
Dimensionality Reduction for Tensor-based
Objects
Graph Embedding: A General Framework for
Dimensionality Reduction
Learning using Privileged Information for
Face Verification and Person Re-identification
Representative Previous Work
ISOMAP: Geodesic
Distance Preserving
J. Tenenbaum et al., 2000
LLE: Local Neighborhood
Relationship Preserving
S. Roweis & L. Saul, 2000
LE/LPP: Local Similarity
Preserving, M. Belkin, P.
Niyogi et al., 2001, 2003
PCA LDA
Dimensionality Reduction Algorithms
• Any common perspective to understand and explain these
dimensionality reduction algorithms? Or any unified formulation
that is shared by them?
• Any general tool to guide developing new algorithms for
dimensionality reduction?
Statistics-based Geometry-based
PCA/KPCA LDA/KDA … ISOMAP LLE LE/LPP …
Matrix Tensor
Hundreds
Our Answers
Direct Graph Embedding
1minT
T
y B yy Ly
Original PCA & LDA,
ISOMAP, LLE,
Laplacian Eigenmap
Linearization
PCA, LDA, LPP
wXy T
Kernelization
KPCA, KDA
)( iii xw
Tensorization
CSA, DATER
n
nii wwwy 2
2
1
1X
Type
Formulation
Example
S. Yan, D. Xu et al., CVPR 2005, T-PAMI 2007
Direct Graph Embedding
1 2[ , , . . . , ]NX x x x1 2[ , , . . . , ] T
Ny y y y
Data in high-dimensional space and low-
dimensional space (assumed as 1D space here):
L, B: Laplacian matrix from S, SP;
[ , ]i i jG Sx
Intrinsic Graph:
Penalty Graph
S, SP: Similarity matrix (graph edge)
[ , ]P P
i jiG Sx
, i i i jj iL D S D S i
Similarity in high
dimensional space
Direct Graph Embedding -- Continued
1 2[ , , . . . , ]NX x x x1 2[ , , . . . , ] T
Ny y y y
* 2
1 1
1 1
arg m in || || arg m ini j ijy y or y y ori jy B y y B y
y y y S y L y
* 2
1
1
arg m in || ||i j ijy y or i jy By
y y y S
Data in high-dimensional space and low-
dimensional space (assumed as 1D space here):
L, B: Laplacian matrix from S, SP;[ , ]i i jG Sx
Criterion to Preserve Graph Similarity:
Intrinsic Graph:
Penalty Graph
S, SP: Similarity matrix (graph edge)
Special case B is Identity matrix (Scale normalization)
[ , ]P P
i jiG Sx
Problem: It cannot handle new test data.
, i i i jj iL D S D S i
Similarity in high
dimensional space
Linearization
y X w
*
1
1
arg m inw w or
w XBX w
w w XL X w
Linear mapping function
Objective function in Linearization
Intrinsic Graph
Penalty Graph
Problem: linear mapping function is not enough to preserve
the real nonlinear structure?
Kernelization
: ix Ff
the original input space to another
higher dimensional Hilbert space.
Nonlinear mapping:
( , ) ( ) ( )k x y x y ( , )i j i jK k x x
( )i iiw x
*
1
1
arg m inK or
K BK
a K LK
Kernel matrix:
Constraint:
Objective function in Kernelization
Intrinsic Graph
Penalty Graph
)( ix
Tensorization
Low dimensional representation is
obtained as:
Objective function in Tensorization
1 2
1 2 . . . n
i i ny w w w X
1 1 2 1 2
1
* 2
1 2 1 2( ,..., ) 1
( , ..., ) a rg m in || ... ... ||n n n
ni n j n ij
f w w i j
w w w w w w w w S
X X
1 1 2
1 1 2 1 2
2
1 21
2
1 2 1 2
( , ..., ) || ... ||
( , ..., ) || ... ... ||
n n
n n n
N
i n iii
P
i n j n ij
i j
f w w w w w B or
f w w w w w w w w S
X
X Xwhere
Intrinsic Graph
Penalty Graph
Common Formulation
Tensorization
1 1 2 1 2
1
* 2
1 2 1 2( ,..., ) 1
( , ..., ) a rg m in || ... ... ||n n n
ni n j n ij
f w w i j
w w w w w w w w S
X X
1 1 2
1 1 2 1 2
2
1 21
2
1 2 1 2
( , ..., ) || ... ||
( , ..., ) || ... ... ||
n n
n n n
N
i n iii
P
i n j n ij
i j
f w w w w w B or
f w w w w w w w w S
X
X Xwhere
Linearization
Kernelization
Direct Graph Embedding
L, B: Laplacian matrix from S, SP;
S, SP: Similarity matrixIntrinsic graph
Penalty graph
*
1
1
arg m inw w or
w XBX w
w w XL X w
*
1
1
arg m inK or
K BK
a K LK
*
1
1
arg miny y or
y By
y y L y
A General Framework for Dimensionality Reduction
Algorithm S & B Definition Embedding Type
PCA/KPCA/CSA L/K/T
LDA/KDA/DATER L/K/T
ISOMAP D
LLE D
LE/LPPif ; B=D
D/L
1 , ;Ni jS i j B I
1, ,
i j ii j l l l NS n B I e e
( ) , ;i j G i jS D i j B I
;S M M M M B I
2e x p { | | | | / }i j i jS x x t
| | | |i jx x
D: Direct Graph Embedding
L: Linearization
K: Kernelization
T: Tensorization
General Framework: PCA and LDA
1 , ;NijW i j B I
*
1w ww arg min w C w
1 1 1( )( ) ( )i iN N N
i
C x x x x X I ee X
*
1
1
1
1
( )( ) ( )
( )( )
i i
c
c
c
W W
w wB
Nl l
W i i
i
N
c c
c
Nc c T
n
c
B c W
w S w Sw arg min rg min
w S w
S x x x x
x x x x
w wa
w Cw
X I e e X
S n NC S
[1,1, ...,1]e
Principal Component Analysis Linear Discriminant Analysis
1, ,
i j iij l l l NW n B I ee
General Framework: ISOMAP
( ) , ;ij G ijW D i j B I
2
1 1
1 1 1 1' ' ' '2 ' ' '
1 1 1 1' ' ' '2 2 2' ' ' '2
( ) ( / 2)
( ( ) ( ) / 2)
( ( ))
( ) ( )
0
G ij ijj j
ijN Nj
ij i j ij i jN N Nj i j i
ij ij i j i jN Nj jj ji jj iN
D HSH
I ee S I ee
S S S S
S S S S
Geodesic Distance Preserving
GD
Geodesic Distance Matrix Inner Product Matrix
2
( ) / 2
1/
G
ij ij ij ij
D HSH
S D and H N
Multidimensional Scaling * max ( ) T
Gy
y Arg y D y
General Framework: LLE
;W M M M M B I
[( )( )] ( )
1 ( )
1 ( )
1 ( )
1 0
ij ij ij ji ijj j
ij ji ki kjj j k
ij ji ki kjj k j
ij ji kij k
ij ji kij j k
I M I M I M M M M
M M M M
M M M M
M M M
M M M
Local neighborhood relationship preserving
i ij jjX W X
* min
T
y
ij ij ij ji ki kj
k
y Arg y M y
where M w w w w
2( ) | | i ij iji
y y W y
General Framework: Laplacian Eigenmap/ LPP
2 || || 2exp{ }, || ||
0 otherwise.
i j
i jtijW
x xx x
B=D
2
1min ( )
T i j ij
y D yij
y y W
2 || || 2exp{ }, || ||
0 otherwise.
x xx xi j
i jtijW
1minT
T
y D yy Ly
, ii ijjL D W D W
T TXLX XDXw w
Ty X w
Local similarity preserving
Laplacian Eigenmap LPP
New Dimensionality Reduction Algorithm:
Marginal Fisher Analysis
ijS
Important Information
for face recognition:
1) Label information
2) Local manifold structure
(neighborhood or margin)
1: if xi is among the k1-nearest neighbors of xj in the same class;
0 : otherwise
1: if the pair (i,j) is among the k2 shortest pairs among the data set;
0: otherwise
P
ijS
Experiments: Face Recognition
PIE-1 G3/P7 G4/P6
PCA+LDA (Linearization) 65.8% 80.2%
PCA+MFA (Ours) 71.0% 84.9%
KDA (Kernelization) 70.0% 81.0%
KMFA (Ours) 72.3% 85.2%
DATER-2 (Tensorization) 80.0% 82.3%
TMFA-2 (Ours) 82.1% 85.2%
ORL G3/P7 G4/P6
PCA+LDA (Linearization) 87.9% 88.3%
PCA+MFA (Ours) 89.3% 91.3%
KDA (Kernelization) 87.5% 91.7%
KMFA (Ours) 88.6% 93.8%
DATER-2 (Tensorization) 89.3% 92.0%
TMFA-2 (Ours) 95.0% 96.3%
Summary
• Optimization framework that unifies previous
dimensionality reduction algorithms as special
cases.
• A new dimensionality reduction algorithm:
Marginal Fisher Analysis.
Outline
Dimensionality Reduction for Tensor-based
Objects
Graph Embedding: A General Framework for
Dimensionality Reduction
Learning using Privileged Information for
Face Verification and Person Re-identification
V. Vapnik and A. Vashist, A new learning paradigm: Learning using privileged information. Neural Networks, 544–557, 2009.
Privileged information:
Information available only in the training process (not available in the testing
process).
Training: attending classes in the classroom
Testing: taking an exam
Privileged Information: teacher's instruction
Learning using Privileged Information
oracle function
SVM+
(Primal Form)
primal form of SVM
: main feature
: privileged information
Applications in Image and Video Recognition
Testing Data
Azimut 95 Luxury Yacht at the Miami
International Boat Show 2012 Azimut-
Benetti Yachts sees 20 per cent gain in
new luxury yacht sales
caption, tags, keywords, ....
Training Data
1) Web images/videos are usually associated with additional surrounding textual
descriptions (tags, captions, etc.)
2) RGB-D images from Kinect cameras contain additional depth images
Distance Metric Learning using
Privileged Information• Distance Metric Learning for Face Verification
– Mahalanobis distance:
• Distance Metric Learning using Privileged Information
– Additional depth information is used as privileged information
Similar pair
Dissimilar pair
Similar or dissimilar?
Similar or dissimilar?
Similar pair
Dissimilar pair
Similar or dissimilar?
Similar or dissimilar?
Information Theoretic Distance
Metric Learning (ITML)• Distance Metric Learning
– Given training dataset , where is the feature vector
and their labels:
: two samples belong to the same subject
: two samples are from two different subjects
• Information Theoretic Distance Metric Learning
where
Information Theoretic Distance Metric Learning
using Privileged Information (ITML+)
• Objective Function– We learn and simultaneously:
where
– Discussions:• For similar pairs, the learned distance should be more similar
• For dissimilar pairs, the learned distance should be more dissimilar
X. Xu, W. Li and D. Xu, T-NNLS 2015
Experiments: Face Verification
• Setting:– Main features (i.e. gradient-LBP features) from RGB
images
– Privileged information (i.e. gradient-LBP features)
from depth images
Results (AP% and AUC%) on the EUROCOM face dataset
Experiments: Person Re-identification
• Setting:– Main features (i.e. kernel descriptor features) from
RGB images
– Privileged information (i.e. kernel descriptor features)
from depth images
Results (mean Rank-1 recognition rates%) on the BIWI RGBD-ID dataset
Summary
• A new distance metric learning method to
take advantage of additional depth images
as privileged information