dimensionality reduction chapter 3 (duda et al.) – section 3.8 cs479/679 pattern recognition dr....

52
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

Upload: laurel-eaton

Post on 19-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

Dimensionality Reduction

Chapter 3 (Duda et al.) – Section 3.8

CS479/679 Pattern RecognitionDr. George Bebis

Page 2: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

Data Dimensionality

• From a theoretical point of view, increasing the number of features should lead to better performance.

• In practice, the inclusion

of more features leads to

worse performance (i.e., curse

of dimensionality).

• Need exponential number of training

examples as dimensionality increases.

Page 3: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

3

Dimensionality Reduction

• Significant improvements can be achieved by first mapping the data into a lower-dimensional space.

• Dimensionality can be reduced by:− Combining features (linearly or non-linearly)− Selecting a subset of features (i.e., feature selection).

• We will focus on feature combinations first.

Page 4: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

Dimensionality Reduction (cont’d)

• Linear combinations are particularly attractive because they are simple to compute and analytically tractable.

• Given x ϵ RN, the goal is to find an N x K transformation matrix U such that:

y = UTx ϵ RK where K<<N

4

Page 5: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

Dimensionality Reduction (cont’d)

• Idea: find a set of basis vectors in a lower dimensional space.

5

(1) Higher-dimensional space representation:

(2) Lower-dimensional sub-space representation:

Page 6: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

Dimensionality Reduction (cont’d)

• Two classical approaches for finding optimal linear transformations are:

− Principal Components Analysis (PCA): Seeks a projection that preserves as much information in the data as possible (in a least-squares sense).

− Linear Discriminant Analysis (LDA): Seeks a projection that best separates the data (in a least-squares sense).

6

Page 7: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

7

Principal Component Analysis (PCA)

• Dimensionality reduction implies information loss; PCA preserves as much information as possible, that is, it minimizes the error:

• How should we determine the “best” lower dimensional space?

The “best” low-dimensional space can be determined by the “best” eigenvectors of the covariance matrix of the data (i.e., the eigenvectors corresponding to the “largest” eigenvalues – also called “principal components”).

Page 8: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

8

PCA - Steps

− Suppose x1, x2, ..., xM are N x 1 vectors

1

M

(i.e., center at zero)

Page 9: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

9

PCA – Steps (cont’d)

( ).

( . )i

ii i

x x ub

u u

an orthogonal basis

where

Page 10: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

10

PCA – Linear Transformation

• The linear transformation RN RK that performs the dimensionality reduction is:

( ).( ).

( . )i

i ii i

x x ub x x u

u u

If ui has unit length:

Page 11: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

11

Geometric interpretation

• PCA projects the data along the directions where the data varies the most.

• These directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues.

• The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector directions.

Page 12: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

12

How to choose K?

• Choose K using the following criterion:

• In this case, we say that we “preserve” 90% or 95% of the information in the data.

• If K=N, then we “preserve” 100% of the information in the data.

Page 13: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

13

Error due to dimensionality reduction

• The original vector x can be reconstructed using its principal components:

• PCA minimizes the reconstruction error:

• It can be shown that the reconstruction error is:

Page 14: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

14

Normalization

• The principal components are dependent on the units used to measure the original variables as well as on the range of values they assume.

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data to have zero mean and unit standard deviation:

Page 15: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

15

Application to Faces• Computation of low-dimensional basis (i.e.,eigenfaces):

Page 16: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

16

Application to Faces• Computation of the eigenfaces – cont.

1

M

Page 17: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

17

Application to Faces• Computation of the eigenfaces – cont.

ui

Ti i iAA u u

i i i iu Av and

Page 18: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

18

Application to Faces• Computation of the eigenfaces – cont.

each face Φi can be represented as follows:

(i.e., using ATA)

Page 19: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

19

Example

Training images

Page 20: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

20

Example (cont’d)

Top eigenvectors: u1,…uk

Mean: μ

Page 21: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

21

Application to Faces• Representing faces onto this basis

Face reconstruction:

( || || 1)jwhere u

Page 22: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

22

Eigenfaces• Case Study: Eigenfaces for Face Detection/Recognition

− M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.

• Face Recognition

− The simplest approach is to think of it as a template matching problem.

− Problems arise when performing recognition in a high-dimensional space.

− Significant improvements can be achieved by first mapping the data into a lower dimensionality space.

Page 23: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

23

Eigenfaces• Face Recognition Using Eigenfaces

( || || 1)iwhere u

2

1

|| || ( )K

l li i

i

w w

where

The distance er is called distance in face space (difs)

Page 24: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

24

Face detection and recognition

Detection Recognition “Sally”

Page 25: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

25

Eigenfaces• Face Detection Using Eigenfaces

( || || 1)iwhere u

− The distance ed is called distance from face space (dffs)

Page 26: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

26

Eigenfaces

Reconstructed face lookslike a face.

Reconstructed non-facelooks like a face again!

Input Reconstructed

Page 27: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

27

Eigenfaces• Face Detection Using Eigenfaces – cont.

Case 1: in face space AND close to a given faceCase 2: in face space but NOT close to any given faceCase 3: not in face space AND close to a given faceCase 4: not in face space and NOT close to any given face

Page 28: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

28

Reconstruction using partial information

• Robust to partial face occlusion.

Input Reconstructed

Page 29: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

29

Eigenfaces

• Face detection, tracking, and recognition

Visualize dffs:

Page 30: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

30

Limitations

• Background changes cause problems− De-emphasize the outside of the face (e.g., by multiplying the input

image by a 2D Gaussian window centered on the face).

• Light changes degrade performance− Light normalization might help.

• Performance decreases quickly with changes to face size− Scale input image to multiple sizes.− Multi-scale eigenspaces.

• Performance decreases with changes to face orientation (but not as fast as with scale changes)

− Plane rotations are easier to handle.− Out-of-plane rotations are more difficult to handle.− Multi-orientation eigenspaces.

Page 31: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

31

Limitations (cont’d)

• Not robust to misalignment.

Page 32: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

32

Limitations (cont’d)

• PCA is not always an optimal dimensionality-reduction technique for classification purposes.

Page 33: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

33

Linear Discriminant Analysis (LDA)

• What is the goal of LDA?

− Perform dimensionality reduction “while preserving as much of the class discriminatory information as possible”.

− Seeks to find directions along which the classes are best separated by taking into consideration the scatter within-classes but also the scatter between-classes.

Page 34: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

34

Case of C classes

1 1

( )( )iMC

Tw j i j i

i j

S x x

μ μ

Page 35: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

35

Case of C classes (cont’d)

• LDA seeks projections that maximize the between-class scatter while minimizing the within-class scatter:

| | | |max max

| || |

Tb b

Tww

S U S U

U S US

TUy x

,b wS S

• Suppose the desired projection transformation is given by:

• Suppose the scatter matrices of the projected data y are:

Page 36: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

36

Case of C classes (cont’d)

• This is equivalent to the following generalized eigenvector problem:

• The columns of the matrix U in the linear transformation are the eigenvectors (i.e., called Fisherfaces) corresponding to the largest eigenvalues.

• Important: Sb has at most rank C-1; therefore, the max number of eigenvectors with non-zero eigenvalues is C-1 (i.e., max dimensionality of sub-space is C-1)

b k k w kS u S u

TUy x

Page 37: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

37

Case of C classes (cont’d)

• If Sw is non-singular, we can solve a conventional eigenvalue problem as follows:

• In practice, Sw is singular since the data are image vectors with large dimensionality while the size of the data set is much smaller (M << N )

1w b k k kS S u u

b k k w kS u S u

Page 38: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

38

Does Sw-1 always exist? (cont’d)

• To alleviate this problem, PCA can be used first:

1) PCA is first applied to the data set to reduce its dimensionality.

2) LDA is then applied to find the most discriminative directions:

Page 39: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

39

Case Study I

− D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996.

• Content-based image retrieval:

− Application: query-by-example content-based image retrieval

− Question: how to select a good set of image features for content-based image retrieval?

Page 40: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

40

Case Study I (cont’d)• Assumptions

− "Well-framed" images are required as input for training and query-by-example test probes.

− Only a small variation in the size, position, and orientation of the objects in the images is allowed.

Page 41: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

41

Case Study I (cont’d)

• Terminology− Most Expressive Features (MEF): features obtained using PCA.− Most Discriminating Features (MDF): features obtained using LDA.

• Numerical instabilities

− When computing the eigenvalues/eigenvectors of Sw-1SBuk = kuk

numerically, the computations can be unstable since Sw-1SB is not

always symmetric; look in the paper for more information.

Page 42: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

42

Case Study I (cont’d)

• Comparing MEF with MDF:− MEF vectors show the tendency of PCA to capture major variations

in the training set such as lighting direction.− MDF vectors discount those factors unrelated to classification.

Page 43: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

43

Case Study I (cont’d)

• Clustering effect

Page 44: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

44

Case Study I (cont’d)

1) Represent each training image in terms of MEFs/MDFs.

2) Represent a query image in terms of MEFs/MDFs.

3) Find the k closest neighbors for retrieval (e.g., using Euclidean distance).

• Methodology

Page 45: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

45

Case Study I (cont’d)

• Experiments and results

Face images− A set of face images was used with 2 expressions, 3 lighting conditions.

− Testing was performed using a disjoint set of images.

Page 46: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

46

Case Study I (cont’d)

Page 47: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

47

Case Study I (cont’d)

− Examples of correct search probes

Page 48: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

48

Case Study I (cont’d)

− Example of a failed search probe

Page 49: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

49

Case Study II

− A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001.

• Is LDA always better than PCA?

− There has been a tendency in the computer vision community to prefer LDA over PCA.

− This is mainly because LDA deals directly with discrimination between classes while PCA does not pay attention to the underlying class structure.

Page 50: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

50

Case Study II (cont’d)

AR database

Page 51: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

51

Case Study II (cont’d)

LDA is not always better when the training set is small

Page 52: Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8 CS479/679 Pattern Recognition Dr. George Bebis

52

Case Study II (cont’d)

LDA outperforms PCA when the training set is large