chapter 2 dimensionality reduction. linear methods

31
Chapter 2 Dimensionality Reduction. Linear Methods

Upload: ursula-logan

Post on 26-Dec-2015

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 2 Dimensionality Reduction. Linear Methods

Chapter 2Dimensionality Reduction. Linear Methods

Page 2: Chapter 2 Dimensionality Reduction. Linear Methods

2.1 Introduction

• Dimensionality reduction– the process of finding a suitable lower

dimensional space in which to represent the original data

• Goal:– Explore high-dimensional data– Visualize the data using 2-D or 3-D.– Analyze the data using statistical methods, such as

clustering, smoothing

Page 3: Chapter 2 Dimensionality Reduction. Linear Methods

possible methods

• just select subsets of the variables for processing

• An alternative would be to create new variables that are functions

• The methods we describe in this book are of the second type

Page 4: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.1

• A projection will be in the form of a matrix that takes the data from the original space to a lower-dimensional one.

• To project onto a line that is θ radians from the horizontal or x axis

• projection matrix P

Page 5: Chapter 2 Dimensionality Reduction. Linear Methods
Page 6: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.1

Page 7: Chapter 2 Dimensionality Reduction. Linear Methods

2.2 Principal Component Analysis . PCA

• Aim:– (PCA) is to reduce the dimensionality from p to d,

where d < p, while at the same time accounting for as much of the variation in the original data set as possible

• new set of coordinates or variables that are a linear combination of the original variables

• the observations in the new principal component space are uncorrelated.

Page 8: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.1 PCA Using the Sample Covariance Matrix

• centered data matrix Xc that has dimension• Variable definition:

Page 9: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.1 PCA Using the Sample Covariance Matrix

• The next step is to calculate the eigenvectors and eigenvalues of the matrix S

orthonormal

subject to the condition that the set of eigenvectors is orthonormal.

Page 10: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.1 PCA Using the Sample Covariance Matrix

• A major result in matrix algebra shows that any square, symmetric, nonsingular matrix can be transformed to a diagonal matrix using

• the columns of A contain the eigenvectors of S, and L is a diagonal matrix with the eigenvalues along the diagonal.

• By convention, the eigenvalues are ordered in descending order

Page 11: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.1 PCA Using the Sample Covariance Matrix

• use the eigenvectors of S to obtain new variables called principal components (PCs)

• Equation 2.2 shows that the PCs are linear combinations of the original variables.

• Scaling the eigenvectors• Using wj in the transformation yields PCs that

are uncorrelated with unit variance.

Page 12: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.1 PCA Using the Sample Covariance Matrix

• transform the observations to the PC coordinate system via the following equation

• The matrix Z contains the principal component scores

• To summarize: the transformed variables are the PCs and the individual transformed data values are the PC scores.

Page 13: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.1 PCA Using the Sample Covariance Matrix

• linear algebra theorem: the sum of the variances of the original variables is equal to the sum of the eigenvalues

• The idea of dimensionality reduction with PCA is that one could include in the analysis only those PCs that have the highest eigenvalues

• Reduce the dimensionality to d with the following

Ad contains the first d eigenvectors or columns of A

Page 14: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.2 PCA Using the Sample Correlation Matrix

• We can scale the data first to have standard units

• The standardized data x* are then treated as observations in the PCA process.

• sample correlation matrix R• The correlation matrix should be used for PCA

when the variances along the original dimensions are very different

Page 15: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.2 PCA Using the Sample Correlation Matrix

• Something should be noted:– Methods for statistical inference based on the

sample PCs from covariance matrices are easier and are available in the literature.

– the PCs obtained from the correlation and covariance matrices do not provide equivalent information.

Page 16: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.3 How Many Dimensions Should We Keep?

• following possible ways to address this question– Cumulative Percentage of Variance Explained– Scree Plot– The Broken Stick– Size of Variance

• Example 2.2– We show how to perform PCA using the yeast cell

cycle data set.

Page 17: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.3 How Many Dimensions Should We Keep?

• Cumulative Percentage of Variance Explained• The idea is to select those d PCs that

contribute a specified cumulative percentage of total variation in the data

Page 18: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.3 How Many Dimensions Should We Keep?

• Scree Plot• graphical way to decide the number of PCs• The original idea: a plot of lk (the eigenvalue) versus k

(the index of the eigenvalue)– In some cases, we might plot the log of the eigenvalues

when the first eigenvalues are very large• looks for the elbow in the curve or the place where

the curve levels off and becomes almost flat.• The value of k at this elbow is an estimate for how

many PCs to retain

Page 19: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.3 How Many Dimensions Should We Keep?

• The Broken Stick• choose the number of PCs based on the size of

the eigenvalue or the proportion of the variance explained by the individual PC.

• take a line segment and randomly divide it into p segments, the expected length of the k-th longest segmentthe proportion of the variance explained by the k-th PC is greater than gk, that PC is kept.

Page 20: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.3 How Many Dimensions Should We Keep?

• Size of Variance• we would keep PCs if

where

Page 21: Chapter 2 Dimensionality Reduction. Linear Methods

2.2.3 How Many Dimensions Should We Keep?

• Example 2.2• Yeast that these contain 384 gene corresponding to five phases, measured

at 17 time points.

Page 22: Chapter 2 Dimensionality Reduction. Linear Methods

2.3 Singular Value Decomposition . SVD

• provides a way to find the PCs without explicitly calculating the covariance matrix

• the technique is valid for an arbitrary matrix– We use the noncentered form in the explanation

that follows• The SVD of X is given by

– where U is an matrix D is a diagonal matrix with n rows and p columns, and V has dimensions

Page 23: Chapter 2 Dimensionality Reduction. Linear Methods

2.3 Singular Value Decomposition . SVD

Page 24: Chapter 2 Dimensionality Reduction. Linear Methods

2.3 Singular Value Decomposition . SVD

• the first r columns of V form an orthonormal basis for the column space of X

• the first r columns of U form a basis for the row space of X

• As with PCA, we order the singular values decreasing ordered and impose the same order on the columns of U and V

• approximation to the original matrix X is obtained

Page 25: Chapter 2 Dimensionality Reduction. Linear Methods

2.3 Singular Value Decomposition . SVD

• Example 2.3• applied to information retrieval (IR)• start with a data matrix, where each row

corresponds to a term, and each column corresponds to a document in the corpus

• this query is given by a column vector

Page 26: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.3

Page 27: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.3

• Method to find the most relevant documents– cosine of the angle between the query vectors and

the columns– use a cutoff value of 0.5– the second query matches with the first book, but

misses the fourth one

Page 28: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.3

• The idea is that some of the dimensions represented by the full term-document matrix are noise and that documents will have closer semantic structure after dimensionality reduction using SVD

• find the representation of the query vector in the reduced space given by the first k columns of U

Page 29: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.3

• Why?• Consider • Note that The columns of U and V are

orthonormal• Equation 2.6 left-multiply

1 1( , , , ) ( , , , )T T Tk n k nU X X D V V

Page 30: Chapter 2 Dimensionality Reduction. Linear Methods

Example 2.3

Using a cutoff value of 0.5,we now correctly have documents 1 and 4 as beingrelevant to our queries on baking bread and baking.

Page 31: Chapter 2 Dimensionality Reduction. Linear Methods

Thanks