machine learning for data science (cs4786) …pca: variance maximization pick directions along which...

80
PCA and Random Projections Machine Learning for Data Science (CS4786) Lecture 4

Upload: others

Post on 12-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Machine Learning for Data Science (CS4786)Lecture 10

PCA and Random Projections

Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/

Machine Learning for Data Science (CS4786)

Lecture 4

Clustering

Course Webpage :

http://www.cs.cornell.edu/Courses/cs4786/2017fa/

Page 2: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

DIM REDUCTION: LINEAR TRANSFORMATION

Pick a low dimensional subspace

Project linearly to this subspace

Subspace retains as much informationX

d

n Yn

K

K

Wd =⇥

Page 3: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

DIM REDUCTION: LINEAR TRANSFORMATION

Pick a low dimensional subspace

Project linearly to this subspace

Subspace retains as much informationX

d

n Yn

K

K

Wd =⇥

y>i = x>

i W

Page 4: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: VARIANCE MAXIMIZATION

Pick directions along which data varies the mostFirst principal component:

w1 = arg maxw∶�w�2=1

1n

n�t=1�w�xt − 1

n

n�t=1

w�xt�2

= arg maxw∶�w�2=1

1n

n�t=1�w�(xt − µ)�2

= arg maxw∶�w�2=1

1n

n�t=1

w�(xt − µ)(xt − µ)�w= arg max

w∶�w�2=1w�⌃w

⌃ is the covariance matrix

PCA: VARIANCE MAXIMIZATION

Pick directions along which data varies the mostFirst principal component:

w1 = arg maxw∶�w�2=1

1n

n�t=1�w�xt − 1

n

n�t=1

w�xt�2

= arg maxw∶�w�2=1

1n

n�t=1�w�(xt − µ)�2

= arg maxw∶�w�2=1

1n

n�t=1

w�(xt − µ)(xt − µ)�w= arg max

w∶�w�2=1w�⌃w

⌃ is the covariance matrix

Spread =1

n

nX

t=1

dist2 yt,

1

n

nX

t=1

yt

!

<latexit sha1_base64="LS7mndCO6K0hRJR1bmZGPQIvwjE=">AAACU3icdVFBaxQxFM5Mq62r1VWPXkIXoYIsySLt9lAo9uKxotsWdrZDJvNmN22SGZI34jLMfyxCD/4RLx40065gxT4IfHzf+3jvfckqrTwy9j2K19YfPNzYfNR7/GTr6bP+8xcnvqydhIksdenOMuFBKwsTVKjhrHIgTKbhNLs86vTTL+C8Ku1nXFYwM2JuVaGkwECl/YsE4Ss603zqbDlt6QFNCidkw9vGtjTxtUkbPODtuaWJEbgIvXnYqz0fJRoK3Fmm+PZ+S1Bp4tR8gW/S/oANGWOcc9oBvrfLAtjfH4/4mPJOCjUgqzpO+9+SvJS1AYtSC++nnFU4a4RDJTW0vaT2UAl5KeYwDdAKA37W3GTS0teByWlRuvAs0hv2b0cjjPdLk4XO7ir/r9aR/9OmNRbjWaNsVSNYeTuoqDXFknYB01w5kKiXAQjpVNiVyoUI8WD4hl4I4c+l9H5wMhpyNuQf3w0O36/i2CSvyDbZIZzskUPygRyTCZHkivwgvyISXUc/4zhev22No5XnJblT8dZvnGG07g==</latexit><latexit sha1_base64="LS7mndCO6K0hRJR1bmZGPQIvwjE=">AAACU3icdVFBaxQxFM5Mq62r1VWPXkIXoYIsySLt9lAo9uKxotsWdrZDJvNmN22SGZI34jLMfyxCD/4RLx40065gxT4IfHzf+3jvfckqrTwy9j2K19YfPNzYfNR7/GTr6bP+8xcnvqydhIksdenOMuFBKwsTVKjhrHIgTKbhNLs86vTTL+C8Ku1nXFYwM2JuVaGkwECl/YsE4Ss603zqbDlt6QFNCidkw9vGtjTxtUkbPODtuaWJEbgIvXnYqz0fJRoK3Fmm+PZ+S1Bp4tR8gW/S/oANGWOcc9oBvrfLAtjfH4/4mPJOCjUgqzpO+9+SvJS1AYtSC++nnFU4a4RDJTW0vaT2UAl5KeYwDdAKA37W3GTS0teByWlRuvAs0hv2b0cjjPdLk4XO7ir/r9aR/9OmNRbjWaNsVSNYeTuoqDXFknYB01w5kKiXAQjpVNiVyoUI8WD4hl4I4c+l9H5wMhpyNuQf3w0O36/i2CSvyDbZIZzskUPygRyTCZHkivwgvyISXUc/4zhev22No5XnJblT8dZvnGG07g==</latexit><latexit sha1_base64="LS7mndCO6K0hRJR1bmZGPQIvwjE=">AAACU3icdVFBaxQxFM5Mq62r1VWPXkIXoYIsySLt9lAo9uKxotsWdrZDJvNmN22SGZI34jLMfyxCD/4RLx40065gxT4IfHzf+3jvfckqrTwy9j2K19YfPNzYfNR7/GTr6bP+8xcnvqydhIksdenOMuFBKwsTVKjhrHIgTKbhNLs86vTTL+C8Ku1nXFYwM2JuVaGkwECl/YsE4Ss603zqbDlt6QFNCidkw9vGtjTxtUkbPODtuaWJEbgIvXnYqz0fJRoK3Fmm+PZ+S1Bp4tR8gW/S/oANGWOcc9oBvrfLAtjfH4/4mPJOCjUgqzpO+9+SvJS1AYtSC++nnFU4a4RDJTW0vaT2UAl5KeYwDdAKA37W3GTS0teByWlRuvAs0hv2b0cjjPdLk4XO7ir/r9aR/9OmNRbjWaNsVSNYeTuoqDXFknYB01w5kKiXAQjpVNiVyoUI8WD4hl4I4c+l9H5wMhpyNuQf3w0O36/i2CSvyDbZIZzskUPygRyTCZHkivwgvyISXUc/4zhev22No5XnJblT8dZvnGG07g==</latexit><latexit sha1_base64="LS7mndCO6K0hRJR1bmZGPQIvwjE=">AAACU3icdVFBaxQxFM5Mq62r1VWPXkIXoYIsySLt9lAo9uKxotsWdrZDJvNmN22SGZI34jLMfyxCD/4RLx40065gxT4IfHzf+3jvfckqrTwy9j2K19YfPNzYfNR7/GTr6bP+8xcnvqydhIksdenOMuFBKwsTVKjhrHIgTKbhNLs86vTTL+C8Ku1nXFYwM2JuVaGkwECl/YsE4Ss603zqbDlt6QFNCidkw9vGtjTxtUkbPODtuaWJEbgIvXnYqz0fJRoK3Fmm+PZ+S1Bp4tR8gW/S/oANGWOcc9oBvrfLAtjfH4/4mPJOCjUgqzpO+9+SvJS1AYtSC++nnFU4a4RDJTW0vaT2UAl5KeYwDdAKA37W3GTS0teByWlRuvAs0hv2b0cjjPdLk4XO7ir/r9aR/9OmNRbjWaNsVSNYeTuoqDXFknYB01w5kKiXAQjpVNiVyoUI8WD4hl4I4c+l9H5wMhpyNuQf3w0O36/i2CSvyDbZIZzskUPygRyTCZHkivwgvyISXUc/4zhev22No5XnJblT8dZvnGG07g==</latexit>

=KX

j=1

w>j ⌃wj

<latexit sha1_base64="yR0tj84VyWvNrIFWMMlQ+7zX+CI=">AAACHXicdVBNSwMxEM36bf2qevQSLIKnkpSi9VAoehG8KFotdNslm2bbtMnukmSVsvSPePGvePGgiAcv4r8xqxWs6IOBx3szzMzzY8G1QejdmZqemZ2bX1jMLS2vrK7l1zcudZQoyuo0EpFq+EQzwUNWN9wI1ogVI9IX7MofHGX+1TVTmkfhhRnGrCVJN+QBp8RYycuXq9DVifTSfhWP2ifQlcT0/CC9GXn9tmuiGLrnvCvJhOHlC6iIEMIYw4zg/T1kycFBpYQrEGeWRQGMcerlX91ORBPJQkMF0bqJUWxaKVGGU8FGOTfRLCZ0QLqsaWlIJNOt9PO7EdyxSgcGkbIVGvip/pxIidR6KH3bmd2of3uZ+JfXTExQaaU8jBPDQvq1KEgENBHMooIdrhg1YmgJoYrbWyHtEUWosYHmbAjfn8L/yWWpiFERn5ULtcNxHAtgC2yDXYDBPqiBY3AK6oCCW3APHsGTc+c8OM/Oy1frlDOe2QQTcN4+ABWTopU=</latexit><latexit sha1_base64="yR0tj84VyWvNrIFWMMlQ+7zX+CI=">AAACHXicdVBNSwMxEM36bf2qevQSLIKnkpSi9VAoehG8KFotdNslm2bbtMnukmSVsvSPePGvePGgiAcv4r8xqxWs6IOBx3szzMzzY8G1QejdmZqemZ2bX1jMLS2vrK7l1zcudZQoyuo0EpFq+EQzwUNWN9wI1ogVI9IX7MofHGX+1TVTmkfhhRnGrCVJN+QBp8RYycuXq9DVifTSfhWP2ifQlcT0/CC9GXn9tmuiGLrnvCvJhOHlC6iIEMIYw4zg/T1kycFBpYQrEGeWRQGMcerlX91ORBPJQkMF0bqJUWxaKVGGU8FGOTfRLCZ0QLqsaWlIJNOt9PO7EdyxSgcGkbIVGvip/pxIidR6KH3bmd2of3uZ+JfXTExQaaU8jBPDQvq1KEgENBHMooIdrhg1YmgJoYrbWyHtEUWosYHmbAjfn8L/yWWpiFERn5ULtcNxHAtgC2yDXYDBPqiBY3AK6oCCW3APHsGTc+c8OM/Oy1frlDOe2QQTcN4+ABWTopU=</latexit><latexit sha1_base64="yR0tj84VyWvNrIFWMMlQ+7zX+CI=">AAACHXicdVBNSwMxEM36bf2qevQSLIKnkpSi9VAoehG8KFotdNslm2bbtMnukmSVsvSPePGvePGgiAcv4r8xqxWs6IOBx3szzMzzY8G1QejdmZqemZ2bX1jMLS2vrK7l1zcudZQoyuo0EpFq+EQzwUNWN9wI1ogVI9IX7MofHGX+1TVTmkfhhRnGrCVJN+QBp8RYycuXq9DVifTSfhWP2ifQlcT0/CC9GXn9tmuiGLrnvCvJhOHlC6iIEMIYw4zg/T1kycFBpYQrEGeWRQGMcerlX91ORBPJQkMF0bqJUWxaKVGGU8FGOTfRLCZ0QLqsaWlIJNOt9PO7EdyxSgcGkbIVGvip/pxIidR6KH3bmd2of3uZ+JfXTExQaaU8jBPDQvq1KEgENBHMooIdrhg1YmgJoYrbWyHtEUWosYHmbAjfn8L/yWWpiFERn5ULtcNxHAtgC2yDXYDBPqiBY3AK6oCCW3APHsGTc+c8OM/Oy1frlDOe2QQTcN4+ABWTopU=</latexit><latexit sha1_base64="yR0tj84VyWvNrIFWMMlQ+7zX+CI=">AAACHXicdVBNSwMxEM36bf2qevQSLIKnkpSi9VAoehG8KFotdNslm2bbtMnukmSVsvSPePGvePGgiAcv4r8xqxWs6IOBx3szzMzzY8G1QejdmZqemZ2bX1jMLS2vrK7l1zcudZQoyuo0EpFq+EQzwUNWN9wI1ogVI9IX7MofHGX+1TVTmkfhhRnGrCVJN+QBp8RYycuXq9DVifTSfhWP2ifQlcT0/CC9GXn9tmuiGLrnvCvJhOHlC6iIEMIYw4zg/T1kycFBpYQrEGeWRQGMcerlX91ORBPJQkMF0bqJUWxaKVGGU8FGOTfRLCZ0QLqsaWlIJNOt9PO7EdyxSgcGkbIVGvip/pxIidR6KH3bmd2of3uZ+JfXTExQaaU8jBPDQvq1KEgENBHMooIdrhg1YmgJoYrbWyHtEUWosYHmbAjfn8L/yWWpiFERn5ULtcNxHAtgC2yDXYDBPqiBY3AK6oCCW3APHsGTc+c8OM/Oy1frlDOe2QQTcN4+ABWTopU=</latexit>

MaximizeKX

j=1

w>j ⌃wj

s.t. 8j, kwjk22 = 1 & wj ? wi<latexit sha1_base64="9znnfXwqIpAuOh3YSN/2Etryh8A=">AAACi3icdVFdaxNBFJ1dv2K0muqjLxeDxQdZdlIxabFQFEEQoaJpC5lkmZ3MppPO7A4zs2rcbn6MP8k3/42zaQqN6IWBwznnfsy9qZbCujj+HYQ3bt66fad1t33v/taDh53tR8e2KA3jQ1bIwpym1HIpcj50wkl+qg2nKpX8JD1/2+gnX7mxosi/uIXmY0VnucgEo85TSecnUdSdGVV9pN+FEj/4EuodILZUSTU/wPXkA6wcaVZ9q5P5hLhCA/ksZopuCIS0ryrZyEXLpkhWGColzF8AudjwXiS9SQ8OAMOS7Cw36gDR3OjrlEg63TiK4xhjDA3A/VexB3t7gx4eAG4kH120jqOk84tMC1YqnjsmqbUjHGs3rqhxgklet0lpuabsnM74yMOcKm7H1WqXNTzzzBT86P7lDlbs9YyKKmsXKvXOZkb7t9aQ/9JGpcsG40rkunQ8Z5eNslKCK6A5DEyF4czJhQeUGeFnBXZGDWXOn6/tl3D1U/g/OO5FOI7wp5fdwzfrdbTQE/QUPUcY9dEheo+O0BCxoBVEQT8YhFvhbrgfvr60hsE65zHaiPDdH+vexos=</latexit><latexit sha1_base64="9znnfXwqIpAuOh3YSN/2Etryh8A=">AAACi3icdVFdaxNBFJ1dv2K0muqjLxeDxQdZdlIxabFQFEEQoaJpC5lkmZ3MppPO7A4zs2rcbn6MP8k3/42zaQqN6IWBwznnfsy9qZbCujj+HYQ3bt66fad1t33v/taDh53tR8e2KA3jQ1bIwpym1HIpcj50wkl+qg2nKpX8JD1/2+gnX7mxosi/uIXmY0VnucgEo85TSecnUdSdGVV9pN+FEj/4EuodILZUSTU/wPXkA6wcaVZ9q5P5hLhCA/ksZopuCIS0ryrZyEXLpkhWGColzF8AudjwXiS9SQ8OAMOS7Cw36gDR3OjrlEg63TiK4xhjDA3A/VexB3t7gx4eAG4kH120jqOk84tMC1YqnjsmqbUjHGs3rqhxgklet0lpuabsnM74yMOcKm7H1WqXNTzzzBT86P7lDlbs9YyKKmsXKvXOZkb7t9aQ/9JGpcsG40rkunQ8Z5eNslKCK6A5DEyF4czJhQeUGeFnBXZGDWXOn6/tl3D1U/g/OO5FOI7wp5fdwzfrdbTQE/QUPUcY9dEheo+O0BCxoBVEQT8YhFvhbrgfvr60hsE65zHaiPDdH+vexos=</latexit><latexit sha1_base64="9znnfXwqIpAuOh3YSN/2Etryh8A=">AAACi3icdVFdaxNBFJ1dv2K0muqjLxeDxQdZdlIxabFQFEEQoaJpC5lkmZ3MppPO7A4zs2rcbn6MP8k3/42zaQqN6IWBwznnfsy9qZbCujj+HYQ3bt66fad1t33v/taDh53tR8e2KA3jQ1bIwpym1HIpcj50wkl+qg2nKpX8JD1/2+gnX7mxosi/uIXmY0VnucgEo85TSecnUdSdGVV9pN+FEj/4EuodILZUSTU/wPXkA6wcaVZ9q5P5hLhCA/ksZopuCIS0ryrZyEXLpkhWGColzF8AudjwXiS9SQ8OAMOS7Cw36gDR3OjrlEg63TiK4xhjDA3A/VexB3t7gx4eAG4kH120jqOk84tMC1YqnjsmqbUjHGs3rqhxgklet0lpuabsnM74yMOcKm7H1WqXNTzzzBT86P7lDlbs9YyKKmsXKvXOZkb7t9aQ/9JGpcsG40rkunQ8Z5eNslKCK6A5DEyF4czJhQeUGeFnBXZGDWXOn6/tl3D1U/g/OO5FOI7wp5fdwzfrdbTQE/QUPUcY9dEheo+O0BCxoBVEQT8YhFvhbrgfvr60hsE65zHaiPDdH+vexos=</latexit><latexit sha1_base64="9znnfXwqIpAuOh3YSN/2Etryh8A=">AAACi3icdVFdaxNBFJ1dv2K0muqjLxeDxQdZdlIxabFQFEEQoaJpC5lkmZ3MppPO7A4zs2rcbn6MP8k3/42zaQqN6IWBwznnfsy9qZbCujj+HYQ3bt66fad1t33v/taDh53tR8e2KA3jQ1bIwpym1HIpcj50wkl+qg2nKpX8JD1/2+gnX7mxosi/uIXmY0VnucgEo85TSecnUdSdGVV9pN+FEj/4EuodILZUSTU/wPXkA6wcaVZ9q5P5hLhCA/ksZopuCIS0ryrZyEXLpkhWGColzF8AudjwXiS9SQ8OAMOS7Cw36gDR3OjrlEg63TiK4xhjDA3A/VexB3t7gx4eAG4kH120jqOk84tMC1YqnjsmqbUjHGs3rqhxgklet0lpuabsnM74yMOcKm7H1WqXNTzzzBT86P7lDlbs9YyKKmsXKvXOZkb7t9aQ/9JGpcsG40rkunQ8Z5eNslKCK6A5DEyF4czJhQeUGeFnBXZGDWXOn6/tl3D1U/g/OO5FOI7wp5fdwzfrdbTQE/QUPUcY9dEheo+O0BCxoBVEQT8YhFvhbrgfvr60hsE65zHaiPDdH+vexos=</latexit>

Page 5: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PRINCIPAL COMPONENT ANALYSIS (PCA)

Eigen Face:

Write down each data point as a linear combination of smallnumber of basis vectors

Data specific compression scheme

One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space

Turk & Pentland’91

μ = Mean face

w1 w2 w3

yt[1] = yt[2] = yt[3] =

xt

Page 6: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PRINCIPAL COMPONENT ANALYSIS (PCA)

Eigen Face:

Write down each data point as a linear combination of smallnumber of basis vectors

Data specific compression scheme

One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space

Turk & Pentland’91

• Each xt (each row of X) is a face image (vectorized version)

μ = Mean face

w1 w2 w3

yt[1] = yt[2] = yt[3] =

xt

Page 7: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PRINCIPAL COMPONENT ANALYSIS (PCA)

Eigen Face:

Write down each data point as a linear combination of smallnumber of basis vectors

Data specific compression scheme

One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space

Turk & Pentland’91

• Each xt (each row of X) is a face image (vectorized version)

• Each yt is the set of coefficients we multiply to the eigen face

μ = Mean face

w1 w2 w3

yt[1] = yt[2] = yt[3] =

xt

Page 8: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PRINCIPAL COMPONENT ANALYSIS (PCA)

Eigen Face:

Write down each data point as a linear combination of smallnumber of basis vectors

Data specific compression scheme

One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space

Turk & Pentland’91

• Each xt (each row of X) is a face image (vectorized version)

• Each yt is the set of coefficients we multiply to the eigen face

• wi’s are orthogonal to each other and of unit length

μ = Mean face

w1 w2 w3

yt[1] = yt[2] = yt[3] =

xt

Page 9: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

ORTHONORMAL PROJECTIONS

(Centered) Data-points as linear combination of someorthonormal basis, i.e.

xt = µ + d�j=1

yt[j]wj

where w1, . . . ,wd ∈ Rd are the orthonormal basis and µ = 1n ∑n

t=1 xt.Represent data as linear combination of just K orthonormal basis,

xt = µ + K�j=1

yt[j]wj + µ

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 10: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

ORTHONORMAL PROJECTIONS

(Centered) Data-points as linear combination of someorthonormal basis, i.e.

xt = µ + d�j=1

yt[j]wj

where w1, . . . ,wd ∈ Rd are the orthonormal basis and µ = 1n ∑n

t=1 xt.Represent data as linear combination of just K orthonormal basis,

xt = µ + K�j=1

yt[j]wj + µ

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 11: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

ORTHONORMAL PROJECTIONS

(Centered) Data-points as linear combination of someorthonormal basis, i.e.

xt = µ + d�j=1

yt[j]wj

where w1, . . . ,wd ∈ Rd are the orthonormal basis and µ = 1n ∑n

t=1 xt.Represent data as linear combination of just K orthonormal basis,

xt = µ + K�j=1

yt[j]wj + µ

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 12: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 13: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 14: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 15: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 16: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 17: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 18: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 19: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 20: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Page 21: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Page 22: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Page 23: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Page 24: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Page 25: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

Page 26: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

Goal: find the basis that minimizes reconstruction error,

n�t=1�xt − xt�22 = n�

t=1

������������K�

j=1yt[j]wj + µ − xt

������������2

2

= n�t=1

������������K�

j=1yt[j]wj + µ − d�

j=1yt[j]wj − µ

������������2

2

= n�t=1

������������d�

j=K+1yt[j]wj

������������2

2

(but �a + b�22 = �a�22 + �b�22 + 2a�b)

= n�t=1

��

d�j=K+1

yt[j]2�wj�22 + 2d�

j=K+1

d�i=j+1

yt[j]yt[i]w�j wi��

= n�t=1

d�j=k+1

yt[j]2�wj�22 (last step because wj ⊥wi)

MinimizedX

j=K+1

w>j ⌃wj

s.t. 8j, kwjk22 = 1 & wj ? wi<latexit sha1_base64="OMY48mNzMpj0YqslzE376szcVHk=">AAACjXicdVFdb9MwFHXC1wgfK/DIyxUVExIositgLWJoggeQENIQdJtUt5HjOp07O4lsB1Sy9Nfwi3jj3+B0nbQiuJKlo3PO/fC9aamkdRj/DsIrV69dv7F1M7p1+87d7c69+4e2qAwXQ16owhynzAolczF00ilxXBrBdKrEUXr6rtWPvgljZZF/dYtSjDWb5TKTnDlPJZ2fVDN3YnT9SeZSyx9iCc0OUFvppJ7vfXxKmskUVp40q783yXxCXVEC/SJnmm0IlEYXtWzs4mVbJisMUwrmz4CebXjPkt6kB3tAYEl3lht1gJbClJcpmXS6OMYYE0KgBWT3JfZgMOj3SB9IK/noonUcJJ1fdFrwSovcccWsHRFcunHNjJNciSailRUl46dsJkYe5kwLO65X22zgsWem4Ef3L3ewYi9n1Exbu9Cpd7Yz2r+1lvyXNqpc1h/XMi8rJ3J+3iirFLgC2tPAVBrBnVp4wLiRflbgJ8ww7vwBI7+Ei5/C/8FhLyY4Jp+fd/ffrtexhR6iR+gJImgX7aMP6AANEQ+iAAeD4FW4Hb4IX4dvzq1hsM55gDYifP8HbbLHLA==</latexit><latexit sha1_base64="OMY48mNzMpj0YqslzE376szcVHk=">AAACjXicdVFdb9MwFHXC1wgfK/DIyxUVExIositgLWJoggeQENIQdJtUt5HjOp07O4lsB1Sy9Nfwi3jj3+B0nbQiuJKlo3PO/fC9aamkdRj/DsIrV69dv7F1M7p1+87d7c69+4e2qAwXQ16owhynzAolczF00ilxXBrBdKrEUXr6rtWPvgljZZF/dYtSjDWb5TKTnDlPJZ2fVDN3YnT9SeZSyx9iCc0OUFvppJ7vfXxKmskUVp40q783yXxCXVEC/SJnmm0IlEYXtWzs4mVbJisMUwrmz4CebXjPkt6kB3tAYEl3lht1gJbClJcpmXS6OMYYE0KgBWT3JfZgMOj3SB9IK/noonUcJJ1fdFrwSovcccWsHRFcunHNjJNciSailRUl46dsJkYe5kwLO65X22zgsWem4Ef3L3ewYi9n1Exbu9Cpd7Yz2r+1lvyXNqpc1h/XMi8rJ3J+3iirFLgC2tPAVBrBnVp4wLiRflbgJ8ww7vwBI7+Ei5/C/8FhLyY4Jp+fd/ffrtexhR6iR+gJImgX7aMP6AANEQ+iAAeD4FW4Hb4IX4dvzq1hsM55gDYifP8HbbLHLA==</latexit><latexit sha1_base64="OMY48mNzMpj0YqslzE376szcVHk=">AAACjXicdVFdb9MwFHXC1wgfK/DIyxUVExIositgLWJoggeQENIQdJtUt5HjOp07O4lsB1Sy9Nfwi3jj3+B0nbQiuJKlo3PO/fC9aamkdRj/DsIrV69dv7F1M7p1+87d7c69+4e2qAwXQ16owhynzAolczF00ilxXBrBdKrEUXr6rtWPvgljZZF/dYtSjDWb5TKTnDlPJZ2fVDN3YnT9SeZSyx9iCc0OUFvppJ7vfXxKmskUVp40q783yXxCXVEC/SJnmm0IlEYXtWzs4mVbJisMUwrmz4CebXjPkt6kB3tAYEl3lht1gJbClJcpmXS6OMYYE0KgBWT3JfZgMOj3SB9IK/noonUcJJ1fdFrwSovcccWsHRFcunHNjJNciSailRUl46dsJkYe5kwLO65X22zgsWem4Ef3L3ewYi9n1Exbu9Cpd7Yz2r+1lvyXNqpc1h/XMi8rJ3J+3iirFLgC2tPAVBrBnVp4wLiRflbgJ8ww7vwBI7+Ei5/C/8FhLyY4Jp+fd/ffrtexhR6iR+gJImgX7aMP6AANEQ+iAAeD4FW4Hb4IX4dvzq1hsM55gDYifP8HbbLHLA==</latexit><latexit sha1_base64="OMY48mNzMpj0YqslzE376szcVHk=">AAACjXicdVFdb9MwFHXC1wgfK/DIyxUVExIositgLWJoggeQENIQdJtUt5HjOp07O4lsB1Sy9Nfwi3jj3+B0nbQiuJKlo3PO/fC9aamkdRj/DsIrV69dv7F1M7p1+87d7c69+4e2qAwXQ16owhynzAolczF00ilxXBrBdKrEUXr6rtWPvgljZZF/dYtSjDWb5TKTnDlPJZ2fVDN3YnT9SeZSyx9iCc0OUFvppJ7vfXxKmskUVp40q783yXxCXVEC/SJnmm0IlEYXtWzs4mVbJisMUwrmz4CebXjPkt6kB3tAYEl3lht1gJbClJcpmXS6OMYYE0KgBWT3JfZgMOj3SB9IK/noonUcJJ1fdFrwSovcccWsHRFcunHNjJNciSailRUl46dsJkYe5kwLO65X22zgsWem4Ef3L3ewYi9n1Exbu9Cpd7Yz2r+1lvyXNqpc1h/XMi8rJ3J+3iirFLgC2tPAVBrBnVp4wLiRflbgJ8ww7vwBI7+Ei5/C/8FhLyY4Jp+fd/ffrtexhR6iR+gJImgX7aMP6AANEQ+iAAeD4FW4Hb4IX4dvzq1hsM55gDYifP8HbbLHLA==</latexit>

Page 27: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n
Page 28: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Maximize Total Spread

1

n

nX

t=1

dist2 yt,

1

n

nX

t=1

yt

!

<latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit>

Page 29: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Maximize Total Spread

1

n

nX

t=1

dist2 yt,

1

n

nX

t=1

yt

!

<latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit>

MaximizeKX

j=1

w>j ⌃wj

s.t. 8j, kwjk22 = 1 & wj ? wi<latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit>

()

<latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit>

Page 30: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Maximize Total Spread Minimize Reconstruction Error

1

n

nX

t=1

dist2 yt,

1

n

nX

t=1

yt

!

<latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit>

1

n

nX

t=1

kxt � xtk22<latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit><latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit><latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit><latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit>

MaximizeKX

j=1

w>j ⌃wj

s.t. 8j, kwjk22 = 1 & wj ? wi<latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit>

()

<latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit>

Page 31: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Maximize Total Spread Minimize Reconstruction Error

1

n

nX

t=1

dist2 yt,

1

n

nX

t=1

yt

!

<latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit><latexit sha1_base64="e1/rvfAR+WciZpMt99IJzMTpcNM=">AAACQHicdVDLSgMxFM34rPVVdekmWAQFKTMi6EYQ3bisYB/QaYdMmmlDk8yQ3BHKMJ/mxk9w59qNC0XcujJ9LLTqgcDhnHvIvSdMBDfguk/O3PzC4tJyYaW4ura+sVna2q6bONWU1WgsYt0MiWGCK1YDDoI1E82IDAVrhIOrkd+4Y9rwWN3CMGFtSXqKR5wSsFJQaviRJjTz8kzl2DepDDI49/KOwr4k0Ncy69ol8s6xL1gEB8MAjvC/EetiX/NeHw6DUtmtuGPg38SbkjKaohqUHv1uTFPJFFBBjGl5bgLtjGjgVLC86KeGJYQOSI+1LFVEMtPOxgXkeN8qXRzF2j4FeKx+T2REGjOUoZ0cXWVmvZH4l9dKITprZ1wlKTBFJx9FqcAQ41GbuMs1oyCGlhCqud0V0z6x9YDtvGhL8GZP/k3qxxXPrXg3J+WLy2kdBbSL9tAB8tApukDXqIpqiKJ79Ixe0Zvz4Lw4787HZHTOmWZ20A84n1/BwLEC</latexit>

1

n

nX

t=1

kxt � xtk22<latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit><latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit><latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit><latexit sha1_base64="z1FiZF6Kq1Gyp9iwSlo+5Tlfb28=">AAACK3icbZDLSgMxFIYzXmu9VV26CRbBjWWmCLoRpG5cVrBV6LRDJs3YYCYzJGfEEud93PgqLnThBbe+h5laUKs/BH6+cw455w9TwTW47qszNT0zOzdfWigvLi2vrFbW1ts6yRRlLZqIRF2ERDPBJWsBB8EuUsVIHAp2Hl4dF/Xza6Y0T+QZDFPWjcml5BGnBCwKKg0/UoQaLzcyx77O4sDAoZf3JPZv/ZjAIIzMTR4A3sX+gID5Zhb6t0G9Vw8qVbfmjoT/Gm9sqmisZlB59PsJzWImgQqidcdzU+gaooBTwfKyn2mWEnpFLlnHWkliprtmdGuOty3p4yhR9knAI/pzwpBY62Ec2s5iVT1ZK+B/tU4G0UHXcJlmwCT9+ijKBIYEF8HhPleMghhaQ6jidldMB8SGBzbesg3Bmzz5r2nXa55b8073qkeNcRwltIm20A7y0D46QieoiVqIojv0gJ7Ri3PvPDlvzvtX65QzntlAv+R8fALndKit</latexit>

MaximizeKX

j=1

w>j ⌃wj

s.t. 8j, kwjk22 = 1 & wj ? wi<latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit><latexit sha1_base64="i8hQjWLAcH3Nh2WjFcyda5/AvXU=">AAACi3icbVFdaxNBFJ1dv9JoNdrHvlwMFh9k2U2FFrFQFEEQoaJpC5lkmZ3cTSed2R1mZtV0u/kx/iTf/DdO0iBN64WBwznnfsy9mZbCujj+E4R37t67/6C10X74aPPxk87TZ8e2rAzHPi9laU4zZlGKAvtOOImn2iBTmcST7Pz9Qj/5jsaKsvjmZhqHik0KkQvOnKfSzq8dqpg7M6r+zH4KJS5wDg1QW6m0nh4kzegTLA1ZXv9o0umIulID/Somiq0JlLb/VbKRi+a+SF4aJiVMXwG9XPNepr1RDw4ggTndma/VAarR6OuUSDvdOIqXAbdBsgJdsoqjtPObjkteKSwcl8zaQRJrN6yZcYJLbNq0sqgZP2cTHHhYMIV2WC932cALz4zBj+5f4WDJXs+ombJ2pjLvXMxob2oL8n/aoHL5/rAWha4cFvyqUV5JcCUsDgNjYZA7OfOAcSP8rMDPmGHc+fO1/RKSm1++DY57URJHyZfX3cN3q3W0yDZ5Tl6ShOyRQ/KRHJE+4UEriIK9YD/cDHfDN+HbK2sYrHK2yFqEH/4ClnHGVQ==</latexit>

()

<latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit> ()

<latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit><latexit sha1_base64="9Ebf9Kg0015jEAhnDMZyIJ/H+xE=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVBIR9Fj04sFDBfsBbSib7aZdutkNuxMlhPpXvHhQxKs/xJv/xm2bg7Y+GHi8N8PMvDAR3IDnfTsrq2vrG5ulrfL2zu7evntw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD8fXUbz8wbbiS95AlLIjJUPKIUwJW6ruV3q2SQ8Ei0Hw4AqK1euy7Va/mzYCXiV+QKirQ6LtfvYGiacwkUEGM6fpeAkFONHAq2KTcSw1LCB2TIetaKknMTJDPjp/gE6sMcKS0LQl4pv6eyElsTBaHtjMmMDKL3lT8z+umEF0GOZdJCkzS+aIoFRgUniaBB1wzCiKzhFDN7a2YjogmFGxeZRuCv/jyMmmd1Xyv5t+dV+tXRRwldISO0Sny0QWqoxvUQE1EUYae0St6c56cF+fd+Zi3rjjFTAX9gfP5A3bLlUk=</latexit>

MinimizedX

j=K+1

w>j ⌃wj

s.t. 8j, kwjk22 = 1 & wj ? wi<latexit sha1_base64="dkyLv/HbvJY/QSia7FBehz/XNKc=">AAACjXicbVFdaxQxFM2MVetU7VYffbl0sQjKMLNUtGKl6IOFIlTqtoXN7pDJZrbZJpmQZJR1Ovtr/EV9678xu12k23ohcDjn3I/cm2vBrUuSqyC8t3L/wcPVR9Ha4ydP11sbz45tWRnKurQUpTnNiWWCK9Z13Al2qg0jMhfsJD//MtNPfjJjeal+uIlmfUlGihecEueprPVnC0vizoysv3HFJf/NptAAtpXM6vHuweu0GQxhbsmL+leTjQfYlRrwER9JsiRgHP2rZWMXT32ZojRECBi/AXyx5L3IOoMO7EIKU7w1XaoDWDOjb1I8a7WTOJkH3AXpArTRIg6z1iUelrSSTDkqiLW9NNGuXxPjOBWsiXBlmSb0nIxYz0NFJLP9er7NBl56Zgh+dP+Ugzl7M6Mm0tqJzL1zNqO9rc3I/2m9yhXv+zVXunJM0etGRSXAlTA7DQy5YdSJiQeEGu5nBXpGDKHOHzDyS0hvf/kuOO7EaRKn37fbe58X61hFL9AmeoVS9A7toX10iLqIBlGQBDvBh3A9fBt+DD9dW8NgkfMcLUX49S8YR8b2</latexit><latexit sha1_base64="dkyLv/HbvJY/QSia7FBehz/XNKc=">AAACjXicbVFdaxQxFM2MVetU7VYffbl0sQjKMLNUtGKl6IOFIlTqtoXN7pDJZrbZJpmQZJR1Ovtr/EV9678xu12k23ohcDjn3I/cm2vBrUuSqyC8t3L/wcPVR9Ha4ydP11sbz45tWRnKurQUpTnNiWWCK9Z13Al2qg0jMhfsJD//MtNPfjJjeal+uIlmfUlGihecEueprPVnC0vizoysv3HFJf/NptAAtpXM6vHuweu0GQxhbsmL+leTjQfYlRrwER9JsiRgHP2rZWMXT32ZojRECBi/AXyx5L3IOoMO7EIKU7w1XaoDWDOjb1I8a7WTOJkH3AXpArTRIg6z1iUelrSSTDkqiLW9NNGuXxPjOBWsiXBlmSb0nIxYz0NFJLP9er7NBl56Zgh+dP+Ugzl7M6Mm0tqJzL1zNqO9rc3I/2m9yhXv+zVXunJM0etGRSXAlTA7DQy5YdSJiQeEGu5nBXpGDKHOHzDyS0hvf/kuOO7EaRKn37fbe58X61hFL9AmeoVS9A7toX10iLqIBlGQBDvBh3A9fBt+DD9dW8NgkfMcLUX49S8YR8b2</latexit><latexit sha1_base64="dkyLv/HbvJY/QSia7FBehz/XNKc=">AAACjXicbVFdaxQxFM2MVetU7VYffbl0sQjKMLNUtGKl6IOFIlTqtoXN7pDJZrbZJpmQZJR1Ovtr/EV9678xu12k23ohcDjn3I/cm2vBrUuSqyC8t3L/wcPVR9Ha4ydP11sbz45tWRnKurQUpTnNiWWCK9Z13Al2qg0jMhfsJD//MtNPfjJjeal+uIlmfUlGihecEueprPVnC0vizoysv3HFJf/NptAAtpXM6vHuweu0GQxhbsmL+leTjQfYlRrwER9JsiRgHP2rZWMXT32ZojRECBi/AXyx5L3IOoMO7EIKU7w1XaoDWDOjb1I8a7WTOJkH3AXpArTRIg6z1iUelrSSTDkqiLW9NNGuXxPjOBWsiXBlmSb0nIxYz0NFJLP9er7NBl56Zgh+dP+Ugzl7M6Mm0tqJzL1zNqO9rc3I/2m9yhXv+zVXunJM0etGRSXAlTA7DQy5YdSJiQeEGu5nBXpGDKHOHzDyS0hvf/kuOO7EaRKn37fbe58X61hFL9AmeoVS9A7toX10iLqIBlGQBDvBh3A9fBt+DD9dW8NgkfMcLUX49S8YR8b2</latexit><latexit sha1_base64="dkyLv/HbvJY/QSia7FBehz/XNKc=">AAACjXicbVFdaxQxFM2MVetU7VYffbl0sQjKMLNUtGKl6IOFIlTqtoXN7pDJZrbZJpmQZJR1Ovtr/EV9678xu12k23ohcDjn3I/cm2vBrUuSqyC8t3L/wcPVR9Ha4ydP11sbz45tWRnKurQUpTnNiWWCK9Z13Al2qg0jMhfsJD//MtNPfjJjeal+uIlmfUlGihecEueprPVnC0vizoysv3HFJf/NptAAtpXM6vHuweu0GQxhbsmL+leTjQfYlRrwER9JsiRgHP2rZWMXT32ZojRECBi/AXyx5L3IOoMO7EIKU7w1XaoDWDOjb1I8a7WTOJkH3AXpArTRIg6z1iUelrSSTDkqiLW9NNGuXxPjOBWsiXBlmSb0nIxYz0NFJLP9er7NBl56Zgh+dP+Ugzl7M6Mm0tqJzL1zNqO9rc3I/2m9yhXv+zVXunJM0etGRSXAlTA7DQy5YdSJiQeEGu5nBXpGDKHOHzDyS0hvf/kuOO7EaRKn37fbe58X61hFL9AmeoVS9A7toX10iLqIBlGQBDvBh3A9fBt+DD9dW8NgkfMcLUX49S8YR8b2</latexit>

Page 32: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Claim

Page 33: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

Maximize Total Spread = Minimize Reconstruction Error

Claim

Page 34: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Claim:dX

j=1

w>j ⌃wj = Constant =

1

n

nX

t=1

kxt � µk22<latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit>

Page 35: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Claim:dX

j=1

w>j ⌃wj = Constant =

1

n

nX

t=1

kxt � µk22<latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit>

Recall that:1

n

nX

t=1

kxt � xtk22 =dX

j=K+1

w>j ⌃wj

<latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit><latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit><latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit><latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit>

Page 36: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PCA: MINIMIZING RECONSTRUCTION ERROR

1n

n�t=1�xt − xt�22 = 1

n

n�t=1

d�j=k+1

yt[j]2�wj�22 (but �wj� = 1)= 1

n

n�t=1

d�j=k+1

yt[j]2 (now yj =w�j (xt − µ))= 1

n

n�t=1

d�j=k+1(w�j (xt − µ))2

= 1n

n�t=1

d�j=k+1

w�j (xt − µ)(xt − µ)�wj

= d�j=k+1

w�j ⌃wj

Claim:dX

j=1

w>j ⌃wj = Constant =

1

n

nX

t=1

kxt � µk22<latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit><latexit sha1_base64="FkJtxwBhWY3ip9y16Wt63EzefGU=">AAACb3icdVHLbhMxFPUMrxIeDWXBoghdESHBgmgcIZoiVarIhmURpK2USUYex5O69WNk34FG09nygez4Bzb8AZ42kSiCK1k6Ouf6+N7jvFTSY5L8iOIbN2/dvrNxt3Pv/oOHm91HW4feVo6LMbfKuuOceaGkEWOUqMRx6QTTuRJH+dmo1Y++COelNZ9xWYqpZgsjC8kZBirrfktRnKPT9Ugxqd9BA6mvdFaf7tFmNodUMzzJi/prk53OUrQlpJ/kQrNrAuzB2gVG1nhkBoNRYAvHeE2b2qxtsbU1kF6cZwivg0uVXmSD2SDr9pJ+kiSUUmgB3XmbBLC7OxzQIdBWCtUjqzrIut/TueWVFga5Yt5PaFLitGYOJVei6aSVFyXjZ2whJgEapoWf1pd5NfAiMHMorAsnjHrJ/nmjZtr7pc5DZ7um/1tryX9pkwqL4bSWpqxQGH71UFEpQAtt+DCXTnBUywAYdzLMCvyEhZAwfFEnhLDeFP4PDgd9mvTpxze9/ferODbINnlOXhJKdsg++UAOyJhw8jPairajp9Gv+En8LIar1jha3XlMrlX86jewzryk</latexit>

Recall that:1

n

nX

t=1

kxt � xtk22 =dX

j=K+1

w>j ⌃wj

<latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit><latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit><latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit><latexit sha1_base64="q3UkfRWSVcUTeyIiwbQdLRULGbM=">AAACenicdVHLahsxFNVM+kjdl5suS0HE9EWIGZnQOAVDaDeFbtKHk4BlDxpZYyuRNIN0p41R5iP6a931S7rpoprYfaS0FwRH59yDrs7NSiUdJMnXKF67cvXa9fUbrZu3bt+52763ceiKynIx5IUq7HHGnFDSiCFIUOK4tILpTImj7PRVox99FNbJwnyARSnGms2MzCVnEKi0/ZmCOAOr/TvBmVIY5gxe4JrmlnFPam9q6iqdehiQemIwPaehwVPNYJ7l/qyuU8Db+Pc9BXqe9iY9PMBL48ngzVawTn/1fKrTkwmFosT0vZxpdkmgtJW2O0k3SRJCCG4A2X2eBLC31++RPiaNFKqDVnWQtr/QacErLQxwxZwbkaSEsWcWJFeibtHKiZLxUzYTowAN08KN/UV0NX4UmCnOCxuOAXzB/unwTDu30FnobMZ0f2sN+S9tVEHeH3tpygqE4cuH8ioEXOBmD3gqreCgFgEwbmWYFfM5C6lD2FYTws+f4v+Dw16XJF3ydqez/3IVxzp6gDbRU0TQLtpHr9EBGiKOvkUPo8fRk+h7vBk/i7eWrXG08txHlyre+QEKKcJz</latexit>

Take K = 0 so that xt = µ<latexit sha1_base64="4zaJm2waa1qbJxUUa4sNE2jILEQ=">AAACHXicdVBBSxtBFJ6NrU2j1a099jI0CJ7CjASNh4LUS6GXFIwK2SXMTt6aITO7y8zbYlj2j/TSv9KLh4r00Iv4bzqrEVTaD2b4+L73eO99SaGVQ8Zug9bKi5err9qvO2vrbzY2w7dbJy4vrYSRzHVuzxLhQKsMRqhQw1lhQZhEw2kyP2r8029gncqzY1wUEBtxnqlUSYFemoT9COECramOxRzoF/qRMupyijOBtI78X0VG4CxJq4u6nqD3I1NOwi7rMcY457QhfH+PeXJwMNjlA8oby6NLlhhOwj/RNJelgQylFs6NOSswroRFJTXUnah0UAg5F+cw9jQTBlxc3V1X022vTGmaW/8ypHfq445KGOcWJvGVzaruudeI//LGJaaDuFJZUSJk8n5QWmqK/n4fFZ0qCxL1whMhrfK7UjkTVkj0gXZ8CA+X0v+Tk90eZz3+td89/LSMo03ekw9kh3CyTw7JZzIkIyLJd/KT/CJXwY/gMrgOft+XtoJlzzvyBMHNX98Zoc4=</latexit><latexit sha1_base64="4zaJm2waa1qbJxUUa4sNE2jILEQ=">AAACHXicdVBBSxtBFJ6NrU2j1a099jI0CJ7CjASNh4LUS6GXFIwK2SXMTt6aITO7y8zbYlj2j/TSv9KLh4r00Iv4bzqrEVTaD2b4+L73eO99SaGVQ8Zug9bKi5err9qvO2vrbzY2w7dbJy4vrYSRzHVuzxLhQKsMRqhQw1lhQZhEw2kyP2r8029gncqzY1wUEBtxnqlUSYFemoT9COECramOxRzoF/qRMupyijOBtI78X0VG4CxJq4u6nqD3I1NOwi7rMcY457QhfH+PeXJwMNjlA8oby6NLlhhOwj/RNJelgQylFs6NOSswroRFJTXUnah0UAg5F+cw9jQTBlxc3V1X022vTGmaW/8ypHfq445KGOcWJvGVzaruudeI//LGJaaDuFJZUSJk8n5QWmqK/n4fFZ0qCxL1whMhrfK7UjkTVkj0gXZ8CA+X0v+Tk90eZz3+td89/LSMo03ekw9kh3CyTw7JZzIkIyLJd/KT/CJXwY/gMrgOft+XtoJlzzvyBMHNX98Zoc4=</latexit><latexit sha1_base64="4zaJm2waa1qbJxUUa4sNE2jILEQ=">AAACHXicdVBBSxtBFJ6NrU2j1a099jI0CJ7CjASNh4LUS6GXFIwK2SXMTt6aITO7y8zbYlj2j/TSv9KLh4r00Iv4bzqrEVTaD2b4+L73eO99SaGVQ8Zug9bKi5err9qvO2vrbzY2w7dbJy4vrYSRzHVuzxLhQKsMRqhQw1lhQZhEw2kyP2r8029gncqzY1wUEBtxnqlUSYFemoT9COECramOxRzoF/qRMupyijOBtI78X0VG4CxJq4u6nqD3I1NOwi7rMcY457QhfH+PeXJwMNjlA8oby6NLlhhOwj/RNJelgQylFs6NOSswroRFJTXUnah0UAg5F+cw9jQTBlxc3V1X022vTGmaW/8ypHfq445KGOcWJvGVzaruudeI//LGJaaDuFJZUSJk8n5QWmqK/n4fFZ0qCxL1whMhrfK7UjkTVkj0gXZ8CA+X0v+Tk90eZz3+td89/LSMo03ekw9kh3CyTw7JZzIkIyLJd/KT/CJXwY/gMrgOft+XtoJlzzvyBMHNX98Zoc4=</latexit><latexit sha1_base64="4zaJm2waa1qbJxUUa4sNE2jILEQ=">AAACHXicdVBBSxtBFJ6NrU2j1a099jI0CJ7CjASNh4LUS6GXFIwK2SXMTt6aITO7y8zbYlj2j/TSv9KLh4r00Iv4bzqrEVTaD2b4+L73eO99SaGVQ8Zug9bKi5err9qvO2vrbzY2w7dbJy4vrYSRzHVuzxLhQKsMRqhQw1lhQZhEw2kyP2r8029gncqzY1wUEBtxnqlUSYFemoT9COECramOxRzoF/qRMupyijOBtI78X0VG4CxJq4u6nqD3I1NOwi7rMcY457QhfH+PeXJwMNjlA8oby6NLlhhOwj/RNJelgQylFs6NOSswroRFJTXUnah0UAg5F+cw9jQTBlxc3V1X022vTGmaW/8ypHfq445KGOcWJvGVzaruudeI//LGJaaDuFJZUSJk8n5QWmqK/n4fFZ0qCxL1whMhrfK7UjkTVkj0gXZ8CA+X0v+Tk90eZz3+td89/LSMo03ekw9kh3CyTw7JZzIkIyLJd/KT/CJXwY/gMrgOft+XtoJlzzvyBMHNX98Zoc4=</latexit>

Page 37: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

MinimizedX

j=K+1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@dX

j=1

w>j ⌃wj �

KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@ 1

n

nX

t=1

kxt � µk22 �KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@�KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() MaximizeKX

j=1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

<latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit>

Maximize Total Spread = Minimize Reconstruction Error

Page 38: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

MinimizedX

j=K+1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@dX

j=1

w>j ⌃wj �

KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@ 1

n

nX

t=1

kxt � µk22 �KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@�KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() MaximizeKX

j=1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

<latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit>

Maximize Total Spread = Minimize Reconstruction Error

Page 39: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

MinimizedX

j=K+1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@dX

j=1

w>j ⌃wj �

KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@ 1

n

nX

t=1

kxt � µk22 �KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@�KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() MaximizeKX

j=1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

<latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit>

Maximize Total Spread = Minimize Reconstruction Error

Page 40: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

MinimizedX

j=K+1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@dX

j=1

w>j ⌃wj �

KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@ 1

n

nX

t=1

kxt � µk22 �KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@�KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() MaximizeKX

j=1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

<latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit>

Maximize Total Spread = Minimize Reconstruction Error

Page 41: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

MinimizedX

j=K+1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@dX

j=1

w>j ⌃wj �

KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@ 1

n

nX

t=1

kxt � µk22 �KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() Minimize

0

@�KX

j=1

w>j ⌃wj

1

A s.t. kwjk2 = 1,wj ? wk

() MaximizeKX

j=1

w>j ⌃wj s.t. kwjk2 = 1,wj ? wk

<latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit><latexit sha1_base64="V6CeD8oacDItagUgMbqD7wdBRr8=">AAAFpHic1ZTfb9MwEMe9rYFRfnXwyIuhYhoCqrhCrHuYNMEDSBtSp9Ft0txGjuu0XhMnsi+sJUv/MP4M3vhvcNoO2gHSkBDSTrJ0uu+d/TlbPj8JpQHX/ba0vFJybtxcvVW+fefuvfuVtQeHJk41Fy0eh7E+9pkRoVSiBRJCcZxowSI/FEf+4G2hH30S2shYfYRRItoR6ykZSM7Ahry1lS/rmEYM+jrKPkglI/lZ5JiaNPKy0+3d5yTvdKcJfpCd5d5ph0KcYHogexFbEPB4PKYghmB3MjWo4Zyez+v03KvjbfJisYgmQifzoQGlZYu0F6teKALQstcHpnV89ismLRI2fsD+BerL+aLdqxXRCcqz/98mnvZJA814RvJMXbwPFPQK/zx/mHtQ9BalBUWnfi37vBbMbHjpp1yd9d9BepWqW3NdlxCCC4dsvnats7XVqJMGJoVkrYpm1vQqX2k35mkkFPCQGXNC3ATaGdMgeSjyMk2NSBgfsJ44sa5ikTDtbDJkcvzURro4iLVdCvAkOl+RsciYUeTbzILSXNaK4O+0kxSCRjuTKklBKD49KEhDDDEuJhbuSi04hCPrMK6lZcW8z+xnADvXyvYSLjrFf3YO6zXi1sj+q+rOm9l1rKJH6AnaQARtoh30HjVRC/HS49K7UrO076w7e86B05qmLi/Nah6iBXM63wFqsfU3</latexit>

Maximize Total Spread = Minimize Reconstruction Error

Page 42: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PRINCIPAL COMPONENT ANALYSIS

Eigenvectors of the covariance matrix are the principalcomponents

Top K principal components are the eigenvectors with K largesteigenvalues

Projection = Data × Top Keigenvectors

Reconstruction = Projection × Transpose of top K eigenvectors

Independently discovered by Pearson in 1901 and Hotelling in1933.

⌃ =cov X !

1.

eigs= ⌃ ,K( )W2.

3. Y = W⇥X�µ

Page 43: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

RECONSTRUCTION

4. Y= ⇥bX W>+µ

Page 44: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PRINCIPAL COMPONENT ANALYSIS: DEMO

Page 45: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

Page 46: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

Page 47: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

Page 48: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

U U = ITV V = IT

Page 49: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

U U = ITV V = IT

Page 50: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

U U = ITV V = IT

Page 51: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHEN d >> n

If d >> n then ⌃ is largeBut we only need top K eigen vectors.Idea: use SVD

X − µ = UDV�Then note that, ⌃ = (X − µ)�(X − µ) = VD2V

Hence, matrix V is the same as matrix W got from eigendecomposition of ⌃, eigenvalues are diagonal elements of D2

Alternative algorithm:

[U,V] = SVD(X − µ,K) W = V

U U = ITV V = IT

Page 52: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

The Tall, THE FAT AND THE UGLY

Xn

d

Page 53: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

The Tall, THE FAT AND THE UGLY

Xn

d

X>d

n⇥ = ⌃

d

dn

Page 54: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

The Tall, THE FAT AND THE UGLY

Xn

d

K

Wd = Eigs( )⌃ ,K

X>d

n⇥ = ⌃

d

dn

Page 55: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

THE TALL, the Fat AND THE UGLY

Xn

d

Page 56: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

THE TALL, the Fat AND THE UGLY

Xn

d

SVD(X)

⇥ ⇥

d

n

nU

d VW>

Page 57: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

THE TALL, the Fat AND THE UGLY

Xn

d

SVD(X)

⇥ ⇥

d

n

nU

d VW>

K

Page 58: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

THE TALL, THE FAT AND the Ugly

d and n so large we can’t even store in memoryOnly have time to be linear in size(X) = n × d

I there any hope?

X

Page 59: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PICK A RANDOM W

Page 60: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PICK A RANDOM W

Y = X ⇥

2

666666664

+1 . . . �1�1 . . . +1+1 . . . �1

···

+1 . . . �1

3

777777775

d

K

pK

Page 61: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Consider any vector x ∈ Rd and let y =W�x. Note that

y[j]2 = ��d�

i=1W[i, j] ⋅ x[i]��

2

=�i,i ′(W[i, j] ⋅ x[i]) ⋅ �W[i ′, j] ⋅ x[i ′]�

=�i,i ′�W[i, j] ⋅W[i ′, j]� ⋅ �x[i] ⋅ x[i ′]�

Page 62: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

RANDOM PROJECTION

What does “it works” even mean?

Distances between all pairs of data-points in low dim. projectionis roughly the same as their distances in the high dim. space.

That is, when K is “large enough”, with “high probability”, for allpairs of data points i, j ∈ {1, . . . ,n},

(1 − ✏) �yi − yj�2 ≤ �xi − xj�2 ≤ (1 + ✏) �yi − yj�2

Page 63: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

RANDOM PROJECTION

What does “it works” even mean?

Distances between all pairs of data-points in low dim. projectionis roughly the same as their distances in the high dim. space.

That is, when K is “large enough”, with “high probability”, for allpairs of data points i, j ∈ {1, . . . ,n},

(1 − ✏) �yi − yj�2 ≤ �xi − xj�2 ≤ (1 + ✏) �yi − yj�2

Page 64: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

RANDOM PROJECTION

What does “it works” even mean?

Distances between all pairs of data-points in low dim. projectionis roughly the same as their distances in the high dim. space.

That is, when K is “large enough”, with “high probability”, for allpairs of data points i, j ∈ {1, . . . ,n},

(1 − ✏) �yi − yj�2 ≤ �xi − xj�2 ≤ (1 + ✏) �yi − yj�2

Page 65: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

• Lets start with a one dimensional projection (K = 1)

• What is the expected value of:

1. yt � ys?

2. (yt � ys)2?

<latexit sha1_base64="DoOoJHWiQBgHTwtYKIInIMX1Knw=">AAACFHicdZDLSgMxFIYzXmu9VV26CRZLRTokRWy7atGNywr2Am0tmTTThmYuJBmhlPoObnwVNy4UcevCnW9jpq2gogcSfr7/HJLzO6HgSiP0YS0sLi2vrCbWkusbm1vbqZ3dugoiSVmNBiKQTYcoJrjPapprwZqhZMRzBGs4w/PYb9wwqXjgX+lRyDoe6fvc5ZRog7qp4wy2b03BUVfDnLkVLLfbyUx+RrMG5ww8us7DcjeVRjZCCGMMY4ELp8iIUqmYx0WIY8tUGsyr2k29t3sBjTzmayqIUi2MQt0ZE6k5FWySbEeKhYQOSZ+1jPSJx1RnPF1qAg8N6UE3kOb4Gk7p94kx8ZQaeY7p9IgeqN9eDP/yWpF2i50x98NIM5/OHnIjAXUA44Rgj0tGtRgZQajk5q+QDogkVJsckyaEr03h/6KetzGy8eVJunI2jyMB9sEByAIMCqACLkAV1AAFd+ABPIFn6956tF6s11nrgjWf2QM/ynr7BHYgm/M=</latexit><latexit sha1_base64="DoOoJHWiQBgHTwtYKIInIMX1Knw=">AAACFHicdZDLSgMxFIYzXmu9VV26CRZLRTokRWy7atGNywr2Am0tmTTThmYuJBmhlPoObnwVNy4UcevCnW9jpq2gogcSfr7/HJLzO6HgSiP0YS0sLi2vrCbWkusbm1vbqZ3dugoiSVmNBiKQTYcoJrjPapprwZqhZMRzBGs4w/PYb9wwqXjgX+lRyDoe6fvc5ZRog7qp4wy2b03BUVfDnLkVLLfbyUx+RrMG5ww8us7DcjeVRjZCCGMMY4ELp8iIUqmYx0WIY8tUGsyr2k29t3sBjTzmayqIUi2MQt0ZE6k5FWySbEeKhYQOSZ+1jPSJx1RnPF1qAg8N6UE3kOb4Gk7p94kx8ZQaeY7p9IgeqN9eDP/yWpF2i50x98NIM5/OHnIjAXUA44Rgj0tGtRgZQajk5q+QDogkVJsckyaEr03h/6KetzGy8eVJunI2jyMB9sEByAIMCqACLkAV1AAFd+ABPIFn6956tF6s11nrgjWf2QM/ynr7BHYgm/M=</latexit><latexit sha1_base64="DoOoJHWiQBgHTwtYKIInIMX1Knw=">AAACFHicdZDLSgMxFIYzXmu9VV26CRZLRTokRWy7atGNywr2Am0tmTTThmYuJBmhlPoObnwVNy4UcevCnW9jpq2gogcSfr7/HJLzO6HgSiP0YS0sLi2vrCbWkusbm1vbqZ3dugoiSVmNBiKQTYcoJrjPapprwZqhZMRzBGs4w/PYb9wwqXjgX+lRyDoe6fvc5ZRog7qp4wy2b03BUVfDnLkVLLfbyUx+RrMG5ww8us7DcjeVRjZCCGMMY4ELp8iIUqmYx0WIY8tUGsyr2k29t3sBjTzmayqIUi2MQt0ZE6k5FWySbEeKhYQOSZ+1jPSJx1RnPF1qAg8N6UE3kOb4Gk7p94kx8ZQaeY7p9IgeqN9eDP/yWpF2i50x98NIM5/OHnIjAXUA44Rgj0tGtRgZQajk5q+QDogkVJsckyaEr03h/6KetzGy8eVJunI2jyMB9sEByAIMCqACLkAV1AAFd+ABPIFn6956tF6s11nrgjWf2QM/ynr7BHYgm/M=</latexit><latexit sha1_base64="DoOoJHWiQBgHTwtYKIInIMX1Knw=">AAACFHicdZDLSgMxFIYzXmu9VV26CRZLRTokRWy7atGNywr2Am0tmTTThmYuJBmhlPoObnwVNy4UcevCnW9jpq2gogcSfr7/HJLzO6HgSiP0YS0sLi2vrCbWkusbm1vbqZ3dugoiSVmNBiKQTYcoJrjPapprwZqhZMRzBGs4w/PYb9wwqXjgX+lRyDoe6fvc5ZRog7qp4wy2b03BUVfDnLkVLLfbyUx+RrMG5ww8us7DcjeVRjZCCGMMY4ELp8iIUqmYx0WIY8tUGsyr2k29t3sBjTzmayqIUi2MQt0ZE6k5FWySbEeKhYQOSZ+1jPSJx1RnPF1qAg8N6UE3kOb4Gk7p94kx8ZQaeY7p9IgeqN9eDP/yWpF2i50x98NIM5/OHnIjAXUA44Rgj0tGtRgZQajk5q+QDogkVJsckyaEr03h/6KetzGy8eVJunI2jyMB9sEByAIMCqACLkAV1AAFd+ABPIFn6956tF6s11nrgjWf2QM/ynr7BHYgm/M=</latexit>

yt = x>t u where each u[i] = random± 1

<latexit sha1_base64="ir6bOzIv9gOL8FEF+OFcpR4OKWo=">AAACSnicdVBNbxMxFPSGAm34SuHI5YkIiVNkV6hND5UquPRYJNJWyi4rr/O2sbr2ruy30Gi1+XtcOPXGj+DCAYR6wduGT8FIlkYz8+znyapCe+L8Y9S7sXbz1u31jf6du/fuPxhsPjzyZe0UTlRZlO4kkx4LbXFCmgo8qRxKkxV4nJ297Pzjt+i8Lu1rWlSYGHlqda6VpCClA7lICfYgNpLmWd6ctym9iamsfip1C8vlEmLCc3KmeTdHh4BSzaH9FZnqpLtklQEn7aw0LcSVAZEOhnzEORdCQEfEzjYPZHd3vCXGIDorYMhWOEwHF/GsVLVBS6qQ3k8FryhppCOtCmz7ce2xkupMnuI0UCsN+qS5qqKFp0GZQV66cCzBlfr7RCON9wuThWS3vf/b68R/edOa8nHSaFvVhFZdP5TXBVAJXa8w0w4VFYtApHI67ApqLp1UFNrvhxJ+/BT+T462RoKPxKvnw/0XqzrW2WP2hD1jgu2wfXbADtmEKfaefWJf2NfoQ/Q5+hZdXkd70WrmEfsDvbXvDzy0qA==</latexit><latexit sha1_base64="ir6bOzIv9gOL8FEF+OFcpR4OKWo=">AAACSnicdVBNbxMxFPSGAm34SuHI5YkIiVNkV6hND5UquPRYJNJWyi4rr/O2sbr2ruy30Gi1+XtcOPXGj+DCAYR6wduGT8FIlkYz8+znyapCe+L8Y9S7sXbz1u31jf6du/fuPxhsPjzyZe0UTlRZlO4kkx4LbXFCmgo8qRxKkxV4nJ297Pzjt+i8Lu1rWlSYGHlqda6VpCClA7lICfYgNpLmWd6ctym9iamsfip1C8vlEmLCc3KmeTdHh4BSzaH9FZnqpLtklQEn7aw0LcSVAZEOhnzEORdCQEfEzjYPZHd3vCXGIDorYMhWOEwHF/GsVLVBS6qQ3k8FryhppCOtCmz7ce2xkupMnuI0UCsN+qS5qqKFp0GZQV66cCzBlfr7RCON9wuThWS3vf/b68R/edOa8nHSaFvVhFZdP5TXBVAJXa8w0w4VFYtApHI67ApqLp1UFNrvhxJ+/BT+T462RoKPxKvnw/0XqzrW2WP2hD1jgu2wfXbADtmEKfaefWJf2NfoQ/Q5+hZdXkd70WrmEfsDvbXvDzy0qA==</latexit><latexit sha1_base64="ir6bOzIv9gOL8FEF+OFcpR4OKWo=">AAACSnicdVBNbxMxFPSGAm34SuHI5YkIiVNkV6hND5UquPRYJNJWyi4rr/O2sbr2ruy30Gi1+XtcOPXGj+DCAYR6wduGT8FIlkYz8+znyapCe+L8Y9S7sXbz1u31jf6du/fuPxhsPjzyZe0UTlRZlO4kkx4LbXFCmgo8qRxKkxV4nJ297Pzjt+i8Lu1rWlSYGHlqda6VpCClA7lICfYgNpLmWd6ctym9iamsfip1C8vlEmLCc3KmeTdHh4BSzaH9FZnqpLtklQEn7aw0LcSVAZEOhnzEORdCQEfEzjYPZHd3vCXGIDorYMhWOEwHF/GsVLVBS6qQ3k8FryhppCOtCmz7ce2xkupMnuI0UCsN+qS5qqKFp0GZQV66cCzBlfr7RCON9wuThWS3vf/b68R/edOa8nHSaFvVhFZdP5TXBVAJXa8w0w4VFYtApHI67ApqLp1UFNrvhxJ+/BT+T462RoKPxKvnw/0XqzrW2WP2hD1jgu2wfXbADtmEKfaefWJf2NfoQ/Q5+hZdXkd70WrmEfsDvbXvDzy0qA==</latexit><latexit sha1_base64="ir6bOzIv9gOL8FEF+OFcpR4OKWo=">AAACSnicdVBNbxMxFPSGAm34SuHI5YkIiVNkV6hND5UquPRYJNJWyi4rr/O2sbr2ruy30Gi1+XtcOPXGj+DCAYR6wduGT8FIlkYz8+znyapCe+L8Y9S7sXbz1u31jf6du/fuPxhsPjzyZe0UTlRZlO4kkx4LbXFCmgo8qRxKkxV4nJ297Pzjt+i8Lu1rWlSYGHlqda6VpCClA7lICfYgNpLmWd6ctym9iamsfip1C8vlEmLCc3KmeTdHh4BSzaH9FZnqpLtklQEn7aw0LcSVAZEOhnzEORdCQEfEzjYPZHd3vCXGIDorYMhWOEwHF/GsVLVBS6qQ3k8FryhppCOtCmz7ce2xkupMnuI0UCsN+qS5qqKFp0GZQV66cCzBlfr7RCON9wuThWS3vf/b68R/edOa8nHSaFvVhFZdP5TXBVAJXa8w0w4VFYtApHI67ApqLp1UFNrvhxJ+/BT+T462RoKPxKvnw/0XqzrW2WP2hD1jgu2wfXbADtmEKfaefWJf2NfoQ/Q5+hZdXkd70WrmEfsDvbXvDzy0qA==</latexit>

Page 66: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Hence,

E��y�2� = �x�22If we let x = xs − xt then

y = xW = xsW − xtW = ys − yt

Hence for any s, t ∈ {1, . . . ,n},E��ys − yt�2� = �xs − xt�22

Lets try this in Matlab . . .…

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

Page 67: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Hence,

E��y�2� = �x�22If we let x = xs − xt then

y = xW = xsW − xtW = ys − yt

Hence for any s, t ∈ {1, . . . ,n},E��ys − yt�2� = �xs − xt�22

Lets try this in Matlab . . .…

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

Law of large numbers says that average over multiple draws is close to expectation

Page 68: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

• Like repeating the experiment K times and averaging

yt[k] = x>t uk/

pK where each uk[i] = random± 1

<latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit>

Page 69: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

• Like repeating the experiment K times and averaging

yt[k] = x>t uk/

pK where each uk[i] = random± 1

<latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit>

(ys[k]� yt[k])2 =

�x>t uk � x>

s uk

�2/K

<latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit>

Page 70: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

• Like repeating the experiment K times and averaging

yt[k] = x>t uk/

pK where each uk[i] = random± 1

<latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit>

(ys[k]� yt[k])2 =

�x>t uk � x>

s uk

�2/K

<latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit>

kyt � ysk22 =KX

k=1

(ys[k]� yt[k])2 =

1

K

KX

k=1

�x>t uk � x>

s uk

�2

<latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit><latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit><latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit><latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit>

Page 71: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

Say K = 1. Consider any vector x ∈ Rd and let y = x W. Note that

y2 = ��d�

i=1W[i,1] ⋅ x[i]��

2

= d�i=1(W[i,1] ⋅ x[i])2 + 2�

i ′>i(W[i,1] ⋅ x[i]) �W[i ′,1] ⋅ x[i ′]�

= d�i=1

W2[i,1]x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

However W2[i,1] = 1�K = 1 when K = 1

= d�i=1

x2[i] +�i ′>i�W[i,1] ⋅W[i ′,1]� ⋅ �x[i] ⋅ x[i ′]�

• Like repeating the experiment K times and averaging

yt[k] = x>t uk/

pK where each uk[i] = random± 1

<latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit><latexit sha1_base64="piWkdkN7kX4Tj0AJldD+oTvTVHQ=">AAACY3icdVFBa9RAFJ5ErXWrNq3eRHi4CJ7WTCl2exCKXgQvFdy2kMQwmX3pDptJ0pkX7RqyP9KbNy/+DyfbrVTRBwMf3/e9N2++yepCWQrD755/6/adjbub9wZb9x883A52dk9s1RiJE1kVlTnLhMVClTghRQWe1QaFzgo8zeZve/30MxqrqvIjLWpMtDgvVa6kIEelwddYC5plebvoUormCbyGa+bSMZ9iqurfTNOlc3gZ2wtD7ftuuVxCTHhJRrdfZmgQUMgZdDfdkVpNXLvAiHJa6Q7iWgNPg2E4CsOQcw494AevQgcOD8d7fAy8l1wN2bqO0+BbPK1ko7EkWQhrIx7WlLTCkJIFdoO4sVgLORfnGDlYCo02aVcZdfDcMVPIK+NOSbBib3a0Qlu70Jlz9vvbv7We/JcWNZSPk1aVdUNYyquL8qYAqqAPHKbKoKRi4YCQRrldQc6EEZLctwxcCNcvhf+Dk70RD0f8w/7w6M06jk32hD1jLxhnB+yIvWPHbMIk++FteNte4P30t/xd//GV1ffWPY/YH+U//QVm1bm9</latexit>

(ys[k]� yt[k])2 =

�x>t uk � x>

s uk

�2/K

<latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit><latexit sha1_base64="Y6QD7CTG3GWQ+5SlaXNAjAosE5o=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYJelHOZRXC3q0I6TAD/Baww45vGkhweYaRnDpfnMSRMGeXFpLuswWct1Bvu7gRk1m4O78OmQsdYOO18vtVnZsvOwt6xsyzSskgGtJ0P8z+3GhouK1tWw3sj/v2OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H7xCBW0=</latexit>

kyt � ysk22 =KX

k=1

(ys[k]� yt[k])2 =

1

K

KX

k=1

�x>t uk � x>

s uk

�2

<latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit><latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit><latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit><latexit sha1_base64="3YP9ejGsZib0Dby1VX1W49YzWpo=">AAADJ3icrVLLbhMxFPUMrxJeKSzZWESVyoJgR4imi0gVbJCyKRJpK8WTkcfxJNZ4HrLvVI2m8zds+BU2SIAQLPkTPE2oEh4SC65k6eicc30fdlRoZYGQb55/5eq16ze2brZu3b5z9157+/6RzUsj5EjkOjcnEbdSq0yOQIGWJ4WRPI20PI6Sl41+fCqNVXn2BhaFDFI+y1SsBAdHhdveYGeXpRzmUVwt6tCOkwA/wWsMOObxpIcHmGkZw6X5zEkTBnlxaS7rMFnLdQb7u4EZNZuDu/DpkLEWO1+vtFnYsvOwtyxsyzSskgGtJ0P8z93GhouK1tWw3sj/v1OE7Q7pEkIopbgBdO85cWB/v9+jfUwbyUUHreIwbH9k01yUqcxAaG7tmJICgoobUELLusVKKwsuEj6TYwcznkobVBfvXOMdx0xxnBt3MsAX7HpGxVNrF2nknE2z9letIf+kjUuI+0GlsqIEmYllobjUGHLcfBo8VUYK0AsHuDDK9YrFnLsFg/taLbeEn5Piv4OjXpeSLn39rHPwYrWOLfQQPUK7iKI9dIBeoUM0QsJ76733Pnmf/Xf+B/+L/3Vp9b1VzgO0Ef73H6BFBW0=</latexit>

This is an average over K trials

Page 72: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PICK A RANDOM W

Page 73: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

PICK A RANDOM W

Y = X ⇥

2

666666664

+1 . . . �1�1 . . . +1+1 . . . �1

···

+1 . . . �1

3

777777775

d

K

pK

Page 74: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

For large K, not only true in expectation but also with high probability

For any ✏ > 0, if K ≈ log (n��) �✏2, with probability 1 − � over draw ofW, for all pairs of data points i, j ∈ {1, . . . ,n},

(1 − ✏) �yi − yj�2 ≤ �xi − xj�2 ≤ (1 + ✏) �yi − yj�2

Lets try on Matlab . . .

This is called the Johnson-Lindenstrauss lemma or JL lemma for short.

2 2

Page 75: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

For large K, not only true in expectation but also with high probability

For any ✏ > 0, if K ≈ log (n��) �✏2, with probability 1 − � over draw ofW, for all pairs of data points i, j ∈ {1, . . . ,n},

(1 − ✏) �yi − yj�2 ≤ �xi − xj�2 ≤ (1 + ✏) �yi − yj�2

Lets try on Matlab . . .

This is called the Johnson-Lindenstrauss lemma or JL lemma for short.

2 2

Page 76: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY SHOULD RANDOM PROJECTIONS EVEN WORK?!

For large K, not only true in expectation but also with high probability

For any ✏ > 0, if K ≈ log (n��) �✏2, with probability 1 − � over draw ofW, for all pairs of data points i, j ∈ {1, . . . ,n},

(1 − ✏) �yi − yj�2 ≤ �xi − xj�2 ≤ (1 + ✏) �yi − yj�2

Lets try on Matlab . . .

This is called the Johnson-Lindenstrauss lemma or JL lemma for short.

2 2

Page 77: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY IS THIS SO RIDICULOUSLY MAGICAL?

n= 1000

d = 1000

Page 78: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY IS THIS SO RIDICULOUSLY MAGICAL?

n= 1000

d = 1000

If we take K = 69.1/✏2, with probability

0.99 distances are preserved to accuracy ✏

Page 79: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY IS THIS SO RIDICULOUSLY MAGICAL?

n= 1000

d = 10000

If we take K = 69.1/✏2, with probability

0.99 distances are preserved to accuracy ✏

Page 80: Machine Learning for Data Science (CS4786) …PCA: VARIANCE MAXIMIZATION Pick directions along which data varies the most First principal component: w 1 = arg max w∶ w 2 =1 1 n n

WHY IS THIS SO RIDICULOUSLY MAGICAL?

n= 1000

d = 1000000

If we take K = 69.1/✏2, with probability

0.99 distances are preserved to accuracy ✏