machine learning: unsupervised learningkt.era.ee/lectures/aacimp2012/slides5.pdf · today aacimp...

Post on 27-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning:

Unsupervised Learning

Konstantin Tretyakov http://kt.era.ee

AACIMP 2012

August 2012

So far…

AACIMP Summer School.

August, 2012

So far…

AACIMP Summer School.

August, 2012

Optimization

Probability theory

Reasoning by analogy

So far…

AACIMP Summer School.

August, 2012

Optimization Reasoning by analogy

Fermat’s theorem

Gradient methods

Batch & On-line

Probability theory

MLE,

MAP,

Bayesian Estimation

Risk Optimization

k-NN,

Kernel methods

So far…

AACIMP Summer School.

August, 2012

Optimization Reasoning by analogy

Fermat’s theorem

Gradient methods

Batch & On-line

Probability theory

MLE,

MAP,

Bayesian Estimation

Risk Optimization

k-NN,

Kernel methods k-NN, Decision tree learning

Linear regression (OLS, RR),

Linear classification (SVM, Perceptron),

Bayes & Naïve Bayes

Kernel-<xxx>

Today

AACIMP Summer School.

August, 2012

Optimization Reasoning by analogy

Fermat’s theorem

Gradient methods

Batch & On-line

Probability theory

MLE,

MAP,

Bayesian Estimation

Risk Optimization

k-NN,

Kernel methods k-NN, Decision tree learning

Linear regression (OLS, RR),

Linear classification (SVM, Perceptron),

Bayes & Naïve Bayes

Kernel-<xxx>

Unsupervised

Learning

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

Data Mining

AACIMP Summer School.

August, 2012

Raw data

Pattern

Deviations

Today

AACIMP Summer School.

August, 2012

Clustering Decomposition

Quiz

Why would one need clustering?

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Complete linkage

Hierarchical clustering

AACIMP Summer School.

August, 2012

Single linkage

Hierarchical clustering

AACIMP Summer School.

August, 2012

Average linkage

Hierarchical clustering

AACIMP Summer School.

August, 2012

Ward linkage

𝜎2

Hierarchical clustering

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Hierarchical clustering

AACIMP Summer School.

August, 2012

Partitional vs Hierarchical

AACIMP Summer School.

August, 2012 Slide © M.Kull, http://courses.cs.ut.ee/2012/ml/

K-means

AACIMP Summer School.

August, 2012

K-means

AACIMP Summer School.

August, 2012

K-means

AACIMP Summer School.

August, 2012

K-means

AACIMP Summer School.

August, 2012

K-means

AACIMP Summer School.

August, 2012

K-means

AACIMP Summer School.

August, 2012

K-means

AACIMP Summer School.

August, 2012

argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨 (𝒊)𝟐

𝒊

K-means

AACIMP Summer School.

August, 2012

argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨(𝒊)𝟐

𝒊

• Need to find cluster centers 𝒄𝒌. 𝒄𝟏 = ? , 𝒄𝟐 = ? , … , 𝒄𝑲 =?

K-means

AACIMP Summer School.

August, 2012

argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨(𝒊)𝟐

𝒊

• Need to find cluster centers 𝒄𝒌. 𝒄𝟏 = ? , 𝒄𝟐 = ? , … , 𝒄𝑲 =?

• Introduce latent variables (one for each 𝒙𝒊) 𝒂𝒊 = 𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐜𝐥𝐮𝐬𝐭𝐞𝐫_𝐜𝐞𝐧𝐭𝐞𝐫(𝒊) 𝒂𝟏 =? , 𝒂𝟐 =? , 𝒂𝟑 =? ,… , 𝒂𝒏 =?

K-means

AACIMP Summer School.

August, 2012

argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨(𝒊)𝟐

𝒊

• For fixed 𝒄𝒌 we can find optimal 𝒂𝒊

• For fixed 𝒂𝒊 we can find optimal 𝒄𝒌.

• Iterate to convergence.

Fuzzy vs Hard

AACIMP Summer School.

August, 2012 Slide © M.Kull, http://courses.cs.ut.ee/2012/ml/

Gaussian Mixture Modeling

AACIMP Summer School.

August, 2012

𝑿 ∼ [𝑁 𝝁1, 𝜎12 or 𝑁 𝝁2, 𝜎2

2 ]

Given 𝑿, estimate 𝝁𝒊, 𝝈𝒊𝟐

Gaussian Mixture Modeling

AACIMP Summer School.

August, 2012

𝑿 ∼ [𝑁 𝝁1, 𝜎12 or 𝑁 𝝁2, 𝜎2

2 ]

Given 𝑿, estimate 𝝁𝒊, 𝝈𝒊𝟐

MLE

Gaussian Mixture Modeling

AACIMP Summer School.

August, 2012

𝑿 ∼ [𝑁 𝝁1, 𝜎12 or 𝑁 𝝁2, 𝜎2

2 ]

Given 𝑿, estimate 𝝁𝒊, 𝝈𝒊𝟐

MLE

Expectation-Maximization (EM)

SKLearn’s Clustering

from sklearn.cluster

import

Ward,

KMeans,

DBScan,

MeanShift,

SpectralClustering,

AffinityPropagation

AACIMP Summer School.

August, 2012

SKLearn’s Clustering

from sklearn.cluster

import

Ward,

KMeans,

DBScan,

MeanShift,

SpectralClustering,

AffinityPropagation

AACIMP Summer School.

August, 2012

Use feature vectors

Use distance matrix

Quiz

Fuzzy clustering means that _______

K-means finds a set of cluster centers, which

have the smallest ______________

K-means can get stuck in a local minimum

(Y/N)?

AACIMP Summer School.

August, 2012

Today

AACIMP Summer School.

August, 2012

Clustering Decomposition

Canonical basis

𝑥1𝑥2= 𝛼10+ 𝛽01

AACIMP Summer School.

August, 2012

Alternative basis

𝑥1𝑥2= 𝛼

0.9−0.1

+ 𝛽0.10.9

AACIMP Summer School.

August, 2012

Alternative basis

𝑥1𝑥2= 𝛼

0.4−0.6

+ 𝛽0.60.4

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

1000⋮0

+ 𝛼2

0100⋮0

+⋯+ 𝛼𝑚

0000⋮1

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

1000⋮0

+ 𝛼2

0100⋮0

+⋯+ 𝛼𝑚

0000⋮1

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

0.00.10.10.2⋮0.0

+ 𝛼2

0.30.20.20.1⋮0.3

+ 𝛼𝑚

0.10.10.10.0⋮0.0

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

0.00.10.10.2⋮0.0

+ 𝛼2

0.30.20.20.1⋮0.3

+ 𝛼𝑚

0.10.10.10.0⋮0.0

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

1000⋮0

+ 𝛼2

0100⋮0

+⋯+ 𝛼𝑚

0000⋮1

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

0.00.10.10.2⋮0.0

+ 𝛼2

0.30.20.20.1⋮0.3

+ 𝛼𝑚

0.10.10.10.0⋮0.0

AACIMP Summer School.

August, 2012

Linear Decomposition

𝑥1𝑥2𝑥3𝑥4…𝑥100000

= 𝛼1

0.00.10.10.2⋮0.0

+ 𝛼2

0.30.20.20.1⋮0.3

+ 𝛼𝑚

0.10.10.10.0⋮0.0

AACIMP Summer School.

August, 2012

Linear Decomposition

𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎

AACIMP Summer School.

August, 2012

Linear Decomposition

𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎

𝒙 =⋮ ⋮ … ⋮𝒗𝟏⋮𝒗𝟐⋮…𝒗𝒎⋮

𝛼1𝛼2⋮𝛼𝑚

AACIMP Summer School.

August, 2012

Linear Decomposition

𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎

𝒙 = 𝑽𝜶

AACIMP Summer School.

August, 2012

Linear Decomposition

𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎

𝒙 = 𝑽𝜶 𝜶 = ?

AACIMP Summer School.

August, 2012

Linear Decomposition

𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎

𝒙 = 𝑽𝜶 𝜶 = 𝑽+𝒙

AACIMP Summer School.

August, 2012

How do we find a good basis?

AACIMP Summer School.

August, 2012

Linear projection

AACIMP Summer School.

August, 2012

Linear projection

AACIMP Summer School.

August, 2012

Linear projection

AACIMP Summer School.

August, 2012

Linear projection

AACIMP Summer School.

August, 2012

Idea: Maximize projection variance

For a point 𝒙𝑖 and a unit basis vector 𝒗 the

length of projection of 𝒙𝑖 onto 𝒗 is given by

𝑝 = 𝒗, 𝒙𝑖 = 𝒗𝑇𝒙𝑖

AACIMP Summer School.

August, 2012

𝒗

𝒙

𝑝

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 =1

𝑛 𝑝𝑖 − 𝑝

2

𝑖

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 =1

𝑛 𝑝𝑖 − 𝑝

2

𝑖

𝒗 = argmax𝒗 𝜎𝒗2

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 =1

𝑛 𝑝𝑖 − 𝑝

2

𝑖

AACIMP Summer School.

August, 2012

Pre-center data, so that 𝑝 = 𝒗𝑇𝒙 = 0

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 =1

𝑛 𝑝𝑖

2

𝑖

AACIMP Summer School.

August, 2012

Pre-center data, so that 𝑝 = 𝒗𝑇𝒙 = 0

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 =1

𝑛 𝑝𝑖

2

𝑖

=1

𝑛𝒑 2

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 =1

𝑛 𝑝𝑖

2

𝑖

=1

𝑛𝒑 2

=1

𝑛𝑿𝒗 2

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 = ⋯

=1

𝑛𝑿𝒗 2 =

1

𝑛𝑿𝒗 𝑇 𝑿𝒗

=1

𝑛𝒗𝑇𝑿𝑇𝑿𝒗 = 𝒗𝑇𝚺𝒗

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 = 𝒗𝑇𝚺𝒗

AACIMP Summer School.

August, 2012

Projection variance

𝑝𝑖 = 𝒗𝑇𝒙𝑖

𝜎𝒗2 = 𝒗𝑇𝚺𝒗

AACIMP Summer School.

August, 2012

Data covariance matrix

𝑿𝑻𝑿

Objective function

argmax𝒗 𝒗𝑇𝚺𝒗

𝑠. 𝑡. 𝒗 2 = 1

AACIMP Summer School.

August, 2012

Optimization

argmax𝒗 𝒗𝑇𝚺𝒗

𝑠. 𝑡. 𝒗 2 = 1

𝚺𝒗 = 𝜆𝒗

AACIMP Summer School.

August, 2012

Method of Lagrange multipliers…

Optimization

argmax𝒗 𝒗𝑇𝚺𝒗

𝑠. 𝑡. 𝒗 2 = 1

𝚺𝒗 = 𝜆𝒗

AACIMP Summer School.

August, 2012

Method of Lagrange multipliers…

Eigenvector of 𝚺 Eigenvalue

Example

Xc = X – mean(X, axis=0)

Sigma = Xc.T * Xc / n

(= cov(Xc, rowvar=0))

lambdas, vs = eigh(Sigma)

AACIMP Summer School.

August, 2012

Example

AACIMP Summer School.

August, 2012

𝜆1 = 0.8

𝜆2 = 28.9

Example

AACIMP Summer School.

August, 2012

𝜆1 = 0.8

𝜆2 = 28.9

𝜎𝑖2 = 𝒗𝑖

𝑇𝚺𝒗𝑖 = 𝒗𝑖𝑇𝜆𝑖𝒗𝑖 = 𝜆𝑖 𝒗𝑖

2 = 𝜆𝑖

Example

AACIMP Summer School.

August, 2012

𝜆2 = 28.9

Example

AACIMP Summer School.

August, 2012

𝜆2 = 28.9

Example

AACIMP Summer School.

August, 2012

𝜆1 = 0.8

𝜆2 = 28.9

Principal Components Analysis

AACIMP Summer School.

August, 2012

Principal components are the

eigenvectors of the covariance matrix.

𝑽, 𝝀 = 𝐞𝐢𝐠(𝚺)

Principal Components Analysis

AACIMP Summer School.

August, 2012

Principal components are the

eigenvectors of the covariance matrix.

𝑽, 𝝀 = 𝐞𝐢𝐠(𝚺)

For each PC, the corresponding eigenvalue

𝜆𝑖 shows the amount of variance

explained by the component.

Principal Components Analysis

AACIMP Summer School.

August, 2012

Eigenvalue spectrum of 𝚺

Principal Components Analysis

AACIMP Summer School.

August, 2012

Data projection onto PC 𝑖: 𝒑 = 𝑿𝒗𝑖

Data reconstruction from PC coordinates:

𝑿proj𝑽∗𝑇 = 𝑿

Data projection onto multiple PCs:

𝑿proj = 𝑿𝑽∗

SKLearn’s PCA

from sklearn.decomposition

import PCA

model = PCA(n_components=2)

model.fit(X)

X_t = model.transform(X)

model.components_[1,:]

AACIMP Summer School.

August, 2012

SKLearn’s PCA

from sklearn.decomposition

import

PCA,

SparsePCA,

ProbabilisticPCA,

KernelPCA,

FastICA,

NMF,

DictionaryLearning,

...

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′ = 𝑿5 00 0.9

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′ = 𝑿5 00 0.9

𝑿′′

= 𝑿′cos 0.8 − sin 0.8sin 0.8 cos 0.8

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹

𝑿′′ 𝑇 𝑿′′

= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹)

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹

𝑿′′ 𝑇 𝑿′′

= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹

𝑿′′ 𝑇 𝑿′′

= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹 = 𝑹𝑇𝑫2𝑹

AACIMP Summer School.

August, 2012

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹

𝑿′′ 𝑇 𝑿′′

= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹 = 𝑹𝑇𝑫2𝑹

AACIMP Summer School.

August, 2012

𝚺 = 𝑽𝚲𝑽𝑇 (𝑽 = 𝑹𝑻, 𝚲 = 𝐃2)

PCA: Geometric intuition

𝑿 ∼ 𝑁 0,1

𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹

𝑿′′ 𝑇 𝑿′′

= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹 = 𝑹𝑇𝑫2𝑹

AACIMP Summer School.

August, 2012

𝚺 = 𝑽𝚲𝑽𝑇 (𝑽 = 𝑹𝑻, 𝚲 = 𝐃2)

𝑿′′𝑹𝑇 = 𝑿𝑫

𝑿′′𝑽 = 𝑿𝐩𝐫𝐨𝐣

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

Quiz

Principal components are ___________ of

the ___________ matrix.

Eigenvalue spectrum shows how much

___________ is explained by each ______.

If 𝚺 = 𝑽𝚲𝑽𝑇, then

𝑿𝐩𝐫𝐨𝐣 = _______

AACIMP Summer School.

August, 2012

Conclusion

AACIMP Summer School.

August, 2012

Conclusion

AACIMP Summer School.

August, 2012

Statistical Learning Theory,

PAC-theory

Semi-supervised learning,

Active learning,

Reinforcement learning,

Multi-instance learning,

Deep learning

Graphical models,

Neural networks,

Fuzzy sets,

Association rules

Computer vision,

Natural language processing,

Information Retrieval,

Music & Video processing,

Bioinformatics,

Physics,

Robotics,

Finance & Economics,

Where to go next

Books: “The Elements of Statistical Learning” (Hastie & Tibshirani)

“Pattern Recognition and Machine Learning” (Bishop)

“Kernel Methods for Pattern Analysis” (Shawe-Taylor &

Cristianini)

On-line materials:

http://videolectures.net

+ Coursera, Udacity, edX

Tools:

Python, R, RapidMiner, Weka, Matlab, Mathematica, …

AACIMP Summer School.

August, 2012

Where to go next

http://kt.era.ee/aacimp

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

Science

AACIMP Summer School.

August, 2012

Observation

Analysis

Generalization

AACIMP Summer School.

August, 2012

Why?

AACIMP Summer School.

August, 2012

Why?

AACIMP Summer School.

August, 2012

Contemporary Science

AACIMP Summer School.

August, 2012

Observation

Analysis

Generalization

Contemporary Science

AACIMP Summer School.

August, 2012

Analysis

Generalization

CS & EE

Contemporary Science

AACIMP Summer School.

August, 2012

Contemporary Science

AACIMP Summer School.

August, 2012

Contemporary Science

AACIMP Summer School.

August, 2012

Contemporary Science

AACIMP Summer School.

August, 2012

Analysis

Generalization

CS & EE

Contemporary Science

AACIMP Summer School.

August, 2012

Generalization

CS & EE

ML

Contemporary Science

AACIMP Summer School.

August, 2012

CS & EE

ML

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

AACIMP Summer School.

August, 2012

Thank You!

top related