machine learning: unsupervised learningkt.era.ee/lectures/aacimp2012/slides5.pdf · today aacimp...
TRANSCRIPT
Machine Learning:
Unsupervised Learning
Konstantin Tretyakov http://kt.era.ee
AACIMP 2012
August 2012
So far…
AACIMP Summer School.
August, 2012
So far…
AACIMP Summer School.
August, 2012
Optimization
Probability theory
Reasoning by analogy
So far…
AACIMP Summer School.
August, 2012
Optimization Reasoning by analogy
Fermat’s theorem
Gradient methods
Batch & On-line
Probability theory
MLE,
MAP,
Bayesian Estimation
Risk Optimization
k-NN,
Kernel methods
So far…
AACIMP Summer School.
August, 2012
Optimization Reasoning by analogy
Fermat’s theorem
Gradient methods
Batch & On-line
Probability theory
MLE,
MAP,
Bayesian Estimation
Risk Optimization
k-NN,
Kernel methods k-NN, Decision tree learning
Linear regression (OLS, RR),
Linear classification (SVM, Perceptron),
Bayes & Naïve Bayes
Kernel-<xxx>
Today
AACIMP Summer School.
August, 2012
Optimization Reasoning by analogy
Fermat’s theorem
Gradient methods
Batch & On-line
Probability theory
MLE,
MAP,
Bayesian Estimation
Risk Optimization
k-NN,
Kernel methods k-NN, Decision tree learning
Linear regression (OLS, RR),
Linear classification (SVM, Perceptron),
Bayes & Naïve Bayes
Kernel-<xxx>
Unsupervised
Learning
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
Data Mining
AACIMP Summer School.
August, 2012
Raw data
Pattern
Deviations
Today
AACIMP Summer School.
August, 2012
Clustering Decomposition
Quiz
Why would one need clustering?
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Complete linkage
Hierarchical clustering
AACIMP Summer School.
August, 2012
Single linkage
Hierarchical clustering
AACIMP Summer School.
August, 2012
Average linkage
Hierarchical clustering
AACIMP Summer School.
August, 2012
Ward linkage
𝜎2
Hierarchical clustering
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Hierarchical clustering
AACIMP Summer School.
August, 2012
Partitional vs Hierarchical
AACIMP Summer School.
August, 2012 Slide © M.Kull, http://courses.cs.ut.ee/2012/ml/
K-means
AACIMP Summer School.
August, 2012
K-means
AACIMP Summer School.
August, 2012
K-means
AACIMP Summer School.
August, 2012
K-means
AACIMP Summer School.
August, 2012
K-means
AACIMP Summer School.
August, 2012
K-means
AACIMP Summer School.
August, 2012
K-means
AACIMP Summer School.
August, 2012
argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨 (𝒊)𝟐
𝒊
K-means
AACIMP Summer School.
August, 2012
argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨(𝒊)𝟐
𝒊
• Need to find cluster centers 𝒄𝒌. 𝒄𝟏 = ? , 𝒄𝟐 = ? , … , 𝒄𝑲 =?
K-means
AACIMP Summer School.
August, 2012
argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨(𝒊)𝟐
𝒊
• Need to find cluster centers 𝒄𝒌. 𝒄𝟏 = ? , 𝒄𝟐 = ? , … , 𝒄𝑲 =?
• Introduce latent variables (one for each 𝒙𝒊) 𝒂𝒊 = 𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐜𝐥𝐮𝐬𝐭𝐞𝐫_𝐜𝐞𝐧𝐭𝐞𝐫(𝒊) 𝒂𝟏 =? , 𝒂𝟐 =? , 𝒂𝟑 =? ,… , 𝒂𝒏 =?
K-means
AACIMP Summer School.
August, 2012
argmin𝒄𝟏,…,𝒄𝑲 𝒙𝒊 − 𝒄𝐜𝐥𝐨𝐬𝐞𝐬𝐭_𝐭𝐨(𝒊)𝟐
𝒊
• For fixed 𝒄𝒌 we can find optimal 𝒂𝒊
• For fixed 𝒂𝒊 we can find optimal 𝒄𝒌.
• Iterate to convergence.
Fuzzy vs Hard
AACIMP Summer School.
August, 2012 Slide © M.Kull, http://courses.cs.ut.ee/2012/ml/
Gaussian Mixture Modeling
AACIMP Summer School.
August, 2012
𝑿 ∼ [𝑁 𝝁1, 𝜎12 or 𝑁 𝝁2, 𝜎2
2 ]
Given 𝑿, estimate 𝝁𝒊, 𝝈𝒊𝟐
Gaussian Mixture Modeling
AACIMP Summer School.
August, 2012
𝑿 ∼ [𝑁 𝝁1, 𝜎12 or 𝑁 𝝁2, 𝜎2
2 ]
Given 𝑿, estimate 𝝁𝒊, 𝝈𝒊𝟐
MLE
Gaussian Mixture Modeling
AACIMP Summer School.
August, 2012
𝑿 ∼ [𝑁 𝝁1, 𝜎12 or 𝑁 𝝁2, 𝜎2
2 ]
Given 𝑿, estimate 𝝁𝒊, 𝝈𝒊𝟐
MLE
Expectation-Maximization (EM)
SKLearn’s Clustering
from sklearn.cluster
import
Ward,
KMeans,
DBScan,
MeanShift,
SpectralClustering,
AffinityPropagation
AACIMP Summer School.
August, 2012
SKLearn’s Clustering
from sklearn.cluster
import
Ward,
KMeans,
DBScan,
MeanShift,
SpectralClustering,
AffinityPropagation
AACIMP Summer School.
August, 2012
Use feature vectors
Use distance matrix
Quiz
Fuzzy clustering means that _______
K-means finds a set of cluster centers, which
have the smallest ______________
K-means can get stuck in a local minimum
(Y/N)?
AACIMP Summer School.
August, 2012
Today
AACIMP Summer School.
August, 2012
Clustering Decomposition
Canonical basis
𝑥1𝑥2= 𝛼10+ 𝛽01
AACIMP Summer School.
August, 2012
Alternative basis
𝑥1𝑥2= 𝛼
0.9−0.1
+ 𝛽0.10.9
AACIMP Summer School.
August, 2012
Alternative basis
𝑥1𝑥2= 𝛼
0.4−0.6
+ 𝛽0.60.4
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
1000⋮0
+ 𝛼2
0100⋮0
+⋯+ 𝛼𝑚
0000⋮1
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
1000⋮0
+ 𝛼2
0100⋮0
+⋯+ 𝛼𝑚
0000⋮1
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
0.00.10.10.2⋮0.0
+ 𝛼2
0.30.20.20.1⋮0.3
+ 𝛼𝑚
0.10.10.10.0⋮0.0
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
0.00.10.10.2⋮0.0
+ 𝛼2
0.30.20.20.1⋮0.3
+ 𝛼𝑚
0.10.10.10.0⋮0.0
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
1000⋮0
+ 𝛼2
0100⋮0
+⋯+ 𝛼𝑚
0000⋮1
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
0.00.10.10.2⋮0.0
+ 𝛼2
0.30.20.20.1⋮0.3
+ 𝛼𝑚
0.10.10.10.0⋮0.0
AACIMP Summer School.
August, 2012
Linear Decomposition
𝑥1𝑥2𝑥3𝑥4…𝑥100000
= 𝛼1
0.00.10.10.2⋮0.0
+ 𝛼2
0.30.20.20.1⋮0.3
+ 𝛼𝑚
0.10.10.10.0⋮0.0
AACIMP Summer School.
August, 2012
Linear Decomposition
𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎
AACIMP Summer School.
August, 2012
Linear Decomposition
𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎
𝒙 =⋮ ⋮ … ⋮𝒗𝟏⋮𝒗𝟐⋮…𝒗𝒎⋮
𝛼1𝛼2⋮𝛼𝑚
AACIMP Summer School.
August, 2012
Linear Decomposition
𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎
𝒙 = 𝑽𝜶
AACIMP Summer School.
August, 2012
Linear Decomposition
𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎
𝒙 = 𝑽𝜶 𝜶 = ?
AACIMP Summer School.
August, 2012
Linear Decomposition
𝒙 = 𝛼1𝒗𝟏 + 𝛼2𝒗𝟐 +⋯+ 𝛼𝑚𝒗𝒎
𝒙 = 𝑽𝜶 𝜶 = 𝑽+𝒙
AACIMP Summer School.
August, 2012
How do we find a good basis?
AACIMP Summer School.
August, 2012
Linear projection
AACIMP Summer School.
August, 2012
Linear projection
AACIMP Summer School.
August, 2012
Linear projection
AACIMP Summer School.
August, 2012
Linear projection
AACIMP Summer School.
August, 2012
Idea: Maximize projection variance
For a point 𝒙𝑖 and a unit basis vector 𝒗 the
length of projection of 𝒙𝑖 onto 𝒗 is given by
𝑝 = 𝒗, 𝒙𝑖 = 𝒗𝑇𝒙𝑖
AACIMP Summer School.
August, 2012
𝒗
𝒙
𝑝
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 =1
𝑛 𝑝𝑖 − 𝑝
2
𝑖
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 =1
𝑛 𝑝𝑖 − 𝑝
2
𝑖
𝒗 = argmax𝒗 𝜎𝒗2
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 =1
𝑛 𝑝𝑖 − 𝑝
2
𝑖
AACIMP Summer School.
August, 2012
Pre-center data, so that 𝑝 = 𝒗𝑇𝒙 = 0
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 =1
𝑛 𝑝𝑖
2
𝑖
AACIMP Summer School.
August, 2012
Pre-center data, so that 𝑝 = 𝒗𝑇𝒙 = 0
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 =1
𝑛 𝑝𝑖
2
𝑖
=1
𝑛𝒑 2
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 =1
𝑛 𝑝𝑖
2
𝑖
=1
𝑛𝒑 2
=1
𝑛𝑿𝒗 2
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 = ⋯
=1
𝑛𝑿𝒗 2 =
1
𝑛𝑿𝒗 𝑇 𝑿𝒗
=1
𝑛𝒗𝑇𝑿𝑇𝑿𝒗 = 𝒗𝑇𝚺𝒗
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 = 𝒗𝑇𝚺𝒗
AACIMP Summer School.
August, 2012
Projection variance
𝑝𝑖 = 𝒗𝑇𝒙𝑖
𝜎𝒗2 = 𝒗𝑇𝚺𝒗
AACIMP Summer School.
August, 2012
Data covariance matrix
𝑿𝑻𝑿
Objective function
argmax𝒗 𝒗𝑇𝚺𝒗
𝑠. 𝑡. 𝒗 2 = 1
AACIMP Summer School.
August, 2012
Optimization
argmax𝒗 𝒗𝑇𝚺𝒗
𝑠. 𝑡. 𝒗 2 = 1
𝚺𝒗 = 𝜆𝒗
AACIMP Summer School.
August, 2012
Method of Lagrange multipliers…
Optimization
argmax𝒗 𝒗𝑇𝚺𝒗
𝑠. 𝑡. 𝒗 2 = 1
𝚺𝒗 = 𝜆𝒗
AACIMP Summer School.
August, 2012
Method of Lagrange multipliers…
Eigenvector of 𝚺 Eigenvalue
Example
Xc = X – mean(X, axis=0)
Sigma = Xc.T * Xc / n
(= cov(Xc, rowvar=0))
lambdas, vs = eigh(Sigma)
AACIMP Summer School.
August, 2012
Example
AACIMP Summer School.
August, 2012
𝜆1 = 0.8
𝜆2 = 28.9
Example
AACIMP Summer School.
August, 2012
𝜆1 = 0.8
𝜆2 = 28.9
𝜎𝑖2 = 𝒗𝑖
𝑇𝚺𝒗𝑖 = 𝒗𝑖𝑇𝜆𝑖𝒗𝑖 = 𝜆𝑖 𝒗𝑖
2 = 𝜆𝑖
Example
AACIMP Summer School.
August, 2012
𝜆2 = 28.9
Example
AACIMP Summer School.
August, 2012
𝜆2 = 28.9
Example
AACIMP Summer School.
August, 2012
𝜆1 = 0.8
𝜆2 = 28.9
Principal Components Analysis
AACIMP Summer School.
August, 2012
Principal components are the
eigenvectors of the covariance matrix.
𝑽, 𝝀 = 𝐞𝐢𝐠(𝚺)
Principal Components Analysis
AACIMP Summer School.
August, 2012
Principal components are the
eigenvectors of the covariance matrix.
𝑽, 𝝀 = 𝐞𝐢𝐠(𝚺)
For each PC, the corresponding eigenvalue
𝜆𝑖 shows the amount of variance
explained by the component.
Principal Components Analysis
AACIMP Summer School.
August, 2012
Eigenvalue spectrum of 𝚺
Principal Components Analysis
AACIMP Summer School.
August, 2012
Data projection onto PC 𝑖: 𝒑 = 𝑿𝒗𝑖
Data reconstruction from PC coordinates:
𝑿proj𝑽∗𝑇 = 𝑿
Data projection onto multiple PCs:
𝑿proj = 𝑿𝑽∗
SKLearn’s PCA
from sklearn.decomposition
import PCA
model = PCA(n_components=2)
model.fit(X)
X_t = model.transform(X)
model.components_[1,:]
AACIMP Summer School.
August, 2012
SKLearn’s PCA
from sklearn.decomposition
import
PCA,
SparsePCA,
ProbabilisticPCA,
KernelPCA,
FastICA,
NMF,
DictionaryLearning,
...
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′ = 𝑿5 00 0.9
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′ = 𝑿5 00 0.9
𝑿′′
= 𝑿′cos 0.8 − sin 0.8sin 0.8 cos 0.8
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹
𝑿′′ 𝑇 𝑿′′
= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹)
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹
𝑿′′ 𝑇 𝑿′′
= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹
𝑿′′ 𝑇 𝑿′′
= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹 = 𝑹𝑇𝑫2𝑹
AACIMP Summer School.
August, 2012
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹
𝑿′′ 𝑇 𝑿′′
= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹 = 𝑹𝑇𝑫2𝑹
AACIMP Summer School.
August, 2012
𝚺 = 𝑽𝚲𝑽𝑇 (𝑽 = 𝑹𝑻, 𝚲 = 𝐃2)
PCA: Geometric intuition
𝑿 ∼ 𝑁 0,1
𝑿′′ = 𝑿 ⋅ 𝑫 ⋅ 𝑹
𝑿′′ 𝑇 𝑿′′
= 𝑿𝑫𝑹 𝑇(𝑿𝑫𝑹) = 𝑹𝑇𝑫𝑇𝑿𝑇𝑿𝑫𝑹 = 𝑹𝑇𝑫2𝑹
AACIMP Summer School.
August, 2012
𝚺 = 𝑽𝚲𝑽𝑇 (𝑽 = 𝑹𝑻, 𝚲 = 𝐃2)
𝑿′′𝑹𝑇 = 𝑿𝑫
𝑿′′𝑽 = 𝑿𝐩𝐫𝐨𝐣
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
Quiz
Principal components are ___________ of
the ___________ matrix.
Eigenvalue spectrum shows how much
___________ is explained by each ______.
If 𝚺 = 𝑽𝚲𝑽𝑇, then
𝑿𝐩𝐫𝐨𝐣 = _______
AACIMP Summer School.
August, 2012
Conclusion
AACIMP Summer School.
August, 2012
Conclusion
AACIMP Summer School.
August, 2012
Statistical Learning Theory,
PAC-theory
Semi-supervised learning,
Active learning,
Reinforcement learning,
Multi-instance learning,
Deep learning
Graphical models,
Neural networks,
Fuzzy sets,
Association rules
Computer vision,
Natural language processing,
Information Retrieval,
Music & Video processing,
Bioinformatics,
Physics,
Robotics,
Finance & Economics,
…
Where to go next
Books: “The Elements of Statistical Learning” (Hastie & Tibshirani)
“Pattern Recognition and Machine Learning” (Bishop)
“Kernel Methods for Pattern Analysis” (Shawe-Taylor &
Cristianini)
On-line materials:
http://videolectures.net
+ Coursera, Udacity, edX
Tools:
Python, R, RapidMiner, Weka, Matlab, Mathematica, …
AACIMP Summer School.
August, 2012
Where to go next
http://kt.era.ee/aacimp
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
Science
AACIMP Summer School.
August, 2012
Observation
Analysis
Generalization
AACIMP Summer School.
August, 2012
Why?
AACIMP Summer School.
August, 2012
Why?
AACIMP Summer School.
August, 2012
Contemporary Science
AACIMP Summer School.
August, 2012
Observation
Analysis
Generalization
Contemporary Science
AACIMP Summer School.
August, 2012
Analysis
Generalization
CS & EE
Contemporary Science
AACIMP Summer School.
August, 2012
Contemporary Science
AACIMP Summer School.
August, 2012
Contemporary Science
AACIMP Summer School.
August, 2012
Contemporary Science
AACIMP Summer School.
August, 2012
Analysis
Generalization
CS & EE
Contemporary Science
AACIMP Summer School.
August, 2012
Generalization
CS & EE
ML
Contemporary Science
AACIMP Summer School.
August, 2012
CS & EE
ML
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
AACIMP Summer School.
August, 2012
Thank You!