lec 6 data analysis
DESCRIPTION
Computer graphicsTRANSCRIPT
![Page 1: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/1.jpg)
Analysis ofHigh-Dimensional Data
Leif Kobbelt
![Page 2: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/2.jpg)
2Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Motivation
• Given: n samples in d-dimensional space
nd
n R xxX ,,1
2
![Page 3: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/3.jpg)
3Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Motivation
• Given: n samples in d-dimensional space
• Decrease d dimensionality reduction:PCA
MDS
nd
n R xxX ,,1
3
![Page 4: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/4.jpg)
4Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Principal Component Analysis
• Idea: Compute orthorgonal linear transformation
that transforms the data into a new coordinate
system s.t.greatest variance on first coordinate axis
second greatest variance on second axis
etc.
• Optimal transform for a given data set in the least
squares sense
• Dimensionality reduction: project data into lower
dimensional space spanned by first principal
components
![Page 5: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/5.jpg)
5Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Principal Component Analysis
Given: n samples scattered in d-dimensional space,
written as a matrix
nd
n R xxxX ,,, 21
compute the centered covariance matrix:
(interpretation as map from Rd
to Rd)
ddT RXXXXC ))((
5
![Page 6: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/6.jpg)
6Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Principal Component Analysis
computation of C with the “centering matrix”:
TTTXJJXXJXJC
T
nn
IJ 111
principal component(s):
eigenvector(s) vi to largest eigenvalue(s) λi of C
(low rank approximation)
6
![Page 7: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/7.jpg)
7Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Principal Component Analysis
Tqqq
T
ddd
TVDVC
vvvv
vvvv
111
111
diag
diag
nqT
q RJXX vv 1
* :
7
![Page 8: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/8.jpg)
8Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Relation to SVD
• singular value decomposition
TUVXJ
T
TTTT
VV
VUUVXJXJC
2
)(
8
![Page 9: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/9.jpg)
9Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
… for very large dimension d
ddT RXJXJC )(
nnT RXJXJC )(~
vvC vXJwT
wvXJvXJXJXJwCTTT
~
9
![Page 10: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/10.jpg)
10Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Example
10 points in 2R
10
![Page 11: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/11.jpg)
11Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
10 points in
Example
717.0615.0
615.0617.0C
68.0
74.01e
74.0
68.02e
11
2R
![Page 12: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/12.jpg)
12Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Multi-Dimensional Scaling
Given: For n unknown samples in high-
dimensional space
d
in R xxxX ,,,1
we are given a matrix of pairwise
(squared) distances:2
, jiji xxD
ndR X
nnRD
12
![Page 13: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/13.jpg)
13Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Multi-Dimensional Scaling
samples in some abstract space:
matrix of pairwise abstract distances:
Ain xxxX ,,,1
jiD ,
X
nnRD
13
![Page 14: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/14.jpg)
14Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Multi-Dimensional Scaling
Goal:find an embedding of in a low-dimensional
space such that the pairwise (variations of)
distances are preserved.
2
)ˆ()ˆ,(F
T JDDJDD
D̂
other measures are possible
but they cannot be solved easily.
)ˆ,( DD
X
14
![Page 15: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/15.jpg)
15Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Multi-Dimensional Scaling
closed form solution:
first q eigenvectors of the matrix
define the coordinates of a q-dimensional
embedding
nnT RJDJ 21
nq
T
q
q
q R
v
v
v
vX ,,'
1
11
15
qvv ,,1
![Page 16: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/16.jpg)
16Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Multi-Dimensional Scaling
![Page 17: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/17.jpg)
17Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Motivation
• Given: n samples in d-dimensional space
• Decrease n clustering: k-means
EM
Mean shift
Spectral clustering
Hierarchical clustering
nd
n R xxX ,,1
17
![Page 18: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/18.jpg)
18Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Cluster Analysis
• Task: Given a set of observations / data samples,
assign them into clusters so that observations in
the same cluster are similar.
18
![Page 19: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/19.jpg)
19Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Cluster Analysis
• Task: Given a set of observations / data samples,
assign them into clusters so that observations in
the same cluster are similar.
19
![Page 20: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/20.jpg)
20Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
• Idea: partition n observations into k clusters in
which each observation belongs to the cluster with
the nearest mean.
• Given: data samples
• Goal: partition the n samples into k sets (k ≤ n)
S1, S2, …, Sk such that
is minimized, where μi is the mean of points in Si.
d
in Rxxx ,,1
k
i S
ijS
ij1
2
minargx
μx
20
![Page 21: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/21.jpg)
21Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
• Two step algorithm:Assignment step: Assign each sample to the cluster with
the closest mean (Voronoi Diagram)
Update step: Calculate the new means to be the centroid
of the observations in the cluster.
Iterate until convergence (assignments change no
longer)
kiS t
ij
t
ijj
t
i ,,1,: ** mxmxx
tij S
jt
i
t
iS x
xm11
21
![Page 22: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/22.jpg)
22Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
22
![Page 23: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/23.jpg)
23Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
23
![Page 24: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/24.jpg)
24Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
24
![Page 25: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/25.jpg)
25Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
25
![Page 26: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/26.jpg)
26Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
26
![Page 27: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/27.jpg)
27Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
27
![Page 28: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/28.jpg)
28Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
28
![Page 29: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/29.jpg)
29Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
29
![Page 30: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/30.jpg)
30Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
30
![Page 31: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/31.jpg)
31Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
31
![Page 32: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/32.jpg)
32Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
32
![Page 33: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/33.jpg)
33Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
33
![Page 34: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/34.jpg)
34Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
34
![Page 35: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/35.jpg)
35Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering
35
![Page 36: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/36.jpg)
36Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
k-means Clustering - Comments
• Advantages:Efficient
Always converges to a solution
• Drawbacks:Not necessarily globally optimal solution
#clusters k is an input parameter
Sensitive to initial clusters
Cluster model: data is split halfway between cluster
means
36
![Page 37: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/37.jpg)
37Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Clustering Results
37
![Page 38: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/38.jpg)
38Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm
• Expectation Maximization (EM)
• Probabilistic assignments to clusters instead of
deterministic assignments
• Multivariate Gaussian distributions instead of
means
38
![Page 39: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/39.jpg)
39Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm
• Given: data samples
• Assumption: data was generated by k Gaussians
• Goal: Fit Gaussian mixture model (GMM) to data
Find means
covariances of the Gaussians
probabilities (weights) that the samples come from
the Gaussian j
kj ,,1
jμ
jjΣ
d
in R xxxX ,,,1
X
39
![Page 40: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/40.jpg)
40Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm – Example (1D)
• Three samples drawn from each mixture component• means: 2,2 21 μμ
40
![Page 41: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/41.jpg)
41Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm – Example (2D)
41
![Page 42: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/42.jpg)
42Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm – Example (2D)
42
![Page 43: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/43.jpg)
43Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm – Example (2D)
43
![Page 44: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/44.jpg)
44Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm
1. Initialization: Choose initial estimates
and compute the initial
log-likelihood
2. E-step: Compute
and
n
i
k
j
jjijn
L1 1
0000 ,log1
Σμx
kjjjj ,,1,,, 000 Σμ
kjni
k
l
m
l
m
li
m
l
m
j
m
ji
m
jm
ij ,,1,,,1,
,
,
1
Σμx
Σμx
kjnn
i
m
ij
m
j ,,1,1
44
![Page 45: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/45.jpg)
45Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
EM Algorithm
3. M-step: Compute new estimates (j=1,...,k)
4. Convergence check: Compute new log-
likelihood
n
i
Tm
ji
m
ji
m
ijm
j
m
j
n
i
i
m
ijm
j
m
j
m
jm
j
n
n
n
n
1
111
1
1
1
1
1
μxμxΣ
xμ
n
i
k
j
m
j
m
ji
m
j
m
nL
1 1
1111 ,log1
μx
45
![Page 46: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/46.jpg)
46Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Example (2D)
Ground truth: Means:
Covariance matrices:
Weights:
• Input to EM-algorithm:
1000 samples
46
![Page 47: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/47.jpg)
47Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Initial Estimate
Initial density estimation:
(centroids of k-means result)
47
![Page 48: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/48.jpg)
48Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
1st Iteration
48
![Page 49: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/49.jpg)
49Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
2nd Iteration
49
![Page 50: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/50.jpg)
50Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
3rd Iteration
Estimates after three iterations:
50
![Page 51: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/51.jpg)
51Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
• Non-parametric clustering technique
• No prior knowledge of #clusters
• No constraints on shape of clusters
51
![Page 52: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/52.jpg)
52Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering - Idea
• Interprete points in feature space as empirical probability
density function
• Dense regions in feature space correspond to local
maxima of the underlying distribution
• For each sample: run gradient ascent procedure on local
estimated density until convergence
• Stationary points = maxima of distribution
• Samples associted with the same stationary point are
considered to be in the same cluster
52
![Page 53: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/53.jpg)
53Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
• Given: data samples• Multi-variate kernel density estimate with radially
symmetric kernel K(x) and window radius h
• The radially symmetric kernel is defined as
where is a normalization constant• Modes of density function are located at zeros of
gradient function
n
i
i
d hK
nhf
1
1 xxx
2
, xx kcK dk
dkc ,
0 xf
d
in Rxxx ,,1
53
![Page 54: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/54.jpg)
54Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
Gradient of density estimator
where denotes the derivative of the
kernel profile
xxx
xxx
xxx
n
i
i
n
i
ii
n
i
i
d
dk
hg
hg
hg
nh
cf
1
2
1
2
1
2
2
,2
xx'kg
xk
54
![Page 55: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/55.jpg)
55Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
Gradient of density estimator
xxx
xxx
xxx
n
i
i
n
i
ii
n
i
i
d
dk
hg
hg
hg
nh
cf
1
2
1
2
1
2
2
,2
proportional to density
estimate at x xhm
mean shift vector points toward direction of
maximum increase in the density.
xhm
55
![Page 56: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/56.jpg)
56Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
Mean shift procedure for sample :
1. Compute mean shift vector
2. Translate density estimation window
Iterate 1. and 2. until convergence, i.e.,
t
ixm
t
i
t
i
t
i m xxx 1
0 if x
ix
56
![Page 57: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/57.jpg)
57Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
57
![Page 58: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/58.jpg)
58Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
0
ix
58
![Page 59: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/59.jpg)
59Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
1
ix
59
![Page 60: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/60.jpg)
60Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
2
ix
60
![Page 61: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/61.jpg)
61Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
3
ix
61
![Page 62: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/62.jpg)
62Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift Clustering
n
ix
62
![Page 63: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/63.jpg)
63Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Mean Shift - Comments
• Advantages: No prior knowledge of #clusters
No constraints on shape of clusters
• Drawbacks:Computationally expensive:
Run algorithm for every sample
Identification of sample neighborhood requires multi-dimensional
range search
How to choose the bandwidth parameter h ?
![Page 64: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/64.jpg)
64Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Summary
• Given: n samples in d-dimensional space
• Decrease d dimensionality reduction:PCAMDS
• Decrease n clustering: k-meansEM Mean shiftSpectral clusteringHierarchical clustering
nd
n R xxX ,,1
64
![Page 65: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/65.jpg)
65Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Model similarity between data points as graph
• Clustering: Find connected components in graph
![Page 66: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/66.jpg)
66Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Model similarity between data points as graph
• (weighted) Adjacency Matrix W:
• Degree Matrix D:
![Page 67: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/67.jpg)
67Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Graphs: Similarity graph: fully connected, model local neighborhood relations
Gaussian kernel similarity function:
K-nearest neighbour graph
𝜀-neighbourhood graph
![Page 68: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/68.jpg)
68Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Model similarity between data points as graph
• (weighted) Adjacency Matrix W:
• Degree Matrix D:
• Graph Laplacian L = D – W:
![Page 69: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/69.jpg)
69Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Properties of the Graph Laplacian L:
For every vector
L is symmetric and positive semi-definite
The smallest eigenvalue of L is 0
The corresonding eigenvector is the constant one vector
L has n non-negative, real-valued eigenvalues
![Page 70: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/70.jpg)
70Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• The multiplicity k of the eigenvalue 0 of L equals the number of connected
components in the graph Consider k = 1. Assume f is eigenvector with eigenvalue 0:
The sum only vanishes if all terms vanish
If two vertices are connected (their edge weight > 0)
f needs to be constant for all vertices which can be connected by a path
All vertices of a connected component in an undirected graph can be connected by a
path:
f needs to be constant on the whole connected component
![Page 71: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/71.jpg)
71Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Laplacian of graph with 1 connected component has one constant vector
with eigenvalue 0
• For k > 1: Wlog. assume that vertices are ordered according to connected
components
• Each is a graph Laplacian of a fully connected graph: Each has one eigenvalue 0 with constant one vector on the i-th connected comp.
• Spectrum of L is given by union of the spectra of
![Page 72: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/72.jpg)
72Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Graph:
• Graph Laplacian
• Eigenvectors for eigenvalues
![Page 73: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/73.jpg)
73Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Graph:
• Project vertices into subspace spanned by k eigenvectors
• Projected vertices:
• K-means clustering recovers the connected components Embedding is the same regardless of data ordering
![Page 74: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/74.jpg)
74Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Similarity Graph:
• W =
![Page 75: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/75.jpg)
75Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Similarity Graph:
• L =
• Eigenvalues : 0, 0.4, 2, 2
• Eigenvectors :
![Page 76: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/76.jpg)
76Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Similarity Graph:
• For fully connected graph we want to find the Min-Cut: Partition graph into 2 sets of vertices such that the weight of edges connecting them
is minimal:
Vertices in each set should be similar to vertices in the same set, but dissimilar to
vertices from the other set
Partitions often not balanced: isolated vertices
![Page 77: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/77.jpg)
77Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Similarity Graph:
• For fully connected graph we want to find the Normalized Cut: Partition graph into 2 sets of vertices such that the weight of edges connecting them
is minimal
Partitions should have similar size
![Page 78: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/78.jpg)
78Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Min-Cut: minimize
• Normalized Cut: minimize
minimal if
![Page 79: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/79.jpg)
79Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Reformulate with Graph Laplacian
• Construct f:
![Page 80: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/80.jpg)
80Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Reformulate Ncut:
• Minimize subject to
Partition (cluster) assignment by thresholding f at 0
NP hard to compute since f is discrete
Relax problem by allowing f to take arbitrary real values
Solution: second eigenvector of (normalized Graph Laplacian)
• For k > 2 we can similarily construct indicator vectors like f and relax the
problem for minimization: Project the vertices into the subspace spanned by the first k eigenvectors of L‘
Clustering the embedded vertices yields the solution
• Spectral clustering (with normalized Graph Laplacian) approximates Ncut
![Page 81: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/81.jpg)
81Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
Mean Shift Spectral Clustering K-Means
![Page 82: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/82.jpg)
82Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Spectral Clustering
• Summary: Useful for non-convex clustering problems
Computation intensive because of eigenvalue computation (for large matrices)
Choice of k necessary:
A heuristic can be used that tries to find jumps in the eigenvalues (eigengap)
Similarity has to be defined for graph construction:
Size of Gaussian kernel?
Size of neighbourhood?
![Page 83: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/83.jpg)
83Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Hierarchical Clustering
• Bottom up: Each data point is it‘s own cluster
Greedily merge clusters according to some criteria
![Page 84: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/84.jpg)
84Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Hierarchical Clustering
• Requirements: Metric: distance between data points
Linkage: distance between data point sets:
Maximum linkage:
Average linkage:
Ward linkage:
![Page 85: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/85.jpg)
85Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Hierarchical Clustering
• Algorithm: Start out with a cluster for each data point
Merge two clusters that result in the least increase in linkage criteria
Repeat until k clusters remain
• Maximum linkage: Minimizes maximimal distance of data points in each cluster
• Average linkage: Minimizes average distance of data points in each cluster
• Ward linkage: Minimizes inter-cluster variance
![Page 86: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/86.jpg)
86Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Hierarchical Clustering
• We can add connectivity constraints that enforce which clusters can be
merged
![Page 87: Lec 6 Data Analysis](https://reader031.vdocuments.net/reader031/viewer/2022020320/5695cfe71a28ab9b02900c49/html5/thumbnails/87.jpg)
87Visual Computing Institute | Prof. Dr. Leif Kobbelt
Computer Graphics and Multimedia
Data Analysis and Visualization
Hierarchical Clustering
• Summary: Flexibel: any pairwise distance can be used
Choice of k, distance and linkage necessary
Instead of specifying k we can use a heuristic which stops cluster merging if the
linkage increases too much
Given connectivity constraints hierarchical clustering scales well for large number of
data points
How do we choose connectivity constraints?
K-nearest neighbour graph
𝜀-neighbourhood graph