clustering - department of computer science &...
TRANSCRIPT
ClusteringCSL465/603 - Fall 2016Narayanan C Krishnan
Supervised vs Unsupervised Learning• Supervised learning – Given 𝑥", 𝑦" "%&
' , learn a function 𝑓: 𝑋 → 𝑌• Categorical output – classification• Continuous output – regression
• Unsupervised learning - Given 𝑥" "%&' , can we
infer the structure of the data?• Learning without a teacher
Clustering CSL465/603 - Machine Learning 2
Why Unsupervised Learning?• Unlabeled data is cheap• Labeled data is expensive – cumbersome to collect
• Exploratory data analysis• Preprocessing step for supervised learning
algorithms• Analysis of data in high dimensional spaces
Clustering CSL465/603 - Machine Learning 3
Cluster Analysis• Discover groups such that samples within a group
are more similar to each other than samples across groups
Clustering CSL465/603 - Machine Learning 4
Applications of Clustering (1)• Unsupervised image segmentation
Clustering CSL465/603 - Machine Learning 5
Applications of Clustering (2)• Image Compression
Clustering CSL465/603 - Machine Learning 6
Applications of Clustering (3)• Social network clustering
Clustering CSL465/603 - Machine Learning 7
Applications of Clustering (4)• Recommendation Systems
Clustering CSL465/603 - Machine Learning 8
Components of Clustering• A dissimilarity (similarity) function• Measures the distance/dissimilarity between examples
• A loss function• Evaluates the clusters
• An algorithm that optimizes this loss function
Clustering CSL465/603 - Machine Learning 9
Proximity Matrices• Data is directly represented in terms of proximity
between pairs of objects• Subjectively judged dissimilarities are seldom
distance in the strict sense (not necessarily follow the properties of a distance measure)• Replace the proximity matrix 𝐷 by
𝐷 + 𝐷0 /2
Clustering CSL465/603 - Machine Learning 10
Dissimilarity Based on Attributes (1)• Data point x" has 𝐷 features• Attributes are real-valued• Euclidean distance between the data points
𝐷 x", x4 = 6 𝑥"7 − 𝑥47 ^2:
7%&
�
• Resulting clusters are invariant to rotation and translation, but not to scaling
• If features have different scales - standardize the data
Clustering CSL465/603 - Machine Learning 11
Dissimilarity Based on Attributes (2)• Data point x" has 𝐷 features• Attributes are real-valued• Any ℒ= norm
𝐷 x", x4 = 6 𝑥"7 − 𝑥47>
:
7%&
?
• Cosine distance between the data points𝐷 x", x4 =
∑ 𝑥"7𝑥47:7%&
∑ 𝑥"7A:7%&
� ∑ 𝑥47A:7%&
�
Clustering CSL465/603 - Machine Learning 12
Dissimilarity Based on Attributes (3)• Data point x" has 𝐷 features• Attributes are ordinal• Grades – A, B, C, D• Answers to survey question - strongly agree, agree,
neutral, disagree• Replace the ordinal values by quantitative
representations𝑚 − 1/2
𝑀 ,𝑚 = 1,… ,𝑀
Clustering CSL465/603 - Machine Learning 13
Dissimilarity Based on Attributes (4)• Data point x" has 𝐷 features• Attributes are categorical• Values of an attribute are unordered• Define explicit difference between the values
𝑑&& ⋯ 𝑑&H⋮ ⋱ ⋮
𝑑H& ⋯ 𝑑HH• Often
• For identical values - 𝑑K,KL = 0, if𝑚 = 𝑚P
• For different values- 𝑑K,KL= 1, if𝑚 ≠ 𝑚P
Clustering CSL465/603 - Machine Learning 14
Loss Function for Clustering (1)• Assign each observation to a cluster without regard
to the probability model describing the data• Let 𝐾 - be the number of clusters and 𝑘 - indexes into the
number of clusters• Each observation is assigned to one and only one cluster• View the assignment as a function 𝐶 𝑖 = 𝑘
• Loss function
𝑊 𝐶 =126 6 6 𝑑(x", x"L)
�
Y "L %Z
�
Y " %Z
[
Z%&• Characterized the extent to which observations assigned
to the same cluster tend to be close to one another• Within cluster distance/scatter
Clustering CSL465/603 - Machine Learning 15
Loss Function for Clustering (2)• Consider the function
𝑇 =1266 𝑑""L
'
"L%&
'
"%&• Total point scatter
• This can be divided as
𝑇 =126 6 6 𝑑""L
�
Y "L %Z
+ 6 𝑑""L�
Y "L ]Z
�
Y " %Z
[
Z%&
𝑇 = 𝑊 𝐶 + 𝐵(𝐶)
Clustering CSL465/603 - Machine Learning 16
Loss Function for Clustering (3)• The function 𝐵 𝐶
𝐵 𝐶 =126 6 6 𝑑""L
�
Y "L ]Z
�
Y " %Z
[
Z%&• Between cluster distance/scatter
• Thus minimizing 𝑊 𝐶 is equivalent to maximizing 𝐵 𝐶
Clustering CSL465/603 - Machine Learning 17
Combinatorial Clustering• Minimize 𝑊 over all possible assignments of 𝑁 data
points to 𝐾 clusters• Unfortunately feasible only for very small data sets• The number of distinct assignments is
𝑆 𝑁,𝐾 =1𝐾!6 −1 [bZ 𝐾
𝑘 𝑘'[
Z%&• 𝑆(10, 4) = 34,105• 𝑆 19, 4 = 10&g
• Not a practical clustering algorithm
Clustering CSL465/603 - Machine Learning 18
K- Means Clustering (1)• Most popular iterative descent clustering method• Suppose all variables/features are real-valued and
we use squared Euclidean distance as the dissimilarity measure
𝑑 xh, x"L = x" − x"L A
• The within cluster scatter can be written as
𝑊 𝐶 =126 6 6 x" − x"L A
�
Y "L %Z
�
Y " %Z
[
Z%&
= 6𝑁Z 6 x" − xiZ A�
Y " %Z
[
Z%&Clustering CSL465/603 - Machine Learning 19
K-Means Clustering (2)• Find
𝐶∗ = minY6𝑁Z 6 x" − xiZ A
�
Y " %Z
[
Z%&
• Note that for a set 𝑆�̅�n = argminr6 x" − m A
�
"∈n• So find
𝐶∗ = minY, rt tuv
w 6𝑁Z 6 x" − mZ
A�
Y " %Z
[
Z%&
Clustering CSL465/603 - Machine Learning 20
K-Means Clustering (3)• Find the optimal solution using Expectation
Maximization• Iterative procedure consisting of two steps
• Expectation step (E Step) – Fix the mean vectors mZ Z%&
[ and find the optimal 𝐶∗
• Maximization step (M step) – Fix the cluster assignments 𝐶 and find the optimal mean vectors mZ Z%&
[
• Each step of this procedure reduces the loss function value
Clustering CSL465/603 - Machine Learning 21
K-Means Clustering Illustration (1)
Clustering CSL465/603 - Machine Learning 22
K-Means Clustering Illustration (2)
Clustering CSL465/603 - Machine Learning 23
K-Means Clustering Illustration (3)
Clustering CSL465/603 - Machine Learning 24
K-Means Clustering Illustration (4)
Clustering CSL465/603 - Machine Learning 25
K-Means Clustering Illustration (5)
Clustering CSL465/603 - Machine Learning 26
K-Means Clustering Illustration (6)
Clustering CSL465/603 - Machine Learning 27
K-Means Clustering Illustration (7)
Clustering CSL465/603 - Machine Learning 28
K-Means Clustering Illustration (8)
Clustering CSL465/603 - Machine Learning 29
K-Means Clustering Illustration (9)
Clustering CSL465/603 - Machine Learning 30
K-Means Clustering Illustration (10)
Clustering CSL465/603 - Machine Learning 31
• Blue point - Expectation step• Red point – Maximization step
How to Choose K?• Similar to choosing 𝐾 in kNN• The loss function generally decreases with 𝐾
Clustering CSL465/603 - Machine Learning 32
Limitations of K-Means Clustering• Hard assignments are susceptible to noise/outliers• Assumes spherical (convex) clusters with uniform
prior on the clusters• Clusters can change arbitrarily for different 𝐾 and
initializations
Clustering CSL465/603 - Machine Learning 33
K-Medoids• K-Means is suitable only when using Euclidean
distance• Susceptible to outliers• Challenge when the centroid of a cluster is not a valid
data point• Generalizing K-Means to arbitrary distance
measures• Replace the mean calcluation by median calculation
• Ensures the centroid to be a medoid – always a valid data point• Increases computation as we have to now find the
medoid
Clustering CSL465/603 - Machine Learning 34
Soft K-Means as Gaussian Mixture Models (1)• Probabilistic Clusters• Each cluster is associated with a Gaussian
Distribution - 𝒩(𝜇Z, ΣZ)• Each cluster also has a prior probability - 𝜋Z• Then the likelihood of a data point drawn from the 𝐾
clusters will be
𝑃 𝑥 = 6𝜋Z𝑃 𝑥 𝜇Z, ΣZ
[
Z%&• Where ∑ 𝜋Z = 1[
Z%&
Clustering CSL465/603 - Machine Learning 35
Soft K-Means as Gaussian Mixture Models (2)• Given 𝑁 iid data points, the likelihood function 𝑃 𝑥&, … , 𝑥' is 𝑃 𝑥&, … , 𝑥' =
Clustering CSL465/603 - Machine Learning 36
Soft K-Means as Gaussian Mixture Models (3)• Given 𝑁 iid data points, the likelihood function 𝑃 𝑥&, … , 𝑥' is 𝑃 𝑥&, … , 𝑥' = ∏ 𝑃(𝑥")'
"%& = ∏ ∑ 𝜋Z𝑃 𝑥" 𝜇Z, ΣZ[Z%&
'"%&
• Let us take the negative log likelihood
Clustering CSL465/603 - Machine Learning 37
Soft K-Means as Gaussian Mixture Models (4)• Given 𝑁 iid data points, the likelihood function 𝑃 𝑥&, … , 𝑥' is 𝑃 𝑥&, … , 𝑥' = ∏ 𝑃(𝑥")'
"%& = ∏ ∑ 𝜋Z𝑃 𝑥" 𝜇Z, ΣZ[Z%&
'"%&
• Let us take the log likelihood
6log 6𝜋Z𝑃 𝑥" 𝜇Z, ΣZ
[
Z%&
'
"%&
Clustering CSL465/603 - Machine Learning 38
Soft K-Means as Gaussian Mixture Models (5)• Problem with maximum likelihood• Sum over the components appears inside the log, thus
coupling all parameters• Can lead to singularities
Clustering CSL465/603 - Machine Learning 39
6log 6𝜋Z𝑃 𝑥" 𝜇Z, ΣZ
[
Z%&
'
"%&
Soft K-Means as Gaussian Mixture Models (6)• Latent Variables• Each data point 𝑥" is associated with a latent
variable - 𝑧" = 𝑧"&, … , 𝑧"[• Where 𝑧"Z ∈ 0, 1 , ∑ 𝑧"Z = 1[
Z%& and 𝑃 𝑧"Z = 1 = 𝜋Z• Given the complete data 𝑋, 𝑍, we look at maximizing
𝑃 𝑋, 𝑍 𝜋Z, 𝜇Z, ΣZ
Clustering CSL465/603 - Machine Learning 40
Soft K-Means as Gaussian Mixture Models (7)• Latent Variables• Each data point 𝑥" is associated with a latent
variable - 𝑧" = 𝑧"&, … , 𝑧"[• Where 𝑧"Z ∈ 0, 1 , ∑ 𝑧"Z = 1[
Z%& and 𝑃 𝑧"Z = 1 = 𝜋Z• Let the probability 𝑃 𝑧"Z = 1|𝑥" be denoted as 𝛾 𝑧"Z• From Bayes theorem
𝛾 𝑧"Z = 𝑃 𝑧"Z = 1|𝑥" =𝑃 𝑧Z = 1 𝑃 𝑥"|𝑧"Z = 1
𝑃 𝑥"• The marginal distribution 𝑃 𝑥" = ∑ 𝑃 𝑥", 𝑧"�
��t =∑ 𝑃 𝑧"Z = 1 𝑃 𝑥"|𝑧"Z = 1[Z%&
Clustering CSL465/603 - Machine Learning 41
Soft K-Means as Gaussian Mixture Models (8)• Now,
𝑃 𝑧Z = 1 = 𝜋Z𝑃 𝑥"|𝑧"Z = 1 = 𝑁 𝑥"|𝜇Z, ΣZ
• Therefore𝛾 𝑧"Z = 𝑃 𝑧"Z = 1|𝑥" =
Clustering CSL465/603 - Machine Learning 42
Estimating the mean 𝜇Z (1)• Begin with the log-likelihood function
6log 6𝜋Z𝑃 𝑥" 𝜇Z, ΣZ
[
Z%&
'
"%&• Taking the derivative wrt to 𝜇Z and equating it to 0
Clustering CSL465/603 - Machine Learning 43
Estimating the mean 𝜇Z (2)
𝜇Z =1𝑁Z
6𝛾 𝑧"Z 𝑥"
'
"%&• Where 𝑁Z = ∑ 𝛾 𝑧"Z'
"%&• Effective number of points assigned to cluster 𝑘
• So the mean of 𝑘��Gaussian component is the weighted mean of all the points in the dataset• Where the weight of the 𝑖��data point is the posterior
probability that component k was responsible for generating 𝑥"
Clustering CSL465/603 - Machine Learning 44
Estimating the Covariance ΣZ• Begin with the log-likelihood function
6log 6𝜋Z𝑃 𝑥" 𝜇Z, ΣZ
[
Z%&
'
"%&• Taking the derivative wrt to ΣZ and equating it to 0
ΣZ =1𝑁Z
6𝛾 𝑧"Z 𝑥" − 𝜇Z 0 𝑥" − 𝜇Z
'
"%&• Similar to the result for a single Gaussian for the
dataset, but each data point is weighted by the corresponding posterior probability.
Clustering CSL465/603 - Machine Learning 45
Estimating the mixing coefficients 𝜋Z• Begin with the log-likelihood function
6log 6𝜋Z𝑃 𝑥" 𝜇Z, ΣZ
[
Z%&
'
"%&• Maximize the log-likelihood, w.r.t 𝜋Z
• Subject to the condition that ∑ 𝜋Z[Z%& = 1
• Use Lagrange multiplier 𝜆 and maximize
6log 6𝜋Z𝑃 𝑥" 𝜇Z, ΣZ
[
Z%&
'
"%&
+ 𝜆 6 𝜋Z[
Z%&− 1
• Solving this will result in 𝜋Z =
𝑁Z𝑁
Clustering CSL465/603 - Machine Learning 46
Soft K-Means as Gaussian Mixture Models (8)• In Summary• 𝜋Z =
't'
• 𝜇Z =&'t∑ 𝛾 𝑧"Z 𝑥"'"%&
• ΣZ =&'t∑ 𝛾 𝑧"Z 𝑥" − 𝜇Z 0 𝑥" − 𝜇Z'"%&
• But then what if 𝑧"Z is unkown?• Use EM algorithm!
Clustering CSL465/603 - Machine Learning 47
EM for GMM• First choose initial values for 𝜋Z, 𝜇Z, ΣZ• Alternate between Expectation and Maximization
Steps• Expectation Step (E) – Given the parameters of the
compute the posterior probabilities 𝛾(𝑧"Z)• Maximization step (M) – Given the posterior
probabilities, update 𝜋Z, 𝜇Z, ΣZ
Clustering CSL465/603 - Machine Learning 48
EM for GMM Illustration (1)
Clustering CSL465/603 - Machine Learning 49
EM for GMM Illustration (2)
Clustering CSL465/603 - Machine Learning 50
EM for GMM Illustration (3)
Clustering CSL465/603 - Machine Learning 51
EM for GMM Illustration (4)
Clustering CSL465/603 - Machine Learning 52
EM for GMM Illustration (5)
Clustering CSL465/603 - Machine Learning 53
EM for GMM Illustration (6)
Clustering CSL465/603 - Machine Learning 54
Practical Issues with EM for GMM• Takes many more iterations than k-Means• Each iteration requires more computation• Run k-Means first, and then EM for GMM• Covariance can be initialized to the covariance of the
clusters obtained from k-Means• EM is not guaranteed to find the global maximum of
the log likelihood function• Check for convergence• Log likelihood does not change significantly between two
iterations
Clustering CSL465/603 - Machine Learning 55
Hierarchical Clustering (1)• Organize clusters in a hierarchical fashion• Produces a rooted binary tree (dendrogram)
Clustering CSL465/603 - Machine Learning 56
Hierarchical Clustering (2)• Bottom-up (agglomerative): recursively merge two
groups with the smallest between cluster similarity• Top-down (divisive): recursively split the least
coherent cluster• Users can choose a cut through the hierarchy to
represent the most natural division of clusters
Clustering CSL465/603 - Machine Learning 57
Hierarchical Clustering (3)• Bottom-up (agglomerative): recursively merge two
groups with the smallest between cluster similarity• Top-down (divisive): recursively split the least coherent
cluster• Share a monotonicity property
• Dissimilarity between merged clusters is monotone increase with the level of the merger
• Cophenetic correlation coefficient• Correlation between the 𝑁(𝑁 − 1)/2pairwise observation
dissimilarities and the cophenetic dissmilarities derived from the dendrogram
• Cophenetic dissimilarity - inter group dissimilarity at which the observations are first joined together in the same cluster
Clustering CSL465/603 - Machine Learning 58
Agglomerative Clustering (1)• Single Linkage – distance between two most similar
points in 𝐺 and 𝐻𝐷n� 𝐺, 𝐻 = min
"∈�,4∈�𝐷(𝑖, 𝑗)
• Also referred to as nearest neighbor linkage• Results in extended clusters through chaining• May violate the compactness property (large diameter)
Clustering CSL465/603 - Machine Learning 59
Agglomerative Clustering (2)• Complete Linkage – distance between two most
dissimilar points in 𝐺 and 𝐻𝐷Y� 𝐺, 𝐻 = max
"∈�,4∈�𝐷(𝑖, 𝑗)
• Furthest neighbor technique• Forces spherical clusters with consistent diameter• May violate the closeness property
Clustering CSL465/603 - Machine Learning 60
Agglomerative Clustering (3)• Average Linkage (Group Average) – average
dissimilarity between the groups𝐷�� 𝐺,𝐻 =
1𝑁�𝑁�
66 𝑑 𝑖, 𝑗�
4∈�
�
"∈�• Less affected by outliers
Clustering CSL465/603 - Machine Learning 61
Agglomerative Clustering (4)524 14. Unsupervised Learning
Average Linkage Complete Linkage Single Linkage
FIGURE 14.13. Dendrograms from agglomerative hierarchical clustering of hu-man tumor microarray data.
observations within them are relatively close together (small dissimilarities)as compared with observations in different clusters. To the extent this isnot the case, results will differ.
Single linkage (14.41) only requires that a single dissimilarity dii′ , i ∈ Gand i′ ∈ H, be small for two groups G and H to be considered closetogether, irrespective of the other observation dissimilarities between thegroups. It will therefore have a tendency to combine, at relatively lowthresholds, observations linked by a series of close intermediate observa-tions. This phenomenon, referred to as chaining, is often considered a de-fect of the method. The clusters produced by single linkage can violate the“compactness” property that all observations within each cluster tend tobe similar to one another, based on the supplied observation dissimilari-ties {dii′}. If we define the diameter DG of a group of observations as thelargest dissimilarity among its members
DG = maxi∈Gi′∈G
dii′ , (14.44)
then single linkage can produce clusters with very large diameters.Complete linkage (14.42) represents the opposite extreme. Two groups
G and H are considered close only if all of the observations in their unionare relatively similar. It will tend to produce compact clusters with smalldiameters (14.44). However, it can produce clusters that violate the “close-ness” property. That is, observations assigned to a cluster can be much
Clustering CSL465/603 - Machine Learning 62
Density-Based Clustering (1) (Extra Topic)• DBSCAN – Density Based Spatial Clustering of
Applications with Noise• Proposed by Ester, Kriegel, Sander and Xu (KDD 96)• KDD – 2014 Test of Time Award Winner
• Basic Idea – Clusters are dense regions in the data space, separated by regions of lower object density• Discovers clusters of arbitrary shape in spatial
databases with noise
Clustering CSL465/603 - Machine Learning 63
Density-Based Clustering (2)• Why Density-Based Clustering?
Clustering CSL465/603 - Machine Learning 64
Results of a k-medoid algorithm for k=4
Density-Based Clustering (3)• Principle• For any point in a cluster, the local point density around
that point has to exceed some threshold• The set of point from one cluster is spatially connected
• DBSCAN defines two parameters• 𝜖 - radius for the neighborhood of point 𝑝:
𝑁� 𝑝 = 𝑞 ∈ 𝑋|𝑑 𝑝, 𝑞 ≤ 𝜖• 𝑀𝑖𝑛𝑃𝑡𝑠– minimum number of points in the given
neighborhood 𝑁� 𝑝
Clustering CSL465/603 - Machine Learning 65
𝜖 - Neighborhood• 𝜖 - Neighborhood – objects within a radius of 𝜖 from
an object𝑁� 𝑝 = 𝑞 ∈ 𝑋|𝑑 𝑝, 𝑞 ≤ 𝜖
• High Density 𝜖 - Neighborhood of an object contains at least 𝑀𝑖𝑛𝑃𝑡𝑠of objects
Clustering CSL465/603 - Machine Learning 66
q pεε
Core, Border and Outlier Points• 𝜖 = 1• 𝑀𝑖𝑛𝑃𝑡𝑠 = 5
• Given 𝜖 and 𝑀𝑖𝑛𝑃𝑡𝑠categorize objects into three exclusive groups• Core point – if it has more
than a specified number of points 𝑀𝑖𝑛𝑃𝑡𝑠 within 𝜖-neighborhood (interior points of a cluster)
• Border point – has fewer than 𝑀𝑖𝑛𝑃𝑡𝑠 within 𝜖-neighborhood, but is in the neighborhood of a core point
• Noise/Outlier – any point that is neither a core nor a border point
Clustering CSL465/603 - Machine Learning 67
Core
Border Outlier
Density – Reachability (1)• Directly density-reachable• An object 𝑞 is directly density-reachable from object 𝑝 if 𝑝
is a core object and 𝑞 is in 𝑝’s 𝜖-neighborhood.• Density reachability is asymmetric
Clustering CSL465/603 - Machine Learning 68
q pεε
𝑀𝑖𝑛𝑃𝑡𝑠 =4
Density – Reachability (2)• Density-Reachable (directly and indirectly): • A point 𝑝 is directly density-reachable from 𝑝2;• 𝑝2 is directly density-reachable from 𝑝1;• 𝑝1is directly density-reachable from 𝑞;• 𝑝 ← 𝑝2 ← 𝑝1 ← 𝑞 form a chain.• 𝑝 is indirectly density reachable from 𝑞
Clustering CSL465/603 - Machine Learning 69
p
q
p2p1
Density - Connectivity• Density-reachable is not symmetric• Not good enough to describe clusters
• Density-Connected• A pair of points 𝑝 and 𝑞 are density-connected, if they are
commonly density-reachable from a point 𝑜.• This is symmetric
Clustering CSL465/603 - Machine Learning 70
p q
o
Cluster in DBSCAN• Given a dataset 𝑋, parameter 𝜖 and threshold 𝑀𝑖𝑛𝑃𝑡𝑠• A cluster 𝐶 is a subset of objects satisfying the
criteria• Connected - ∀𝑝, 𝑞 ∈ 𝐶, 𝑝 and 𝑞 are density-connected• Maximal - ∀𝑝, 𝑞 ∈ 𝑋, if 𝑝 ∈ 𝐶 and 𝑞 is density-reachable
from 𝑝, then 𝑞 ∈ 𝐶
Clustering CSL465/603 - Machine Learning 71
DBSCAN - Algorithm• Input – Dataset 𝑋, Parameters - 𝜖,𝑀𝑖𝑛𝑃𝑡𝑠• For each object 𝑝 ∈ 𝑋• If 𝑝 is a core object and not processed then
• 𝐶 = retrieve all objects density reachable from 𝑝• Mark all objects in 𝐶 as processed• Report 𝐶 as a cluster
• Else mark 𝑝 as outlier
• If 𝑝 is a border point, no points are density-reachable from 𝑝 and the DBSCAN algorithm visits the next point in 𝑋
Clustering CSL465/603 - Machine Learning 72
DBSCAN Algorithm – Illustration (1)• 𝜖 = 2, 𝑀𝑖𝑛𝑃𝑡𝑠 = 3
Clustering CSL465/603 - Machine Learning 73
For each object 𝑝 ∈ 𝑋If 𝑝 is a core object and not processed then
𝐶 = retrieve all objects density reachable from 𝑝Mark all objects in 𝐶 as processedReport 𝐶 as a cluster
Else mark 𝑝 as outlier
DBSCAN Algorithm – Illustration (2)• 𝜖 = 2, 𝑀𝑖𝑛𝑃𝑡𝑠 = 3
Clustering CSL465/603 - Machine Learning 74
For each object 𝑝 ∈ 𝑋If 𝑝 is a core object and not processed then
𝐶 = retrieve all objects density reachable from 𝑝Mark all objects in 𝐶 as processedReport 𝐶 as a cluster
Else mark 𝑝 as outlier
DBSCAN Algorithm – Illustration (3)• 𝜖 = 2, 𝑀𝑖𝑛𝑃𝑡𝑠 = 3
Clustering CSL465/603 - Machine Learning 75
For each object 𝑝 ∈ 𝑋If 𝑝 is a core object and not processed then
𝐶 = retrieve all objects density reachable from 𝑝Mark all objects in 𝐶 as processedReport 𝐶 as a cluster
Else mark 𝑝 as outlier
DBSCAN – Example (1)• Where it works• Original Points
Clustering CSL465/603 - Machine Learning 76
DBSCAN – Example (2)• Where it does not work• Varying densities• Original points
Clustering CSL465/603 - Machine Learning 77
Summary• Unsupervised Learning• K-means clustering• Expectation Maximization for discovering the clusters
• K-medoids clustering• Gaussian Mixture Models• Expectation Maximization for estimating the parameters
of the Gaussian mixtures• Hierarchical Clustering• Agglomerative Clustering
• Density Based Clustering• DBSCAN
Clustering CSL465/603 - Machine Learning 78