dmtm lecture 14 density based clustering
TRANSCRIPT
![Page 1: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/1.jpg)
Prof. Pier Luca Lanzi
Density Based ClusteringData Mining and Text Mining (UIC 583 @ Politecnico di Milano)
![Page 2: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/2.jpg)
Prof. Pier Luca Lanzi
![Page 3: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/3.jpg)
Prof. Pier Luca Lanzi
![Page 4: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/4.jpg)
Prof. Pier Luca Lanzi
What is density-based clustering?
• Clustering based on density (local cluster criterion), such as density-connected points• Major features:§Discover clusters of arbitrary shape§Handle noise§One scan§Need density parameters as termination condition
• Several interesting studies:§DBSCAN: Ester, et al. (KDD’96)§OPTICS: Ankerst, et al (SIGMOD’99).§DENCLUE: Hinneburg & D. Keim (KDD’98)§CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)
4
![Page 5: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/5.jpg)
Prof. Pier Luca Lanzi
DBSCAN: Basic Concepts
• The neighborhood within a radius ε of a given object is called the ε-neighborhood of the object• If the ε-neighborhood of an object contains at least MinPts
objects, then the object is a core object • An object p is directly density-reachable from object q if p is
within the ε-neighborhood of q and q is a core object• An object p is density-reachable from object q if there is a chain
of object p1, …, pn where p1=p and pn=q such that pi+1 is directly density reachable from pi
• An object p is density-connected to q with respect to ε and MinPts if there is an object o such that both p and q are density reachable from o
5
![Page 6: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/6.jpg)
Prof. Pier Luca Lanzi
DBSCAN: Basic Concepts
• Density = number of points within a specified radius (Eps)
• A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point
• A noise point is any point that is not a core point or a border point
• A density-based cluster is a set of density-connected objects that is maximal with respect to density-reachability
6
![Page 7: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/7.jpg)
Prof. Pier Luca Lanzi
Density-Reachable &Density-Connected
• Directly density-reachable • Density-reachable
• Density-connected
p
qp1
p q
o
pq
MinPts = 5
Eps = 1 cm
7
![Page 8: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/8.jpg)
Prof. Pier Luca Lanzi
DBSCAN: Core, Border, andNoise Points
8
![Page 9: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/9.jpg)
Prof. Pier Luca Lanzi
DBSCAN Density Based Spatial Clustering
• Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points• Discovers clusters of arbitrary shape in spatial databases with
noise• The Algorithm§Arbitrary select a point p§Retrieve all points density-reachable
from p given Eps and MinPts.§ If p is a core point, a cluster is formed.§ If p is a border point, no points are density-reachable from p
and DBSCAN visits the next point of the database§Continue the process until all of the points have been
processed
9
![Page 10: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/10.jpg)
Prof. Pier Luca Lanzi
Core, Border and Noise Points
Eps = 10, MinPts = 4
10
Original Points Point types: core, border and noise
![Page 11: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/11.jpg)
Prof. Pier Luca Lanzi
When DBSCAN Works Well
• Resistant to Noise• Can handle clusters of different shapes and sizes
Original Points Clusters
11
![Page 12: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/12.jpg)
Prof. Pier Luca Lanzi
When DBSCAN May Fail?
• Varying densities• High-dimensional data
Original Points
(MinPts=4, Eps=9.75).
(MinPts=4, Eps=9.92)
12
![Page 13: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/13.jpg)
Prof. Pier Luca Lanzi
Run the python notebookon density-based clustering
![Page 14: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/14.jpg)
Prof. Pier Luca Lanzi
Examples using R
14
![Page 15: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/15.jpg)
Prof. Pier Luca Lanzi
Density-Based Clustering in R
library(fpc)
set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,sd=0.2))
par(bg="grey40")
ds <- dbscan(x, 0.2, showplot=1)
15
![Page 16: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/16.jpg)
Prof. Pier Luca Lanzi
Density-Based Clustering in R
library(fpc)
set.seed(665544)
x <- seq(0,6.28,0.1)
y <- sin(x)
xd <- x+rnorm(630,sd=0.2)
yd <- y+rnorm(630,sd=0.2)
plot(xd,yd)
par(bg="grey40")
d <- cbind(xd,yd)
# this works nicely since the epsilon is
# the same size of the standard deviation (0.2)
# used to generate the data
ds <- dbscan(d, 0.2, showplot=1)
# this does not work so nicely
ds <- dbscan(d, 0.1, showplot=1)
16
![Page 17: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/17.jpg)
Prof. Pier Luca Lanzi
Clustering Comparisons on Sin Data 17
hierarchical clustering kmeans clustering
![Page 18: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/18.jpg)
Prof. Pier Luca Lanzi
Clustering Comparisons on Sin Data(k-means with 10 clusters)
18
![Page 19: DMTM Lecture 14 Density based clustering](https://reader031.vdocuments.net/reader031/viewer/2022021923/5a647b1d7f8b9a27568b4c67/html5/thumbnails/19.jpg)
Prof. Pier Luca Lanzi
http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Density-Based_Clustering
Software Packages