3.4 density and grid methods
TRANSCRIPT
![Page 1: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/1.jpg)
ClusteringDensity and Grid
Based
1
![Page 2: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/2.jpg)
2
Density based methods Clusters – dense regions of objects Low density regions – Noise DBSCAN
Density Based Spatial Clustering of Applications with Noise
OPTICS Ordering Points To Identify the Clustering Structure
DENCLUE DENsity Based CLUstEring
![Page 3: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/3.jpg)
3
DBSCAN Cluster – maximal set of density connected points
Grows regions with sufficiently high density into clusters -neighborhood MinPts and Core object Directly Density Reachable
An object p is directly density reachable from object q if p is within the -neighborhood of q and q is a core object
pq
MinPts = 5
![Page 4: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/4.jpg)
4
DBSCAN Density Reachable
An object p is density reachable from q, if there is a chain of objects p1, …pn, p1=q and pn=p such that pi+1 is directly density reachable from pi
p
qp1
![Page 5: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/5.jpg)
5
DBSCAN Density Connected
An object p is density connected to object q if there is an object o such that both p and q are density reachable from o.
p q
o
![Page 6: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/6.jpg)
6
DBSCAN Arbitrarily select a point p
Retrieve all points density-reachable from p
If p is a core point, a cluster is formed.
If p is a border point, no points are density-reachable
from p, then DBSCAN visits the next point of the
database.
Continue the process until all of the points have been
processed.
Complexity : O(n log n) / O(n2)
![Page 7: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/7.jpg)
7
OPTICS: A Cluster-Ordering Method OPTICS: Ordering Points To Identify the Clustering
Structure Produces a special order of the database with respect
to its density-based clustering structure Good for both automatic and interactive cluster
analysis, including finding intrinsic clustering structure Can be represented graphically or using visualization
techniques
![Page 8: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/8.jpg)
OPTICS In DBSCAN, for a constant MinPts value, density based
clusters with respect to a higher density (lower value of ) are
completely contained in lower density sets.
DBSCAN is extended so that Objects are processed in a
specific order. Selects an object that is density-reachable with respect to lowest
value
Core distance of an object p : smallest ’ value that makes {p} a core
object
Reachability distance of an object q with respect to p = max (core-
distance of p, d(p,q))
8
![Page 9: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/9.jpg)
OPTICS
Complexity : O(n log n)
9
![Page 10: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/10.jpg)
10
Reachability-distance
Cluster-order
of the objects
undefined
‘
OPTICS
![Page 11: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/11.jpg)
11
DENCLUE: using density functions
DENsity-based CLUstEring Major features
Solid mathematical foundation Good for data sets with large amounts of noise Allows a compact mathematical description of arbitrarily
shaped clusters in high-dimensional data sets Significantly faster than existing algorithm (faster than
DBSCAN by a factor of up to 45) But needs a large number of parameters
![Page 12: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/12.jpg)
12
Influence function: describes the impact of a data point within its neighborhood. x, y – objects in Fd – d-dimensional input space Influence of object y on x is: Can be determined by distance:
Overall density of the data space can be calculated as the sum of the influence function of all data points.
Clusters can be determined mathematically by identifying density attractors. Density attractors are local maximal of the overall density function.
DENCLUE
),()( yxfxf ByB
otherwise 1or ),( 0),( yxdifyxf square
f x y eGaussian
d x y
( , )( , )
2
22
![Page 13: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/13.jpg)
13
Density attractor – Local maxima of overall density function
A point x is said to be density attracted to a density attractor x* if there exists a set of points x0, x1,..xk such that x0 = x and xk =x* and the gradient of xi-1 is in the direction of xi
DENCLUE
![Page 14: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/14.jpg)
DENCLUE Center defined clusters
For a density attractor x* - a subset of points that are density attracted by x* and where density function x* is no less than threshold
Others are outliers Arbitrary shape cluster
Set of density attractors and set of Cs There should be a path from each density attractor to
another where density function value for each point is no less that
14
![Page 15: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/15.jpg)
DENCLUE
15
![Page 16: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/16.jpg)
16
Grid Based Methods
Uses a Multi-resolution grid data structure Quantizes space into a finite number of cells
that form a grid structure Fast processing time STING WaveCluster CLIQUE – CLustering In QUEst
![Page 17: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/17.jpg)
17
STING
STatistical Information Grid Spatial area is divided into rectangular cells Several levels of cells – at different levels of
resolution High level cell is partitioned into several lower
level cells Statistical attributes are stored in cell
Mean, Maximum, Minimum
![Page 18: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/18.jpg)
18
STING
![Page 19: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/19.jpg)
19
STING Parameters of higher level cells are computed
from those at lower levels To answer queries
Identify level Estimate cell’s relevance to query Process relevant cells at lower levels Continue to lowest level
![Page 20: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/20.jpg)
20
STING
Computation is query independent Parallel processing – supported Data is processed in a single pass Quality depends on granularity
![Page 21: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/21.jpg)
21
WaveCluster A multi-resolution clustering approach which applies
wavelet transform to the feature space A wavelet transform is a signal processing technique
that decomposes a signal into different frequency sub-band.
Both grid-based and density-based Input parameters:
# of grid cells for each dimension the wavelet, and the # of applications of wavelet
transform.
![Page 22: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/22.jpg)
22
WaveCluster
Using wavelet transform to find clusters Summarises the data by imposing a multidimensional
grid structure onto data space These multidimensional spatial data objects are
represented in a n-dimensional feature space Apply wavelet transform on feature space to find the
dense regions in the feature space Apply wavelet transform multiple times which result in
clusters at different scales from fine to coarse
![Page 23: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/23.jpg)
23
Quantization
![Page 24: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/24.jpg)
24
Transformation
![Page 25: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/25.jpg)
25
WaveCluster Reasons for using Wavelet transformation in clustering
Unsupervised clustering
It uses filters to emphasize region where points cluster, but simultaneously to suppress weaker information in their boundary
Effective removal of outliers Multi-resolution Cost efficiency
Major features: Complexity O(N) Detect arbitrary shaped clusters at different scales Not sensitive to noise, not sensitive to input order Only applicable to low dimensional data
![Page 26: 3.4 density and grid methods](https://reader038.vdocuments.net/reader038/viewer/2022110314/55cb304fbb61ebb4248b47aa/html5/thumbnails/26.jpg)
26
CLIQUE (Clustering In QUEst) Automatically identifying subspaces of a high dimensional data space
that allow better clustering than original space
CLIQUE can be considered as both density-based and grid-based
It partitions each dimension into the same number of equal length interval
It partitions an m-dimensional data space into non-overlapping rectangular units
A unit is dense if the fraction of total data points contained in the unit exceeds the input model parameter
A cluster is a maximal set of connected dense units within a subspace