spatial data mining. 2 introduction spatial data mining is the process of discovering interesting,...
TRANSCRIPT
Spatial Data Mining
2
Introduction• Spatial data mining is the process of discovering
interesting, useful, non-trivial patterns from large spatial datasets– E.g. co-location patterns of water pumps and cholera– Determining hotspots: unusual locations
• Spatial Data Mining Tasks– Classification/Prediction– Co-location Mining– Clustering
• Recap of special properties of Spatial Data– Spatial autocorrelation– Spatial heterogeneity– Implicit Spatial Relations
3
Spatial Relations
• Spatial databases do not store spatial relations explicitly– Additional functionality required to compute
them• Three types of spatial relations specified
by the OGC reference model– Distance relations
• Euclidean distance between two spatial features– Direction relations
• Ordering of spatial features in space– Topological relations
• Characterise the type of intersection between spatial features
4
Distance relations
• If dist is a distance function and c is some real number
1. dist(A,B)>c,2. dist(A,B)<c and3. dist(A,B)=c
AB
A B
BA
5
Direction relations• If directions of B and C
are required with respect to A
• Define a representative point, rep(A)
• rep(A) defines the origin of a virtual coordinate system
• The quadrants and half planes define the direction relations
• B can have two values {northeast, east}
• Exact direction relation is northeast
A
C
B
rep(A)
C north A
B northeast A
6
Topological Relations• Topological relations describe how geometries
intersect spatially• Simple geometry types
– Point, 0-dimension– Line, 1-dimension– Polygon, 2-dimension
• Each geometry represented in terms of – boundary (B) – geometry of the lower dimension– interior (I) – points of the geometry when boundary is
removed– exterior (E) – points not in the interior or boundary
• Examples for simple geometries– For a point, I = {point}, B={} and E={Points not in I and
B}– For a line, I={points except boundary points}, B={two
end points} and E={Points not in I and B}– For a polygon, I={points within the boundary}, B={the
boundary} and E={points not in I and B}
7
DE-9IM• Topological relations are defined using any
one of the following models– 4IM, four intersection model (only B and E
considered)– 9IM, nine intersection models (B, I, and E)– DE-9IM, dimensionally extended 9 intersection
model• DE-9IM is an OGC complaint model
• Dim is the dimension function
8
Example
• Consider two polygons– A - POLYGON ((10
10, 15 0, 25 0, 30 10, 25 20, 15 20, 10 10))
– B - POLYGON ((20 10, 30 0, 40 10, 30 20, 20 10))
9
I(B) B(B) E(B)
I(A)
B(A)
E(A)
9-Intersection Matrix of example geometries
10
DE-9IM for the example geometries
I(B) B(B) E(B)
I(A) 2 1 2
B(A) 1 0 1
E(A) 2 1 2
11
Relationships using DE-9IM• Different geometries may give
rise to different numbers in the DE-9IM
• For a specific type of relationship we are only interested in certain values in certain positions– That is, we are interested in
patterns in the matrix than actual values
• Actual values are replaced by wild cards– T: value is "true" - non empty
- any dimension >= 0– F: value is "false" - empty -
dimension < 0– *: Don't care what the value is– 0: value is exactly zero– 1: value is exactly one– 2: value is exactly two
A overlaps B
I(B) B(B) E(B)
I(A) T * T
B(A) * * *
E(A) T * *
12
Topological Relations• x.Disjoint(y)
– FF*FF**** • x.Touches(y)
– FT******* Area/Area, Line/Line, Line/Area, Point/Area – F**T***** Not Point/Point – F***T****
• x.Crosses(y) – T*T****** Point/Line, Point/Area, Line/Area – 0******** Line/Line
• x.Within(y)– TF*F*****
• x.Overlaps(y) – T*T***T** Point/Point, Area/Area– 1*T***T** Line/Line
• DE-9IM string for example geometries was ‘212101212’ (from earlier slide)– A crosses B– A overlaps B
13
Approaches to Spatial Data Mining
• Materialize spatial features and use Weka– Required features are added as
additional attributes to the main feature– To create a flat file of data
• Use special data mining techniques that take spatial dependency into account
14
Materializing features- Example
15
Materializing features- Example (2)
16
Spatial Data Mining Architecture
• Retrieve data belonging to multiple themes
• Preprocess spatial data to materialize spatial features– Select the required
features– Use the methods to
compute spatial relations to create a flat file of data
• Use Weka like tool to perform data mining
OGC Complaint Spatial DBMS
Feature Selection & OGC complaint methods
to compute relations
Weka
Flat File
Multiple Themes
17
Spatial Clustering• Also called spatial segmentation• Input
– a table of area names and their corresponding attributes such as population density, number of adult illiterates etc.
– Information about the neighbourhood relationships among the areas– A list of categories/classes of the attributes
• Output– Grouped (segmented) areas where each group has areas with similar
attribute values• Census Website has plenty of examples
– http://www.statistics.gov.uk/census2001/censusmaps/index.html
18
Similarity with image segmentation
• Spatial segmentation is performed in image processing– Identify regions (areas)
of an image that have similar colour (or other image attributes).
– Many image segmentation techniques are available
• E.g. region-growing technique
2 2 2 2
2 2 2 2
2 2 2 2
1 1 1 1 2 2 2 2
1 1 1 1
1 1 1 1
1 1 1 1
19
Region Growing Technique• There are many flavours of
this technique• One of them is described
below:– Assign seed areas to each
of the segments (classes of the attribute)
– Add neighbouring areas to these segments if the incoming areas have similar values of attributes
– Repeat the above step until all the regions are allocated to one of the segments
• Functionality to compute spatial relations (neighbours) assumed
1
1
11
1
2
222
2
2
2
1
20
Summary• Spatial data storage available as extensions of
RDBMS• Visualization of Spatial data available in GIS• Spatial Data Mining requires functionality to
compute spatial relations • OGC specifications provide the standards for all
the above resources• MYSQL provides data spatial data storage
– But only partially provides the functionality for computing relations
• Several OpenSource systems provide all the above resources for spatial data– OpenJump, GeoTools