spatial data mining. 2 introduction spatial data mining is the process of discovering interesting,...

20
Spatial Data Mining

Upload: bruce-briggs

Post on 17-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

Spatial Data Mining

Page 2: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

2

Introduction• Spatial data mining is the process of discovering

interesting, useful, non-trivial patterns from large spatial datasets– E.g. co-location patterns of water pumps and cholera– Determining hotspots: unusual locations

• Spatial Data Mining Tasks– Classification/Prediction– Co-location Mining– Clustering

• Recap of special properties of Spatial Data– Spatial autocorrelation– Spatial heterogeneity– Implicit Spatial Relations

Page 3: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

3

Spatial Relations

• Spatial databases do not store spatial relations explicitly– Additional functionality required to compute

them• Three types of spatial relations specified

by the OGC reference model– Distance relations

• Euclidean distance between two spatial features– Direction relations

• Ordering of spatial features in space– Topological relations

• Characterise the type of intersection between spatial features

Page 4: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

4

Distance relations

• If dist is a distance function and c is some real number

1. dist(A,B)>c,2. dist(A,B)<c and3. dist(A,B)=c

AB

A B

BA

Page 5: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

5

Direction relations• If directions of B and C

are required with respect to A

• Define a representative point, rep(A)

• rep(A) defines the origin of a virtual coordinate system

• The quadrants and half planes define the direction relations

• B can have two values {northeast, east}

• Exact direction relation is northeast

A

C

B

rep(A)

C north A

B northeast A

Page 6: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

6

Topological Relations• Topological relations describe how geometries

intersect spatially• Simple geometry types

– Point, 0-dimension– Line, 1-dimension– Polygon, 2-dimension

• Each geometry represented in terms of – boundary (B) – geometry of the lower dimension– interior (I) – points of the geometry when boundary is

removed– exterior (E) – points not in the interior or boundary

• Examples for simple geometries– For a point, I = {point}, B={} and E={Points not in I and

B}– For a line, I={points except boundary points}, B={two

end points} and E={Points not in I and B}– For a polygon, I={points within the boundary}, B={the

boundary} and E={points not in I and B}

Page 7: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

7

DE-9IM• Topological relations are defined using any

one of the following models– 4IM, four intersection model (only B and E

considered)– 9IM, nine intersection models (B, I, and E)– DE-9IM, dimensionally extended 9 intersection

model• DE-9IM is an OGC complaint model

• Dim is the dimension function

Page 8: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

8

Example

• Consider two polygons– A - POLYGON ((10

10, 15 0, 25 0, 30 10, 25 20, 15 20, 10 10))

– B - POLYGON ((20 10, 30 0, 40 10, 30 20, 20 10))

Page 9: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

9

I(B) B(B) E(B)

I(A)

B(A)

E(A)

9-Intersection Matrix of example geometries

Page 10: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

10

DE-9IM for the example geometries

I(B) B(B) E(B)

I(A) 2 1 2

B(A) 1 0 1

E(A) 2 1 2

Page 11: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

11

Relationships using DE-9IM• Different geometries may give

rise to different numbers in the DE-9IM

• For a specific type of relationship we are only interested in certain values in certain positions– That is, we are interested in

patterns in the matrix than actual values

• Actual values are replaced by wild cards– T: value is "true" - non empty

- any dimension >= 0– F: value is "false" - empty -

dimension < 0– *: Don't care what the value is– 0: value is exactly zero– 1: value is exactly one– 2: value is exactly two

A overlaps B

I(B) B(B) E(B)

I(A) T * T

B(A) * * *

E(A) T * *

Page 12: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

12

Topological Relations• x.Disjoint(y)

– FF*FF**** • x.Touches(y)

– FT******* Area/Area, Line/Line, Line/Area, Point/Area – F**T***** Not Point/Point – F***T****

• x.Crosses(y) – T*T****** Point/Line, Point/Area, Line/Area – 0******** Line/Line

• x.Within(y)– TF*F*****

• x.Overlaps(y) – T*T***T** Point/Point, Area/Area– 1*T***T** Line/Line

• DE-9IM string for example geometries was ‘212101212’ (from earlier slide)– A crosses B– A overlaps B

Page 13: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

13

Approaches to Spatial Data Mining

• Materialize spatial features and use Weka– Required features are added as

additional attributes to the main feature– To create a flat file of data

• Use special data mining techniques that take spatial dependency into account

Page 14: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

14

Materializing features- Example

Page 15: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

15

Materializing features- Example (2)

Page 16: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

16

Spatial Data Mining Architecture

• Retrieve data belonging to multiple themes

• Preprocess spatial data to materialize spatial features– Select the required

features– Use the methods to

compute spatial relations to create a flat file of data

• Use Weka like tool to perform data mining

OGC Complaint Spatial DBMS

Feature Selection & OGC complaint methods

to compute relations

Weka

Flat File

Multiple Themes

Page 17: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

17

Spatial Clustering• Also called spatial segmentation• Input

– a table of area names and their corresponding attributes such as population density, number of adult illiterates etc.

– Information about the neighbourhood relationships among the areas– A list of categories/classes of the attributes

• Output– Grouped (segmented) areas where each group has areas with similar

attribute values• Census Website has plenty of examples

– http://www.statistics.gov.uk/census2001/censusmaps/index.html

Page 18: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

18

Similarity with image segmentation

• Spatial segmentation is performed in image processing– Identify regions (areas)

of an image that have similar colour (or other image attributes).

– Many image segmentation techniques are available

• E.g. region-growing technique

2 2 2 2

2 2 2 2

2 2 2 2

1 1 1 1 2 2 2 2

1 1 1 1

1 1 1 1

1 1 1 1

Page 19: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

19

Region Growing Technique• There are many flavours of

this technique• One of them is described

below:– Assign seed areas to each

of the segments (classes of the attribute)

– Add neighbouring areas to these segments if the incoming areas have similar values of attributes

– Repeat the above step until all the regions are allocated to one of the segments

• Functionality to compute spatial relations (neighbours) assumed

1

1

11

1

2

222

2

2

2

1

Page 20: Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets

20

Summary• Spatial data storage available as extensions of

RDBMS• Visualization of Spatial data available in GIS• Spatial Data Mining requires functionality to

compute spatial relations • OGC specifications provide the standards for all

the above resources• MYSQL provides data spatial data storage

– But only partially provides the functionality for computing relations

• Several OpenSource systems provide all the above resources for spatial data– OpenJump, GeoTools