spatial mining

40
Εξόρυξη Χωρικών Δεδομένων Βασίλειος Μεγαλοοικονόμου, Χρήστος Μακρής (βασισμένο σε σημειώσεις της Μ. Dunham)

Upload: tommy96

Post on 20-Jun-2015

1.204 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Mining

Εξόρυξη Χωρικών Δεδομένων

Βασίλειος Μεγαλοοικονόμου, Χρήστος Μακρής

(βασισμένο σε σημειώσεις της Μ. Dunham)

Page 2: Spatial Mining

Spatial Mining Outline

Goal: Provide an introduction to some spatial mining techniques.

• Introduction• Spatial Data Overview • Spatial Data Mining Primitives• Generalization/Specialization• Spatial Rules• Spatial Classification• Spatial Clustering

Page 3: Spatial Mining

Χωρικό Αντικείμενο (Spatial Object)

• Contains both spatial and nonspatial attributes.

• Must have location type attributes:– Latitude/longitude– Zip code– Street address

• May retrieve object using either (or both) spatial or nonspatial attributes.

Page 4: Spatial Mining

Εφορμογές Εξόρυξης Χωρικών Δεδομένων

• Geology• GIS Systems• Environmental Science• Agriculture• Medicine• Robotics• May involved both spatial and temporal

aspects

Page 5: Spatial Mining

Χωρικά (Spatial) Queries• Spatial selection may involve specialized selection

comparison operations:– Near– North, South, East, West– Contained in– Overlap/intersect

• Region (Range) Query – find objects that intersect a given region.

• Nearest Neighbor Query – find object close to identified object.

• Distance Scan – find object within a certain distance of an identified object where distance is made increasingly larger.

Page 6: Spatial Mining

Χωρικές Δομές Δεδομένων• Data structures designed specifically to store or index

spatial data.• Often based on B-tree or Binary Search Tree• Cluster data on disk based on geographic location.• May represent complex spatial structure by placing the

spatial object in a containing structure of a specific geographic shape.

• Techniques:– Quad Tree– R-Tree– k-D Tree

Page 7: Spatial Mining

MBR

• Minimum Bounding Rectangle

• Smallest rectangle that completely contains the object

Page 8: Spatial Mining

Παραδείγματα MBR

Page 9: Spatial Mining

Quad Tree

• Hierarchical decomposition of the space into quadrants (MBRs)

• Each level in the tree represents the object as the set of quadrants which contain any portion of the object.

• Each level is a more exact representation of the object.

• The number of levels is determined by the degree of accuracy desired.

Page 10: Spatial Mining

Παράδειγμα Quad Tree

Page 11: Spatial Mining

R-Tree

• As with Quad Tree the region is divided into successively smaller rectangles (MBRs).

• Rectangles need not be of the same size or number at each level.

• Rectangles may actually overlap.• Lowest level cell has only one object.• Tree maintenance algorithms similar to

those for B-trees.

Page 12: Spatial Mining

Παράδειγμα R-Tree

Page 13: Spatial Mining

K-D Tree

• Designed for multi-attribute data, not necessarily spatial

• Variation of binary search tree• Each level is used to index one of the

dimensions of the spatial object.• Lowest level cell has only one object• Divisions not based on MBRs but

successive divisions of the dimension range.

Page 14: Spatial Mining

Παράδειγμα k-D Tree

Page 15: Spatial Mining

Τοπολογικές Συσχετίσεις

Spatial region:• Disjoint

• Overlaps or Intersects

• Equals

• Covered by or inside or contained in

• Covers or contains

Page 16: Spatial Mining

Distance Between Objects• Euclidean• Manhattan• Extensions:

Page 17: Spatial Mining

Προοδευτική Βελτίωση (Progressive Refinement)

• Make approximate answers prior to more accurate ones.

• Filter out data not part of answer

• Hierarchical view of data based on spatial relationships

• Coarse predicate recursively refined

Page 18: Spatial Mining

Χωρική Ιεραρχία: Progressive Refinement – Προοδευτική Βελτίωση

Page 19: Spatial Mining

Spatial Data Dominant Algorithm – Γενίκευση Χωρικής Τάξης

Page 20: Spatial Mining

STING

• STatistical Information Grid-based

• Hierarchical technique to divide area into rectangular cells

• Grid data structure contains summary information about each cell

• Hierarchical clustering technique

• Similar to quad tree

Page 21: Spatial Mining

STING

Page 22: Spatial Mining

STING Build Algorithm

Page 23: Spatial Mining

STING Algorithm

Page 24: Spatial Mining

Spatial Rules

• Characteristic Rule

The average family income in Dallas is $50,000.• Discriminant Rule

The average family income in Dallas is $50,000, while in Plano the average income is $75,000.

• Association Rule

The average family income in Dallas for families living near White Rock Lake is $100,000.

Page 25: Spatial Mining

Spatial Association Rules

• Either antecedent or consequent must contain spatial predicates.

• View underlying database as set of spatial objects.

• May create using a type of progressive refinement

Page 26: Spatial Mining

Spatial Association Rule Algorithm

Page 27: Spatial Mining

Spatial Classification

• Partition spatial objects

• May use nonspatial attributes and/or spatial attributes

• Generalization and progressive refinement may be used.

Page 28: Spatial Mining

ID3 Extension

• Neighborhood Graph (Γράφοι γειτνίασης)– Nodes – objects

– Edges – connect neighbors

• Definition of neighborhood varies (απόσταση μικρότερη κάποιου κατωφλίου, ικανοποίηση μιας τοπολογικής σχέσης μεταξύ των αντικειμένων, κ.α)

• ID3 considers nonspatial attributes of all objects in a neighborhood (not just one) for classification.

Page 29: Spatial Mining

Spatial Decision Tree

• Approach similar to that used for spatial association rules.

• Spatial objects can be described based on objects close to them – Buffer (ενδιάμεση ζώνη).

• Description of class based on aggregation of nearby objects.

Page 30: Spatial Mining

Spatial Decision Tree Algorithm

Page 31: Spatial Mining

Spatial Clustering

• Detect clusters of irregular shapes

• Use of centroids and simple distance approaches may not work well.

• Clusters should be independent of order of input.

Page 32: Spatial Mining

Spatial Clustering

Page 33: Spatial Mining

CLARANS Extensions

• Remove main memory assumption of CLARANS.• Use spatial index techniques.• Use sampling by employing R*-trees to identify

central objects.• Change cost calculations by reducing the number

of objects examined – Αντί να εξετάζεται όλη η βάση, εξετάζονται μόνο τα

αντικείμενα στις συστάδες που επηρεάζονται κατά την αλλαγή ενός medoid.

• Η ανάκτηση των αντικειμένων σε μια δοθείσα συστάδα βασίζεται στην κατασκευή ενός διαγράμματος Voronoi

Page 34: Spatial Mining

Voronoi

Page 35: Spatial Mining

SD(CLARANS)

• Spatial Dominant (SD)• Πρώτα συσταδοποιεί τις χωρικές συνιστώσες και

έπειτα εξετάζει τα μη χωρικά γνωρίσματα εντός κάθε συστάδας για να εξάγει την περιγραφή της

• First clusters spatial components using CLARANS

• Then iteratively replaces medoids, but limits number of pairs to be searched.

• Uses generalization• Uses a learning to derive description of cluster.

Page 36: Spatial Mining

SD(CLARANS) Algorithm

Page 37: Spatial Mining

DBCLASD

• Distribution Based Clustering of LArge Spatial Databases

• Extension of DBSCAN

• Assumes items in cluster are uniformly distributed.

• Identifies distribution satisfied by distances between nearest neighbors.

• Objects added if distribution is uniform.

Page 38: Spatial Mining

DBCLASD Algorithm

Page 39: Spatial Mining

Aggregate Proximity (Συναθροιστική Εγγύτητα)

• Aggregate Proximity – measure of how close a cluster is to a feature.

• Aggregate proximity relationship finds the k closest features to a cluster.

• The CRH Algorithm – uses different shapes:– Encompassing Circle– Isothetic Rectangle– Convex Hull

• Μια προσέγγιση φιλτραρίσματος των χαρακτηριστικών που χρησιμοποιεί πρώτα τον περικλείοντα κύκλο, μετά το ισοθετικό ορθογώνιο και τέλος το κυρτό περίβλημα

Page 40: Spatial Mining

CRH