sdm_han.ppt

34
June 7, 2022 Spatial Data Mining 1 Data Mining in Spatial Databases: A Multi-Disciplinary Promise Jiawei Han Database Systems Research Lab. Department of Computing Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/~hanj

Upload: tommy96

Post on 27-Jan-2015

109 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: sdm_han.ppt

April 10, 2023 Spatial Data Mining 1

Data Mining in Spatial Databases: A Multi-Disciplinary Promise

Jiawei HanDatabase Systems Research Lab.

Department of Computing Science

University of Illinois at Urbana-Champaign

http://www.cs.uiuc.edu/~hanj

Page 2: sdm_han.ppt

April 10, 2023 Spatial Data Mining 2

Outline

Why geo-spatial data mining?

Spatial data mining: major progress

Spatial OLAP

Spatial association

Spatial classification

Spatial clustering and outlier analysis

Research challenges in spatial data

mining

Page 3: sdm_han.ppt

April 10, 2023 Spatial Data Mining 3

Why Geo-Spatial Data Mining?

Spatial data mining Mining interesting knowledge/patterns from

huge amount of spatial data Necessity is the mother of invention

Data explosion problem: Data is overwhelming and everywhere—automated data collection, satellite images, remote sensing, GPS, mobile computing and network technology, WWW, etc.)

Making data in use: Data mining may lead to important discoveries

Page 4: sdm_han.ppt

April 10, 2023 Spatial Data Mining 4

Spatial Data Mining vs. Traditional Spatial Data Analysis

Scalability and performance Handle gigabytes of data, interactive exploration, multi-

dimensional drilling/rolling, visualization, ...

Tight integration of database systems and GIS systems

Most of spatial/aspatial data have been stored in relational database systems (e.g., Oracle, MS/SQLServer, DB2, Informix), GIS (e.g., ArcInfo, MapInfo), or data warehouses

Tight coupling and seamless integration Data cleaning, data integration, and data consolidation

New methods and functionalities Association, sequential patterns, classification methods, ...

Page 5: sdm_han.ppt

April 10, 2023 Spatial Data Mining 5

Spatial Data Mining: Confluence of Multiple Disciplines

Spatial Data Mining

SpatialDB System

Statistics

Mobile Computing

Geography

MachineLearning (AI) Visualization

Remote Sensing

Page 6: sdm_han.ppt

April 10, 2023 Spatial Data Mining 6

Outline

Why geo-spatial data mining?

Spatial data mining: major progress

Spatial OLAP

Spatial association

Spatial classification

Spatial clustering and outlier analysis

Research challenges in spatial data

mining

Page 7: sdm_han.ppt

April 10, 2023 Spatial Data Mining 7

Spatial Data Mining—Major Progress

Geo-spatial data warehouse and spatial OLAP Spatial data classification/predictive modeling Spatial clustering/segmentation Spatial association and correlation analysis Spatial regression analysis Spatio-temporal pattern analysis Many more to be explored

Page 8: sdm_han.ppt

April 10, 2023 Spatial Data Mining 8

Spatial Data Warehousing

Spatial data warehouse Integrated, subject-oriented, time-variant, and

nonvolatile spatial data repository for data analysis Spatial data integration: a big issue

Structure-specific formats (raster- vs. vector-based, OO vs. relational models, different storage and indexing, etc.)

Vendor-specific formats (ESRI, MapInfo, Integraph, etc.)

Spatial data cube: Multidimensional spatial database Both dimensions and measures may contain

spatial components

Page 9: sdm_han.ppt

April 10, 2023 Spatial Data Mining 9

Star Schema of the BC Weather Warehouse

Spatial data warehouse Dimensions

region_name time temperature precipitation

Measurements region_map area count Fact tableDimension table

Page 10: sdm_han.ppt

April 10, 2023 Spatial Data Mining 10

Spatial OLAP—OLAP on Map Data

Page 11: sdm_han.ppt

April 10, 2023 Spatial Data Mining 11

Dynamic Merging of Spatial Objects?

Materializing (precomputing) all?—too much storage space

On-line merge?—slow, expensive! A better way: object-based,

selective (partial) materialization

Page 12: sdm_han.ppt

April 10, 2023 Spatial Data Mining 12

Spatial Association and Correlation Mining

FIND SPATIAL ASSOCIATION RULE DESCRIBING "Golf Course" FROM Washington_Golf_courses, WashingtonWHERE CLOSE_TO(Washington_Golf_courses.Obj, Washington.Obj, "3 km") AND Washington.CFCC <> "D81" IN RELEVANCE TO Washington_Golf_courses.Obj, Washington.Obj, CFCC SET SUPPORT THRESHOLD 0.5

What kind of objects are usually located close to golf course?

Page 13: sdm_han.ppt

April 10, 2023 Spatial Data Mining 13

Efficient Mining of Spatial Associations

Progressive refinement Hierarchy of spatial relationship:

g_close_to: near_by, touch, intersect, contain, etc. First search for rough relationship and then refine it

Rough spatial computation (as a filter) Using MBR or R-tree for rough estimation

Detailed spatial algorithm (as refinement) Apply only to those objects which have passed the

rough spatial association test (no less than min_support)

Micro-clustering and join indexing methods

Page 14: sdm_han.ppt

April 10, 2023 Spatial Data Mining 14

Spatial Classification and Model Construction

Generalization- or clustering- based induction

Interactive classification

Page 15: sdm_han.ppt

April 10, 2023 Spatial Data Mining 15

Can Typical Classification Methods Be Applied to Spatial Classification?

Decision-tree classification: Entropy-based information-gain vs. Gini-index

vs. MDL Tree pruning methods: boosting/bagging

Naïve-Bayesian classifier + boosting Bayesian belief networks Neural network Genetic programming Nearest neighbor and case-based reasoning Support vector machine method Association-based multi-dimensional classification

Page 16: sdm_han.ppt

April 10, 2023 Spatial Data Mining 16

What Kind of Houses Are Highly Valued?—Associative

Classification

L

HH

H H

L

LLL

HH

HH

HH

H

HH

HH

LL

L

L

L

HH

HH

C03

C04

C02

C08

L

LL

LL C07

Highway

C05

C06

C01 HH

HH

H

C09

L LL

C10

lake

Page 17: sdm_han.ppt

April 10, 2023 Spatial Data Mining 17

Grouping and Associating Spatial Features for Classification

House_ID

MCluster_ID

Spatial Features Yrs

Sqr_ft Class

H01 C05 close_to(como lake), next_to(Futureshop), ...

16 2300 H

H03 C08 close_to(Lougheed_Hwy), next_to(Austin_elmntary), ...

32 2500 L

H45 C09 next_to (QueenEliz_park), next_to (Cambie_road), ...

20 3100 H

H82 C05 close_to(como lake), next_to(Futureshop), ...

18 3400 H

... ... ...... ... ... ...

H1857

c18 inside(east_Vancouver), close_to (Fraser_st), close_to (sky_train_station)

41 2100 L

Page 18: sdm_han.ppt

April 10, 2023 Spatial Data Mining 18

Mining volcanoes on Venus Training set provided by experts Model constructed can be used for

prediction Finding stars in galaxies (JPL’96) QuakeFinder

Find earth quakes related to spatial info

Spatial Classification: Typical Examples

Page 19: sdm_han.ppt

April 10, 2023 Spatial Data Mining 19

Function Detect changes and trends along a spatial dimension Study the trend of non-spatial or spatial data changing

with space Application examples

Observe the trend of changes of the climate or vegetation with the increasing distance from an ocean

Crime rate or unemployment rate change with regard to city geo-distribution

Spatial Trend Analysis

Page 20: sdm_han.ppt

April 10, 2023 Spatial Data Mining 20

Spatial Cluster Analysis

Mining clusters—k-means, k-medoids, hierarchical, density-based, etc.

Analysis of distinct features of the clusters

Page 21: sdm_han.ppt

April 10, 2023 Spatial Data Mining 21

Density-Based Cluster analysis: OPTICS & Its Applications

Page 22: sdm_han.ppt

April 10, 2023 Spatial Data Mining 22

Clustering and Distribution Density Functions: Density Attractor

Page 23: sdm_han.ppt

April 10, 2023 Spatial Data Mining 23

Center-Defined and Arbitrary Shaped

Page 24: sdm_han.ppt

April 10, 2023 Spatial Data Mining 24

STING: A Statistical Information Grid Approach

Wang, Yang and Muntz (VLDB’97) Each cell stores statistical distribution of

measure at low level Multi-level resolution

Page 25: sdm_han.ppt

April 10, 2023 Spatial Data Mining 25

WaveCluster

G. Sheikholeslami, et al. (1998) Multiple wavelet transformation-based cluster analysis

Page 26: sdm_han.ppt

April 10, 2023 Spatial Data Mining 26

Constraints-Based Clustering

Constraints on individual objects Simple selection of such objects before clustering

Clustering parameters as constraints K-means, density-based: radius, min-# of points

Constraints imposed by physical obstacles Clustering with Obstructed Distance

Constraints specified on clusters using SQL aggregates Sum of the profits in each cluster > 1 million $ Average sales in each cluster > 20 million $s Min # of golden customers (in each cluster) > 1000

Page 27: sdm_han.ppt

April 10, 2023 Spatial Data Mining 27

Constraint-Based Clustering: Planning ATM Locations

Mountain

RiverBridge

Spatial data with obstacles

C1

C2C3

C4

Clustering without takingobstacles into consideration

Page 28: sdm_han.ppt

April 10, 2023 Spatial Data Mining 28

Clustering with Spatial Obstacles

Taking obstacles into account

Not Taking obstacles into account

Page 29: sdm_han.ppt

April 10, 2023 Spatial Data Mining 29

Towards Spatial Data Mining System: An Architecture

Graphic User Interface

Spatial DB meta data: hierarchyNon-Spatial DB

Geo-Classifier

Geo-OLAP Analyzer

Geo-Predictor

Geo-Clustor

Geo-Associator

Future Modules Future Modules

Spatial Database and Warehouse Server

Page 30: sdm_han.ppt

April 10, 2023 Spatial Data Mining 30

Outline

Why geo-spatial data mining? Spatial data mining: major progress

Spatial OLAP Spatial association Spatial classification Spatial clustering and outlier analysis

Research challenges in spatial data mining

Page 31: sdm_han.ppt

April 10, 2023 Spatial Data Mining 31

Research Challenges in Spatial Data Mining

Mining temporal spatial data

Mining spatial-related stream data

Spatial data mining applications (land use,

bio-medical)

Page 32: sdm_han.ppt

April 10, 2023 Spatial Data Mining 32

Conclusions

Spatial data mining vs. traditional spatial analysis

Scalability, architecture, functions, methods

Good progress has been made on spatial data

mining

OLAP, association, clustering, classification,

outlier analysis, etc.

Still lots to be done! Young and promising direction

Joint efforts (from multiple disciplines) lead to

joyous promises!

Page 33: sdm_han.ppt

April 10, 2023 Spatial Data Mining 33

http://www.cs.uiuc.edu/~hanj

Thank you !!!Thank you !!!

Page 34: sdm_han.ppt

April 10, 2023 Spatial Data Mining 34

Some References on Spatial Data Mining

H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001.

Ester M., Frommelt A., Kriegel H.-P., Sander J.: Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support, Data Mining and Knowledge Discovery, an International Journal. 4, 2000, pp. 193-216.

J. Han, M. Kamber, and A. K. H. Tung, "Spatial Clustering Methods in Data Mining: A Survey", in H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2000.

Y. Bedard, T. Merrett, and J. Han, "Fundamentals of Geospatial Data Warehous ing for Geo-graphic Knowledge Discovery", in H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2000