spatial data mining satoru hozumi cs 157b. learning objectives understand the concept of spatial...

27
Spatial Data Spatial Data Mining Mining Satoru Hozumi Satoru Hozumi CS 157B CS 157B

Upload: angela-peters

Post on 13-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Spatial Data Spatial Data MiningMiningSatoru HozumiSatoru Hozumi

CS 157BCS 157B

Page 2: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Learning ObjectivesLearning Objectives

Understand the concept of Spatial Understand the concept of Spatial Data MiningData Mining

Learn techniques on how to find Learn techniques on how to find spatial patternsspatial patterns

Page 3: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Examples of Spatial Examples of Spatial PatternsPatterns

1855 Asiatic Cholera in London.1855 Asiatic Cholera in London. A water pump identified as the source.A water pump identified as the source.

Cancer cluster to investigate health Cancer cluster to investigate health hazards.hazards.

Crime hotspots for planning police Crime hotspots for planning police patrol routes.patrol routes.

Affects of weather in the US caused Affects of weather in the US caused by unusual warming of Pacific ocean by unusual warming of Pacific ocean (El Nino).(El Nino).

Page 4: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

What is a Spatial What is a Spatial Pattern?Pattern?

What is not a pattern?What is not a pattern? Random, haphazard, chance, stray, accidental, Random, haphazard, chance, stray, accidental,

unexpected.unexpected. Without definite direction, trend, rule, method, Without definite direction, trend, rule, method,

design, aim, purpose.design, aim, purpose. What is a Pattern?What is a Pattern?

A frequent arrangement, configuration, A frequent arrangement, configuration, composition, regularity.composition, regularity.

A rule, law, method, design, description.A rule, law, method, design, description. A major direction, trend, prediction.A major direction, trend, prediction.

Page 5: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Defining Spatial Data Defining Spatial Data MiningMining

Search for spatial patterns.Search for spatial patterns. Non-trivial search – as “automated” as possible.Non-trivial search – as “automated” as possible.

Large search space of plausible hypothesisLarge search space of plausible hypothesis Ex. Asiatic cholera : causes water, food, air, insects.Ex. Asiatic cholera : causes water, food, air, insects.

Interesting, useful, and unexpected spatial Interesting, useful, and unexpected spatial patterns.patterns. Useful in certain application domainUseful in certain application domain

Ex. Shutting off identified water pump => saved human lives.Ex. Shutting off identified water pump => saved human lives. May provide a new understanding of the worldMay provide a new understanding of the world

Ex. Water pump – Cholera connection lead to the “germ” Ex. Water pump – Cholera connection lead to the “germ” theory.theory.

Page 6: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

What is NOT Spatial Data What is NOT Spatial Data MiningMining

Simple querying of Spatial DataSimple querying of Spatial Data Finding neighbors of Canada given names and Finding neighbors of Canada given names and

boundaries of all countries (Search space not large)boundaries of all countries (Search space not large) Uninteresting or obvious patternsUninteresting or obvious patterns

Heavy rainfall in Minneapolis is correlated with Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul (10 miles apart).heavy rainfall in St. Paul (10 miles apart).

Common knowledge, nearby places have similar Common knowledge, nearby places have similar rainfallrainfall

Mining of non-spatial dataMining of non-spatial data Diaper sales and beer sales are correlated in Diaper sales and beer sales are correlated in

eveningsevenings

Page 7: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Families of Spatial Data Families of Spatial Data Mining PatternsMining Patterns

Location Prediction: Location Prediction: Where will a phenomenon occur?Where will a phenomenon occur?

Spatial InteractionsSpatial Interactions Which subset of spatial phenomena Which subset of spatial phenomena

interact?interact? Hot spotHot spot

Which locations are unusual or share Which locations are unusual or share commonalities?commonalities?

Page 8: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Location PredictionLocation Prediction Where will a phenomenon occur?Where will a phenomenon occur? Which spatial events are predictable?Which spatial events are predictable? How can a spatial event be predicted from How can a spatial event be predicted from

other spatial events?other spatial events? ExamplesExamples

Where will an endangered bird nest?Where will an endangered bird nest? Which areas are prone to fire given maps of Which areas are prone to fire given maps of

vegitation and drought?vegitation and drought? What should be recommended to a traveler in What should be recommended to a traveler in

a given location?a given location?

Page 9: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Spatial InteractionsSpatial Interactions Which spatial events are related to each Which spatial events are related to each

other?other? Which spatial phenomena depend on Which spatial phenomena depend on

other phenomenon?other phenomenon? ExamplesExamples

Earth science: Earth science: climate and disturbance => {wild fires, hot, dry, climate and disturbance => {wild fires, hot, dry,

lightning}lightning} Epidemiology: Epidemiology:

Disease type and enviornmental events => {West Disease type and enviornmental events => {West Nile disease, stagnant water source, dead birds, Nile disease, stagnant water source, dead birds, mosquitoes}mosquitoes}

Page 10: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Hot spotsHot spots Is a phenomenon Is a phenomenon

spatially clutered?spatially clutered? Which spatial Which spatial

entities are entities are unusual or share unusual or share common common characteristics?characteristics?

ExamplesExamples Crime hot spots to Crime hot spots to

plan police patrolsplan police patrols

Page 11: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Spatial QueriesSpatial Queries Spatial Range QueriesSpatial Range Queries

Find all cities within 50 miles of ParisFind all cities within 50 miles of Paris Query has associated region (location, boundary)Query has associated region (location, boundary) Answer includes overlapping or contained data Answer includes overlapping or contained data

regionsregions Nearest-Neighbor QueriesNearest-Neighbor Queries

Find the 10 cities nearest to ParisFind the 10 cities nearest to Paris Results must be ordered by proximityResults must be ordered by proximity

Spatial Join QueriesSpatial Join Queries Find all cities near a lakeFind all cities near a lake Join condition involves regions and proximity.Join condition involves regions and proximity.

Page 12: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Unique Properties of Unique Properties of Spatial PatternsSpatial Patterns

Items in a traditional data are Items in a traditional data are independent of each other, where as independent of each other, where as properties of location in a map are often properties of location in a map are often “auto-correlated” (patterns exist)“auto-correlated” (patterns exist)

Traditional data deals with simple Traditional data deals with simple domains, e.g. numbers and symbols domains, e.g. numbers and symbols where as spatial data types are complexwhere as spatial data types are complex

Items in traditional data describe Items in traditional data describe discrete objects where as spatial data is discrete objects where as spatial data is continuouscontinuous

Page 13: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Association RulesAssociation Rules

Support = the number of time a rule shows Support = the number of time a rule shows up in a databaseup in a database

Confidence = Conditional probability of Y Confidence = Conditional probability of Y given Xgiven X

ExampleExample (Bedrock type = limestone), (soil depth < 50 ft) (Bedrock type = limestone), (soil depth < 50 ft)

=> (sink hole risk = high)=> (sink hole risk = high) Support = 20 %, confidence = 0.8Support = 20 %, confidence = 0.8 Interpretation: Locations with limestone Interpretation: Locations with limestone

bedrock and low soil depth have high risk of bedrock and low soil depth have high risk of sink hole formation.sink hole formation.

Page 14: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Apriori Algorithm to mine Apriori Algorithm to mine association rules association rules

Key challengeKey challenge Very large search spaceVery large search space

Key assumptionKey assumption Few associations are support above given Few associations are support above given

thresholdthreshold Associations with low support are not Associations with low support are not

interestinginteresting Key insightKey insight

If an association item set has high support, If an association item set has high support, then so do all its subsetsthen so do all its subsets

Page 15: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Association rules Association rules ExampleExample

Page 16: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Techniques for Association Techniques for Association MiningMining

Classical methodClassical method Association rules given item types and Association rules given item types and

transactions transactions Assumes spatial data can be decomposed Assumes spatial data can be decomposed

into transactionsinto transactions Such decomposition may alter spatial patternsSuch decomposition may alter spatial patterns

New spatial methodNew spatial method Spatial association ruleSpatial association rule Spatial co-locationSpatial co-location

Page 17: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Associations, Spatial Associations, Spatial associations, co-locationassociations, co-location

Page 18: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Associations, Spatial Associations, Spatial associatins, co-locationassociatins, co-location

Page 19: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Co-location RulesCo-location Rules

For point data in spaceFor point data in space Does not need transaction, works Does not need transaction, works

directly with continuous spacedirectly with continuous space Use neighborhood definition and Use neighborhood definition and

spatial joinsspatial joins

Page 20: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Co-location rulesCo-location rules

Page 21: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

ClusteringClustering

Process of discovering groups in Process of discovering groups in large databaseslarge databases Spatial view: rows in a database = Spatial view: rows in a database =

points in a multi-dimentional space.points in a multi-dimentional space. Visualization may reveal interesting Visualization may reveal interesting

groupsgroups

Page 22: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

ClusteringClustering HierarchicalHierarchical

All points in one clusterAll points in one cluster Split and merge till a stop criterion is reachedSplit and merge till a stop criterion is reached

PartitionalPartitional Start with random central pointStart with random central point Assign points to nearest central pointAssign points to nearest central point Update the central pointsUpdate the central points Approach with statistical rigorApproach with statistical rigor

DensityDensity Find clusters based on density of regionsFind clusters based on density of regions

Page 23: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

OutliersOutliers

Observations inconsistent with rest Observations inconsistent with rest of the datasetof the dataset

Observations inconsistent with their Observations inconsistent with their neighborhoodsneighborhoods

A local instability or discontinuityA local instability or discontinuity

Page 24: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Variogram CloudVariogram Cloud Create a variogram by plotting attribute Create a variogram by plotting attribute

difference, distance for each pair of pointsdifference, distance for each pair of points Select points common to many outlying pairsSelect points common to many outlying pairs

Page 25: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Moran Scatter PlotMoran Scatter Plot Plot normalized attribute values, weighted average in Plot normalized attribute values, weighted average in

the neighborhood for each locationthe neighborhood for each location Select points in upper left and lower right quadrantSelect points in upper left and lower right quadrant

Page 26: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

Scatter plotScatter plot Plot normalized attribute values, weighted average in Plot normalized attribute values, weighted average in

the neighborhood for each locationthe neighborhood for each location Fit a liner regression lineFit a liner regression line Select points which are unusually far from the Select points which are unusually far from the

regression line.regression line.

Page 27: Spatial Data Mining Satoru Hozumi CS 157B. Learning Objectives Understand the concept of Spatial Data Mining Understand the concept of Spatial Data Mining

ConclusionConclusion

Patterns are opposite of randomPatterns are opposite of random Common spatial patterns:Common spatial patterns:

Location predictionLocation prediction Feature interactionFeature interaction Hot spotHot spot

Spatial patterns may be discovered Spatial patterns may be discovered using:using: Techniques like associations, clustering Techniques like associations, clustering

and outlier detectionand outlier detection