mining spatial data: opportunities and challenges …ceci/micfiles/mining spatial data...mining...

116
Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department of Computer Science University of Bari, Italy August 30th - September 1st, 2007 - AVEIRO, PORTUGAL

Upload: others

Post on 27-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Mining Spatial Data: Opportunities and Challengesof a Relational Approach

Donato MalerbaDepartment of Computer Science

University of Bari, Italy

August 30th - September 1st, 2007 - AVEIRO, PORTUGAL

Page 2: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Exploration: A Historical Example

1848: An epidemic of the ‘Asiatic cholera’ hit LondonJohn Snow observed the distribution of deaths throughout the city and hypothesized that river water contaminated by cholera evacuations explained spatial variations in mortality throughout London

John Snow

Page 3: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Exploration: A Historical Example

August 1854: the cholera epidemic hit an area of North LondonJ. Snow obtained the names and addresses listed on 83 death certificates from the Registry Office.He marked cholera cases on a map

Page 4: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Exploration: A Historical Example

He also inventoried potential sources of contamination (pumps)and combined this information on the map.He observed that nearly all the deaths had taken place within a short distance of the pump in Broad Street

Page 5: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Exploration: A Historical Example

Snow persuaded the parish council to remove the handleNot easy: the water provided by this pump was held in such high esteem that people came from neighboring streets for itResult: the epidemic subsided.

death

Page 6: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Exploration: A Historical Example

The council did not really believe Snow, so a curate repeated Snow’s work and considered other factors (cleanliness/filthiness of houses).The curate, who was initially biased against Snow’s theory, located 700 deaths within a 250-yard radius and showed that the use of water from the Broad Street pump was strongly correlated with death from Asiatic cholera.

Page 7: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Exploration: A Historical Example

Some curiosity: Snow’s theory was supported bytwo pieces of ‘negative data’

No infection in the workhouse (it had its own well)No cases in the Lion Brewery (workers drank the beer)

Page 8: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Lessons LearnedKey elements of this success story:

Identification of relevant spatial objectsReference spatial objects

(buildings where cholera cases occurred)Task-relevant spatial objects

(water pumps, wells, etc.)Identification of the properties of, and relationshipsbetween, relevant spatial objects(distance of buildings from water pumps, presence of wells)

Page 9: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data MiningThe goal of spatial data mining is to automate the discovery of such correlations, which can then be examined by specialists for further validation and verification.

Page 10: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 11: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Modeling Spatial InformationTwo major approaches to conceptual modelingof space:

Field-based modelObject-based model

Page 12: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Field-based ModelThe world is seen as a continuous surface over whichfeatures vary.Spatial variation is defined by a number of Field Functions:

f: Rn Attribute DomainExamples: elevation, temperature, precipitation

Field OperationsExamples, addition(+) and composition(o).

))((:)()(:

xgfxgfxgxfxgf

→+→+

o

Page 13: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Object-based ModelThe world is seen as a surface littered with distinct, identifiable and relevant things or entities, called objects, which exist independent of their locations.Objects can be:

Zero-dimensional or punctualOne-dimensional or linearTwo-dimensional or surfacic

Operations on spatial objectsTopologicalDirectionalDistance-based

Page 14: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Field-based vs. Object-based

(b) (c)

(0,0) (2,0) (4,0)

(0,2)

(0,4)

Fir Oak

(a)

Area/Boundary

FS1

FS2

FS3

[(0,2),(4,2),(4,4),(0,4)]

[(0,0),(2,0),(2,2),(0,2)]

[(2,0),(4,0),(4,2),(2,2)]

y

x

Area-ID

f(x,y) �

"Pine," 2 � x � 4 ; 2 � y � 4

"Fir," 0 � x � 2; 0 � y � 2

"Oak," 2 � x � 4; 0 � y � 2

Pine

Object Viewpoint of Forest Stands

DominantTree Species

Fir

Oak

Pine

Field Viewpoint of Forest Stands

Page 15: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 16: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Pattern (Field-based view)A function obtained as a combination of field functions according to field operations

f (x,y) = precipitation in (x,y)

∫∫ >BA

dxdyyxfBsize

dxdyyxfAsize

),()(

1),()(

1

Page 17: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Pattern (Object-based view)It expresses a spatial relationship among spatial objects.

If the wetlands area is near open water then there is a nest of a red-winged blackbird

(classification rule, specifically location prediction rule)The price of a house near the river is 2000*Size + 5000

(regression rule)Trajectories of monitored cars group along the directiondowntown to residential suburbs

(cluster)A country that is adjacent to the Mediterranean sea is a wine exporter

(association rule)

Page 18: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data MiningSpatial data mining: extraction of interesting and useful but implicit spatial patterns. (adapted from the definition of KDD)Fayyad, U., Piatesky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, U., Piatesky-Shapiro, G., Smyth, P., Uthurusamy, R. (Eds.): Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press (1996) 1-35.

Page 19: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 20: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

What’s Special about Spatial Data Mining? (1/4)

The formulation of a spatial data mining method cannot leave out of consideration the logicalrepresentation of spatial information.Two modes

TessellationVector

Page 21: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Tessellation ModeRegular or irregular

Field-based data in tessellation mode

Object-based data in tessellation mode

⟨2,3,6,7,8,12,13,14,18,19⟩2 3

6 7 8

12 13 14

18 19

Page 22: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Vector ModeOrdered sets of xy-coordinates defining points, lines, or polygons

Field-based data in vector mode

Object-based data in vector mode

Page 23: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Tessellation Vector☺Storage efficient

☺Easy to scale (though some operations are complex)Difficult to convert remote sensing images into this formatDifficult to check on a large number of constraints (is a polygon convex?)

☺Supported by spatial DBMS

Large memory space required Operation on objects are time consuming

☺Large volumes of spatialdata (e.g., remote sensing images) are available

Page 24: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Hybrid Modes also PossibleSpatial objects in a regular grid of cells

Page 25: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Tessellationsingle pixels are classifiedImage processing operators are needed Industrial area

Page 26: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

VectorImages are transformedSpatial relationships are computed

Industrial area

Page 27: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Representation of Spatial Data in a Spatial DBMS

Spatial information is represented in different layers, one for each type of spatial object.Layer: a database relation Ri with a number of elementary attributes Ai

1, …, Aimi and possibly a

geometry attribute GiVector representation for Gi which can be a point, a line or a polygonA reference system defines the coordinates of single points and vertices of lines and polygons

Page 28: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

LayerA spatial object or a field function is represented by one or more tuples of a …Layer: a database relation Ri with a number of elementary attributes Ai

1, …, Aimi and possibly a geometry attribute Gi

The geometry attribute is represented in vector mode.

wells ID Location Depth Type2357 (15,22) 20 drilled

Page 29: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

What’s Special about Spatial Data Mining? (2/4)

Different types of spatial objects Several layers in a spatial DB

wells ID Location Depth Type2357 (15,22) 20 drilled

buildings ID Surface Size OwnerAD18 250 Smith

Page 30: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

What’s Special about Spatial Data Mining? (3/4)

Spatial objects have a locational property which implicitly defines spatial relationships between objects.

TopologicalDistance-basedDirectional

Page 31: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Topological RelationsInvariant under homomorphisms (rotation, translation & scaling)Semantics defined by the 9-intersection model

disjoint

meet

contains

covers

overlaps

equal

inside

covered by

For regions

Page 32: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Distance RelationsMetric

Euclidean distance between two pointsFor polygons it’s an aggregate function (e.g., minimum)

Non-metricTypically defined on the basisof a cost function (e.g. drive time)

Page 33: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Directional RelationsBased on an angle

Based on the extension of Allen’s algebra

α

Page 34: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial RelationsIn a spatial DB, different spatial relations ρ implicitly define

spatial joins between two layers Ri and RjRi ρ Rj

Too many spatial joins implicitly defined Efficient computation of spatial relations is a must when developing spatial data systems

Page 35: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Abstraction from PhysicalRepresentation

Interest towards properties not related to physical representation Well-defined semantics (e.g., 9-intersection model) is not enough.

Example: Two roads can cross each other, or run parallel, or can be confluent, independently of the fact that they are represented as “lines” or “regions”

Page 36: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

What’s Special about Spatial Data Mining? (4/4)

Spatial (positive) autocorrelation: The values of a givenproperty are highly uniform among similar spatial objects in the neighborhood.

Tobler’s first law of geography“everything is related to everything else, but near things are

more related than distant things”(Tobler, 1970)

Page 37: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

What’s Special about Spatial Data Mining? (4/4)

Spatial autocorrelation: The value of a property observed at a location depends on the values of properties observed at neighboring locations.positive autocorrelation: more similarnegative autocorrelation: less similar

Page 38: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Tobler’s First Law of Geography“everything is related to everything else, but near

things are more related than distant things”(Tobler, 1970)

Page 39: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Error and Spatial LagTwo primary types of autocorrelation:

Spatial error Spatial lagSpatially lagged explanatory variables

yi yj

εi εj

xi xj

yi yj

εi εj

xi xj

Spatially lagged response variables

Page 40: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Violated AssumptionsSpatial error: error terms are uncorrelatedSpatial lag: observations are independent (as well as error terms are uncorrelated)

“Anyone seriously interested in prediction when the sample data exhibit spatial dependence should consider a spatial model”

(LeSage & Page, 2001)

Page 41: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Limits of Traditional Data Mining Methods

Numeric and discrete type (no geometry)Observations cannot be of different types (e.g., wells, buildings, etc.)Spatial relationships between observations notrepresented / considered

Page 42: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial StatisticsSpatial dependence typically modeled by the linear models

y = Xα + βDy + γ DX + εy: vector of observations of the dependent variableX: matrix of observations of the independent variableα: strength of local influence β: strength of spatial dependence on response var.sγ: strength of spatial dependence on explanatory var.sD: spatial weight matrix (or neighborhood matrix)

Page 43: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Weight MatrixContains a ‘d’ term for every combination of observations in the data set‘d’ may be the inverse distance betweenobservations or 0,1 if they share a borderand/or vertex.The choice of spatial weight matrix is often made ad hoc and a priori.Provides the ‘structure’ of assumed spatial relationships.

Page 44: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Problems with Spatial Models D has to be carefully definedHow can D express the contribution of different spatial relationships?Spatial dependencies are all handled in a pre-processing or feature extraction stepAll spatial objects involved in the spatial phenomena (rows of X) are uniformly represented by the same set of attributesNo clear difference between reference (target) and task-relevant objects.

Page 45: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Reference vs. Task-relevant ObjectsReference objects: the main subject of analysisTask-relevant objects: objects in the neighborhood

that can contribute to explain the spatial variation

Example: find associations involving large towns of Apulia (Italy) Reference objects: large towns Task-relevant objects:

water bodies roads province boundaries

Page 46: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 47: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Towards a Multi-Relational Representation

In spatial data mining the units of analysis are typically composed of several spatial objects with different properties.Their spatial structure cannot be accommodated into a classical double-entry table. A better representation: A set of relations, R1 … Rjsome of which are layers. Foreign key constraints and spatial relations define possible joins.

Page 48: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

ExampleProblem: investigate social effects of public transportation in a British citySpatial data set: ED ID Area

03bsfc01BL Name Line Type

15a main

CE ED #householdsno car

#households1 car

#households≥2 cars

03bsfc01 80 67143

Page 49: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

ExampleA unit of analysis corresponds to an ED(reference object) described in terms of #cars per household and crossing bus lines (task-relevant objects)Relational pattern:

“the enumeration districts with a high percentage of households which own less than two cars, are served by at least two bus lines, one of which is a main bus line”

Page 50: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Multi-relational Data MiningMRDM tools can be applied directly to data distributed on several relations to find relational patterns which involve multiple relations.Relational patterns can be expressed in SQL but also in first-order logic

Page 51: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Patterns in SQL and LogicSELECT DISTINCT ED.IDFROM ED, CE, BL AS BL1, BL AS BL2WHERE CE.ED = ED.ID

AND HH2CARS / (HHNOCARS + HH1CAR + HH2CARS)*100 > 60 AND INTERSECTS(ED.AREA,BL1.LINE) AND INTERSECTS(ED.AREA,BL2.LINE)AND BL1.NAME ≠ BL2.NAME AND BL1.TYPE=“MAIN”

ed(X), ce(X, HHNOCARS, HH1, HH2), bl(BL1), bl(BL2), HH2CARS / (HHNOCARS + HH1CAR + HH2CARS)*100 > 60, intersects(X,BL1), intersects(X,BL2), BL1 ≠ BL2, main(BL1)

Page 52: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Two Settings for MRDMFinding relational patterns within units of analysis represented as sets of tuples

Each unit of analysis includes a single reference object and is represented by a sub-database of the original one

Finding patterns within the whole database

Page 53: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Individual-Centered RepresentationsSeveral advantages

Positive PAC-learnability resultsMethods working under single-table assumption are easier to upgrade (e.g., the notion of unit of analysis simplifies the sampling)More efficient (process one unit of analysis at a time)

Page 54: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Individual-Centered RepresentationsBut …

Units of analysis with a single reference object might not be easy to define In spatial data mining, the unit of analysis should be carefully selected so that

autocorrelation is consideredthe size of the neighborhood is limited

Page 55: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Example of AutocorrelationSpatially lagged explanatory variablesno communal establishment (schools, hospitals) in an ED, but many of

them are located in the nearby EDs

Xr j

Xr j

Xr j Xr j

Xr j

Xr j(x1,i ,… , xk,i , xr j)

Reference object: ED

Task-relevant objects: communal establishments in the nearby

Page 56: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Example of AutocorrelationSpatially lagged response variablesthe price level for a good at a retail outlet in a city depends on the price

for the same good in the nearby

Xr j

Xr j

Xr j Xr j

Xr j

Xr j

Yj

Yj

Yj Yj

Y j

Yj(x1,i ,… , xk,i , y j)

Reference object: EDTask-relevant objects: EDs in the neighborhood

Page 57: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

A Recipe for MRDM SystemsStart from a well-known data mining system working on the classical double-entry table representationUpgrade

Generality order of patterns (e.g., θ-subsumption), Generalization/specialization operatorsSimilarity measure, …

to deal with several relations Build new system, retain as much as possible from the original one

Page 58: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Additional Ingredients for a Relational Approach to Spatial Data Mining

Define a representation of spatial objectsDefine operators for spatial joins Optimize the computation of spatial joins with spatial indexesDistinguish reference from task-relevant objectsVisualize spatial patterns (e.g., on a map)

Page 59: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 60: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Association Rule Discovery

SPADA system (Malerba & Lisi, ILP 2001) Task-relevant data organized hierarchically

Spatial patterns are found at different granularity levels

road_net

MotorwayA_road B_road PrimaryRoad

1

2

3

Road net

1

2

3

Water net

water_net

canal river water

1

2

3

Rail net

rail_net

rail

Page 61: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Association Rule Discovery

Spatial pattern: conjunction of first-order logic atomsThe space of spatial patterns is ordered by θ-subsumptionmonotonicity of support w.r.t. θ-subsumption pruning of patterns at the same granularity level in the candidate generation phase monotonicity of pattern frequency w.r.t. granularity level

pruning of patterns at different granularity levels in the candidate generation phase

Page 62: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Association Rule Discovery

Efficiency improvement of pattern evaluation by caching support objects for each stored pattern Definition of a declarative bias to filter out rules on the basis of users’ preferences efficiency improvement is a byproductIntegration of SPADA in the ARES system that interfaces a Spatial DB (Oracle Spatial)

Page 63: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Classification

Based on associative classification (Ceci et al., ECML/PKDD 2004)SPADA is used to extract strong multi-level spatial association rules, with exactly one literal representing the class label in the consequentStrong rules are then used to build a relational Naive Bayesian classifier.

Page 64: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Clustering

CORSO (presented @ this conference )Units of analysis are described by severalrelations A relational distance measure is used to clusterthemUnits of analysis are themselves spatially related

Page 65: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Clustering

Discrete Spatial Structure: a directed graph wherenodes correspond to units of anaysislinks correspond to spatial relations between units of analysis

“Neighbouring regions” relation…

Apulia

Molise

Basilicata

Campania

Calabria

Abruzzo

Latium

Tuscany

Sicily

Page 66: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial ClusteringCORSO combines

graph based partitioningwith multi-relational clustering

clusters are described by means of a logical theory

…Apulia

Molise

Basilicata

Campania

Calabria

Abruzzo

Latium

Tuscany

Sicily

Page 67: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial ClusteringRelated work: GDBSCAN (Sander et al., 1998)

Pros• spatial relations between objects to be clustered is

consideredCons• Data stored in a single double-entry table• No description of clusters

Page 68: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Relational Systems for Spatial DMSpatial Regression

Mrs-SMOTI (Malerba et al., ECML/PKDD 2005)It generates relational model trees from a collection of tables (some are layers). It extends its predecessor SMOTI:

tight-integration with a spatial database to mine spatial relationships and properties implicit in datasearch strategy modified to capture the implicit relational structure of spatial data (intra-layer and inter-layer) intra-layer relationship make available spatially-lagged response in addition to spatially lagged explanatory attributes

Page 69: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Classical Regression ProblemGiven

m independent (or explanatory) attributes Xi (both continuous and discrete)a continuous dependent (or response) attribute Y to be predicteda set of n training cases (x1, x2, …, xm, y)

Builda function y=g(x) such that it correctly predicts the value of the response attribute for each m-tuple (x1, x2, …, xm)

Page 70: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Problem 1: Spatial ArrangementIf spatial heterogeneity of response isanticipated, than allow

the constantone or more of the other regression parameters

to vary spatiallyExample: residential areas have a higher number of migrants

Yi = β0 + β1x1,i + … + βkxk,i + γDi + ei

Di is a dummy variable: 1 site i is in residential area, 0 otherwise

Page 71: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Problem 1: Spatial ArrangementYi = β0 + (β1 + γDi)x1,i + … + βkxk,i + eiIn this case the slope parameter associated to variable xi

varies.

Issues: clumsy generalization to more than two influence areasdifficult to establish a priori which variables (if any) are actually affected by spatial arrangement

Page 72: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Model Tree• A tree-structure is generated according to a top-

down strategypartitioning of the training setlocal regression models

X1 ≤ 3

Y=3+2X1

a set of n training cases (x1, x2, …, xm, y)

Page 73: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Regression ProblemThe response attribute Y (e.g., number of migrants) is associated to a location (e.g., ED)Explanatory variables Xi are also associated to locations

In standard model tree learningmethods:arrangement properties of spatial objects isdisregardedobservations are assumedindependent

Page 74: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Model Trees: State of the ArtStatistics

Ciampi (1991): RECPAMSiciliano & Mola (1994)…

Data MiningKaralic, (1992): RETISQuinlan, (1992): M5Wang & Witten, (1997): M5’Lubinsky, (1994): TSIRTorgo, (1997): HTLMalerba et al., (2004): SMOTI…

No state of art method tries to mine model trees dealing with spatial structure!

Page 75: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Dealing with local & global effectsSome explanatory attributes can have spatially global effect on the response attribute, while others have only a spatially local effect.

Y = 0.9

Y = 3+1.1X1

Y = 3X1+1.1X2

• The model tree doesn’t show up the possibly globaleffect of X1

Page 76: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Dealing with local & global effectsA tree structure with splitting and regression nodes

• Splitting nodes perform a Boolean test.

tR

Xi ≤ α

Y=a+bXu Y=c+dXw

t

tL

continuousvariable

tXi∈{xi1,…,xih}

Y=a+bXu Y=c+dXw

tRtL

discrete variable

tL

• Regression nodes compute only a straight-line regression. They have only one child.

Y=a+bXi

X’j ≤ α

Y=c+dX’u Y=e+fX’w

nL nR

t

t’

t’Rt’L

X’j=Xj-(aj+bjXi)

Page 77: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Dealing with local & global effectsLeaves are associatedwith a straight-lineregression function

65

4

SMOTI: Stepwise Model Tree Induction(Malerba et al., IEEE Trans. Pattern Analysis & Mach. Intell. 2004)

3

2

Y’=c+dX’3

Y’=e+fX’2

X’4 ≤ γ

Y’=g+hX’3

0Y=a+bX1

1X’3 ≤ α

T

7

Y’=i+lX’4X’2 ≤ β The multiple regressionmodel associated to a leaf is the composition of straight-line regression functions found along the path from the root to a leaf

Page 78: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Dealing with Spatial Autocorrelation• Augment ED information by exploiting intra-layer

relationships (e.g., neighborhood)

ED #MigrInWards #Establishments #Employees on 10% sample population

#Migrants

Italy 3 1 9 4382…

03BSFA18 10 1 4503BSFN01 18 5 73

… … … …

Reference ED NeighbouringED

03BSFA0403BSFA0403BSFA04

03BSFA0503BSFB1803BSFQ01

Page 79: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Dealing with Spatial AutocorrelationOther spatial objects, which are different from areas where Y is measured, can be easily accommodated in this framework inter-layerrelationships.

Example:

Page 80: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Model Trees + Spatial Data Structure = Mrs-SMOTI

Mrs-SMOTI is the spatial extension of SMOTI (Stepwise Model Tree Induction)INPUT: spatial objects eventually belonging to separate layers stored in a spatial database S

reference objects (main subject of analysis)task-relevant objects

OUTPUT: a spatial model tree T by partitioning training spatial data according to intra-layer and inter-layer relationshipsassociating different regression models to disjoint spatial areas

Page 81: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Split NodeBinary split nodes involves:1. Boolean tests on spatial relationships (either intra-layer or

inter-layer)

Example: Partitioning EDs in presence/absence of roads.

EDs crossed by some road

EDs not crossed byany road

An extra layeris added to the spatial model

Page 82: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Split Node

2. Boolean tests onthematic attributes of a layerspatial properties implicitly defined for the geometry of a layer (e.g., area for polygons, extension for lines)

Boolean tests involve only some layer already included in the model

Page 83: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Regression NodeIt performs a straight-line regression on either a continuous thematic attribute or a continuous spatial property

response attribute and continuous explanatory attributes are replaced with residuals stepwise regressionregression attribute comes from a layer already included in the modelwhen a new layer is added to the model, continuous thematic and spatial attributes are replaced with corresponding residuals.

Page 84: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Database IntegrationHow?

Object relational data representation (Oracle Spatial)Spatial patterns associated to splitting and regression nodes are expressed by spatial queries.

ExampleSELECT * FROM EDs x, ROADS yWHERE SDO_GEOM.RELATE(x.geometry,’ANYINTERACT’,

y.geometry,0.001)=‘TRUE’

Page 85: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Mining Stockport Census DataGOAL: investigate social phenomena related to unemployment SPATIAL DATASET: Stockport (Greater Manchester, UK)

Reference object: 578 EDs in StockportTask-relevant objects:

shopping areas (53 objects)employment areas (30 object) housing areas (9 objects)

Page 86: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Mining Stockport Census DataTwo experimental settings:

B0 is obtained by considering only ED layerB1 is obtained by considering all layers (L1+L2)

Page 87: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Mining Stockport Census DataPortion of the spatial model tree built by Mrs-SMOTI on

the entire dataset (B1 setting)

-- split on EDs number of migrants [≤ 47] (578 EDs)---- regression on EDs’ area (458 EDs)------ split on EDs-Shopping areas spatial relationship (94 EDs)

…------ split on EDs’ number of migrants (364 EDs)

…---- split on EDs’ area (120 EDs)------ leaf on EDs’ area (22 EDs)------ regression on EDs’ area

Boolean test on a thematic attribute

Test on an inter-layer relationship

Boolean test on a spatial property

Page 88: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Mining Stockport Census Data10-fold cross validationAverage Mean Square Error (Avg.MSE)Systems: Mrs-SMOTI vs. SMOTI, M5’Two transformations of original multirelational data into a classical double-entry table by computing :

P1 – spatial joins according to all possible intra-layer and inter-layer relationships multiple tuples are generated for the same reference objectP2 – average values for continuous attributes one tuple for each reference object

Page 89: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Mining Stockport Census Data

Mrs-SMOTI vs SMOTI and M5’average (avg) and standard deviation (std) of the mean squareError (mse) and number of leaves (#L) of the learned models Mrs-SMOTI always has performance better (Avg.MSE) than

SMOTI and M5’

Page 90: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 91: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Relationships Not Explicitly Modeled

MRDM methods do take advantage of information on the data model reported in the DB schema (e.g., foreign keys) in order to guide the search process.But … the spatial relationships are not explicitly modeled in the schema of a spatial DB.

Page 92: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Relationships Not Explicitly Modeled

Pre-compute spatial relationshipsSpatial weight matrix D for spatial linear modelsSingle DB relation in GeoMiner (Han et al., 1997)Materialize distance, direction, topological relations (Ester et al., 1999)Extract spatial relations and represent them as first-order predicates (Appice et al., 2005)

Page 93: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Relationships Not Explicitly ModeledPros

Spatial DB are rather staticCons

Very large number of spatial relationships between two layersSome of them might be unnecessarily extracted

Dynamically compute spatial relationships, but which of them?

Page 94: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Feature Selection BiasConcentrated linkage (Jensen & Neville, 2002)

High concentration of objects linked to a common neighbor

0 1

ED

BL BLBL

ED

BL

ED

BL

ED

BL

Linkage

Page 95: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Feature Selection BiasRelational Autocorrelation

The values of a given attribute are highly uniformamong objects that share a common neighbor

0 1BL

ED EDED

AutocorrelationBL

ED EDEDED

+ - + - + + +

Page 96: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Feature Selection BiasHigh Linkage and Autocorrelation

Decreased Effective Sample Size

Increase the variance of scores estimated

Bias increases as variance increases

Frequent in truespatial phenomena

Feature selection algorithms are biased in favor of features with large variance (even when they are not related to the class attribute).

Page 97: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Feature Selection Biasχ2-test for independence fail to discard uninformative features (it’s based on i.i.d. assumption)Most MRDM algorithms do not account for this biasException: relational probability tree learning uses a randomization test to adjust for feature selection bias (Neville et al. 2003)

Page 98: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department
Page 99: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department
Page 100: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Use Unlabelled DataIn a spatial domain the (semi-supervised) smootheness assumption is implied by positive autocorrelation of high density regionsTransductive setting appropriate for spatial classification and regressionCurrently only one work on transductive relational setting (Ceci et al., 2007) Promising results for spatial domains (Appice et al., 2007)

Page 101: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Collective InferenceIn predictive data mining tasks, patterns maytake the form:

yi = f(xi, xN(i), yN(i))Dependentvariable in

space i

Dependentvariable in space N(i)

Both yi and yN(i) have to be inferred collectively.

Page 102: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Collective InferenceA possible approach

Locally-learned individual inference models+

Joint inference procedure (e.g. relaxation labelling)Example: iterative classification (Neville & Jensen, 2000).

Page 103: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Collective InferenceJoint Relational Model: estimates the joint probability distribution over the variables both in i and N(i) and then jointly infer the values of both yi and yN(i).

Probabilistic relational models (Getoor et al., 2001, Neville & Jenssen, 2003)Autocorrelation in exploited to improve predictions

This inference procedure should be investigated in the context of spatial data mining

Page 104: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Hierarchies of Spatial ObjectsSpatial objects are often organized in hierarchiesA hierarchy of areal objects may also be inducedby the spatial relationship of containment

County

District2District1 Districtn

Ward1… Ward1Ward1

WardnWard1Ward2

A spatial

hierarchy for

UK census data

Page 105: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Hierarchies of Spatial ObjectsSpatial patterns involving the most abstract spatial objects are

well supported, but less confident

Spatial data mining methods should be able to explore the search space at different granularity levels.

Page 106: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Hierarchies of Spatial ObjectsNaive approach:

Level-by-level anaysisInformation on patterns found at a level is not usedto make search more efficient at a higher/lower level

More sophisticated approach:GeoAssociator (Koperski & Han, 1995)SPADA (Malerba & Lisi, 2001)

Page 107: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Knowledge Rich Data MiningKnowledge available on spatial phenomenaIn geography, many natural geographic dependencies

A port is adjacent to a water bodyMany non-novel and uninteresting patterns with a very high support and confidence.Use known dependencies to prune uninteresting patterns

SPADA

Page 108: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Knowledge Rich Data MiningStockport (UK) Characterising the area served by the M63 motorway12,466 strong ass. rulesMany pure spatial patterns

ed_on_M63(X), can_reach(X,Y) is_a(Y,ward_on_m63_ED) (90.0 %, 100.0 %)

Page 109: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Embedding Spatial ReasoningProcess by which information about objects in space is used to arrive to valid conclusions regarding the objects relationships.

Recursive definition of site accessibility

Page 110: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Embedding Spatial ReasoningQuantitative approach:

based on coordinates and distancesMore akin to machine reasoning

Qualitative approach (Freksa, 1991)Abstract representations (‘northwest’, ‘far’, …)Closely related to human reasoning EfficientDeals with imprecision, uncertainty and incompleteness

Page 111: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Embedding Spatial ReasoningEmbedding spatial inference engines in the spatial data mining systems: promising, but still unexplored.

SPADA: a limited form of spatial inference if rules of spatial reasoning are reported in the background knowledge

Page 112: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

AgendaModeling spatial informationSpatial patternSpatial data mining: main issuesOpportunities for a relational approachA case study: spatial model treesChallenges for a relational approachSummary

Page 113: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

SummarySpatial data mining presents several issues

Spatial objects have a geometryAre relatedAre of different typeAutocorrelation affects spatial phenomena

Solutions offered in spatial statistics are limitedDouble-entry table representationThe choice of neighborhood matrix is criticalSpatial dependencies handled in pre-processing

Page 114: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

SummaryThe relational approach is the most appropriate

Several methods have already been proposed for different tasks

But, there are still many challenges Dynamic handling of spatial dependencies & scalabilityBias caused by autocorrelationTransductive inferenceCollective inferenceHierarchies of objectsUse of spatial knowledge Use of specific spatial reasoners…

Page 115: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Outlook To develop effective solutions to spatial data analysis it is necessary to develop synergies between researchers working on different research topics:

Spatial statisticsMulti-relational data miningSpatial Databases and GISVisualization

Will this happen?Motivation for optimism: real applications (e.g., sales

prediction of individual shops, urban data analysis, location based services) demand for this collaboration.

Page 116: Mining Spatial Data: Opportunities and Challenges …ceci/micFiles/Mining Spatial Data...Mining Spatial Data: Opportunities and Challenges of a Relational Approach Donato Malerba Department

Spatial Data Mining @ CS Dept.-Univ. of BariAnnalisa AppiceMichelangelo CeciAntonietta LanzaAntonio TuriAntonio Varlaro

Thanks to them for their valuable contribution to this research topic.