geogra phical analys is

67
Geographical analysis Overlay, cluster analysis, auto-correlation, trends, models, network analysis, spatial data mining

Upload: gretchen-medina

Post on 03-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Geogra phical analys is. Overlay, cluster analys is , auto - correlati on , trends, model s , netw o rk analys is , spatial data mining. Geogra phical analys is. Combinati on of different geogra ph ic data sets or themes by overlay o r statisti cs Discovery of pat terns , dependencies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Geogra phical  analys is

Geographical analysis

Overlay, cluster analysis, auto-correlation, trends, models, network

analysis, spatial data mining

Page 2: Geogra phical  analys is

Geographical analysis

• Combination of different geographic data sets or themes by overlay or statistics

• Discovery of patterns, dependencies• Discovery of trends, changes (time)• Development of models• Interpolation, extrapolation, prediction• Spatial decision support, planning• Consequence analysis (What if?)

Page 3: Geogra phical  analys is

Example overlay

• Two subdivisions with labeled regions

Soil type 1Soil type 2Soil type 3Soil type 4

Birch forestBeech forestMixed forest

Birch forest on soil type 2

soil vegetation

Page 4: Geogra phical  analys is

Kinds of overlay

• Two subdivisions with the same boundaries- nominal and nominal Religion and voting per municipality- nominal and ratio Voting and income per municipality- ratio and ratio Average income and age of employees

• Two subdivisions with different boundariesSoil type and vegetation

• Subdivision and elevation modelVegetation and precipitation

Page 5: Geogra phical  analys is

Kinds of overlay, cont’d

• Subdivision and point setquarters in city, occurrences of violence on the street

• Two elevation modelselevation and precipitation

• Elevation model and point setelevation and epicenters of earthquakes

• Two point setsmoney machines, street robbery locations

• Network and subdivision, other network, elevation model

Page 6: Geogra phical  analys is

Result of overlay

• New subdivision or map layer, e.g. for further processing

• Table with combined data• Count, surface area

Soil Vegetation Area #patches

Type 1 Beech 30 ha 2Type 2 Birch 15 ha 2Type 3 Mixed 8 ha 1Type 4 Beech 2 ha 1…. ….

Page 7: Geogra phical  analys is

Buffer and overlay

• Neighborhood analysis: data of a theme within a given distance (buffer) of objects of another theme

Sightings of nesting locations of the great blue heron (point set)Rivers; buffer with width 500 m of a river

Overlay Nesting locations great blue heron near river

Page 8: Geogra phical  analys is

Overlay: ways of combination

• Combination (join) of attributes• One layer as selection for the other

– Vegetation types only for soil type 2– Land use within 1 km of a river

Page 9: Geogra phical  analys is

Overlay in raster

• Pixel-wise operation, if the rasters have the same coordinate (reference) system

Forest Population increaseabove 2% per year

Pixel-wiseAND

Both

Page 10: Geogra phical  analys is

Overlay in vector

• E.g. the plane sweep algorithm as given in Computational Geometry (line segment intersection), to get the overlay in a topological structure

• Using R-trees as an indexing structure to find intersections of boundaries

Page 11: Geogra phical  analys is

Combined (multi-way) overlays

• Site planning, new construction sites depending on multiple criteria

• Another example (earth sciences):Parametric land classification: partitioning of the land based on chosen, classified themes

Page 12: Geogra phical  analys is

Elevation Annual precipitation

Page 13: Geogra phical  analys is

Types of rock Overlay: partitioning based on the three themes

Page 14: Geogra phical  analys is

Analysis point set

• Points in an attribute space: statistics, e.g. regression, principal component analysis, dendrograms

(area, population, #crimes)

#population

#crimes (12, 34.000, 34)(14, 45.000, 31)(15, 41.000, 14)(17, 63.000, 82)(17, 66.000, 79) …… ……

Page 15: Geogra phical  analys is

Analysis point set

• Points in geographical space without associated value: clusters, patterns, regularity, spread

Actual average nearestneighbor distance versus expected Av. NN. Dist. for this number of points in the region

For example: volcanoes in a region; crimes in a city

Page 16: Geogra phical  analys is

Analysis point set

• Points in geographical space with value: up to what distance are measured values “similar” (or correlated)?

2014

13

1012

11

16

18

2115

17

16

22

2119

12

Page 17: Geogra phical  analys is

Analysis point set

• Temperature at location x and 5 km away from x is expected to be nearly the same

• Elevation (in Switzerland) at location x and 5 km away from x is not expected to be related (even over 1 km), but it is expected to be nearly the same 100 meters away

• Other examples:– depth to groundwater– soil humidity– nitrate concentration in the soil

Page 18: Geogra phical  analys is

Analysis point set

• Points in geographical space with value:auto-correlation (~ up to what distance are measured values “similar”, or correlated)

2014

13

1012

11

16

18

2115

17

16

22

2119

12 n points (n choose 2) pairs;each pair has a distance and a difference in value

Page 19: Geogra phical  analys is

distance

difference

distance

Classify distances and determine average per class

Averagedifference observed expected difference

2

2

2

Page 20: Geogra phical  analys is

distance distance

sill

range

Observed variogram Model variogram (linear)

Smaller distances more correlation, smaller variance

Averagedifference observed expected difference

2

2

nugget

Page 21: Geogra phical  analys is

Importance auto-correlation

• Descriptive statistic of a data set: describes the distance-dependency of auto-correlation

• Interpolation based on data further away than the range is nonsense

20

14

13

101211

16

18

21 15

17

16

22

21

1912

range

??

Page 22: Geogra phical  analys is

Importance auto-correlation

• If the range of a geographic variable is small, more sample point measurements are needed to obtain a good representation of the geographic variable through spatial interpolation

influences cost of an analysis or decision procedure, and quality of the outcome of the analysis

Page 23: Geogra phical  analys is

Analysis subdivision

• Nominal subdivision: auto-correlation(~ clustering of equivalent classes)

• Ratio subdivision: auto-correlation

PvdA

CDA

VVD

Auto-correlation No auto-correlation

Page 24: Geogra phical  analys is

Auto-correlation nominal subdivision

• 22 neighbor relations (adjacencies) among 12 provinces

• Pr(province A = VVD and province B = VVD) = 4/12 * 3/11

• E(VVD adj. VVD) = 22 * 12/132 = 2• Reality: 4 times

• E(CDA adj. PvdA) = 5.33; reality once

PvdA

CDA

VVD

Join count statistic:

4/12 * 4/11 * 2 * 22

Page 25: Geogra phical  analys is

Geographical models

• Properties of (geographical) models:– selective (simplification, more ideal)– approximative– analogous (resembles reality)– structured (usable, analyzable, transformable)– suggestive – re-usable (usable in related situations)

Page 26: Geogra phical  analys is

Geographical models

• Functions of models:– psychological (for understanding, visualization)– organizational (framework for definitions)– explanatory– constructive (beginning of theories, laws)– communicative (transfer scientific ideas)– predictive

Page 27: Geogra phical  analys is

Example: forest fire

• Is the Kröller-Müller museum well enough protected against (forest)fire?

• Data: proximity fire dept., burning properties of land cover, wind, origin of fire

• Model for: fire spread

b * ws * (1- sh) * (0.2 + cos )

b = burn factorws = wind speed = angle wind – direction pixelsh = soil humidity

Time neighbor pixel on fire: [1.41 *]

Page 28: Geogra phical  analys is

Forest fire

Forest; burn factor 0.8Heath; burn factor 0.6Road; burn factor 0.2Museum

Soilhumidity

Origin< 3 minutes< 6 minutes< 9 minutes> 9 minutes

Wind, speed 3

Page 29: Geogra phical  analys is

Forest fire model

• Selective: only surface cover, humidity and wind; no temperature, seasonal differences, …

• Approximative: surface cover in 4 classes; no distinction in forest type, etc., pixel based so direction discretized

• Structured: pixels, simple for definition relations between pixels

• Re-usable: approach/model also applies to other locations (and other spread processes)

Page 30: Geogra phical  analys is

Network analysis

• When distance or travel time on a network (graph) is considered

• Dijkstra’s shortest path algorithm• Reachability measure for a destination:

potential value

j

ijjcwi )(potential w = weight origin j = distance decay parameterc = distance cost betweenorigin j and destination i

j

ij

Page 31: Geogra phical  analys is

Example reachability

• Law Ambulance Transport: every location must be reachable within 15 minutes (from origin of ambulance)

Page 32: Geogra phical  analys is

Example reachability

• Physician’s practice:- optimal practice size: 2350 (minimum: 800)- minimize distance to practice - improve current situation with as few changes as possible

Page 33: Geogra phical  analys is

Current situation: 16 practices, 30.000 people, average 1875 per practice

Computed, improved situation: 13 practices

Page 34: Geogra phical  analys is

Example in table

Original New

Number of practices 16 13

Number of practice locations 9 7

Number of practices < 800 size 2 0

Number of people > 3 km 3957 4624

Average travel distance (km) 0,9 1,2

Largest distance (km) 5,2 5,4

Page 35: Geogra phical  analys is

Analysis elevation model

• Landscape shape recognition:- peaks and pits- valleys and ridges- convexity, concavity

• Water flow, erosion,watershed regions,landslides, avalanches

Page 36: Geogra phical  analys is

Spatial data mining

• Finding spatial patterns in large spatial data sets– within one spatial data set– across two or more data sets

• With time: spatio-temporal data mining

Page 37: Geogra phical  analys is

Spatial data mining and computation

• “Geographic data mining involves the application of computational tools to revealinteresting patterns in objects and events distributed in geographic space and across time” (Miller & Han, 2001)

• Large data sets attempt to carefully define interesting patterns (to avoid finding non-interesting patterns) advanced algorithms neededfor efficiency

Page 38: Geogra phical  analys is
Page 39: Geogra phical  analys is

Clustering?

• Are the people clustered in this room? How do we define a cluster?

• In spatial data mining we have objects/ entities with a location given by coordinates

• Cluster definitions involve distance between locations

Page 40: Geogra phical  analys is

Clustering - options

• Determine whether clustering occurs• Determine the degree of clustering• Determine the clusters• Determine the largest cluster

• Determine the outliers

Page 41: Geogra phical  analys is
Page 42: Geogra phical  analys is
Page 43: Geogra phical  analys is

Co-location

• Are the men clustered?• Are the women clustered?

• Is there a co-location of men and women?

co-location pattern

Page 44: Geogra phical  analys is
Page 45: Geogra phical  analys is

Co-location

• Like before, we may be interested in– is there co-location?– the degree of co-location– the largest co-location– the co-locations themselves– the objects not involved in co-location

Page 46: Geogra phical  analys is
Page 47: Geogra phical  analys is

Spatio-temporal data

• Locations have a time stamp• Interesting patterns involve space and time• Example here: time-stamped point set

Page 48: Geogra phical  analys is

Trajectory data

• Entities with a trajectory (time-stamped motion path)

• Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ...

• n entities = trajectories; n = 10 – 100,000• t time steps; t = 10 – 100,000

input size is nt

• m size subgroup (unknown); m = 10 – 100,000

Page 49: Geogra phical  analys is

Trajectory data

• Migration patterns of animals• Trajectories of tornadoes• Tracking of (suspect) individuals for security• Lifelines of people for social behavior

Page 50: Geogra phical  analys is

Example pattern in trajectories

• What is the location visited by most entities?

location = circular region of specified radius

Page 51: Geogra phical  analys is

Example pattern in trajectories

• What is the location visited by most entities?

location = circular region of specified radius

4 entities

Page 52: Geogra phical  analys is

Example pattern in trajectories

• What is the location visited by most entities?

location = circular region of specified radius

3 entities

Page 53: Geogra phical  analys is

Example pattern in trajectories

• Compute buffer of each trajectory

Page 54: Geogra phical  analys is

Example pattern in trajectories

• Compute buffer of each trajectory

0

1

2

1

11

• Compute the arrangement of the buffers and the cover count of each cell

1

Page 55: Geogra phical  analys is

Example pattern in trajectories

• One trajectory has t time stamps; its buffer can be computed in O(t log t) time

• All buffers can be computed in O(nt log t) time• The arrangement can be computed in

O(nt log (nt) + k) time, where k = O((nt)2) is the complexity of the arrangement

• Cell cover counts are determined in O(k) time

Page 56: Geogra phical  analys is

Example pattern in trajectories

• Total: O(nt log (nt) + k) time• If the most visited location is visited by

m entities, this is O(nt log (nt) + ntm)

• Note: input size is nt ;n entities, each with location at t moments

Page 57: Geogra phical  analys is

Patterns in entity data

Spatial data• n points (locations)• Distance is important

– clustering pattern

• Presence of attributes (e.g. male/female):– co-location patterns

Spatio-temporal data• n trajectories, each

has t time steps• Distance is time-

dependent– flock pattern– meet pattern

• Heading and speed are important and are also time-dependent

Page 58: Geogra phical  analys is

Patterns in trajectories

• n trajectories, each with t time steps n polygonal lines with t vertices

Page 59: Geogra phical  analys is

Patterns in trajectories

• Flock and meet patterns: large enough subset that has same “character” during a time interval– close to each other– same direction of motion– ...Flock: changing locationMeet: fixed location

• Determine the longest duration pattern

Page 60: Geogra phical  analys is

Patterns in trajectories• Longest flock: given a radius r and subset

size m, determine the longest time interval for which the m entities were within each other’s proximity (circle radius r)

Time = 0 1 65432 7 8

longest flock in [ 1.9 , 6.4 ]

m = 3

Page 61: Geogra phical  analys is

Patterns in trajectories

• Computing the longest flock is NP-hard• This remains true for radius cr approximations with

c < 2• A radius 2 approximation of the longest flock can

be computed in time O(n2 t log n)

... meaning: if the longest flock for radius rhas duration , then we surely find a flock ofduration for radius 2r

Page 62: Geogra phical  analys is

Patterns in trajectories

flock

meet

fixed subsetm = 3

fixed radius

Page 63: Geogra phical  analys is

Patterns in trajectories

• Go into 3D (space-time) for algorithms

time

0

1

2

4

3

flock meet

Page 64: Geogra phical  analys is

Patterns in trajectories

Exact radius results

flock

meet

NP-hard

O(n2t2 (n2 log n + t))

Page 65: Geogra phical  analys is

Patterns in trajectories

Approximate radius results

flock

meet

O(n2t log n)

O((n2t log n) / (m2))

factor 2

factor 1+

Page 66: Geogra phical  analys is

Patterns in trajectories

• Flock and meet patterns require algorithms in 3-dimensional space (space-time)

• Exact algorithms are inefficient only suitable for smaller data sets

• Approximation can reduce running time with an order of magnitude

Page 67: Geogra phical  analys is

Summary

• There are many types of geographical analysis, it is the main task of a GIS

• Overlay and buffer analysis are most important• Statistics is also very important• Spatial and spatio-temporal data mining gives

new types of analysis of geographic data