geogra phical analys is
DESCRIPTION
Geogra phical analys is. Overlay, cluster analys is , auto - correlati on , trends, model s , netw o rk analys is , spatial data mining. Geogra phical analys is. Combinati on of different geogra ph ic data sets or themes by overlay o r statisti cs Discovery of pat terns , dependencies - PowerPoint PPT PresentationTRANSCRIPT
Geographical analysis
Overlay, cluster analysis, auto-correlation, trends, models, network
analysis, spatial data mining
Geographical analysis
• Combination of different geographic data sets or themes by overlay or statistics
• Discovery of patterns, dependencies• Discovery of trends, changes (time)• Development of models• Interpolation, extrapolation, prediction• Spatial decision support, planning• Consequence analysis (What if?)
Example overlay
• Two subdivisions with labeled regions
Soil type 1Soil type 2Soil type 3Soil type 4
Birch forestBeech forestMixed forest
Birch forest on soil type 2
soil vegetation
Kinds of overlay
• Two subdivisions with the same boundaries- nominal and nominal Religion and voting per municipality- nominal and ratio Voting and income per municipality- ratio and ratio Average income and age of employees
• Two subdivisions with different boundariesSoil type and vegetation
• Subdivision and elevation modelVegetation and precipitation
Kinds of overlay, cont’d
• Subdivision and point setquarters in city, occurrences of violence on the street
• Two elevation modelselevation and precipitation
• Elevation model and point setelevation and epicenters of earthquakes
• Two point setsmoney machines, street robbery locations
• Network and subdivision, other network, elevation model
Result of overlay
• New subdivision or map layer, e.g. for further processing
• Table with combined data• Count, surface area
Soil Vegetation Area #patches
Type 1 Beech 30 ha 2Type 2 Birch 15 ha 2Type 3 Mixed 8 ha 1Type 4 Beech 2 ha 1…. ….
Buffer and overlay
• Neighborhood analysis: data of a theme within a given distance (buffer) of objects of another theme
Sightings of nesting locations of the great blue heron (point set)Rivers; buffer with width 500 m of a river
Overlay Nesting locations great blue heron near river
Overlay: ways of combination
• Combination (join) of attributes• One layer as selection for the other
– Vegetation types only for soil type 2– Land use within 1 km of a river
Overlay in raster
• Pixel-wise operation, if the rasters have the same coordinate (reference) system
Forest Population increaseabove 2% per year
Pixel-wiseAND
Both
Overlay in vector
• E.g. the plane sweep algorithm as given in Computational Geometry (line segment intersection), to get the overlay in a topological structure
• Using R-trees as an indexing structure to find intersections of boundaries
Combined (multi-way) overlays
• Site planning, new construction sites depending on multiple criteria
• Another example (earth sciences):Parametric land classification: partitioning of the land based on chosen, classified themes
Elevation Annual precipitation
Types of rock Overlay: partitioning based on the three themes
Analysis point set
• Points in an attribute space: statistics, e.g. regression, principal component analysis, dendrograms
(area, population, #crimes)
#population
#crimes (12, 34.000, 34)(14, 45.000, 31)(15, 41.000, 14)(17, 63.000, 82)(17, 66.000, 79) …… ……
Analysis point set
• Points in geographical space without associated value: clusters, patterns, regularity, spread
Actual average nearestneighbor distance versus expected Av. NN. Dist. for this number of points in the region
For example: volcanoes in a region; crimes in a city
Analysis point set
• Points in geographical space with value: up to what distance are measured values “similar” (or correlated)?
2014
13
1012
11
16
18
2115
17
16
22
2119
12
Analysis point set
• Temperature at location x and 5 km away from x is expected to be nearly the same
• Elevation (in Switzerland) at location x and 5 km away from x is not expected to be related (even over 1 km), but it is expected to be nearly the same 100 meters away
• Other examples:– depth to groundwater– soil humidity– nitrate concentration in the soil
Analysis point set
• Points in geographical space with value:auto-correlation (~ up to what distance are measured values “similar”, or correlated)
2014
13
1012
11
16
18
2115
17
16
22
2119
12 n points (n choose 2) pairs;each pair has a distance and a difference in value
distance
difference
distance
Classify distances and determine average per class
Averagedifference observed expected difference
2
2
2
distance distance
sill
range
2σ
Observed variogram Model variogram (linear)
Smaller distances more correlation, smaller variance
Averagedifference observed expected difference
2
2
nugget
Importance auto-correlation
• Descriptive statistic of a data set: describes the distance-dependency of auto-correlation
• Interpolation based on data further away than the range is nonsense
20
14
13
101211
16
18
21 15
17
16
22
21
1912
range
??
Importance auto-correlation
• If the range of a geographic variable is small, more sample point measurements are needed to obtain a good representation of the geographic variable through spatial interpolation
influences cost of an analysis or decision procedure, and quality of the outcome of the analysis
Analysis subdivision
• Nominal subdivision: auto-correlation(~ clustering of equivalent classes)
• Ratio subdivision: auto-correlation
PvdA
CDA
VVD
Auto-correlation No auto-correlation
Auto-correlation nominal subdivision
• 22 neighbor relations (adjacencies) among 12 provinces
• Pr(province A = VVD and province B = VVD) = 4/12 * 3/11
• E(VVD adj. VVD) = 22 * 12/132 = 2• Reality: 4 times
• E(CDA adj. PvdA) = 5.33; reality once
PvdA
CDA
VVD
Join count statistic:
4/12 * 4/11 * 2 * 22
Geographical models
• Properties of (geographical) models:– selective (simplification, more ideal)– approximative– analogous (resembles reality)– structured (usable, analyzable, transformable)– suggestive – re-usable (usable in related situations)
Geographical models
• Functions of models:– psychological (for understanding, visualization)– organizational (framework for definitions)– explanatory– constructive (beginning of theories, laws)– communicative (transfer scientific ideas)– predictive
Example: forest fire
• Is the Kröller-Müller museum well enough protected against (forest)fire?
• Data: proximity fire dept., burning properties of land cover, wind, origin of fire
• Model for: fire spread
b * ws * (1- sh) * (0.2 + cos )
b = burn factorws = wind speed = angle wind – direction pixelsh = soil humidity
Time neighbor pixel on fire: [1.41 *]
Forest fire
Forest; burn factor 0.8Heath; burn factor 0.6Road; burn factor 0.2Museum
Soilhumidity
Origin< 3 minutes< 6 minutes< 9 minutes> 9 minutes
Wind, speed 3
Forest fire model
• Selective: only surface cover, humidity and wind; no temperature, seasonal differences, …
• Approximative: surface cover in 4 classes; no distinction in forest type, etc., pixel based so direction discretized
• Structured: pixels, simple for definition relations between pixels
• Re-usable: approach/model also applies to other locations (and other spread processes)
Network analysis
• When distance or travel time on a network (graph) is considered
• Dijkstra’s shortest path algorithm• Reachability measure for a destination:
potential value
j
ijjcwi )(potential w = weight origin j = distance decay parameterc = distance cost betweenorigin j and destination i
j
ij
Example reachability
• Law Ambulance Transport: every location must be reachable within 15 minutes (from origin of ambulance)
Example reachability
• Physician’s practice:- optimal practice size: 2350 (minimum: 800)- minimize distance to practice - improve current situation with as few changes as possible
Current situation: 16 practices, 30.000 people, average 1875 per practice
Computed, improved situation: 13 practices
Example in table
Original New
Number of practices 16 13
Number of practice locations 9 7
Number of practices < 800 size 2 0
Number of people > 3 km 3957 4624
Average travel distance (km) 0,9 1,2
Largest distance (km) 5,2 5,4
Analysis elevation model
• Landscape shape recognition:- peaks and pits- valleys and ridges- convexity, concavity
• Water flow, erosion,watershed regions,landslides, avalanches
Spatial data mining
• Finding spatial patterns in large spatial data sets– within one spatial data set– across two or more data sets
• With time: spatio-temporal data mining
Spatial data mining and computation
• “Geographic data mining involves the application of computational tools to revealinteresting patterns in objects and events distributed in geographic space and across time” (Miller & Han, 2001)
• Large data sets attempt to carefully define interesting patterns (to avoid finding non-interesting patterns) advanced algorithms neededfor efficiency
Clustering?
• Are the people clustered in this room? How do we define a cluster?
• In spatial data mining we have objects/ entities with a location given by coordinates
• Cluster definitions involve distance between locations
Clustering - options
• Determine whether clustering occurs• Determine the degree of clustering• Determine the clusters• Determine the largest cluster
• Determine the outliers
Co-location
• Are the men clustered?• Are the women clustered?
• Is there a co-location of men and women?
co-location pattern
Co-location
• Like before, we may be interested in– is there co-location?– the degree of co-location– the largest co-location– the co-locations themselves– the objects not involved in co-location
Spatio-temporal data
• Locations have a time stamp• Interesting patterns involve space and time• Example here: time-stamped point set
Trajectory data
• Entities with a trajectory (time-stamped motion path)
• Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ...
• n entities = trajectories; n = 10 – 100,000• t time steps; t = 10 – 100,000
input size is nt
• m size subgroup (unknown); m = 10 – 100,000
Trajectory data
• Migration patterns of animals• Trajectories of tornadoes• Tracking of (suspect) individuals for security• Lifelines of people for social behavior
Example pattern in trajectories
• What is the location visited by most entities?
location = circular region of specified radius
Example pattern in trajectories
• What is the location visited by most entities?
location = circular region of specified radius
4 entities
Example pattern in trajectories
• What is the location visited by most entities?
location = circular region of specified radius
3 entities
Example pattern in trajectories
• Compute buffer of each trajectory
Example pattern in trajectories
• Compute buffer of each trajectory
0
1
2
1
11
• Compute the arrangement of the buffers and the cover count of each cell
1
Example pattern in trajectories
• One trajectory has t time stamps; its buffer can be computed in O(t log t) time
• All buffers can be computed in O(nt log t) time• The arrangement can be computed in
O(nt log (nt) + k) time, where k = O((nt)2) is the complexity of the arrangement
• Cell cover counts are determined in O(k) time
Example pattern in trajectories
• Total: O(nt log (nt) + k) time• If the most visited location is visited by
m entities, this is O(nt log (nt) + ntm)
• Note: input size is nt ;n entities, each with location at t moments
Patterns in entity data
Spatial data• n points (locations)• Distance is important
– clustering pattern
• Presence of attributes (e.g. male/female):– co-location patterns
Spatio-temporal data• n trajectories, each
has t time steps• Distance is time-
dependent– flock pattern– meet pattern
• Heading and speed are important and are also time-dependent
Patterns in trajectories
• n trajectories, each with t time steps n polygonal lines with t vertices
Patterns in trajectories
• Flock and meet patterns: large enough subset that has same “character” during a time interval– close to each other– same direction of motion– ...Flock: changing locationMeet: fixed location
• Determine the longest duration pattern
Patterns in trajectories• Longest flock: given a radius r and subset
size m, determine the longest time interval for which the m entities were within each other’s proximity (circle radius r)
Time = 0 1 65432 7 8
longest flock in [ 1.9 , 6.4 ]
m = 3
Patterns in trajectories
• Computing the longest flock is NP-hard• This remains true for radius cr approximations with
c < 2• A radius 2 approximation of the longest flock can
be computed in time O(n2 t log n)
... meaning: if the longest flock for radius rhas duration , then we surely find a flock ofduration for radius 2r
Patterns in trajectories
flock
meet
fixed subsetm = 3
fixed radius
Patterns in trajectories
• Go into 3D (space-time) for algorithms
time
0
1
2
4
3
flock meet
Patterns in trajectories
Exact radius results
flock
meet
NP-hard
O(n2t2 (n2 log n + t))
Patterns in trajectories
Approximate radius results
flock
meet
O(n2t log n)
O((n2t log n) / (m2))
factor 2
factor 1+
Patterns in trajectories
• Flock and meet patterns require algorithms in 3-dimensional space (space-time)
• Exact algorithms are inefficient only suitable for smaller data sets
• Approximation can reduce running time with an order of magnitude
Summary
• There are many types of geographical analysis, it is the main task of a GIS
• Overlay and buffer analysis are most important• Statistics is also very important• Spatial and spatio-temporal data mining gives
new types of analysis of geographic data