acm gis 2007 1 an interactive framework for raster data spatial joins wan bae (computer science,...

29
1 ACM GIS 2007 An Interactive Framework for Raster Data Spatial Joins Wan Bae (Computer Science, University of Denver) Petr Vojtěchovský (Mathematics, University of Denver) Shayma Alkobaisi (Computer Science, University of Denver) Scott T. Leutenegger (Computer Science, University of Denver) Seon Ho Kim (Computer Science, University of Denver)

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1 ACM GIS 2007

An Interactive Framework for Raster Data Spatial Joins

Wan Bae (Computer Science, University of Denver)

Petr Vojtěchovský (Mathematics, University of Denver)

Shayma Alkobaisi (Computer Science, University of Denver)

Scott T. Leutenegger (Computer Science, University of Denver)

Seon Ho Kim (Computer Science, University of Denver)

2 ACM GIS 2007

Outline

Introduction

Issues and Problems

Probabilistic Joins

Sampling Joins

Interactive Framework

Experiments

Conclusion

3 ACM GIS 2007

Geographic Information Systems

Web applicationWeb application

datadata datadata

datadata

• CollectCollect• StoreStore• RetrieveRetrieve

• Integration of georeferenced dataIntegration of georeferenced data• Spatial queriesSpatial queries• Complex spatial data analysis & Complex spatial data analysis & modeling for decision supportmodeling for decision support

GIS

Web application

UsersUsers

datadata

datadatadatadata

4 ACM GIS 2007

Raster Data Model

(a) Satellite Image

0 1 2 3 4 5 6 7 8 90 R T1 R T2 H R3 R4 R R5 R6 R T T H7 R T T8 R9 R

(b) Raster Model

• A great portion of georeferenced data• Simple data structure but greater storage space• Continuously changing data

5 ACM GIS 2007

Continuously Changing Data

6 ACM GIS 2007

Raster Data Spatial Joins

(a) (b)

“Find the regions where rainfall rate is greater than 1.0 and wind speed is greater than 50”

7 ACM GIS 2007

Issues for User-driven Data Exploration

Fast Query response time

– Time consuming for exact answers due to large size of data sets

– Time intensive GIS decision support queries

– Lack of optimization and approximation techniques for raster data joins

Interactive query processing

– Lack of interactivities in traditional GIS

– No user control over query processing Visualization increases the utility of the GIS

8 ACM GIS 2007

Our Approach

Fast approximation of query results

1. probabilistic join

2. sampling join

Visualize intermediate results

1. “big picture” of query result

2. partial result: non-blocking joins

Allow users to control query processing

For faster and more effective decision support queries:

9 ACM GIS 2007

Our Approximations

2. Can use the result of a subset of data cell joins for the final answer?

R (8/16) S (9/16) = they must join!

1. What is the probability that R joins S?

1 joins / 2 cells ? / 16 cells

10

ACM GIS 2007

Augmented Quad-trees

Both data sets are indexed using Quad-trees

NW

SESW

NE NW

SESW

NE

11

ACM GIS 2007

Join Probability

Let X = [0, 1], m and n be randomly chosen intervals in X of length a, b. The probability p that m ∩ n ≠ 0

Join Probability of p (m ∩ n ≠ 0) = ?

12

ACM GIS 2007

1-d Join Probability

0 1

overlapped

dxaxaaxba

bapb

1

0

},0max{}1,min{)1)(1(

1),(

aa1 a2m

bb1 b2n

x x+bb1-b q

p

a1-a

13

ACM GIS 2007

2-d Join Probability

1

1

1111

1

1 1

1 ),(),()1)(1(

1),( dbda

b

b

a

apbap

babap

a b

a1

a2 a

m

b1

b2b

n

0

14

ACM GIS 2007

Look-up table for 2-d Join Probability

P 0.1 0.2 0.3 0.4 0.5

0.1 0.4636 0.6228 0.7414 0.8317 0.8997

0.2 0.6228 0.7683 0.8640 0.9277 0.9681

0.3 0.7414 0.8640 0.9343 0.9738 0.9930

0.4 0.8317 0.9277 0.9738 0.9937 0.9995

0.5 0.8997 0.9681 0.9930 0.9995 1.0

15

ACM GIS 2007

Probabilistic Join (PJ)

p( , )4

2

4

2

p( , )16

9

16

8

16

ACM GIS 2007

Probabilistic Join Result

(b) data set S (65536 x 65536)

(a) data set Q (65536 x 65536)

(e) 4th level joins(c) 2th level joins (d) 3th level joins

17

ACM GIS 2007

Incremental Stratified Sampling Join (ISSJ)

Utilize stratified random sampling technique from quad-

trees of two data sets R and S

Data randomization: Acceptance/Rejection method

1. Sampling step: sample data from outer data set R

2. Spatial joining step: joins with the corresponding data cell on inner data set S

3. Refining step: running estimates and confidence intervals

4. Visualization: display partial results (actual join results)

18

ACM GIS 2007

Stratified Random Sampling

ST1 ST2 ST3 ST4

02 21

ST1

ST2 ST3

ST4

19

ACM GIS 2007

Estimates and Confidence Interval

Population Proportion: fraction indicating the part of the sample having a particular interest

Estimated Value: the statistic computed from sample information using population proportion

Confidence interval: an interval that estimates a population parameter within a range of possible values at specified probability

Confidence level: the specified probability

20

ACM GIS 2007

Incremental Sampling Join Result

(b) Partial result(a) Estimated result

IA

NE

WI

CO

KS

MI

state airports confidence

interval

13

22

19

15

11

8

0.05 0.05 0.05 0.05 0.05 0.05

95

95

95

95

95

95

10% done

21

ACM GIS 2007

Interactive Join Framework

22

ACM GIS 2007

Experiments

PJ and ISSJ compared to full Quad-tree join.

Confidence level set to 95% in ISSJ

Varied buffer size and data sets size.

Data sets:

– Synthetic: U E, E U, U U

(65536 65536 and 262144 262144)

– Real: 6 data sets mineral resources for each state of AZ, CO, OR and WY from U.S. Geological Survey

(65536 65536)

23

ACM GIS 2007

Actual joins vs. 2-d PJ

sample size actual joins 2-d (error)

5% 54 48 (0.1060)

10% 109 99 (0.0917)

20% 218 197 (0.0963)

50% 545 494 (0.0936)

24

ACM GIS 2007

Accuracy of Estimates of ISSJ

Estimates vs. exact value for real data sets

number of processed cells

25

ACM GIS 2007

Time for Confidence Interval of ISSJ

Confidence Interval and I/Os for real data sets

sampling joinfull quad-tree join

26

ACM GIS 2007

ISSJ vs. PJ vs. Actual joins

(a) ISSJ w/10% CI (b) ISSJ w/5% CI

(a) Actual join (d) PJ

27

ACM GIS 2007

Time for Confidence Intervals

I/Os of PJ, ISSJ and the full quad-tree join for Colorado

28

ACM GIS 2007

Conclusion

A novel spatial join, Probabilistic Join, for raster data joins for obtaining a “big picture” visualization of query answer

An interactive raster spatial join algorithm, Incremental Refining Spatial Join, for confidence interval bounded estimated query answer of raster data joins

29

ACM GIS 2007

Thank you!