icde-2006 subramanian arumugam christopher jermaine department of computer science university of...

30
ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering nt-of-Approach Join for Moving Objec 22 nd International Conference on Data Engineering

Upload: rosamond-baldwin

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

ICDE Challenges 3-dimensional space + time Large # of objects Massive amount of data

TRANSCRIPT

Page 1: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

ICDE-2006

Subramanian Arumugam Christopher Jermaine

Department of Computer ScienceUniversity of Florida

22nd International Conference on Data Engineering

Closest-Point-of-Approach Join for Moving Object Histories

22nd International Conference on Data Engineering

Page 2: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

2

ICDE-2006

SELECT distinct (r, s) FROM R as r, S as s, TIME tWHERE dist (r, s, t) < 0.5 AND (r(t).altd - s(t).altd) ≥ -1000 AND (r(t).altd - s(t).altd) ≤ 1000 AND s(t) C AND r(t) C AND t ≥ 'JAN-1-2005’ AND t ≤ 'MAR-31-2005'

“Find all commercial airliners that approached within 1000 vertical feet and 0.5 miles of a single engine plane in the BOS/JFK/EWR/LGA corridor C in the first three months of last year”

CPA-Join Is Useful For Analysis Of Spatiotemporal Data

Commercial airliners R, single engine planes S

Page 3: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

3

ICDE-2006

Challenges• 3-dimensional space + time• Large # of objects• Massive amount of data

Page 4: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

4

ICDE-2006

CPA Illustration for Straight Line TrajectoriesObject p

Object q

CPA - Position at which two dynamically moving objects attain their closest possible distance

Page 5: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

5

ICDE-2006

50403020100 3

040

50

20

10

60

70

y

x

01

234

5

01 2

34

5

40,32 38,1851,27 49,12 5,32 6,26

15,39 59,18 27,38 11,49

5,32 24,65

Time Object P Object Q 0 1 2 3 4 5

50403020100 3

040

50

20

10

60

70

y

x

01

234

5

01 2

34

5

Polyline approximation

Sampled PositionsMoving Object Trajectories

distcpa

Page 6: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

6

ICDE-2006

Simple CPA-Join

Procedure CPA (Object P, Object Q, distance d)1. List result = {};2. for each pair of segments (p P, q Q) 3. if CPA_distance (p,q) d4. result += (p,q);5. return result;

Need to compare only those segments whose time interval overlaps

Plane sweep

Find all object pairs (p P, q Q) from relations P and Q such that CPA-distance (p,q) d

Page 7: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

7

ICDE-2006

CPA-Join using Simple Plane Sweep- First sort the segments in P and Q along time dimension (external

sort)- While there is still some unprocessed data

- Read in enough segments from P and Q to fill the main memory buffer

- Next, sweep a vertical line along the time dimension.- Maintain a sweepline data structure which keeps tracks of all

active segments that intersect the sweep line- As the sweep line progresses, the sweepline data structure is

updated with insertions (new segments that became active) and deletions (segments whose time period has expired)

- During updates to the sweepline structure, an all-pairs comparison returns valid results’

Page 8: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

8

ICDE-2006

CPA-Join using Plane Sweep

Sweep line has to pause at every new sample point encountered. Processing multi-gigabyte dataset can take a long time

memory

disk

Page 9: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

9

ICDE-2006

Group segments using a bounding box approximation

diskIn the best case, just 1 comparison is needed

memory

memory

disk

Page 10: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

10

ICDE-2006

Algorithm: Layered Plane SweepWhile there is still some unprocessed data in disk

Read in data from relations P and Q to fill in the bufferConstruct MBR for the trajectory of every object in the bufferSort MBRs along one of the spatial dimension and do a plane-sweep in it to identify qualifying MBR pairsExpand the MBRs to obtain the individual segmentsSort segments along time dimension and do a plane-sweep along time to obtain the actual results

Page 11: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

11

ICDE-2006

Layered Plane-Sweep Example

But one size doesn’t fit all!

Page 12: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

12

ICDE-2006

- Indexes can be used to do CPA-Join- But (almost) all indexes use MBR approximation- And MBRs impose predefined granularities

p

q

x

yz

A Note on Indexing

Page 13: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

13

ICDE-2006

Layered Plane Sweep..what is the problem?• Layered Plane Sweep always processes the entire fraction of

data held in memory buffers• When objects interact heavily such an approach may lead to

no pruning at all

In the best case, just one comparison is needed

Though less buffer is processed initially, overall efficiency can be betterEfficiency of layered technique is not tied to the amount of data processed, but to choosing a granularity that minimizes the # of distance computations

Page 14: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

14

ICDE-2006

Cost to Process Data in Memory Buffer • Cost can be approximated as a function of distance

computations (which dominate execution time) cost = (nseg + nMBR) where

nseg is the # of segment level comparisons nMBR is the # of bounding box comparisons

• In general, cost for a fraction (0 ≤ ≤ 1) of the buffer

cost = (nseg + nMBR) * (1/)

Page 15: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

15

ICDE-2006

What we haveLayered Plane

Sweep•processes large fraction ( is large)•good when there is light interaction•bad when there is heavy interaction

Simple Plane Sweep•processes tiny fraction ( is small)•good when there is heavy interaction•bad when there is light interaction What we want

An Adaptive Algorithm•processes a fraction that maximizes performance ( varies)•Tunes to the characteristics of underlying data•Provide superior performance under all scenarios

Page 16: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

16

ICDE-2006

Algorithm: Adaptive Plane SweepWhile there is still some unprocessed data in disk

Read in data from relations P and Q to fill in the bufferChoose a fraction of the data that maximizes performanceProcess the chosen fraction of data using Layered Plane Sweep

Page 17: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

17

ICDE-2006

How many fractions should we consider?

How to estimate the cost for a given fraction ?

“Evaluate increasing buffer fractions from 0 to 1 and choose the fraction with the minimum cost”

Goal: Choose a fraction of data that maximizes performance

Page 18: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

18

ICDE-2006

• Exact cost is known only after the fact! • To know the cost associated with a given , we

need to actually execute the join (layered plane sweep) at that granularity

How to estimate Cost for a given fraction

Estimate cost using a simple online sampling algorithm [HH97]

Page 19: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

19

ICDE-2006

Cost Estimation through sampling

Given: Relations P and Q and alphaConsider segments within Construct MBRs for the objects in PUntil the estimate of cost is accurate to within +/- 10%

– Pick randomly an object q1 from Q and construct a MBR for its trajectory

– Join q1 with all objects in P– Compute nMBR,q1 and nseg,q1– Estimate cost

How to estimate Cost for a given fraction (Contd.)

Page 20: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

20

ICDE-2006

How many fractions to consider?

– Computing cost for all not practical..it will offset any benefit that we gain from the adaptive technique..we need a strategy to limit the # of fractions that we process

“Evaluate increasing buffer fractions from 0 to 1 and choose the fraction with the minimum cost”

Page 21: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

21

ICDE-2006

How many fractions to consider? vs cost graph is not linear, it exhibits convexity

2131415161718191

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Convex region represents the candidate region with the minimum cost

We can get-away with evaluating the cost for a small k fractions of

Fraction considered

Cost

(milli

ons)

Page 22: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

22

ICDE-2006

2131415161718191

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

How to choose the k fractions?K = 10; tstart=32; tend=53

Fraction Time range

Cost

1= 0.11 [32-33.27] 90

2= 0.14 [32-33.61] 71

3= 0.18 [32-34.05] 52

4= 0.23 [32-34.60] 37

5= 0.30 [32-35.31] 31

6= 0.38 [32-36.21] 35

7= 0.48 [32-37.35] 41

8= 0.61 [32-38.80] 52

9= 0.78 [32-40.65] 59

10= 1.0 [32.0-53.0] 71

Acceptable candidates

r = tend - tstart

1 = r(1/k)/r

i = (r. 1)i/r

Fraction chosen can be fine-tuned through recursive calls

Page 23: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

23

ICDE-2006

Putting it all together

Fill Buffer

Optimizer

Layered Plane Sweep

More data?

Relation R, S; distance d; Parameter k

Evaluate k fractions, choose best

Process join on best fraction

Read from relations R and S

Page 24: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

24

ICDE-2006

Benchmarking• Code: Implemented and tested the various

alternatives in C/C++– R-Trees, Simple Sweep, Layered Sweep,

Adaptive Sweep with various parameter settings

• Workload: 2 relations, 100,000 objects (50 GB)– Physics-based Simulation data set– Synthetic data set

• Hardware: Linux 2.4 GHz pentium Xeon, 1 GB Main memory, 2 IDE drives 15,000 rpm

• Setup: 64 KB page size, buffer size 10,000 pages

Page 25: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

25

ICDE-2006

Collision Data Set

100,000 objects, collision occurs during time range [1500 - 2500]

Snapshot at timetick 1500

Page 26: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

26

ICDE-2006

Results - Execution Time for different Strategies

0

2000

4000

6000

8000

10000

12000

0 10 20 30 40 50 60 70 80 90 100

% of join completed

Exec

utio

n tim

e (s

econ

ds)

R-tre

e

simple sw

eep layered sweep

adaptive sweepK=20K=10K=5

Page 27: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

27

ICDE-2006

Buffer Choices made by the optimizer

30

40

50

60

70

80

90

100

0 400 800 1200 1600 2000 2400 2800

Virtual time line in the data set

Frac

tion

of b

uffer

cho

sen

Page 28: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

28

ICDE-2006

Discussion R-trees couldn’t do enough pruning to make a

difference Simple plane-sweep works well when there is

heavy interaction among objects Layered plane-sweep works well when there is

light interaction Adaptive version transitions smoothly between

these extremes Recursive call to fine-tune candidate region

doesn’t seem to help much

Page 29: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

29

ICDE-2006

Conclusion…• CPA-Join for spatiotemporal relations• Proposed a novel adaptive join algorithm for

moving object histories based on extension of the plane-sweep

• Many practical applications

Page 30: ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering

30

ICDE-2006

Questions?

Thank You!

Subramanian ([email protected])