approximate querying about the past, the present, and the future in spatio-temporal databases

30
Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu

Upload: kaethe

Post on 06-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases. Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu. Motivation. Spatio-temporal databases vs. Data streams The monitoring applications Traffic supervision Mobile users monitoring - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

Approximate querying about the Past, the Present, and the Future

in Spatio-Temporal Databases

Jimeng Sun, Dimitris Papadias,

Yufei Tao, Bin Liu

Page 2: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

2

Motivation

• Spatio-temporal databases vs. Data streams• The monitoring applications

– Traffic supervision

– Mobile users monitoring

– Weather forecasting

• Example: – find the number of vehicles

in the city center now

• The challenge is to provide fast query response in highly intensive environment

Page 3: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

3

Problems and methods

• Problems:– How to efficiently store/summarize the spatio-temporal

information?

– How to approximately answer the query about the past, the present, and the future?

• Methods:– Adaptive multi-dimensional histogram (AMH)

– Historical synopsis

– Stochastic prediction method

Page 4: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

4

Related work

• Histograms– Static multi-dimensional histograms

• Equi-depth, Mhist, Minskew, Genhist, SQ

– Query-adaptive multi-dimensional histograms• STGrid, STHoles, SASH

• Other approximation methods– DCT, Wavelet, Sketch

• Spatio-temporal databases– Historical retrieval

– Future prediction

Page 5: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

5

Outline

• Introduction• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Historical synopsis

– Prediction model

• Experiment • Conclusion

Page 6: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

6

Query types

Present Time (PT)

Historical Time (HT)

Future Time (FT)

Queries

time

location

currentpast future

Page 7: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

7

System Overview

PT

HT

FT

Queries

AMH

Past Index

Historical Synopsis

PredictionModel

Spatio-temporalupdates

Page 8: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

8

Histogram

• Partition the space into buckets• Data within a bucket summarize by

the mean• The properties of a good histogram:

– Uniformity within each bucket

– Incremental updateable

0

20

40

60

80

100

0 20 40 60 80 100

0

20

40

60

80

100

0 20 40 60 80 100

bad

good

Page 9: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

9

Adaptive Multi-dimensional Histogram (AMH)

Regular cells

1 1 3 3 3 5

446312

1 1 5 3 4 5

5

4

5

9

111165

4 5

5 6

4

10

6

9

• Objective: minimize WVS=(areai∙vari) (Minskew [Acharya, Poosala, Ramaswamy 99])

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

BPT

b1

b2

b3

b4

b6

b5

Buckets

Page 10: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

10

Dynamic Maintenance of AMH

• Our scheme: record the information during the construction and modify the structure as needed.– 1. information update

• Update the bucket count

– 2. bucket reorganization• Merge: to claim buckets

• Split: to reduce WVS

Page 11: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

11

Information update of AMH

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

BPT

b1

b2

b3

b4

b6

b5

Buckets

mappingb1

b1

n2

n1

Page 12: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

12

Bucket reorganization -Merge

n1

n2 n3

b1 b2

b5

BPT

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

BPT

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

n4

b*

Merge

b1

b2

b*

b5

Buckets

Bucket Info:1. region [x-, x+][y-,y+]2. frequency: count/area3. 2nd moment:(for variance calculation)

•Merge the subtree that leads to minimal WVS increase

Page 13: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

13

Bucket reorganization -Split

n1

n2 n3

b1 b2

b5b*

Split

n1

n2 n3

b*1

b2b5b*

b*2

n4

b*3 b*4

n5

• Split the bucket that leads to maximal WVS decrease

Page 14: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

14

Features of AMH

• Bucket information is updated as new data arrive• Bucket extents continuously adapt the data

distribution changes• The maintenance does not affect the normal query

processing– It is interruptible at any moment of time

– It is performed at the CPU idle time

Page 15: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

15

Outline

• Introduction• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Historical synopsis

– Prediction model

• Experiment • Conclusion

Page 16: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

16

Historical Synopsis

• AMH maintains the current buckets.

• Past index stores the obsolete buckets.

• Past index: – Packed B-tree

– 3D R-tree

AMH

current bucketsrecent buckets

....

Past Index T

old buckets

....

main memorydisk

current cells

incoming streams

Page 17: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

17

Prediction Model

• Prediction based on velocity doesn’t work!– It is not realistic to assume velocity remains constant

between current time and query time

– Velocity is highly dynamic

• We suggest to use only the past and present location information to do prediction.

Page 18: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

18

Prediction Model (cont.)

FT

PredictionModel

HT

PT

Historical Synopsis

results

Parse

forecast the future using any time series prediction method: we use AR

0

2

4

6

8

10

0 10 20 30 40 50 60 70 80

Page 19: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

19

Outline

• Introduction• Related work• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Historical synopsis

– Prediction model

• Experiment • Conclusion

Page 20: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

20

Experiment settings

• Datasets– 2.5M updates for each dataset

– spatial: 50K mobile objects from 2 spatial dataset

– road: from a spatio-temporal generator (described in [Brinkhoff 2002] )

median finalinitial

Road network Data distribution

Page 21: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

21

Robustness with time

0.5M 1M 1.5M 2M 2.5Mnumber of location updates

error rate

0

4%

8%

12%

16%

5k

number of location updates0

10%

20%

30%

0.5M 1M 1.5M 2M 2.5M5k

spatial

road

Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time

Page 22: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

22

Comparison with conventional histogram

• Minskew (a static spatial histogram) is rebuilt every 50k location updates

• tp is the proportion between the cost of AMH and that of Minskew

• The re-organization operations of AMH are uniformly distributed among the 50k location updates.

error rate

0

10%

20%

30%

0.001 0.01 0.1 1

time proportion

error rate

10%

15%

20%

25%

30%

0.001 0.01 0.1 1

spatial

road

minskew

AMH

minskew

AMH

Page 23: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

23

The effect of update intensity

• B-tree performs better at the high update rate.

• R-tree provides much faster query response.

• In general, when query/update ratio is large (>30%), R-tree performs better.

CPU timemsec

0

1

2

3

4

5

PT HT FT

error rate

0%

5%

10%

15%

20%

25%

1k 10k 100kupdate rate update rate

error rate

0%

5%

10%

15%

20%

25%

1k 10k 100k

spatialroad

3D r-tree b-treeQuery type

Page 24: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

24

Conclusion

• We present a comprehensive approach for processing queries that refer to any time in history.

• The proposed architecture maintains– an incremental multi-dimensional histogram;

– a past index structure for storing the outdated buckets.

• Future queries are answered by a stochastic method that uses the recent history to predict the future.

Page 25: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

25

Q+A

Page 26: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

26

Summary

AMH

Past Index

Historical Synopsis

PredictionModel

0. goal: min(WVS)1. Info update2. Reorganization happens when CPU is idle

1.Recent buckets in memory2.Old buckets dump to the disk

Old

buc

kets

Forecast based on the present and past.

Page 27: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

27

Related work

• Static multi-dimensional histograms• Query-adaptive multi-dimensional histograms• Other multi-dimensional approximation methods• Spatio-temporal prediction methods• Spatio-temporal aggregation methods

Page 28: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

28

Evaluation over different query typeserror rate

q

0%

5%

10%

15%

20%

2% 4% 6% 8% 10%L

q

error rate

0%

5%

10%

15%

20%

25%

30%

35%

2% 4% 6% 8% 10%L

spatial

road

Page 29: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

29

Motivation (cont.)

• Spatio-temporal database (STDB) research:– historical retrieval

– future prediction

Page 30: Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases

30

Bucket reorganization -Split

n1

n2 n3

b1 b2

b5b*

b1

b2

b*

b5

BucketsSplit

b*1

b2

b*

b5

Buckets

n1

n2 n3

b*1

b2b5b*

b*2

n4

b*2

b*3 b*4

n5

b*3

b*4