approximate querying about the past, the present, and the future in spatio-temporal databases

Post on 06-Jan-2016

38 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases. Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu. Motivation. Spatio-temporal databases vs. Data streams The monitoring applications Traffic supervision Mobile users monitoring - PowerPoint PPT Presentation

TRANSCRIPT

Approximate querying about the Past, the Present, and the Future

in Spatio-Temporal Databases

Jimeng Sun, Dimitris Papadias,

Yufei Tao, Bin Liu

2

Motivation

• Spatio-temporal databases vs. Data streams• The monitoring applications

– Traffic supervision

– Mobile users monitoring

– Weather forecasting

• Example: – find the number of vehicles

in the city center now

• The challenge is to provide fast query response in highly intensive environment

3

Problems and methods

• Problems:– How to efficiently store/summarize the spatio-temporal

information?

– How to approximately answer the query about the past, the present, and the future?

• Methods:– Adaptive multi-dimensional histogram (AMH)

– Historical synopsis

– Stochastic prediction method

4

Related work

• Histograms– Static multi-dimensional histograms

• Equi-depth, Mhist, Minskew, Genhist, SQ

– Query-adaptive multi-dimensional histograms• STGrid, STHoles, SASH

• Other approximation methods– DCT, Wavelet, Sketch

• Spatio-temporal databases– Historical retrieval

– Future prediction

5

Outline

• Introduction• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Historical synopsis

– Prediction model

• Experiment • Conclusion

6

Query types

Present Time (PT)

Historical Time (HT)

Future Time (FT)

Queries

time

location

currentpast future

7

System Overview

PT

HT

FT

Queries

AMH

Past Index

Historical Synopsis

PredictionModel

Spatio-temporalupdates

8

Histogram

• Partition the space into buckets• Data within a bucket summarize by

the mean• The properties of a good histogram:

– Uniformity within each bucket

– Incremental updateable

0

20

40

60

80

100

0 20 40 60 80 100

0

20

40

60

80

100

0 20 40 60 80 100

bad

good

9

Adaptive Multi-dimensional Histogram (AMH)

Regular cells

1 1 3 3 3 5

446312

1 1 5 3 4 5

5

4

5

9

111165

4 5

5 6

4

10

6

9

• Objective: minimize WVS=(areai∙vari) (Minskew [Acharya, Poosala, Ramaswamy 99])

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

BPT

b1

b2

b3

b4

b6

b5

Buckets

10

Dynamic Maintenance of AMH

• Our scheme: record the information during the construction and modify the structure as needed.– 1. information update

• Update the bucket count

– 2. bucket reorganization• Merge: to claim buckets

• Split: to reduce WVS

11

Information update of AMH

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

BPT

b1

b2

b3

b4

b6

b5

Buckets

mappingb1

b1

n2

n1

12

Bucket reorganization -Merge

n1

n2 n3

b1 b2

b5

BPT

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

BPT

n1

n2 n3

n4

b1 b2

b4b3

b5

n5 b6

n4

b*

Merge

b1

b2

b*

b5

Buckets

Bucket Info:1. region [x-, x+][y-,y+]2. frequency: count/area3. 2nd moment:(for variance calculation)

•Merge the subtree that leads to minimal WVS increase

13

Bucket reorganization -Split

n1

n2 n3

b1 b2

b5b*

Split

n1

n2 n3

b*1

b2b5b*

b*2

n4

b*3 b*4

n5

• Split the bucket that leads to maximal WVS decrease

14

Features of AMH

• Bucket information is updated as new data arrive• Bucket extents continuously adapt the data

distribution changes• The maintenance does not affect the normal query

processing– It is interruptible at any moment of time

– It is performed at the CPU idle time

15

Outline

• Introduction• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Historical synopsis

– Prediction model

• Experiment • Conclusion

16

Historical Synopsis

• AMH maintains the current buckets.

• Past index stores the obsolete buckets.

• Past index: – Packed B-tree

– 3D R-tree

AMH

current bucketsrecent buckets

....

Past Index T

old buckets

....

main memorydisk

current cells

incoming streams

17

Prediction Model

• Prediction based on velocity doesn’t work!– It is not realistic to assume velocity remains constant

between current time and query time

– Velocity is highly dynamic

• We suggest to use only the past and present location information to do prediction.

18

Prediction Model (cont.)

FT

PredictionModel

HT

PT

Historical Synopsis

results

Parse

forecast the future using any time series prediction method: we use AR

0

2

4

6

8

10

0 10 20 30 40 50 60 70 80

19

Outline

• Introduction• Related work• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Historical synopsis

– Prediction model

• Experiment • Conclusion

20

Experiment settings

• Datasets– 2.5M updates for each dataset

– spatial: 50K mobile objects from 2 spatial dataset

– road: from a spatio-temporal generator (described in [Brinkhoff 2002] )

median finalinitial

Road network Data distribution

21

Robustness with time

0.5M 1M 1.5M 2M 2.5Mnumber of location updates

error rate

0

4%

8%

12%

16%

5k

number of location updates0

10%

20%

30%

0.5M 1M 1.5M 2M 2.5M5k

spatial

road

Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time

22

Comparison with conventional histogram

• Minskew (a static spatial histogram) is rebuilt every 50k location updates

• tp is the proportion between the cost of AMH and that of Minskew

• The re-organization operations of AMH are uniformly distributed among the 50k location updates.

error rate

0

10%

20%

30%

0.001 0.01 0.1 1

time proportion

error rate

10%

15%

20%

25%

30%

0.001 0.01 0.1 1

spatial

road

minskew

AMH

minskew

AMH

23

The effect of update intensity

• B-tree performs better at the high update rate.

• R-tree provides much faster query response.

• In general, when query/update ratio is large (>30%), R-tree performs better.

CPU timemsec

0

1

2

3

4

5

PT HT FT

error rate

0%

5%

10%

15%

20%

25%

1k 10k 100kupdate rate update rate

error rate

0%

5%

10%

15%

20%

25%

1k 10k 100k

spatialroad

3D r-tree b-treeQuery type

24

Conclusion

• We present a comprehensive approach for processing queries that refer to any time in history.

• The proposed architecture maintains– an incremental multi-dimensional histogram;

– a past index structure for storing the outdated buckets.

• Future queries are answered by a stochastic method that uses the recent history to predict the future.

25

Q+A

26

Summary

AMH

Past Index

Historical Synopsis

PredictionModel

0. goal: min(WVS)1. Info update2. Reorganization happens when CPU is idle

1.Recent buckets in memory2.Old buckets dump to the disk

Old

buc

kets

Forecast based on the present and past.

27

Related work

• Static multi-dimensional histograms• Query-adaptive multi-dimensional histograms• Other multi-dimensional approximation methods• Spatio-temporal prediction methods• Spatio-temporal aggregation methods

28

Evaluation over different query typeserror rate

q

0%

5%

10%

15%

20%

2% 4% 6% 8% 10%L

q

error rate

0%

5%

10%

15%

20%

25%

30%

35%

2% 4% 6% 8% 10%L

spatial

road

29

Motivation (cont.)

• Spatio-temporal database (STDB) research:– historical retrieval

– future prediction

30

Bucket reorganization -Split

n1

n2 n3

b1 b2

b5b*

b1

b2

b*

b5

BucketsSplit

b*1

b2

b*

b5

Buckets

n1

n2 n3

b*1

b2b5b*

b*2

n4

b*2

b*3 b*4

n5

b*3

b*4

top related