approximate querying about the past, the present, and the future in spatio-temporal databases

Approximate querying about the Past, the Present, and the Future

in Spatio-Temporal Databases

Jimeng Sun, Dimitris Papadias,

Yufei Tao, Bin Liu

Motivation

• Spatio-temporal databases vs. Data streams• The monitoring applications

– Traffic supervision

– Mobile users monitoring

– Weather forecasting

• Example: – find the number of vehicles

in the city center now

• The challenge is to provide fast query response in highly intensive environment

Problems and methods

• Problems:– How to efficiently store/summarize the spatio-temporal

information?

– How to approximately answer the query about the past, the present, and the future?

• Methods:– Adaptive multi-dimensional histogram (AMH)

– Historical synopsis

– Stochastic prediction method

Related work

• Histograms– Static multi-dimensional histograms

• Equi-depth, Mhist, Minskew, Genhist, SQ

– Query-adaptive multi-dimensional histograms• STGrid, STHoles, SASH

• Other approximation methods– DCT, Wavelet, Sketch

• Spatio-temporal databases– Historical retrieval

– Future prediction

Outline

• Introduction• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Prediction model

• Experiment • Conclusion

Query types

Present Time (PT)

Historical Time (HT)

Future Time (FT)

Queries

location

currentpast future

System Overview

Queries

Past Index

Historical Synopsis

PredictionModel

Spatio-temporalupdates

Histogram

• Partition the space into buckets• Data within a bucket summarize by

the mean• The properties of a good histogram:

– Uniformity within each bucket

– Incremental updateable

0 20 40 60 80 100

Adaptive Multi-dimensional Histogram (AMH)

Regular cells

1 1 3 3 3 5

446312

1 1 5 3 4 5

111165

• Objective: minimize WVS=(areai∙vari) (Minskew [Acharya, Poosala, Ramaswamy 99])

Buckets

Dynamic Maintenance of AMH

• Our scheme: record the information during the construction and modify the structure as needed.– 1. information update

• Update the bucket count

– 2. bucket reorganization• Merge: to claim buckets

• Split: to reduce WVS

Information update of AMH

Buckets

mappingb1

Bucket reorganization -Merge

Buckets

Bucket Info:1. region [x-, x+][y-,y+]2. frequency: count/area3. 2nd moment:(for variance calculation)

•Merge the subtree that leads to minimal WVS increase

Bucket reorganization -Split

b2b5b*

b*3 b*4

• Split the bucket that leads to maximal WVS decrease

Features of AMH

• Bucket information is updated as new data arrive• Bucket extents continuously adapt the data

distribution changes• The maintenance does not affect the normal query

processing– It is interruptible at any moment of time

– It is performed at the CPU idle time

Outline

• Introduction• Problem and proposed methods

Historical Synopsis

• AMH maintains the current buckets.

• Past index stores the obsolete buckets.

• Past index: – Packed B-tree

– 3D R-tree

current bucketsrecent buckets

Past Index T

old buckets

main memorydisk

current cells

incoming streams

Prediction Model

• Prediction based on velocity doesn’t work!– It is not realistic to assume velocity remains constant

between current time and query time

– Velocity is highly dynamic

• We suggest to use only the past and present location information to do prediction.

Prediction Model (cont.)

PredictionModel

Historical Synopsis

results

forecast the future using any time series prediction method: we use AR

0 10 20 30 40 50 60 70 80

Outline

• Introduction• Related work• Problem and proposed methods

Experiment settings

• Datasets– 2.5M updates for each dataset

– spatial: 50K mobile objects from 2 spatial dataset

– road: from a spatio-temporal generator (described in [Brinkhoff 2002] )

median finalinitial

Road network Data distribution

Robustness with time

0.5M 1M 1.5M 2M 2.5Mnumber of location updates

error rate

number of location updates0

0.5M 1M 1.5M 2M 2.5M5k

spatial

Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time

Comparison with conventional histogram

• Minskew (a static spatial histogram) is rebuilt every 50k location updates

• tp is the proportion between the cost of AMH and that of Minskew

• The re-organization operations of AMH are uniformly distributed among the 50k location updates.

error rate

0.001 0.01 0.1 1

time proportion

error rate

0.001 0.01 0.1 1

spatial

minskew

The effect of update intensity

• B-tree performs better at the high update rate.

• R-tree provides much faster query response.

• In general, when query/update ratio is large (>30%), R-tree performs better.

CPU timemsec

PT HT FT

error rate

1k 10k 100kupdate rate update rate

error rate

1k 10k 100k

spatialroad

3D r-tree b-treeQuery type

Conclusion

• We present a comprehensive approach for processing queries that refer to any time in history.

• The proposed architecture maintains– an incremental multi-dimensional histogram;

– a past index structure for storing the outdated buckets.

• Future queries are answered by a stochastic method that uses the recent history to predict the future.

Summary

Past Index

Historical Synopsis

PredictionModel

0. goal: min(WVS)1. Info update2. Reorganization happens when CPU is idle

1.Recent buckets in memory2.Old buckets dump to the disk

Forecast based on the present and past.

Related work

• Static multi-dimensional histograms• Query-adaptive multi-dimensional histograms• Other multi-dimensional approximation methods• Spatio-temporal prediction methods• Spatio-temporal aggregation methods

Evaluation over different query typeserror rate

2% 4% 6% 8% 10%L

error rate

2% 4% 6% 8% 10%L

spatial

Motivation (cont.)

• Spatio-temporal database (STDB) research:– historical retrieval

– future prediction

Bucket reorganization -Split

BucketsSplit

Buckets

b2b5b*

b*3 b*4

approximate querying about the past, the present, and the future in spatio-temporal databases

spatiotemporal information

current time

present location information

information updateupdate

time series prediction

query timevelocity

bucket count2

bucket reorganizationmerge

Documents

querying sensor networks

approximate continuous querying over...

querying incomplete data

querying structured text

1 qsx: querying social graphs approximate query answering...

storage and querying

scalable top- k spatio-temporal term querying

in-network querying

querying linked data

querying cultural heritage

querying the queue

on the effect of trajectory compression in spatio-temporal...

querying mobile objects in spatio-temporal databases

model-based querying in sensor...

opportunistic linked data querying through approximate...

5 querying xml

approximate querying about the past, the present, and the...

spatio-temporal modelling spatio-temporal modelling —...

querying bio2rdf data

spatio-temporal indexing in non-relational distributed...