bmq-index: shared and incremental processing of border monitoring queries over data streams
DESCRIPTION
BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams. Jinwon Lee Y. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song (Korea Advanced Institute of Science and Technology). Outline. Border Monitoring Query (BMQ) BMQ-Index Experiments - PowerPoint PPT PresentationTRANSCRIPT
BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams
Jinwon LeeY. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song
(Korea Advanced Institute of Science and Technology)
2
Outline
Border Monitoring Query (BMQ)BMQ-IndexExperimentsRelated workConclusion
3
GPSs
Sensors
Data stream monitoring
Emerging Computing Environment
11 10 12 13 12 14Data stream Continuousrange queries
Q1 : 10 < valueQ2 : 11 < value < 13 …….
◀ Remote Medical Service
◀ Disaster Prevention• Flood Warning• Earthquake Prediction• Building Monitoring• Traffic light control
▲ Automatic Home• Automatic Ventilation• Automatic Temperature Control• Automatic Humidity Control
◀ Logistics• Management• Thief-proofing• Catalog • Advertisement
◀ Location-based Service• Tracking (Friends, Employee)• Vehicle Monitoring• Intelligent Transportation
4
Motivating Service Scenario #1
Stock trading
580
590
600
610
620
630
640
650
660
Dat
a st
ream
val
ue (
$)
SAMSUNG stock price during 23 days from Nov. 16th to Dec. 23rd, 2005
Expensive !! ( > $640)
Time
Cheap !! ( < $600)
buy
sell sell
buy
Monitor stock data streams crossing the borders !!
5
Motivating Service Scenario #2
Location-based advertisement
Going out
Send a special lunch menu to people within 1km during lunch time !!
Coming into
Monitor location data streams crossing the borders !!
Coupon
Pet-Care
6
Border Monitoring Query
To monitor data streams crossing the borders – Essential concern in many practical applications
Users’ main interest Useful to automatically trigger or stop relevant actions
BMQ (Border Monitoring Query)– A new type of continuous range query !!– It reports only data crossing the borders of a query range (=
coming into or going out from the query range)
RMQ (Region Monitoring Query) – Conventional continuous range query – It reports all matching data within a query range
7
Problem: Scalability !!
A large number of BMQs can be issued• Millions of stock investors will register their own queries• Millions of stores will register their own queries+ A huge volume of data streams are rapidly incoming + Fast response is also essential for users
How can we process BMQs over data streams efficiently?– (1) Naïve approach
Individual BMQ processing at each data update Lack of scalability !!
– (2) Based on existing mechanisms for RMQ evaluation Shared RMQ processing by indexing queries Costly post-processing !!
8
Solution Approach: BMQ-Index
Shared processing– By query indexing approach
BMQ-Index is built on registered BMQs Upon a data arrival, only border-crossed queries are quickly sea
rched for Achieves a high level of scalability !!
Q1, Q2 (border-crossed
queries)
RegisteredBMQs
Q1: 10 < valueQ2: 11 < value < 13 …….
BMQ-Index
14Data tuple
9
Solution Approach: BMQ-Index
Incremental processing– By incremental access method
Use previous search step for the next search Successive searches are significantly accelerated !!
Keep information only needed for incremental search Low storage cost !!
Q1, Q2 (border-crossed
queries)Registered
BMQs
Q1: 10 < valueQ2: 11 < value < 13 …….
BMQ-Index
Series of data tuples
10 12 13 12 14
Locality of data streams !!
10
One-dimensional BMQ-Index(Example)
+Q1
∞
+Q3
Q1
+Q4
Q3
+Q5
Stream_ID Node pointer
IBM
…
Q2 Q4 Q5
0 10 15 20 5 25 30 35 45
Stream Table
Linked list
Q5Q4
Q3
Q2
Q1
Registered BMQs
0 10
5 20
15
0 25
3035 45
reasonable price range
(unit: $)
$10 $30
Notify me whenever the IBM stock price is coming into or going out from my reasonable price range !!
+Q2
11
Search Operation in One-dimension (Example)
Q5Q4
Q3
Q2
Q1
∞ 0 10 15 20 5 25 30 35 45
0 10
5 20
15
0 25
3035 45
Case 2) 21 37 -Q2, -Q4, +Q5 Traverse BMQ-Index to the right
Case 3) 21 8 +Q3, -Q4, +Q1 Traverse BMQ-Index to the left
Case 1) 21 23 No border-crossed query No node traversal
37 21 8
Stream_ID Node pointer
IBM
…
23
+Q1 +Q3
Q1
+Q4
Q3
+Q5
Q2 Q4 Q5
+Q2
: previous data value (vt-1): current data value (vt)
12
Multi-dimensional BMQ-Index
StreamID V PX PY
s1 (vX1, vY1) RS-X2 RS-Y2
s2 (vX2, vY2) RS-X3 RS-Y5
s3 (vX3, vY3) RS-X5 RS-Y4
Stream Table
bY7
{Q1} {Q2}
{Q1}
{Q3}
{Q3} {Q2}
Q1
Q2
Q3
RS-X List
RS-Y List
RS-X5 RS-X6 RS-X7RS-X4RS-X3RS-X2
{} {}
-DQSet-Xi {} {}
{}
RS-Y2
RS-Y3
RS-Y4
RS-Y5
RS-Y6
RS-Y7
+DQSet-Yi-DQSet-Yi
{Q1}
{Q2}
{Q3}
{}
{}
{}
{}
{}
{}
{Q1}
{Q3}
{Q2}
+DQSet-Xi
{}
bX0 bX1 bX2 bX3 bX4 bX5 bX7
bY1
bY2
bY3
bY4
bY5
bY6
bX6
RS-X1
{}
{}
{} {} RS-Y1bY0
v(s1)
v(s2)
v1(s3)
v3(s3)
v2(s3)
QueryID Range
Q1 (bX1, bX3, bY1, bY4)
Q2 (bX2, bX6, bY2, bY6)
Q3 (bX4, bX5, bY3, bY5)
Query Table
13
Search Operation in Multi-dimension Overall flow
Performance Analysis (d-dimension)– Search performance
(((d–1)d) one-dimensional search time)– Storage cost
(d one-dimensional storage cost)
RS-X list.search()
(xc, yc)
RS-Y list.search()
±XQSet
±YQSet
cross-checkwith Y-dimension
cross-checkwith X-dimension
Union
xc
yc ±YBMQSet
±XBMQSet
QSet±
Per-dimensionsearch
Validation throughcross-check
Union of per-dimension
results
14
Experiments
Workload generation– Stock trading scenario (one-dimensional case)
Data stream generation (Korea stock market[9])– Fluctuation level: 0.01% ~ 0.1%– 2000 stream sources, 1000 tuples in each stream
Query generation– Lower bound: randomly chosen (1 ~ 106 )– Width of queries: 1 ~ 10 times larger than FL – Number of queries: 10,000 ~ 100,000
Comparisons– An approach based on state-of-the-arts RMQ-Index (CEI[CIKM’05] and IS-list[Information System’96])
Performance metrics– Average search time per data tuple (millisecond)– Index storage size (Mbyte)
15
Search performance
Effects of the number of queries (W=0.1%, FL=0.01%)
Effects of the widths of queries (N=100000, FL=0.01%)
Average search time (ms)
0
20
40
60
80
100
0 20000 40000 60000 80000 100000
Number of queries
BMQ-IndexCEI-basedIS-list-based
Average search time (ms)
0
20
40
60
80
100
0 0.02 0.04 0.06 0.08 0.1
Width of queries
BMQ-IndexCEI-basedIS-list-based
16
Storage cost
Effects of the number of queries (W=0.1%)
Effects of the widths of queries (N=100000)
BMQ-Index: twice IS-list: log (# of queries) times CEI: all grids covered by a query range
Index storage size (MB)
0
20
40
60
80
0 20000 40000 60000 80000 100000
Number of queries
BMQ-IndexCEI-basedIS-list-based
Index storage size (MB)
0
20
40
60
80
0 0.02 0.04 0.06 0.08 0.1
Width of queries
BMQ-IndexCEI-basedIS-list-based
17
Related Work
Semantics– CQL (Continuous Query Language developed by STREAM project)
General concept to transform a Relation to a Stream BMQ is a specific class of continuous range query
Shared and Incremental Processing
Previous research Difference
Data stream processing
Tree-based (1-D: [2][4][5][14])
- O(log N) search performance- O(NlogN) storage cost
Grid-based (1-D: [17], 2-D:[6][13])
- Better search performance than tree-based- Require more storage cost
Spatio-temporal database
SINA[11] (shared and incremental)
- Disk-based algorithm- Not purely incremental access method
GPAC[12] (incremental)
- Not for shared processing
Generally not feasible for BMQs !!
18
Conclusion
Summary– Characterize a new type of continuous range query
Border Monitoring Query (BMQ) Useful and practical in many emerging applications
– One- and multi-dimensional BMQ-Index Evaluates a large number of BMQs in a shared and increment
al manner, thereby achieving excellent search performance and low storage cost
19
Thank you
Question?
Backup slide
21
Performance Analysis
1-dimensional BMQ-Index– Search performance
(2 Nq FL)
– Storage cost (2Nq + Nd)
d-dimensional BMQ-Index– Search performance
(((d–1)d) 2Nq FL), only 2 times when d=2
– Storage cost (d(2Nq + Nd) + Nq)
Nq = Number of queriesNd = Number of data streams
22
Cross checking
Algorithm– For +XQSet
check whether vt is located between the Y predicates– For –XQSet
check whether vt-1 is located between the Y predicates YQSet is checked with X-dimension by a similar manner