evaluating window joins over punctuated streams
DESCRIPTION
Evaluating Window Joins over Punctuated Streams. Many slides taken from talk by Luping Ding and Elke A. Rundensteiner, CIKM04 Database Systems Research Group Worcester Polytechnic Institute. Stream Data Processing. Online Transaction Management. Sensor Network Monitoring. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/1.jpg)
23/4/19 CIKM'04 1
Evaluating Window Joins over Punctuated Streams
Many slides taken from talk byLuping Ding and Elke A. Rundensteiner, CIKM04
Database Systems Research GroupWorcester Polytechnic Institute
![Page 2: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/2.jpg)
23/4/19 CIKM'04 2
Stream Data Processing
RegisterContinuous Queries
Stream QueryEngine
Stream QueryEngine
Streaming Data Streaming Result
• Network Usage Analysis
• Online Transaction Management • Sensor Network Monitoring
• Online Auction
![Page 3: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/3.jpg)
23/4/19 CIKM'04 3
New Challenges in Stream Context Potentially infinite data streams vs. stateful ope
rators. e.g., join, distinct, …
Problem: potentially unbounded state Reason: no hint on which data is no longer use
ful
![Page 4: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/4.jpg)
23/4/19 CIKM'04 4
Example -Symmetric Hash Join [WA93]
Memory overflow resolution – state relocation Example: XJoin [UF00],
Hash-Merge Join [MLA04] Problems
Join state still grows with no bound
Delivery of some join results may be highly deferred
A B
insert probe
MemoryMemoryOverflowSA SB
![Page 5: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/5.jpg)
23/4/19 CIKM'04 5
Avoiding Unbounded State
Solution: exploit constraints to detect no-longer-useful data
Sliding window [MWA+03] Identify a bounded set of input data based on time
K-constraint [BW03] Models clustered or ordered data arrival pattern
Punctuation [TMSF03] Dynamically announce termination of certain value
![Page 6: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/6.jpg)
23/4/19 CIKM'04 6
Sliding Window [KNV03]
Wb
Timeline
Wa
Stream A Stream B
… …
![Page 7: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/7.jpg)
23/4/19 CIKM'04 7
Punctuation
Meta-knowledge embedded inside data streams An ordered set of patterns corresponding to attributes of tuples Wildcard (*), constant (9), list ({1,2,3}), range ([1, 20]), empty ()
Semantics: tuples after a punctuation p will NOT match p
No more tuplewill containItem_id 180.
180 Marlie 820.00 Nov-13-03 11:02:00
182 Ultrasale 1000.00 Nov-13-03 11:05:00
180 Jocelyn 850.00 Nov-13-03 11:14:00
180 * * *
…
181 pcfan 50.00 Nov-13-03 11:36:00
…
Bid
![Page 8: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/8.jpg)
23/4/19 CIKM'04 8
Punctuation-Aware Join [DMR+04]
Joinitem_id
Stream BStream A
181 50.00
175 20.00
180 135.00
175 *
158 310.00
… …
2 63.00
175 80.00
1 200.00
A C
… … … …
No more tuple will have A = 175.
175 100.00
… …
A B
175 80.00
175 100.00
175 20.00
SA SB
![Page 9: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/9.jpg)
23/4/19 CIKM'04 9
Features of Punctuation
Purge rule. For any tuple ta from stream A, if there exists a punctuation Pb that has already been received from stream B such that match (ta, ,,Pb), ta will not be joining with any future arriving tuples from stream B. ta doesn’t need to be maintained in the A state after being processed.
Propagation rule. The join operator can also propagate punctuations to the output stream in order to help do
wnstream operators.
![Page 10: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/10.jpg)
23/4/19 CIKM'04 10
Based on punctuation semantics, we derive the following theorem as the foundation of our punctuation propagation algorithm.
Theorem 3.1. Let pa and pb be punctuations retrieved from streams A and B at time TSa and TSb respectively specifying the same punctuated value val of join attribute att. Then no output tuples with val being the value of attribut
e att will be generated after time max(TSa, TSb).
![Page 11: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/11.jpg)
23/4/19 CIKM'04 11
Sliding Window Join
Suppose Ta and Tb are time windows for streams A and B respectively. We define the invalidation rule from the join state based on the sliding window:
Let tuple ta be the latest tuple with timestamp TS
a from stream A that has been processed.The tuple in the B state with timestamp TSb such that TSb + Tb < TSa is called a time-expired tuple and can be invalidated. The same invalidation rule applies to tuples in the A state.
![Page 12: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/12.jpg)
23/4/19 CIKM'04 12
Tb
Ta
Stream A Stream B
……
TSb
TSa
TSb-Ta
TSa-Tb
timeline
Basic Window join
![Page 13: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/13.jpg)
23/4/19 CIKM'04 13
Optimization Opportunities Maintain smaller state than either pure window join or pu
re punctuation-exploiting join Bid tuples that have been joined don’t need to be m
aintained in state (Punctuation)
Drop tuples without affecting precision of result Bid tuples out of 24-hour window of corresponding Au
ction tuple don’t need to be processed Aggregate result for some Auction tuples can be produce
d in less than 24 hours
![Page 14: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/14.jpg)
23/4/19 CIKM'04 14
Features of PWJoin algorithm
Punctuation-exploiting Window Join is composed of three operations:
Probing state to find matching tuples for producing join results.
Purging no-longer-joining tuples by punctuations. Invalidating expired tuples by windows. Among these op
erations.
![Page 15: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/15.jpg)
23/4/19 CIKM'04 15
SELECT A.item_id, Count (*) FROM Auction [Range 24 Hours] A, Bid B WHERE A.item_id = B.item_id GROUP BY A.item_id
Window and Punctuation Occur Simultaneously
Joinitem_id
Auction Stream
Bid Stream Out1
(item_id)
Group-byitem_id (count(*))
Out2
(item_id, count)
Contains punctuations on
item_id
Applies a 24-hour window on Auction
stream
![Page 16: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/16.jpg)
23/4/19 CIKM'04 16
PWJoin Basics and Issue
Issue: how to design PWJoin state to facilitate all search-based operations? Invalidate conducts time-based search Probe and Purge needs value-based search
Receive a new tupleta from stream A
Invalidate tuplesfrom B state
Probe B stateInsert ta
into A state
Receive a new punctpa from stream A
Purge tuplesfrom B state
Insert pa
into A state
![Page 17: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/17.jpg)
23/4/19 CIKM'04 17
PWJoin State with Two-dimensional Index
8
10
8
8
10
4
8
Key Head Tail PunctFlag
8 none
Time List I-Node Index (Hash Table)
WindowBegin
10 punctuated
WindowEnd
I-Node
tuple
NextTimeListTNode
NextValueListTNodeT-Node
Punctuation Timestamp
p1 T1
p2 T2
… …
Punctuation Time List
![Page 18: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/18.jpg)
23/4/19 CIKM'04 18
PWJoin AlgorithmInvalidate: Once a new tuple t is retrieved from stream A, its timestamp is used to invalidate expired tuples from the head of the time list of stream B. Probe: probe I-Node index and join with tuples in value list of matching I-Node. After invalidation is done, the join value of t is used to probe the I-Node index of the B state. If the matching I-Node iNode is found, the corresponding value list is located by following the Head pointer of iNode. Tuple t then joins with all tuples in this value list by following the NextValueListTNode pointer of each T-Node. Finally, the PunctFlag of iNode is checked. If it is “punctuated”, t is discarded. If it is “none”, t is inserted into the A state.
![Page 19: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/19.jpg)
23/4/19 CIKM'04 19
PWJoin AlgorithmPurge: probe I-Node index and delete tuples in value list of matching I-Node. When a new punctuation p is retrieved from stream A, p is used to probe the I-Node index of the B state. If the matching I-Node iNode is found, all tuples in the corresponding value list are deleted. iNode is removed from the I-Node index as well. If the PunctFlag of iNode is “punctuated”, p is discarded. If iNode is not found or iNode’s PunctFlag is “none”, p is used to probe the I-Node index of the A state and set the PunctFlag of the matching I-Node iNodea as “punctuated”.If iNodea does not exist, a new I-Node is created with its PunctFlag marked as true and inserted into the I-Node index of the A state.
![Page 20: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/20.jpg)
23/4/19 CIKM'04 20
Punctuation Propagation [CIKM04] An operator may propagate punctuations to
benefit downstream operators
Joinitem_id
Auction Stream
Bid Stream
Group-byitem_id (count(*))
be unblocked by punctuations propagated by join o
perator
Item_id Bidder_id Bid_price
180 * *propagate punctuations on ite
m_id
![Page 21: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/21.jpg)
23/4/19 CIKM'04 21
Early Punctuation Propagation
Optimizations Enabled by Combined Constraintsa1
a1
a2
a3
a4
a3
a1
a3
a6
a3
a6
a3
a3
a7
a2
a8
a2
a10
Stream S1 Stream S2
a3 propagation point 1
propagation point 2
Tuple Dropping
a1
a1
a2
a3
a4
a3
a1
a3
a6
a3
a6
a3
a3
a7
a2
a8
a2
a10
Stream S1 Stream S2
a3
![Page 22: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/22.jpg)
23/4/19 CIKM'04 22
Achieving Optimizations by Combined Constraints Early propagation
Invalidate punctuations in punctuation time list as invalidating tuples
Expired punctuations can be propagated Tuple dropping
When early propagation happens, set PunctFlag of matching I-Node as “propagated”
Drop new tuples that matches an I-Node whose PunctFlag is “propagated”
![Page 23: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/23.jpg)
23/4/19 CIKM'04 23
Memory Cost Analysis
|Sb|T = |Sb|Tinsert - |Sb|Tpurge = |Sb|Tarrive - |Sb|Tpurge
= bTb - bTb( paT/NKb,T)
b – tuple input rate of stream B
pa – punctuation input rate of stream A
NKb,T - # of distinct join values occurred in stream B up to T’th time unit
Tb – time window on stream B
Window Join Saving by Punctuation
![Page 24: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/24.jpg)
23/4/19 CIKM'04 24
PWJoin vs. WJoin – Memory and Tuple Output Rate
0
500
1000
1500
2000
2500
1 4 7 10 13 16 19 22 25 28 31
Sampl i ng Step (per 2 seconds)
# of
Tup
les
in J
oin
Stat
e
WJ oi n- 1PWJ oi n- 1WJ oi n- 5PWJ oi n- 5WJ oi n- 15PWJ oi n- 15
0
100000
200000
300000
400000
500000
600000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Sampling Step (per 2 seconds)
# o
f T
up
les
Ou
tpu
t
WJoin-5
PWJoin-5
WJoin-15
PWJoin-15
Stream A, B: punct-asc-100-40
![Page 25: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/25.jpg)
23/4/19 CIKM'04 25
PWJoin vs. PJoin – Punctuation Output Rate
0
100
200
300
400
500
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
Sampl i ng Step (per 1 second)
# of Pu
nctua
tions O
utput PJ oi n
PWJ oi n
Stream A: punct-asc-100-40, Stream B: punct-random-30-40Window: 1 second
![Page 26: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/26.jpg)
23/4/19 CIKM'04 26
Conclusion
PWJoin algorithm Designed storage structure for PWJoin state Memory cost analysis of PWJoin
![Page 27: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/27.jpg)
23/4/19 CIKM'04 27
Thanks
WPI Database Research Group
many slides are from davis.wpi.edu/~dsrg/CAPE/slides
![Page 28: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/28.jpg)
23/4/19 CIKM'04 28
References [CIKM04], L. Ding and E.A. Rundensteiner. Evaluating Window Joins over Punctuated Streams. CIK
M04. [KNV03] J. Kang, J. F. Naughton and S. D. Viglas. Evaluating Window Joins over Unbounded Stream
s. ICDE’03. [UF00] T. Urhan and M. Franklin, XJoin: A Reactively Scheduled Pipelined Join Operator. IEEE Data
Engineering Bulletin, 23(2), 2000. [HH99] P. Haas and J. Hellerstein, Ripple Joins for Online Aggregation. SIGMOD’99. [GO03] L. Golab and M. T. Ozsu, Processing Sliding Window Multi-Joins in Continuous Queries over
Data Streams. VLDB’03. [GGO04] L. Golab, S. Garg and M. T. Ozsu, On Indexing Sliding Windows over On-line Data Streams,
EDBT’04. [RDS+04] E. A. Rundensteiner, L. Ding, T. Sutherland, Y. Zhu, B. Pielech and N. Mehta, CAPE: Conti
nuous Query Engine with Heterogeneous-Grained Adaptivity. VLDB Demo, 2004. [BW04] S. Babu and J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous
Queries over Data Streams [TMS+03] P. A. Tucker, D. Maier, T. Sheard and L. Fegaras. Exploiting Punctuation Semantics in Con
tinuous Data Streams. TKDE, 15(3), 2003. [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, Joining Punctuated Streams.
EDBT’04. [MWA+03] R. Motwani, J. Widom, A. Arasu et al. Query Processing, Resource Management, and App
roximation in a Data Stream Management System. CIDR’03.
![Page 29: Evaluating Window Joins over Punctuated Streams](https://reader035.vdocuments.net/reader035/viewer/2022062321/56812f8a550346895d9507e2/html5/thumbnails/29.jpg)
23/4/19 CIKM'04 29
Thanks!