dynamic plan migration for continuous query over data streams
DESCRIPTION
Dynamic Plan Migration for Continuous Query over Data Streams. Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004. - PowerPoint PPT PresentationTRANSCRIPT
Dynamic Plan Migration for Continuous Query over Data
StreamsYali Zhu, Elke Rundensteiner and George Heineman
Database System Research Group, WPI.Massachusetts, USA
SIGMOD’2004*Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”
SIGMOD 2004 2
Stream Query Optimization
Differences with Traditional Query Optimization?
SIGMOD 2004 3
Stream Query Optimization New classes of operators (windows) may mean
new rewrites New execution modes (continous/pipelining) More dynamic fluctuations in statistics
compile time optimization not possible Global optimization not practical; as huge query
networks Adaptive optimization. Other cost models taking memory into account Query optimization and load shedding
SIGMOD 2004 4
Motivation of ‘Query Migration’
Continuous query over streamsStatistics unknown before startStatistics changing during execution
Stream rates, arrival pattern, distribution, etc
Need for dynamic adaptationPlan re-optimization
Change the shape of query plan tree
SIGMOD 2004 5
Run-time Plan Re-Optimization
Step 1 - Decide when to optimizeStatistics Monitoring
Step 2 – Generate new query planQuery Optimization
Step 3 – Replace current plan by new planPlan Migration
SIGMOD 2004 6
Naïve Plan Migration Strategy
Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan
AB
BC
A B C
AB
BC
A B C
Problem: Works for stateless operators only
SIGMOD 2004 7
Stateful Operator in CQ Why stateful
Need non-blocking operators in CQ Operator needs to output partial results State data structure keep received tuples
AB
A B
b1b2b3b4b5
ax
State A State B
ax
ax b2ax b3
Key Observation: The purge of tuples in states relies on processing of new tuples.
Example: Symmetric NL join w/ window constraints
SIGMOD 2004 8
Naïve Migration Strategy Revisited
Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan
AB
BC
A B C(2)
All tuples drained
(4)Processing
Resumed
(3) Old Replaced
By new
Deadlock Waiting Problem:
SIGMOD 2004 9
Problem Definition Dynamic Plan Migration
Input (two migration boxes) One contains old plan One contains new plan Have same input and output queues
Result Old box is replaced by new box
Valid Migration No missing tuples No duplicates
BC
AB
QA QB QC QD
QABCD
AB
CD
BC
QA QB QC QD
QABCD
SAB SC
SA SBSB SC
SBC SD
SBCDSACD
SABC SD
Key points:- Involved plans contain stateful operators- Need to migrate yet still retain useful states and discard useless states.
SIGMOD 2004 10
State of the Art
“Efficient mid-query re-optimization of sub-optimal query execution plans” [Kabra, DeWitt 1998] Only migrates unprocessed portion
Query plan competing model [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] Generate several candidate query plans before start Execute all, choose one after a while
SIGMOD 2004 11
Outline
Problem Motivation and Definition Dynamic Migration Strategies
Moving State StrategyParallel Track Strategy
Experimental Results
SIGMOD 2004 12
Moving State Strategy Basic idea
Share common states between two migration boxes
Key steps State Matching
Match states based on IDs. State Moving
Create new pointers for matched states in new box
What’s left? Unmatched states in new
box
CDSABC SD
BCSAB SC
ABSA SB
ABSA SBCD
CDSBC
SD
BCSB SC
QA QB QC QD QA QB QC QD
QABCD QABCD
Old Box New Box
SIGMOD 2004 13
Unmatched States State Recomputing
Recursively recompute unmatched SBC and SBCD from bottom up
Why always possible? Old and new boxes have same input
queues The states associated with input
queues always match Why necessary?
ABSA SBCD
CDSBC SD
BCSB SC
QA QB QC QD
QABCD
SIGMOD 2004 14
Terms on Tuples New/Old tuples
Old: tuples already in old box when migration starts New: tuples not exist in old box when migration starts
Sub-tuples Tuple ABCD is result of Tuple A, B, C and D are sub-tuples of tuple ABCD Tuple ABCD has 24=16 possible combinations of old/new sub-tuples
A B C D
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
QABCD
SIGMOD 2004 15
Why Recompute Unmatched States
To get the complete results of ABCD, we need all 16 old/new combinations
AB
CD
BC
QB QC QDQA
SA
SD
SB SC
SBCD
SBC
If SBC not recomputed, will miss results with both B and C as OLD:
Old TupleNew Tuple
B C DAB C DAB C DA
SIGMOD 2004 16
Cost Estimation of MS Migration Cost of MS consists of
Cost of state matching ID comparison (neglectable)
Cost of state moving Create pointers (neglectable)
Cost of state recomputing Majority of cost
Affecting parameters Operator selectivities # of tuples in states
Estimated as (input rate x window size) See paper for detailed cost models
One cost model conclusion:
Cost of MS has polynomial relation to window size
SIGMOD 2004 17
MS Migration Pros and Cons
ProsFast when # of tuples in states is small
Low input rates, low selectivity or small window Cons
Output silence during entire migration stage Can query output even during migration?
Motivation for Parallel Track Strategy
SIGMOD 2004 18
Parallel Track Strategy Basic idea
Execute both plans in parallel and gradually “push” old tuples out of old box by purging
Key steps Connect boxes Execute in parallel
Until old box “expired” (no old tuple or sub-tuple)
Disconnect old box Start execute new
box only
CD
SABC SD
BC
SAB SC
AB
SA SB
ABSA
SBCD
CD
SBC SD
BCSB SC
QA QB QC QD
QA QB QC QD
QABCD QABCD
SIGMOD 2004 19
Potential Duplicates Tuple ABCD
24=16 possible old/new sub-tuple combinations
Same case not generated by both boxes
Otherwise we may have duplicates
In new box all states start empty only generates ABCD as
(new,new,new,new) In old box
may generate all 16 cases duplicate the case of
(new,new,new,new)
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
QABCDAt root op in old box:If both to-be-joined tuples have all-new
sub-tuples, don’t join.
Other op in old box:
Proceed as normal
Duplicate Prevention
SIGMOD 2004 20
Estimation of PT Migration
TPT ≈ 2W
1st W
2nd W
TM-start
TM-end
T
New New
OldOld
New New
Old Old
Estimation Formula:
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
Old Box W
SIGMOD 2004 21
PT Migration Duration Given enough system computing resources
new tuples processed right away PT migration duration ≈ 2W
If not enough system resources New tuples accumulated in queues PT migration duration > 2W
SIGMOD 2004 22
Cost Estimation of PT Migration
Cost of PT = cost of process 2W tuples in old box
+ cost of process 2W tuples in new box
Parameters: Input rates, window size, selectivity
Similar to MS strategy
SIGMOD 2004 23
PT Migrations Pros and Cons
ProsKeep on producing results even during
migration no results during MS migration
ConsMigration duration is at least 2W
MS may be faster depending on # tuples in states
SIGMOD 2004 24
Outline
Problem Definition and Motivation Dynamic Migration Strategies
Moving State StrategyParallel Track Strategy
Experimental Results
SIGMOD 2004 25
Experimental Setup Embed in the CAPE system
CAPE = Continuous Adaptive Processing Engine A streaming query engine developed at DSRG, WPI
VLDB’04 demo Layers of Adaptations
Punctuation exploring Adaptive scheduling Query migration Dynamic distribution
Input Streams By stream generator of CAPE Poisson arrival pattern
Experiments on migration duration Vary window size
CAPE Runtime Engine
Runtime Engine
OperatorConfigurator
QoS Inspector
OperatorScheduler
PlanMigrator
ExecutionEngineStorage
ManagerStream
Receiver
DistributionManager
Query PlanGenerator
Stream / QueryRegistration
GUI
StreamProvider
QueriesResults
CAPE Runtime Engine
Runtime Engine
OperatorConfigurator
QoS Inspector
OperatorScheduler
PlanMigrator
ExecutionEngineStorage
ManagerStream
Receiver
DistributionManager
Query PlanGenerator
Stream / QueryRegistration
GUI
StreamProvider
QueriesResults
SIGMOD 2004 26
Migration Duration vs. Window Size
02000400060008000
100001200014000
0 2000 4000 6000 8000Global Window Size W (ms)
Mig
ratio
n D
urat
ion
(ms)
Measured T_PT Estimated T_PT
0200400600800
100012001400160018002000
0 2000 4000 6000 8000Global Window Size W (ms)
Mig
ratio
n D
urat
ion
(ms)
Measured T_MS Poly. (Measured T_MS)
02000400060008000
100001200014000
0 1000 2000 3000 4000 5000Window Size (ms)
Mig
ratio
n D
urat
ion
T_MS T_PT
SIGMOD 2004 27
Conclusions
Identify problem of migration for stateful operators First solutions for continuous query migration
Moving state strategy Parallel track strategy
Embed both strategies into stream system Cost model and experimental evaluation
Cost model confirmed by experiments Identify performance trade-off of the two strategies
SIGMOD 2004 28
Thank You
For more information, check the CAPE website @:
http://davis.wpi.edu/~dsrg/CAPE/