adaptively processing remote data zachary g. ives university of pennsylvania cis 650 – database...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Adaptively Processing Remote Data
Zachary G. IvesUniversity of Pennsylvania
CIS 650 – Database & Information Systems
February 28, 2005
2
Administrivia
Next reading assignment: Doan et al. – LSD
Recall that the midterm will be due 3/16 You can go ahead and choose a topic – let me
know which one
3
Today’s Trivia Question
4
Sources of Query Answering Cost
Regular computation (this has a minimum cost)
But we can get held up by: Inflexible query plans
Delays – we “stall” in waiting for I/O Query scrambling Pipelined hash joins
Bad query plans Bad estimation of intermediate result sizes
The focus of the Kabra and DeWitt paper Insufficient source information
Eddies and ADP Exploration vs. exploitation; extrapolating performance
5
Kabra and DeWitt Recap
Provides significant potential for improvement without adding much overhead Biased towards exploitation, with very limited
information-gathering
A great way to retrofit an existing system In SIGMOD04, IBM had a paper that did this in DB2
But not appropriate for remote data Relies on us knowing the (rough) cardinalities of the
sources Query plans aren’t pipelined
6
A Second Issue: Delays
May have a very computationally inexpensive plan, but slow sources The query plan might get held up waiting for
data
Solution 1: query scrambling Rescheduling – while delayed, find a non-
executing part of the query plan and start it Operator synthesis – when nothing can be
rescheduled, might tinker with the original plan Want to do this in a cost-based way
7
Cost-Based Query Scrambling
Divide plan into runnable subtrees based on schedule; schedule those with 75% efficiency Each may be run out of order if it materializes Cost = mat. write + processing + mat. read Efficiency = (M – MR) / (P + MW): savings / cost
When no more runnable subtrees, need to do something Operator synthesis: try to find a computation that will
mask the delay – only how much should we do? PAIR: only do a single join over a pair of relations IN: include delay, chooses to defer delayed access to end ED: estimated delay, progressively increases the delay
8
Works Pretty Well for Large Mem.
9
But Not As Well for Limited Mem
10
Overall Assessment
Hard to estimate how much work to do! A trade-off between aggressive and
conservative strategies If enough resources, it doesn’t matter…
11
Solution 2: Pipelined Hash Joins
They can dynamically adapt to delays, alleviating the need for techniques like query scrambling (except between different pipeline stages)
Disadvantages: Memory – what happens if we run out
Focus of [Ives+99], [Urhan+00] Cost – a query plan with sub-optimal order will still be
very expensive!
The central piece of the next two techniques we’ll look at…
12
Extreme Adaptivity: Eddies
The basic idea: Query processing consists of sending tuples through
a series of operators Why not treat it like a routing problem? Rely on “back-pressure” (i.e., queue overflow) to tell
us where to send tuples
Part of the ongoing Telegraph project at Berkeley Large-scale federated, shared-nothing data stream
engine Variations in data transfer rates Little knowledge of data sources
13
Telegraph Architecture
Simple “pre-optimizer” to generate initial plan Creates operators, e.g.:
Select: predicate(sourceA) Join: predicate(sourceA, sourceB) (No support for aggregation, union, etc.)
Chooses implementationsGenerally relies on pipelined hash joins
Goal: dataflow-driven scheduling of operations Tuple comes into system Adaptively routed through operators in “eddies”
May be combined, discarded, etc.
14
The Eddy
Represents set of possible orderings of a subplan
Each tuple may “flow” through a different ordering (which may be constrained)
N-ary module consisting of query operators Basic unit of adaptivity
Subplan with select, project, join operators
U
15
Example Join Subplan Alternatives
R2 R1
R2.x = R1.x R3
R1.x = R3.x
R2 R3
R2.x = R3.x R1
R1.x = R3.x
R2 R3
R2.x = R3.xR1
R1.x = R3.x
R1 R2
R2.x = R3.xR3
R1.x = R3.x
…
Join(R3R1.x = R2.x, JoinR1.x = R3.x(R1, R3))
16
Naïve Eddy
Given tuple, route to operator that’s ready Analogous to fluid dynamicsAdjusts to operator costs Ignores operator selectivity
17
Adapting to Variable-Cost Selection
18
But Selectivity is Ignored
19
Lottery-Based Eddy
Need to favor more selective operators Ticket given per tuple input, returned per
output Lottery scheduling based on number of ticketsNow handles both selectivity and cost
20
Enhancing Adaptivity: Sliding Window
Tickets were for entire query
Weighted sliding window approach “Escrow” tickets
during a window “Banked” tickets
from a previous window
21
What about Delays?
Problems here:Don’t know when join buffers vs. “discards” tuples
S R (SLOW)
T
22
Eddy Pros and Cons
Mechanism for adaptively re-routing queries Makes optimizer’s task simpler Can do nearly as well as well-optimized plan in some
cases Handles variable costs, variable selectivities
But doesn’t really handle joins very well – attempts to address in follow-up work: STeMs – break a join into separate data structures;
requires re-computation at each step STAIRs – create intermediate state and shuffle it back
and forth
Pre-optimization? Distribution (DeWitt et al.)? Beyond joins?
23
Generalizing Adaptive Query Processing
We’ve seen a range of different adaptive techniques
How do they fit together? Can we choose points between eddies and
mid-query re-optimization?
24
Types of Adaptive Query Processing
Adaptive scheduling (q. scrambling [UF98], dyn. pipeline sched. [UF01], XJoin [UF00], PH Join [RS86][I+99], ripple join [HH99])
Changes CPU scheduling to improve feedback or reduce delays
Redundant computation (competitive exec. [AZ96])Compare two+ ways of executing the query
Plan partitioning ([S+76][KD98][I+99][M+04])Break the plan into stages; re-optimize future stages as necessary
Adaptive info passing ([IT05 sub.])Pass data between parts of an executing plan
Adaptive data partitioning Break the data into subsets; use a different plan for each subset The only way to reduce overall computation with fine
granularity First (only) implementation has been eddies [AH00][R+03][DH04]
25
Eddies Combine Adaptive Scheduling and Data Partitioning Decisions
Intuitively, each tuple gets its own query plan Route to next operator based on
speed and selectivity of each operator Elegant and simple to implement
But performing a join creates subresults at the next level!
Local & greedy choices may result in state that needs to join with all future data!
Consider long-term effects of decisions before making them – separate CPU scheduling from plan selection
26
Focusing Purely on Adaptive Data Partitioning
Use adaptively scheduled operators to “fill CPU cycles”
Now a query optimizer problem:Choose a plan that minimizes long- term cost (in CPU cycles)
To allow multiple plans, distribute union through join (and select, project, etc.):
If R1 = R11 [ R1
2, R2 = R21 [ R2
2 then:
R1 ⋈ R2 = (R11 [ R1
2) ⋈ (R21 [ R2
2) =
(R11 ⋈ R2
1) [ (R12 ⋈ R2
2) [ (R1
1 ⋈ R22) [ (R1
2 ⋈ R21)
R1R2
R11 R2
1
R12 R2
2
This generalizes to njoins, other SPJ + GU
operators…
27
Adaptive Data Partitioning:Routing Data across Different Plans
R ⋈ S ⋈ T
Exclude R0S0
Exclude R0S0T0,R1S1T1
R0S 0
R1
T1
S 1
T0
R0 S 0
R S T
R0 S0T0
R0 S0 …
S0R0
T 0 S1T1
R1 S1T1
R1
S 1 T1
Options for combining across phases: New results
always injected into old plan
Old results into new plan
Wait until the end – “stitch-up” plan based on best stats
28
Special Architectural Features for ADP
Monitoring and re-optimization thread runs alongside execution:
System-R-like optimizer with aggregation support;uses most current selectivity estimates
Periodic monitoring and re-optimization revises selectivity estimates, recomputes expected costs
Query execution with “smart router” operatorsSpecial support for efficient stitch-up plans:
Uses intermediate results from previous plans (specialized-case of answering queries using views [H01])
Join-over-union (“stitch-up-join”) operator that excludes certain results
29
ADP Application 1:Correcting Cost Mis-estimates
Goal: react to plans that are obviously bad Don’t spend cycles searching for a slightly better plan Try to avoid paths that are likely to not be promising
Monitor/reoptimizer thread watches cardinalities of subresults Re-estimate plan cost, compare to projected costs of
alternatives, using several techniques & heuristics (see paper) Our experiments: re-estimate every 1 sec.
“Smart router” operator does the following: Waits for monitor/reoptimizer to suggest replacement plan Re-routes source data into the new plan New plan’s output is unioned with output of previous plan;
this is fed into any final aggregation operations
30
Correcting for Unexpected Selectivities
04
08
0
3A(uniform)
3A(skewed)
10(uniform)
10(skewed)
10A(uniform)
10A(skewed)
5(uniform)
5(skewed)
Query - data set combination
Ru
nn
ing
tim
e (s
ec)
Static - No StatisticsStatic - CardinalitiesAdaptive - No StatisticsAdaptive - CardinalitiesPlan Partitioning - No Stats
119 122 164 171 163 172 148 149 Pentium IV 3.06 GHzWindows XP
31
ADP Application 2:Optimizing for Order
Most general ADP approach: Pre-generate plans for general
case and each “interesting order” “Smart router” sends tuple to the
plan whose ordering constraint is followed by this tuple
But with multiple joins, MANY plans
Instead: do ADP at the operator level “Complementary join pair” Does its own stitch-up internally Easier to optimize for!
Can also do “partial sorting” at the router (priority queue)
Q Q
Q
MergeHash
R S
h(R) h(S) h(R) h(S)
...
Routers
32
Exploiting Partial Order in the Data
0
82
164
Uniform Skewed Uniform,1%
Reordered
Skewed,1%
Reordered
Skewed,10%
Reordered
Skewed,50%
Reordered
Data set
Ru
nn
ing
Tim
e (
se
c)
Pipelined hash join
Complementary joins
Comp. joins with priority queue
(1024 tuple)
Pentium IV 3.06 GHzWindows XP
33
ADP Over “Windows”:Optimizing for Aggregation
Group-by optimization [CS94]: May be able to “pre-aggregate” some
tuples before joining Why: aggregates can be applied over
union But once we insert pre-
aggregation, we’re stuck (and it’s not pipelined)
Our solution: “Adjustable window pre-aggregation” Change window size depending on
how effectively we can aggregate Also allows data to propagate through
the plan – better info for adaptivity, early answers
T
R
SUM(T.y sums)GROUP BY T.x
SUM(T.y)GROUP BY
T.x, T.joinAttrib
T R
SUM(T.y)GROUP BY T.x
vs.
34
Pre-Aggregation Comparison
0
20
40
3A(uniform)
10(uniform)
10A(uniform)
5(uniform)
Query - data set combination
Ru
nn
ing
Tim
e (
se
c)
Single Aggregation
Traditional Pre-Aggregation
Adjustable-Window Pre-Aggregation
147 149 149 141
35
Adaptive QP in Summary
A variety of different techniques, focusing on: Scheduling Comparison & competition Information passing Data + plan partitioning
A field that is still fairly open – missing: Effective exploration methods A true theory!
What’s possible? What kinds of queries make sense to adapt?
Guarantees of optimality and convergence (perhaps under certain assumptions)