adaptively processing remote data zachary g. ives university of pennsylvania cis 650 – database...

Adaptively Processing Remote Data

Zachary G. IvesUniversity of Pennsylvania

CIS 650 – Database & Information Systems

February 28, 2005

2

Administrivia

Next reading assignment: Doan et al. – LSD

Recall that the midterm will be due 3/16 You can go ahead and choose a topic – let me

know which one

3

Today’s Trivia Question

4

Sources of Query Answering Cost

Regular computation (this has a minimum cost)

But we can get held up by: Inflexible query plans

Delays – we “stall” in waiting for I/O Query scrambling Pipelined hash joins

Bad query plans Bad estimation of intermediate result sizes

The focus of the Kabra and DeWitt paper Insufficient source information

Eddies and ADP Exploration vs. exploitation; extrapolating performance

5

Kabra and DeWitt Recap

Provides significant potential for improvement without adding much overhead Biased towards exploitation, with very limited

information-gathering

A great way to retrofit an existing system In SIGMOD04, IBM had a paper that did this in DB2

But not appropriate for remote data Relies on us knowing the (rough) cardinalities of the

sources Query plans aren’t pipelined

6

A Second Issue: Delays

May have a very computationally inexpensive plan, but slow sources The query plan might get held up waiting for

data

Solution 1: query scrambling Rescheduling – while delayed, find a non-

executing part of the query plan and start it Operator synthesis – when nothing can be

rescheduled, might tinker with the original plan Want to do this in a cost-based way

7

Cost-Based Query Scrambling

Divide plan into runnable subtrees based on schedule; schedule those with 75% efficiency Each may be run out of order if it materializes Cost = mat. write + processing + mat. read Efficiency = (M – MR) / (P + MW): savings / cost

When no more runnable subtrees, need to do something Operator synthesis: try to find a computation that will

mask the delay – only how much should we do? PAIR: only do a single join over a pair of relations IN: include delay, chooses to defer delayed access to end ED: estimated delay, progressively increases the delay

8

Works Pretty Well for Large Mem.

9

But Not As Well for Limited Mem

10

Overall Assessment

Hard to estimate how much work to do! A trade-off between aggressive and

conservative strategies If enough resources, it doesn’t matter…

11

Solution 2: Pipelined Hash Joins

They can dynamically adapt to delays, alleviating the need for techniques like query scrambling (except between different pipeline stages)

Disadvantages: Memory – what happens if we run out

Focus of [Ives+99], [Urhan+00] Cost – a query plan with sub-optimal order will still be

very expensive!

The central piece of the next two techniques we’ll look at…

12

Extreme Adaptivity: Eddies

The basic idea: Query processing consists of sending tuples through

a series of operators Why not treat it like a routing problem? Rely on “back-pressure” (i.e., queue overflow) to tell

us where to send tuples

Part of the ongoing Telegraph project at Berkeley Large-scale federated, shared-nothing data stream

engine Variations in data transfer rates Little knowledge of data sources

13

Telegraph Architecture

Simple “pre-optimizer” to generate initial plan Creates operators, e.g.:

Select: predicate(sourceA) Join: predicate(sourceA, sourceB) (No support for aggregation, union, etc.)

Chooses implementationsGenerally relies on pipelined hash joins

Goal: dataflow-driven scheduling of operations Tuple comes into system Adaptively routed through operators in “eddies”

May be combined, discarded, etc.

14

The Eddy

Represents set of possible orderings of a subplan

Each tuple may “flow” through a different ordering (which may be constrained)

N-ary module consisting of query operators Basic unit of adaptivity

Subplan with select, project, join operators

U

15

Example Join Subplan Alternatives

R2 R1

R2.x = R1.x R3

R1.x = R3.x

R2 R3

R2.x = R3.x R1

R1.x = R3.x

R2 R3

R2.x = R3.xR1

R1.x = R3.x

R1 R2

R2.x = R3.xR3

R1.x = R3.x

…

Join(R3R1.x = R2.x, JoinR1.x = R3.x(R1, R3))

16

Naïve Eddy

Given tuple, route to operator that’s ready Analogous to fluid dynamicsAdjusts to operator costs Ignores operator selectivity

17

Adapting to Variable-Cost Selection

18

But Selectivity is Ignored

19

Lottery-Based Eddy

Need to favor more selective operators Ticket given per tuple input, returned per

output Lottery scheduling based on number of ticketsNow handles both selectivity and cost

20

Enhancing Adaptivity: Sliding Window

Tickets were for entire query

Weighted sliding window approach “Escrow” tickets

during a window “Banked” tickets

from a previous window

21

What about Delays?

Problems here:Don’t know when join buffers vs. “discards” tuples

S R (SLOW)

T

22

Eddy Pros and Cons

Mechanism for adaptively re-routing queries Makes optimizer’s task simpler Can do nearly as well as well-optimized plan in some

cases Handles variable costs, variable selectivities

But doesn’t really handle joins very well – attempts to address in follow-up work: STeMs – break a join into separate data structures;

requires re-computation at each step STAIRs – create intermediate state and shuffle it back

and forth

Pre-optimization? Distribution (DeWitt et al.)? Beyond joins?

23

Generalizing Adaptive Query Processing

We’ve seen a range of different adaptive techniques

How do they fit together? Can we choose points between eddies and

mid-query re-optimization?

24

Types of Adaptive Query Processing

Adaptive scheduling (q. scrambling [UF98], dyn. pipeline sched. [UF01], XJoin [UF00], PH Join [RS86][I+99], ripple join [HH99])

Changes CPU scheduling to improve feedback or reduce delays

Redundant computation (competitive exec. [AZ96])Compare two+ ways of executing the query

Plan partitioning ([S+76][KD98][I+99][M+04])Break the plan into stages; re-optimize future stages as necessary

Adaptive info passing ([IT05 sub.])Pass data between parts of an executing plan

Adaptive data partitioning Break the data into subsets; use a different plan for each subset The only way to reduce overall computation with fine

granularity First (only) implementation has been eddies [AH00][R+03][DH04]

25

Eddies Combine Adaptive Scheduling and Data Partitioning Decisions

Intuitively, each tuple gets its own query plan Route to next operator based on

speed and selectivity of each operator Elegant and simple to implement

But performing a join creates subresults at the next level!

Local & greedy choices may result in state that needs to join with all future data!

Consider long-term effects of decisions before making them – separate CPU scheduling from plan selection

26

Focusing Purely on Adaptive Data Partitioning

Use adaptively scheduled operators to “fill CPU cycles”

Now a query optimizer problem:Choose a plan that minimizes long- term cost (in CPU cycles)

To allow multiple plans, distribute union through join (and select, project, etc.):

If R1 = R11 [ R1

2, R2 = R21 [ R2

2 then:

R1 ⋈ R2 = (R11 [ R1

2) ⋈ (R21 [ R2

2) =

(R11 ⋈ R2

1) [ (R12 ⋈ R2

2) [ (R1

1 ⋈ R22) [ (R1

2 ⋈ R21)

R1R2

R11 R2

1

R12 R2

2

This generalizes to njoins, other SPJ + GU

operators…

27

Adaptive Data Partitioning:Routing Data across Different Plans

R ⋈ S ⋈ T

Exclude R0S0

Exclude R0S0T0,R1S1T1

R0S 0

R1

T1

S 1

T0

R0 S 0

R S T

R0 S0T0

R0 S0 …

S0R0

T 0 S1T1

R1 S1T1

R1

S 1 T1

Options for combining across phases: New results

always injected into old plan

Old results into new plan

Wait until the end – “stitch-up” plan based on best stats

28

Special Architectural Features for ADP

Monitoring and re-optimization thread runs alongside execution:

System-R-like optimizer with aggregation support;uses most current selectivity estimates

Periodic monitoring and re-optimization revises selectivity estimates, recomputes expected costs

Query execution with “smart router” operatorsSpecial support for efficient stitch-up plans:

Uses intermediate results from previous plans (specialized-case of answering queries using views [H01])

Join-over-union (“stitch-up-join”) operator that excludes certain results

29

ADP Application 1:Correcting Cost Mis-estimates

Goal: react to plans that are obviously bad Don’t spend cycles searching for a slightly better plan Try to avoid paths that are likely to not be promising

Monitor/reoptimizer thread watches cardinalities of subresults Re-estimate plan cost, compare to projected costs of

alternatives, using several techniques & heuristics (see paper) Our experiments: re-estimate every 1 sec.

“Smart router” operator does the following: Waits for monitor/reoptimizer to suggest replacement plan Re-routes source data into the new plan New plan’s output is unioned with output of previous plan;

this is fed into any final aggregation operations

30

Correcting for Unexpected Selectivities

04

08

0

3A(uniform)

3A(skewed)

10(uniform)

10(skewed)

10A(uniform)

10A(skewed)

5(uniform)

5(skewed)

Query - data set combination

Ru

nn

ing

tim

e (s

ec)

Static - No StatisticsStatic - CardinalitiesAdaptive - No StatisticsAdaptive - CardinalitiesPlan Partitioning - No Stats

119 122 164 171 163 172 148 149 Pentium IV 3.06 GHzWindows XP

31

ADP Application 2:Optimizing for Order

Most general ADP approach: Pre-generate plans for general

case and each “interesting order” “Smart router” sends tuple to the

plan whose ordering constraint is followed by this tuple

But with multiple joins, MANY plans

Instead: do ADP at the operator level “Complementary join pair” Does its own stitch-up internally Easier to optimize for!

Can also do “partial sorting” at the router (priority queue)

Q Q

Q

QQ

MergeHash

R S

h(R) h(S) h(R) h(S)

...

Routers

32

Exploiting Partial Order in the Data

0

82

164

Uniform Skewed Uniform,1%

Reordered

Skewed,1%

Reordered

Skewed,10%

Reordered

Skewed,50%

Reordered

Data set

Ru

nn

ing

Tim

e (

se

c)

Pipelined hash join

Complementary joins

Comp. joins with priority queue

(1024 tuple)

Pentium IV 3.06 GHzWindows XP

33

ADP Over “Windows”:Optimizing for Aggregation

Group-by optimization [CS94]: May be able to “pre-aggregate” some

tuples before joining Why: aggregates can be applied over

union But once we insert pre-

aggregation, we’re stuck (and it’s not pipelined)

Our solution: “Adjustable window pre-aggregation” Change window size depending on

how effectively we can aggregate Also allows data to propagate through

the plan – better info for adaptivity, early answers

T

R

SUM(T.y sums)GROUP BY T.x

SUM(T.y)GROUP BY

T.x, T.joinAttrib

T R

SUM(T.y)GROUP BY T.x

vs.

34

Pre-Aggregation Comparison

0

20

40

3A(uniform)

10(uniform)

10A(uniform)

5(uniform)

Query - data set combination

Ru

nn

ing

Tim

e (

se

c)

Single Aggregation

Traditional Pre-Aggregation

Adjustable-Window Pre-Aggregation

147 149 149 141

35

Adaptive QP in Summary

A variety of different techniques, focusing on: Scheduling Comparison & competition Information passing Data + plan partitioning

A field that is still fairly open – missing: Effective exploration methods A true theory!

What’s possible? What kinds of queries make sense to adapt?

Guarantees of optimality and convergence (perhaps under certain assumptions)

adaptively processing remote data zachary g. ives university of pennsylvania cis 650 – database...

Documents