dynamically optimizing queries over large scale data platforms

Dynamically Optimizing Queries over Large Scale Data Platforms

Konstantinos Karanasos♯, Andrey Balmin§, Marcel Kutsch♣, Fatma Özcan*, Vuk Ercegovac◊, Chunyang Xia♦, Jesse Jackson♦

Inria SaclayNovember 26, 2014

♯Microsoft *IBM Research §Platfora ♣Apple ◊Google ♦IBM

[Work done at IBM Almaden Research Center]

2

Big Data Platforms

HiveQL

Pig

DryadLINQ

Spark

Stratosphere

Jaql

SQL

unstructured

semi-structurednested

structured

data streamsrelational

Languages

HadoopHive/Stinger

Jaql

StratosphereSpark

Impala

HAWQDryad

HadaptPolybase

DrillNeed for efficient Big Data management

Challenging due to size and heterogeneity of data, variety of applications

The Big Data Landscape

Query optimization is crucial

3

Query Optimization in Large Scale Data Platforms

• Existing challenges• Exponential error propagation in joins• Correlations between predicates

• “New” challenges• Prominent use of UDFs• Complex data types (arrays, maps, structs)• Poor statistics (do we own the data?)• Bad plans over Big data may be disastrous• Exploit cluster’s resources (parallel execution)

Traditional static techniques are not sufficient

We introduce dynamic techniques that are:• at least as good as and • up to 2x (4x) better than the best hand-written left-deep Jaql (Hive) plans

4

SELECT <projection list> FROM ( SELECT <projection list> FROM "PART", "SUPPLIER", "LINEITEM", "PARTSUPP", "ORDERS", "NATION" WHERE <join conditions> AND "PART"."p_name" LIKE '%green%' AND "ORDERS"."o_orderdate" BETWEEN '1995-01-01' AND '1995-07-01' AND "ORDERS"."o_orderstatus"='P' AND udf("PARTSUPP"."ps_partkey") < 0.001 AND <udf list>) "PROFIT"GROUP BY "PROFIT"."NATION", "PROFIT"."order_YEAR"ORDER BY "PROFIT"."NATION" ASC, "PROFIT"."order_YEAR" DESC;

5-way join

external UDFs

correlated predicates

Example: TPCH Q9’

5

“SQL” Processing in Large Scale Platforms• Relational operators -> MapReduce jobs• Two join algorithms:

• Repartition join (RJ) – 1MR job (default)• Memory join (MJ) – map-only job

• Optimizations based on rewrite rules and hints• RJ -> MJ• Chain MJs (multiple joins in one map job)

• Left-deep plans• This is the picture for Jaql (and Hive)

6

Limitations• No selectivity estimation for predicates/UDFs

• Conservative application of memory joins• No cost-based join enumeration

• Rely on order of relations in FROM clause• Left-deep plans

• Often close to optimal for centralized settings• Not sufficient for distributed query processing

7

p

l

Best left-deep hand-written Jaql plan

RJ

o

RJ ps

RJ

Best relational optimizer plan

MJ

s nudf(ps)

udf(o)

udf(p)

udf(o,l)

RJ

s

RJ l

RJ

RJ

RJ

udf(o,l)

p

udf(p)

n

MJ

o

udf(o)

ps

udf(ps)

TPCH Q9’: Execution Plans

8

Dynamic Optimization

• Key idea: alter execution plan at runtime• Studied in the relational setting

• Both centralized and distributed• Basic concern: when to break the pipeline?• No emphasis on UDFs and correlated predicates

• Increasingly being used in large scale platforms(e.g., Scope, Shark, Hive)

Goal: dynamic optimization techniques for large scale data platforms (implemented in Jaql)

9

IBM BigInsights Jaql

Key differentiators• Functions:

reusability + abstraction• Physical Transparency:

precise control when needed• Data model:

semi-structured based on JSON

Dataflows for conceptual JSON data

Flexible scripting language

Scalable map-reduce runtime

Fault Tolerant DFS

Jaql

Map

Jaql

Reduce

Jaql

MapJaql

Reduce

Jaql

Map

10

Jaql Script: Example

transform group by writeread

read(hdfs("reviews")) -> transform { pid: $.placeid, rev: sentAn($.review) }-> group by p = ($.pid) as r into { pid: p, revs: r.rev } -> write(hdfs("group-reviews"))

[ { pid: 12, revs: [ 3*, 4*, … ] }, { pid: 19, revs: [ 2*, 1*, … ] }]

Query Data

Group user reviews by place

11

Jaql to MapReduce

mapReduce( input: { type: hdfs, location: "reviews" }, output: { type: hdfs, location: "group-reviews" }, map: fn($mapIn) ( $mapIn -> transform { pid: $.placeid, rev: sentAn($.review) } -> transform [ $.placeid, $.rev ] ), reduce: fn($p, $r) ( [ pid: $p, revs: $r ] ) )

• Functions as parameters• Rewritten script is valid

Jaql!

read(hdfs("reviews")) -> transform { pid: $.placeid, rev: sentAn($.review) }-> group by p = ($.pid) as r into { pid: p, revs: r.rev } -> write(hdfs("group-reviews"))

Rewrite Engine

12

Outline

• Introduction• System Architecture

• Pilot Runs• Adaptation of Execution Plans

• Experiments• Conclusion

13

DynO Architecture

Query

Query result

Jaql plan

Optimizer (join enumeration)

Jaql compiler

Jaql runtime

MapReduce

Statistics DFS

join query

blocks

execute part of the plan

best plan

pilot runs

remaining plan

1

2

3

4

8

5

6

7

14

• PilR algorithm:• Push-down selections/UDFs• Get leaf expressions (scans + local predicates)• Transform them to map-only jobs• Execute them over random splits of each relation• Until k tuples are output• Collect statistics during execution

• Parallel execution of pilot runs (~4.5x speedup)• Approx. 3% overhead to the execution• Performance speedup of up to 2x (4x) for Jaql (Hive)

Pilot Runs

15

p

l

Best left-deep hand-written Jaql plan

RJ

o

RJ ps

RJ


MJ

s nudf(ps)

udf(o)

udf(p)

udf(o,l)

RJ

s

RJ l

RJ

RJ

RJ

udf(o,l)

p

udf(p)

n

MJ

o

udf(o)

ps

udf(ps)

p

l

MJo

ps

MJ

MJ

s n

udf(ps)

udf(o)

udf(p)

udf(o,l)

MJ

MJ

DynO plan

Up to 2x speedup(4x when applied to Hive)

TPCH Q9’: Impact of Pilot Runs

16

• Collected statistics:• #tuples, min/max, #distinct values• add more if the optimizer can support them

• Statistics reusability• Optimization for selective (and expensive) predicates• Shortcomings:

• Non-local predicates• Non primary/foreign key joins• Join correlations

Pilot Runs: Details

Runtime adaptation of execution plans

17

• Cost-based optimizer• Based on Columbia (top-down) optimizer• Focuses on join enumeration• Accurate statistics from pilot runs and/or previous executions• Bushy plans (intra-query parallelization)

• Online statistics collection• Re-optimization points (natural in MR)• Execution strategies: choosing leaf jobs

• Degree of parallelization, cost/uncertainty of jobs

Adaptation of Execution Plans

18

sp Best left-deep hand-written Jaql plan

RJ l

RJ o

RJ c

RJ

n1

MJ n2

MJ r

MJ

o

RJ

s

RJ n2

MJ

RJ

l

RJ

p

udf(o,c)

r

MJ

c

MJ

n1

udf(o,c)


TPCH Q8’: Impact of Execution Plan Adaptation

19

o

s

RJ n2

MJ

RJ

l

RJ

p

r

MJ

c

MJ

n1

udf(o,c)

RJt1s

MJ n2

MJ

RJ

l

RJ

t1

p

t2

RJ s

RJ n2

MJ

p t2

t3

MJ

n2

MJ

s

t3Speedup up to 2x without any initial statistics(despite the added overhead)

TPCH Q8’: Impact of Execution Plan Adaptation

20

Outline

• Introduction• System Architecture

• Pilot Runs• Adaptation of Execution Plans

• Experiments• Conclusion

21

Experimental Setup

• 15-node cluster, 10 GbE• Each machine:

• 12-cores, 96 GB RAM (2GB to each MR slot), 12*2TB disks• 10 map/8 reduce slots

• Hadoop 1.1.1• ZooKeeper for coordination (in statistics collection)• TPCH data, SF = {100, 300, 1000}• TPCH queries (with additional UDFs)

22

Execution times comparison

• At least as good as the best left-deep hand-written plans• Benefits from bushy plans (Q2)• Benefits from pilot runs due to many UDFs (Q9’)• Benefits from re-optimization due to UDF on join result (Q8’)• Biggest benefit is brought by the pilot runs

23

Benefits of our Approach on Hive

• Similar performance trends with Jaql

• Bigger speedup (up to 4x) due to implementation of broadcast joins (Hive 0.12 exploits DistributedCache)

24

Overhead of Dynamic Optimization

• Pilot runs overhead 2.5-6.5%

• Stats collection overhead 0.1-2.8%

• Overall overhead 7-10%

25

Conclusion

• Pilot runs to account for UDFs• Dynamic adaptation of execution plans• Traditional optimizer for join ordering (bushy plans)• Online statistics collection (no need for initial statistics)• Execution strategies• At least as good plans as the left-deep hand-written ones

• Up to 2x faster (4x for Hive)• Applicability to other systems (e.g., Hive)

26

Perspectives

• Broader range of applications (e.g., ML)• Other runtimes (e.g., Tez)• Adaptive operators• Extend optimizer to support grouping, ordering

Thank you!

dynamically optimizing queries over large scale data platforms

Science